人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › MATLAB等数学软件专版 › [QuantEcon]MATLAB混编FORTRAN语言

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

上一页 1 2 3 456 7 8 9 10 ... 30 下一页

发帖

楼主: tulipsliu

4149 294

[学科前沿] [QuantEcon]MATLAB混编FORTRAN语言 [推广有奖]

41楼

tulipsliu

发表于 2020-12-13 22:35:11 |只看作者 |坛友微信交流群

$$
sz = \frac{log\left({omegabars_{t}}\right)+\frac{{ssigmat_{t-1}}^{2}}{2}}{{ssigmat_{t-1}}}
$$

使用道具举报

42楼

tulipsliu

发表于 2020-12-13 22:35:32 |只看作者 |坛友微信交流群

$$
szplus = \frac{log\left({omegabars_{t+1}}\right)+\frac{{ssigmat_{t}}^{2}}{2}}{{ssigmat_{t}}}
$$

使用道具举报

43楼

tulipsliu

发表于 2020-12-14 08:25:26 |只看作者 |坛友微信交流群

$$
\min_{\beta_0,\beta} \frac{1}{N} \sum_{i=1}^{N} w_i l(y_i,\beta_0+\beta^T x_i) + \lambda\left[(1-\alpha)||\beta||_2^2/2 + \alpha ||\beta||_1\right],
$$

使用道具举报

44楼

tulipsliu

发表于 2020-12-14 08:26:14 |只看作者 |坛友微信交流群

$$
\tilde{\beta}_j \leftarrow \frac{S(\frac{1}{N}\sum_{i=1}^N x_{ij}(y_i-\tilde{y}_i^{(j)}),\lambda \alpha)}{1+\lambda(1-\alpha)},
$$
where $\tilde{y}_i^{(j)} = \tilde{\beta}_0 + \sum_{\ell \neq j} x_{i\ell} \tilde{\beta}_\ell$, and $S(z, \gamma)$ is the soft-thresholding operator with value $\text{sign}(z)(|z|-\gamma)_+$.

使用道具举报

45楼

tulipsliu

发表于 2020-12-14 08:26:53 |只看作者 |坛友微信交流群

Here we solve the following problem:
$$
\min_{(\beta_0, \beta) \in \mathbb{R}^{(p+1)\times K}}\frac{1}{2N} \sum_{i=1}^N ||y_i -\beta_0-\beta^T x_i||^2_F+\lambda \left[ (1-\alpha)||\beta||_F^2/2 + \alpha\sum_{j=1}^p||\beta_j||_2\right].
$$
Here $\beta_j$ is the jth row of the $p\times K$ coefficient matrix $\beta$, and we replace the absolute penalty on each single coefficient by a group-lasso penalty on each coefficient K-vector $\beta_j$ for a single predictor $x_j$.

We use a set of data generated beforehand for illustration.

使用道具举报

46楼

tulipsliu

发表于 2020-12-14 08:27:37 |只看作者 |坛友微信交流群

### Binomial Models

For the binomial model, suppose the response variable takes value in $\mathcal{G}=\{1,2\}$. Denote $y_i = I(g_i=1)$. We model
$$\mbox{Pr}(G=2|X=x)=\frac{e^{\beta_0+\beta^Tx}}{1+e^{\beta_0+\beta^Tx}},$$
which can be written in the following form
$$\log\frac{\mbox{Pr}(G=2|X=x)}{\mbox{Pr}(G=1|X=x)}=\beta_0+\beta^Tx,$$
the so-called "logistic" or log-odds transformation.

The objective function for the penalized logistic regression uses the negative binomial log-likelihood, and is
$$
\min_{(\beta_0, \beta) \in \mathbb{R}^{p+1}} -\left[\frac{1}{N} \sum_{i=1}^N y_i \cdot (\beta_0 + x_i^T \beta) - \log (1+e^{(\beta_0+x_i^T \beta)})\right] + \lambda \big[ (1-\alpha)||\beta||_2^2/2 + \alpha||\beta||_1\big].
$$
Logistic regression is often plagued with degeneracies when $p > N$ and exhibits wild behavior even when $N$ is close to $p$;

使用道具举报

47楼

tulipsliu

发表于 2020-12-14 08:28:13 |只看作者 |坛友微信交流群

### Multinomial Models

For the multinomial model, suppose the response variable has $K$ levels ${\cal G}=\{1,2,\ldots,K\}$. Here we model
$$\mbox{Pr}(G=k|X=x)=\frac{e^{\beta_{0k}+\beta_k^Tx}}{\sum_{\ell=1}^Ke^{\beta_{0\ell}+\beta_\ell^Tx}}.$$

Let ${Y}$ be the $N \times K$ indicator response matrix, with elements $y_{i\ell} = I(g_i=\ell)$. Then the elastic-net penalized negative log-likelihood function becomes
$$
\ell(\{\beta_{0k},\beta_{k}\}_1^K) = -\left[\frac{1}{N} \sum_{i=1}^N \Big(\sum_{k=1}^Ky_{il} (\beta_{0k} + x_i^T \beta_k)- \log \big(\sum_{k=1}^K e^{\beta_{0k}+x_i^T \beta_k}\big)\Big)\right] +\lambda \left[ (1-\alpha)||\beta||_F^2/2 + \alpha\sum_{j=1}^p||\beta_j||_q\right].
$$
Here we really abuse notation! $\beta$ is a $p\times K$ matrix of coefficients. $\beta_k$ refers to the kth column (for outcome category k), and $\beta_j$ the jth row (vector of K coefficients for variable j).
The last penalty term is $||\beta_j||_q$, we have two options for q: $q\in \{1,2\}$.
When q=1, this is a lasso penalty on each of the parameters. When q=2, this is a grouped-lasso penalty on all the K coefficients for a particular variables, which makes them all be zero or nonzero together.

使用道具举报

48楼

tulipsliu

发表于 2020-12-14 08:28:52 |只看作者 |坛友微信交流群

## Poisson Models

Poisson regression is used to model count data under the assumption of Poisson error, or otherwise non-negative data where the mean and variance are proportional. Like the Gaussian and binomial model, the Poisson is a member of the exponential family of distributions. We usually model its positive mean on the log scale: $\log \mu(x) = \beta_0+\beta' x$.
The log-likelihood for observations $\{x_i,y_i\}_1^N$ is given my
$$
l(\beta|X, Y) = \sum_{i=1}^N \left(y_i (\beta_0+\beta' x_i) - e^{\beta_0+\beta^Tx_i}\right).
$$
As before, we optimize the penalized log-lielihood:
$$
\min_{\beta_0,\beta} -\frac1N l(\beta|X, Y) + \lambda \left((1-\alpha) \sum_{i=1}^N \beta_i^2/2) +\alpha \sum_{i=1}^N |\beta_i|\right).
$$

使用道具举报

49楼

tulipsliu

发表于 2020-12-14 08:29:22 |只看作者 |坛友微信交流群

## Cox Models

The Cox proportional hazards model is commonly used for the study of the relationship beteween predictor variables and survival time. In the usual survival analysis framework, we have data of the form $(y_1, x_1, \delta_1), \ldots, (y_n, x_n, \delta_n)$ where $y_i$, the observed time, is a time of failure if $\delta_i$ is 1 or right-censoring if $\delta_i$ is 0. We also let $t_1 < t_2 < \ldots < t_m$ be the increasing list of unique failure times, and $j(i)$ denote the index of the observation failing at time $t_i$.

The Cox model assumes a semi-parametric form for the hazard
$$
h_i(t) = h_0(t) e^{x_i^T \beta},
$$
where $h_i(t)$ is the hazard for patient $i$ at time $t$, $h_0(t)$ is a shared baseline hazard, and $\beta$ is a fixed, length $p$ vector. In the classic setting $n \geq p$, inference is made via the partial likelihood
$$
L(\beta) = \prod_{i=1}^m \frac{e^{x_{j(i)}^T \beta}}{\sum_{j \in R_i} e^{x_j^T \beta}},
$$
where $R_i$ is the set of indices $j$ with $y_j \geq t_i$ (those at risk at time $t_i$).

使用道具举报

50楼

tulipsliu

发表于 2020-12-14 08:30:50 |只看作者 |坛友微信交流群

## Appendix 0: Convergence Criteria

Glmnet uses a convergence criterion that focuses not on coefficient
change but rather the impact of the change on the fitted values, and
hence the  loss part of the objective. The net result is a
weighted norm of the coefficient change vector.

For gaussian models it uses the following. Suppose observation $i$
has weight $w_i$. Let $v_j$ be the (weighted)
sum-of-squares for variable $x_j$:
$$v_j=\sum_{i=1}^Nw_ix_{ij}^2.$$
If there is an intercept in the model, these $x_j$ will be centered by
the weighted mean, and hence this would be a weighted variance.
After $\hat\beta_j^o$ has been updated to $\hat\beta_j^n$, we compute
$\Delta_j=v_j(\hat\beta_j^o-\hat\beta_j^n)^2$. After a complete cycle of coordinate descent, we look at
$\Delta_{max}=\max_j\Delta_j$.  Why this measure?
We can write
$$\Delta_j=\frac1N\sum_{i=1}^N w_j(x_{ij}\hat\beta_j^o-x_{ij}\hat\beta_j^n)^2,$$
which measures the weighted sum of squares of changes in fitted values
for this term. This measures the impact of the change in this
coefficient on the fit. If the largest such change is negligible, we stop.

For logistic regression, and other non-Gaussian models it is similar
for the inner loop. Only now the weights for each observation are more
complex. For example, for logisitic regression the weights are those
that arise from the current Newton step, namely $w_i^*=w_i\hat p_i(1-\hat p_i)$. Here $\hat p_i$ are the fitted probabilities as we
entered the current inner loop.  The intuition is the same --- it
measures the impact of the coefficient change on the current weighted
least squares loss, or quadratic approximation to the log-likelihood
loss.

What about outer-loop convergence? We use the same measure, except now
$\hat\beta^o$ is the coefficient vector before we entered this inner
loop, and $\hat\beta^n$ the converged solution for this inner
loop. Hence if this Newton step had no impact, we declare outer-loop convergence.

使用道具举报