楼主: tulipsliu
4149 294

[学科前沿] [QuantEcon]MATLAB混编FORTRAN语言 [推广有奖]

41
tulipsliu 在职认证  发表于 2020-12-13 22:35:11 |只看作者 |坛友微信交流群
$$
sz = \frac{log\left({omegabars_{t}}\right)+\frac{{ssigmat_{t-1}}^{2}}{2}}{{ssigmat_{t-1}}}
$$

使用道具

42
tulipsliu 在职认证  发表于 2020-12-13 22:35:32 |只看作者 |坛友微信交流群
$$
szplus = \frac{log\left({omegabars_{t+1}}\right)+\frac{{ssigmat_{t}}^{2}}{2}}{{ssigmat_{t}}}
$$

使用道具

43
tulipsliu 在职认证  发表于 2020-12-14 08:25:26 |只看作者 |坛友微信交流群
$$
\min_{\beta_0,\beta} \frac{1}{N} \sum_{i=1}^{N} w_i l(y_i,\beta_0+\beta^T x_i) + \lambda\left[(1-\alpha)||\beta||_2^2/2 + \alpha ||\beta||_1\right],
$$

使用道具

44
tulipsliu 在职认证  发表于 2020-12-14 08:26:14 |只看作者 |坛友微信交流群
$$
\tilde{\beta}_j \leftarrow \frac{S(\frac{1}{N}\sum_{i=1}^N x_{ij}(y_i-\tilde{y}_i^{(j)}),\lambda \alpha)}{1+\lambda(1-\alpha)},
$$
where $\tilde{y}_i^{(j)} = \tilde{\beta}_0 + \sum_{\ell \neq j} x_{i\ell} \tilde{\beta}_\ell$, and $S(z, \gamma)$ is the soft-thresholding operator with value $\text{sign}(z)(|z|-\gamma)_+$.

使用道具

45
tulipsliu 在职认证  发表于 2020-12-14 08:26:53 |只看作者 |坛友微信交流群
Here we solve the following problem:
$$
\min_{(\beta_0, \beta) \in \mathbb{R}^{(p+1)\times K}}\frac{1}{2N} \sum_{i=1}^N ||y_i -\beta_0-\beta^T x_i||^2_F+\lambda \left[ (1-\alpha)||\beta||_F^2/2 + \alpha\sum_{j=1}^p||\beta_j||_2\right].
$$
Here $\beta_j$ is the jth row of the $p\times K$ coefficient matrix $\beta$, and we replace the absolute penalty on each single coefficient by a group-lasso penalty on each coefficient K-vector $\beta_j$ for a single predictor $x_j$.

We use a set of data generated beforehand for illustration.

使用道具

46
tulipsliu 在职认证  发表于 2020-12-14 08:27:37 |只看作者 |坛友微信交流群
### Binomial Models

For the binomial model, suppose the response variable takes value in $\mathcal{G}=\{1,2\}$. Denote $y_i = I(g_i=1)$. We model
$$\mbox{Pr}(G=2|X=x)=\frac{e^{\beta_0+\beta^Tx}}{1+e^{\beta_0+\beta^Tx}},$$
which can be written in the following form
$$\log\frac{\mbox{Pr}(G=2|X=x)}{\mbox{Pr}(G=1|X=x)}=\beta_0+\beta^Tx,$$
the so-called "logistic" or log-odds transformation.

The objective function for the penalized logistic regression uses the negative binomial log-likelihood, and is
$$
\min_{(\beta_0, \beta) \in \mathbb{R}^{p+1}} -\left[\frac{1}{N} \sum_{i=1}^N y_i \cdot (\beta_0 + x_i^T \beta) - \log (1+e^{(\beta_0+x_i^T \beta)})\right] + \lambda \big[ (1-\alpha)||\beta||_2^2/2 + \alpha||\beta||_1\big].
$$
Logistic regression is often plagued with degeneracies when $p > N$ and exhibits wild behavior even when $N$ is close to $p$;

使用道具

47
tulipsliu 在职认证  发表于 2020-12-14 08:28:13 |只看作者 |坛友微信交流群
### Multinomial Models

For the multinomial model, suppose the response variable has $K$ levels ${\cal G}=\{1,2,\ldots,K\}$. Here we model
$$\mbox{Pr}(G=k|X=x)=\frac{e^{\beta_{0k}+\beta_k^Tx}}{\sum_{\ell=1}^Ke^{\beta_{0\ell}+\beta_\ell^Tx}}.$$

Let ${Y}$ be the $N \times K$ indicator response matrix, with elements $y_{i\ell} = I(g_i=\ell)$. Then the elastic-net penalized negative log-likelihood function becomes
$$
\ell(\{\beta_{0k},\beta_{k}\}_1^K) = -\left[\frac{1}{N} \sum_{i=1}^N \Big(\sum_{k=1}^Ky_{il} (\beta_{0k} + x_i^T \beta_k)- \log \big(\sum_{k=1}^K e^{\beta_{0k}+x_i^T \beta_k}\big)\Big)\right] +\lambda \left[ (1-\alpha)||\beta||_F^2/2 + \alpha\sum_{j=1}^p||\beta_j||_q\right].
$$
Here we really abuse notation! $\beta$ is a $p\times K$ matrix of coefficients. $\beta_k$ refers to the kth column (for outcome category k), and $\beta_j$ the jth row (vector of K coefficients for variable j).
The last penalty term is $||\beta_j||_q$, we have two options for q: $q\in \{1,2\}$.
When q=1, this is a lasso penalty on each of the parameters. When q=2, this is a grouped-lasso penalty on all the K coefficients for a particular variables, which makes them all be zero or nonzero together.

使用道具

48
tulipsliu 在职认证  发表于 2020-12-14 08:28:52 |只看作者 |坛友微信交流群
## Poisson Models

Poisson regression is used to model count data under the assumption of Poisson error, or otherwise non-negative data where the mean and variance are proportional. Like the Gaussian and binomial model, the Poisson is a member of the exponential family of distributions. We usually model its positive mean on the log scale:   $\log \mu(x) = \beta_0+\beta' x$.
The log-likelihood for observations $\{x_i,y_i\}_1^N$ is given my
$$
l(\beta|X, Y) = \sum_{i=1}^N \left(y_i (\beta_0+\beta' x_i) - e^{\beta_0+\beta^Tx_i}\right).
$$
As before, we optimize the penalized log-lielihood:
$$
\min_{\beta_0,\beta} -\frac1N l(\beta|X, Y)  + \lambda \left((1-\alpha) \sum_{i=1}^N \beta_i^2/2) +\alpha \sum_{i=1}^N |\beta_i|\right).
$$

使用道具

49
tulipsliu 在职认证  发表于 2020-12-14 08:29:22 |只看作者 |坛友微信交流群
## Cox Models

The Cox proportional hazards model is commonly used for the study of the relationship beteween predictor variables and survival time. In the usual survival analysis framework, we have data of the form $(y_1, x_1, \delta_1), \ldots, (y_n, x_n, \delta_n)$ where $y_i$, the observed time, is a time of failure if $\delta_i$ is 1 or right-censoring if $\delta_i$ is 0. We also let $t_1 < t_2 < \ldots < t_m$ be the increasing list of unique failure times, and $j(i)$ denote the index of the observation failing at time $t_i$.

The Cox model assumes a semi-parametric form for the hazard
$$
h_i(t) = h_0(t) e^{x_i^T \beta},
$$
where $h_i(t)$ is the hazard for patient $i$ at time $t$, $h_0(t)$ is a shared baseline hazard, and $\beta$ is a fixed, length $p$ vector. In the classic setting $n \geq p$, inference is made via the partial likelihood
$$
L(\beta) = \prod_{i=1}^m \frac{e^{x_{j(i)}^T \beta}}{\sum_{j \in R_i} e^{x_j^T \beta}},
$$
where $R_i$ is the set of indices $j$ with $y_j \geq t_i$ (those at risk at time $t_i$).

使用道具

50
tulipsliu 在职认证  发表于 2020-12-14 08:30:50 |只看作者 |坛友微信交流群
## Appendix 0: Convergence Criteria

Glmnet uses a convergence criterion that focuses not on coefficient
change but rather the impact of the change on the fitted values, and
hence the  loss part of the objective. The net result is a
weighted norm of the coefficient change vector.

For gaussian models it uses the following. Suppose observation $i$
has weight $w_i$. Let $v_j$ be the (weighted)
sum-of-squares for variable $x_j$:
$$v_j=\sum_{i=1}^Nw_ix_{ij}^2.$$
If there is an intercept in the model, these $x_j$ will be centered by
the weighted mean, and hence this would be a weighted variance.
After $\hat\beta_j^o$ has been updated to $\hat\beta_j^n$,    we compute
$\Delta_j=v_j(\hat\beta_j^o-\hat\beta_j^n)^2$. After a complete cycle of coordinate descent, we look at
$\Delta_{max}=\max_j\Delta_j$.  Why this measure?
We can write
$$\Delta_j=\frac1N\sum_{i=1}^N w_j(x_{ij}\hat\beta_j^o-x_{ij}\hat\beta_j^n)^2,$$
which measures the weighted sum of squares of changes in fitted values
for this term. This measures the impact of the change in this
coefficient on the fit. If the largest such change is negligible, we stop.


For logistic regression, and other non-Gaussian models it is similar
for the inner loop. Only now the weights for each observation are more
complex. For example, for logisitic regression the weights are those
that arise from the current Newton step, namely $w_i^*=w_i\hat p_i(1-\hat p_i)$. Here $\hat p_i$ are the fitted probabilities as we
entered the current inner loop.  The intuition is the same --- it
measures the impact of the coefficient change on the current weighted
least squares loss, or quadratic approximation to the log-likelihood
loss.

What about outer-loop convergence? We use the same measure, except now
$\hat\beta^o$ is the coefficient vector before we entered this inner
loop, and $\hat\beta^n$ the converged solution for this inner
loop. Hence if this Newton step had no impact, we declare outer-loop convergence.

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-26 11:23