楼主: tulipsliu
4214 294

[学科前沿] [QuantEcon]MATLAB混编FORTRAN语言 [推广有奖]

71
tulipsliu 在职认证  发表于 2020-12-15 17:53:54 |只看作者 |坛友微信交流群
In both theory and practice, and in both frequentist and Bayesian inference, the log-likelihood is used instead of the likelihood, on both the record- and model-level. The model-level product of record-level likelihoods can exceed the range of a number that can be stored by a computer, which is usually affected by sample size. By estimating a record-level log-likelihood, rather than likelihood, the model-level log-likelihood is the sum of the record-level log-likelihoods, rather than a product of the record-level likelihoods.

$$\log[p(\textbf{y} | \theta)] = \sum^n_{i=1} \log[p(\textbf{y}_i | \theta)]$$

rather than

$$p(\textbf{y} | \theta) = \prod^n_{i=1} p(\textbf{y}_i | \theta)$$

As a function of $\theta$, the unnormalized joint posterior distribution is the product of the likelihood function and the prior distributions. To continue with the example of Bayesian linear regression, here is the unnormalized joint posterior distribution

$$p(\beta, \sigma^2 | \textbf{y}) = p(\textbf{y} | \beta, \sigma^2)p(\beta_1)p(\beta_2)p(\beta_3)p(\sigma^2)$$

More usually, the logarithm of the unnormalized joint posterior distribution is used, which is the sum of the log-likelihood and prior distributions. Here is the logarithm of the unnormalized joint posterior distribution for this example

$$\log[p(\beta, \sigma^2 | \textbf{y})] = \log[p(\textbf{y} | \beta, \sigma^2)] + \log[p(\beta_1)] + \log[p(\beta_2)] + \log[p(\beta_3)] + \log[p(\sigma^2)]$$

The logarithm of the unnormalized joint posterior distribution is maximized with numerical approximation.

使用道具

72
tulipsliu 在职认证  发表于 2020-12-15 17:54:37 |只看作者 |坛友微信交流群
Approximate Bayesian Computation (ABC), also called likelihood-free estimation, is a family of numerical approximation techniques in Bayesian inference. ABC is especially useful when evaluation of the likelihood, $p(\textbf{y} | \Theta)$ is computationally prohibitive, or when suitable likelihoods are unavailable. As such, ABC algorithms estimate likelihood-free approximations. ABC is usually faster than a similar likelihood-based numerical approximation technique, because the likelihood is not evaluated directly, but replaced with an approximation that is usually easier to calculate. The approximation of a likelihood is usually estimated with a measure of distance between the observed sample, $\textbf{y}$, and its replicate given the model, $\textbf{y}^{rep}$, or with summary statistics of the observed and replicated samples.

使用道具

73
tulipsliu 在职认证  发表于 2020-12-15 17:55:08 |只看作者 |坛友微信交流群
The ``posterior predictive distribution'' is either the replication of $\textbf{y}$ given the model (usually represented as $\textbf{y}^{rep}$), or the prediction of a new and unobserved $\textbf{y}$ (usually represented as $\textbf{y}^{new}$ or $\textbf{y}'$), given the model. This is the likelihood of the replicated or predicted data, averaged over the posterior distribution $p(\Theta | \textbf{y})$

$$p(\textbf{y}^{rep} | \textbf{y}) = \int p(\textbf{y}^{rep} | \Theta)p(\Theta | \textbf{y}) d\Theta$$

or

$$p(\textbf{y}^{new} | \textbf{y}) = \int p(\textbf{y}^{new} | \Theta)p(\Theta | \textbf{y}) d\Theta$$

If $\textbf{y}$ has missing values, then the missing $\textbf{y}$s can be estimated with the posterior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} as $\textbf{y}^{new}$ from within the model. For the linear regression example, the integral for prediction is

$$p(\textbf{y}^{new} | \textbf{y}) = \int p(\textbf{y}^{new} | \beta,\sigma^2)p(\beta,\sigma^2 | \textbf{y}) d\beta d\sigma^2$$

The posterior predictive distribution is easy to estimate

$$\textbf{y}^{new} \sim \mathcal{N}(\mu, \sigma^2)$$

where $\mu$ = \textbf{X}$\beta$, and $\mu$ is the conditional mean, while $\sigma^2$ is the residual variance.

使用道具

74
tulipsliu 在职认证  发表于 2020-12-15 17:56:22 |只看作者 |坛友微信交流群

Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model selection bias, evaluates evidence in favor the null hypothesis, includes model uncertainty, and allows non-nested models to be compared (though of course the model must have the same dependent variable). Also, frequentist significance tests become biased in favor of rejecting the null hypothesis with sufficiently large sample size.

The Bayes factor for comparing two models may be approximated as the ratio of the marginal likelihood of the data in model 1 and model 2. Formally, the Bayes factor in this case is

$$B = \frac{p(\textbf{y}|\mathcal{M}_1)}{p(\textbf{y}|\mathcal{M}_2)} = \frac{\int p(\textbf{y}|\Theta_1,\mathcal{M}_1)p(\Theta_1|\mathcal{M}_1)d\Theta_1}{\int p(\textbf{y}|\Theta_2,\mathcal{M}_2)p(\Theta_2|\mathcal{M}_2)d\Theta_2}$$

where $p(\textbf{y}|\mathcal{M}_1)$ is the marginal likelihood of the data in model 1.

The Bayes factor, $B$, is the posterior odds in favor of the hypothesis divided by the prior odds in favor of the hypothesis, where the hypothesis is usually $\mathcal{M}_1 > \mathcal{M}_2$. Put another way,

使用道具

75
tulipsliu 在职认证  发表于 2020-12-15 17:57:13 |只看作者 |坛友微信交流群
For example, when $B=2$, the data favor $\mathcal{M}_1$ over $\mathcal{M}_2$ with 2:1 odds.

In a non-hierarchical model, the marginal likelihood may easily be approximated with the Laplace-Metropolis Estimator for model $m$ as

$$p(\textbf{y}|m) = (2\pi)^{d_m/2}|\Sigma_m|^{1/2}p(\textbf{y}|\Theta_m,m)p(\Theta_m|m)$$

where $d$ is the number of parameters and $\Sigma$ is the inverse of the negative of the Hessian matrix of second derivatives. \citet{lewis97} introduce the Laplace-Metropolis method of approximating the marginal likelihood in MCMC, though it naturally works with Laplace Approximation as well. For a hierarchical model that involves both fixed and random effects, the Compound Laplace-Metropolis Estimator must be used.

Gelman finds Bayes factors generally to be irrelevant, because they compute the relative probabilities of the models conditional on one of them being true. Gelman prefers approaches that measure the distance of the data to each of the approximate models \citep[p. 180]{gelman04}. However, \citet{kass95} explain that ``the logarithm of the marginal probability of the data may also be viewed as a predictive score. This is of interest, because it leads to an interpretation of the Bayes factor that does not depend on viewing one of the models as `true''

使用道具

76
tulipsliu 在职认证  发表于 2020-12-15 17:57:59 |只看作者 |坛友微信交流群
In Bayesian inference, the most common method of assessing the goodness of fit of an estimated statistical model is a generalization of the frequentist Akaike Information Criterion (AIC). The Bayesian method, like AIC, is not a test of the model in the sense of hypothesis testing, though Bayesian inference has Bayes factors for such purposes. Instead, like AIC, Bayesian inference provides a model fit statistic that is to be used as a tool to refine the current model or select the better-fitting model of different methodologies.

To begin with, model fit can be summarized with deviance, which is defined as -2 times the log-likelihood \citep[p. 180]{gelman04}, such as

$$D(\textbf{y},\Theta) = -2\log[p(\textbf{y} | \Theta)]$$

Just as with the likelihood, $p(\textbf{y} | \Theta)$, or log-likelihood, the deviance exists at both the record- and model-level. Due to the development of \proglang{BUGS} software \citep{gilks94}, deviance is defined differently in Bayesian inference than frequentist inference. In frequentist inference, deviance is -2 times the log-likelihood ratio of a reduced model compared to a full model, whereas in Bayesian inference, deviance is simply -2 times the log-likelihood. In Bayesian inference, the lowest expected deviance has the highest posterior probability \citep[p. 181]{gelman04}.

A related way to measure model complexity is as half the posterior variance of the model-level deviance, known as pV \citep[p. 182]{gelman04}

$$\mathrm{pV} = \mathrm{var}(D) / 2$$

The effect of model fitting, pD or pV, can be thought of as the number of `unconstrained' parameters in the model, where a parameter counts as: 1 if it is estimated with no constraints or prior information; 0 if it is fully constrained or if all the information about the parameter comes from the prior distribution; or an intermediate value if both the data and the prior are informative \citep[p. 182]{gelman04}. Therefore, by including prior information, Bayesian inference is more efficient in terms of the effective number of parameters than frequentist inference. Hierarchical, mixed effects, or multilevel models are even more efficient regarding the effective number of parameters.

使用道具

77
tulipsliu 在职认证  发表于 2020-12-16 19:00:47 |只看作者 |坛友微信交流群
$$
y = \alpha_{j} + \beta_{j} v + \boldsymbol{\gamma}_{j} \boldsymbol{F} + \boldsymbol{\delta}_{j} \boldsymbol{D}_{j} + \varepsilon
$$

where $j$ indexes regression models, $\boldsymbol{F}$ is the full set of free variables that will be included in every regression model, $\boldsymbol{D}_{j}$ is a vector of $k$ variables taken from the set $\boldsymbol{X}$ of doubtful variables, and $\varepsilon$ is the error term. While  $\boldsymbol{D}_{j}$ has conventionally been limited to no more than three doubtful variables per model \citep{LevineRenelt1992, Achen2005}, the particular choice of $k$, the number of doubtful variables to be included in each combination, is up to the researcher.

The above regression is estimated for each of the $M$ possible combinations of $\boldsymbol{D}_{j} \subset \boldsymbol{X}$. The estimated regression coefficients $\hat{\beta}_{j}$ on the focus variable $v$, along with the corresponding standard errors $\hat{\sigma}_{j}$, are collected and stored for use in later calculations. In the original formulation of extreme bounds analysis, the regressions were estimated by Ordinary Least Squares (OLS). In recent research, however, other types of regression models have also been used, such as ordered probit models \citep{Bjornskov2008, Hafner-Burton2005} or logistic models \citep{HegreSambanis2006, MoserSturm2011, Gassebner2013}.

使用道具

78
tulipsliu 在职认证  发表于 2020-12-16 19:01:51 |只看作者 |坛友微信交流群
In order to determine whether a determinant is robust or fragile, Leamer's extreme bounds analysis focuses only on the extreme bounds of the regression coefficients \citep{Leamer1985}. For any focus variable $v$, the lower and upper extreme bounds are defined as the minimum and maximum values of $\hat{\beta}_{j} \pm \tau  \hat{\sigma}_{j}$ across the $M$ estimated regression models, where $\tau$ is the critical value for the requested confidence level. For the conventional 95-percent confidence level, $\tau$ will thus be equal to approximately 1.96. If the upper and lower extreme bounds have the same sign, the focus variable $v$ is said to be robust. Conversely, if the bounds have opposite signs, the variable is declared fragile.

The interval between the lower and upper extreme bound represents the set of values that are not statistically significantly distinguishable from the coefficient estimate $\hat{\beta}_{j}$. In other words, a simple t-test would fail to reject the null hypothesis that the true parameter $\beta_{j}$ equals any value between the extreme bounds. Intuitively, Leamer's version of extreme bounds analysis scans a large number of model specifications for the lowest and highest value that the $\beta_{j}$ parameter could plausibly take at the requested confidence level. It then labels variables robust and fragile based on whether these extreme bounds have the same or opposite signs, respectively.

使用道具

79
tulipsliu 在职认证  发表于 2020-12-16 19:17:58 |只看作者 |坛友微信交流群
$$
\begin{align*}
    p_x &= \sqrt{\frac{2}{\pi \phi}} \frac{e^{(\phi\mu)^{-1}}}{x!}
    \left(
      \sqrt{2\phi \left( 1 + \frac{1}{2\phi\mu^2} \right)}
    \right)^{-(x - \frac{1}{2})} \\
    &\phantom{=} \times K_{x - \frac{1}{2}} \left( \sqrt{\frac{2}{\phi}\left(1
          + \frac{1}{2\phi\mu^2}\right)} \right),
\end{align*}
$$

使用道具

80
tulipsliu 在职认证  发表于 2020-12-16 19:18:24 |只看作者 |坛友微信交流群
$$\begin{align*}
p_0 &= \exp\left\{
      \frac{1}{\phi\mu} \left(1 - \sqrt{1 + 2\phi\mu^2}\right)
    \right\} \\
    p_1 &= \frac{\mu}{\sqrt{1 + 2\phi\mu^2}}\, p_0 \\
    p_x &= \frac{2\phi\mu^2}{1 + 2\phi\mu^2} \left( 1 - \frac{3}{2x}
    \right) p_{x - 1} + \frac{\mu^2}{1 + 2\phi\mu^2} \frac{1}{x(x -
      1)}\, p_{x - 2}, \quad x = 2, 3, \dots.
\end{align*}
$$

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-4 14:02