楼主: tulipsliu
8496 294

[学科前沿] [QuantEcon]MATLAB混编FORTRAN语言 [推广有奖]

101
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-16 20:05:26
Another way to state Bayes' theorem is

$$\Pr(A_i | B) = \frac{\Pr(B | A_i)\Pr(A_i)}{\Pr(B | A_i)\Pr(A_i) +...+ \Pr(B | A_n)\Pr(A_n)}$$

Let's examine our \textit{burning} question, by replacing $A_i$ with Hell or Heaven, and replacing $B$ with Consort


\item $\Pr(A_1) = \Pr(\mathrm{Hell})$
\item $\Pr(A_2) = \Pr(\mathrm{Heaven})$
\item $\Pr(B) = \Pr(\mathrm{Consort})$
\item $\Pr(A_1 | B) = \Pr(\mathrm{Hell} | \mathrm{Consort})$
\item $\Pr(A_2 | B) = \Pr(\mathrm{Heaven} | \mathrm{Consort})$
\item $\Pr(B | A_1) = \Pr(\mathrm{Consort} | \mathrm{Hell})$
\item $\Pr(B | A_2) = \Pr(\mathrm{Consort} | \mathrm{Heaven})$

102
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-16 20:05:51
The basis for Bayesian inference is derived from Bayes' theorem. Here is Bayes' theorem, equation \ref{bayestheorem}, again

$$\Pr(A | B) = \frac{\Pr(B | A)\Pr(A)}{\Pr(B)}$$

Replacing $B$ with observations $\textbf{y}$, $A$ with parameter set $\Theta$, and probabilities $\Pr$ with densities $p$ (or sometimes $\pi$ or function $f$), results in the following

$$
p(\Theta | \textbf{y}) = \frac{p(\textbf{y} | \Theta)p(\Theta)}{p(\textbf{y})}$$

where $p(\textbf{y})$ will be discussed below, p($\Theta$) is the set of prior distributions of parameter set $\Theta$ before $\textbf{y}$ is observed, $p(\textbf{y} | \Theta)$ is the likelihood of $\textbf{y}$ under a model, and $p(\Theta | \textbf{y})$ is the joint posterior distribution, sometimes called the full posterior distribution, of parameter set $\Theta$ that expresses uncertainty about parameter set $\Theta$ after taking both the prior and data into account. Since there are usually multiple parameters, $\Theta$ represents a set of $j$ parameters, and may be considered hereafter in this article as

$$\Theta = \theta_1,...,\theta_j$$

The denominator

$$p(\textbf{y}) = \int p(\textbf{y} | \Theta)p(\Theta) d\Theta$$

defines the ``marginal likelihood'' of $\textbf{y}$, or the ``prior predictive distribution'' of $\textbf{y}$, and may be set to an unknown constant $\textbf{c}$. The prior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} indicates what $\textbf{y}$ should look like, given the model, before $\textbf{y}$ has been observed. Only the set of prior probabilities and the model's likelihood function are used for the marginal likelihood of $\textbf{y}$. The presence of the marginal likelihood of $\textbf{y}$ normalizes the joint posterior distribution, $p(\Theta | \textbf{y})$, ensuring it is a proper distribution and integrates to one.

103
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 09:39:42
The core of models implemented in \pkg{brms} is the prediction of the response $y$ through predicting all parameters $\theta_p$ of the response distribution $D$, which is also called the model \code{family} in many R packages. We write
$$y_i \sim D(\theta_{1i}, \theta_{2i}, ...)$$
to stress the dependency on the $i\textsuperscript{th}$ observation. Every parameter $\theta_p$ may be regressed on its own predictor term $\eta_p$ transformed by the inverse link function $f_p$ that is $\theta_{pi} = f_p(\eta_{pi})$\footnote{A parameter can also be assumed constant across observations so that a linear predictor is not required.}. Such models are typically refered to as \emph{distributional models}\footnote{The models described in \citet{brms1} are a sub-class of the here described models.}. Details about the parameterization of each \code{family} are given in \code{vignette("brms\_families")}.

Suppressing the index $p$ for simplicity, a predictor term $\eta$ can generally be written as
$$
\eta = \mathbf{X} \beta + \mathbf{Z} u + \sum_{k = 1}^K s_k(x_k)
$$
In this equation, $\beta$ and $u$ are the coefficients at population-level and group-level respectively and $\mathbf{X}, \mathbf{Z}$ are the corresponding design matrices. The terms $s_k(x_k)$ symbolize optional smooth functions of unspecified form based on covariates $x_k$ fitted via splines (see \citet{wood2011} for the underlying implementation in the \pkg{mgcv} package) or Gaussian processes \citep{williams1996}. The response $y$ as well as $\mathbf{X}$, $\mathbf{Z}$, and $x_k$ make up the data, whereas $\beta$, $u$, and the smooth functions $s_k$ are the model parameters being estimated. The coefficients $\beta$ and $u$ may be more commonly known as fixed and random effects, but I avoid theses terms following the recommendations of \citet{gelmanMLM2006}. Details about prior distributions of $\beta$ and $u$ can be found in \citet{brms1} and under \code{help("set\_prior")}.

104
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 09:40:15
As an alternative to the strictly additive formulation described above, predictor terms may also have any form specifiable in Stan. We call it a \emph{non-linear} predictor and write
$$\eta = f(c_1, c_2, ..., \phi_1, \phi_2, ...)$$
The structure of the function $f$ is given by the user, $c_r$ are known or observed covariates, and $\phi_s$ are non-linear parameters each having its own linear predictor term $\eta_{\phi_s}$ of the form specified above. In fact, we should think of non-linear parameters as placeholders for linear predictor terms rather than as parameters themselves. A frequentist implementation of such models, which inspired the non-linear syntax in \pkg{brms}, can be found in the \pkg{nlme} package \citep{nlme2016}.

105
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 09:40:42
While some non-linear relationships, such as quadratic relationships, can be expressed within the basic R formula syntax, other more complicated ones cannot. For this reason, it is possible in \pkg{brms} to fully specify non-linear predictor terms similar to how it is done in \pkg{nlme}, but fully compatible with the extended multilevel syntax described above. Suppose, for instance, we want to model the non-linear growth curve
$$
y = b_1 (1 - \exp(-(x / b_2)^{b_3})
$$
between $y$ and $x$ with parameters $b_1$, $b_2$, and $b_3$ (see Example 3 in this paper for an implementation of this model with real data). Furthermore, we want all three parameters to vary by a grouping variable $g$ and model those group-level effects as correlated. Additionally $b_1$ should be predicted by a covariate $z$. We can express this in \pkg{brms} using multiple formulas, one for the non-linear model itself and one per non-linear parameter:

106
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 09:59:36
While some non-linear relationships, such as quadratic relationships, can be expressed within the basic R formula syntax, other more complicated ones cannot. For this reason, it is possible in \pkg{brms} to fully specify non-linear predictor terms similar to how it is done in \pkg{nlme}, but fully compatible with the extended multilevel syntax described above. Suppose, for instance, we want to model the non-linear growth curve
$$
y = b_1 (1 - \exp(-(x / b_2)^{b_3})
$$

107
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 10:45:05
Model
The Bayesian one-sample t-test makes the assumption that the observations are normally distributed with mean $\mu$ and variance $\sigma^2$. The model is then reparametrized in terms of the standardized effect size $\delta = \mu/\sigma$. For the standardized effect size, a Cauchy prior with location zero and scale $r = 1/\sqrt{2}$ is used. For the variance $\sigma^2$, Jeffreys's prior is used: $p(\sigma^2) \propto 1/\sigma^2$.
  
In this example, we are interested in comparing the null model $\mathcal{H}_0$, which posits that the effect size $\delta$ is zero, to the alternative hypothesis $\mathcal{H}_1$, which assigns $\delta$ the above described Cauchy prior.
  

108
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 10:46:43
Model and Data
The model that we will use assumes that each of the $n$ observations $y_i$ (where $i$ indexes the observation, $i = 1,2,...,n$) is normally distributed with corresponding mean $\theta_i$ and a common known variance $\sigma^2$: $y_i \sim \mathcal{N}(\theta_i, \sigma^2)$. Each $\theta_i$ is drawn from a normal group-level distribution with mean $\mu$ and variance $\tau^2$: $\theta_i \sim \mathcal{N}(\mu, \tau^2)$. For the group-level mean $\mu$, we use a normal prior distribution of the form $\mathcal{N}(\mu_0, \tau^2_0)$. For the group-level variance $\tau^2$, we use an inverse-gamma prior of the form $\text{Inv-Gamma}(\alpha, \beta)$.

In this example, we are interested in comparing the null model $\mathcal{H}_0$, which posits that the group-level mean $\mu = 0$, to the alternative model $\mathcal{H}_1$, which allows $\mu$ to be different from zero. First, we generate some data from the null model:

109
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 10:47:25
In this vignette, we explain how one can bridge sampling can be performed when we are faced with parameter spaces that are in some way non-standard. We will look at simplex parameters and circular parameters.

Simplex parameters are encountered often, in particular in mixture models or when modeling compositional data, where a set of parameters $\theta_1, \dots, \theta_k$ is used that are constrained by $0 \leq \theta \leq 1$ and $\sum_{j=1}^k \theta_j = 1$.  This happens often when we use relative weights of several components, or when we model proportions or probabilities.

Circular parameters are angles that lie on the circle, that is, the parameters are given in degrees ($0^\circ - 360^\circ$) or radians ($0 - 2\pi$). The core property of this type of parameter space is that it is periodical, that is, for example $\theta = 0^\circ = 360^\circ.$ Another way to think of such parameters is as two-dimensional unit vectors, $\boldsymbol{x} = \{x_1, x_2\}$, which are constrained by $\sqrt{x_1^2 + x_2^2} = 1$.

110
tulipsliu(未真实交易用户) 在职认证  发表于 2020-12-17 10:50:40
Model and Data
The model that we will use assumes that each of the $n$ observations $y_i$ (where $i$ indexes the observation, $i = 1,2,...,n$) is normally distributed with corresponding mean $\theta_i$ and a common known variance $\sigma^2$: $y_i \sim \mathcal{N}(\theta_i, \sigma^2)$. Each $\theta_i$ is drawn from a normal group-level distribution with mean $\mu$ and variance $\tau^2$: $\theta_i \sim \mathcal{N}(\mu, \tau^2)$. For the group-level mean $\mu$, we use a normal prior distribution of the form $\mathcal{N}(\mu_0, \tau^2_0)$. For the group-level variance $\tau^2$, we use an inverse-gamma prior of the form $\text{Inv-Gamma}(\alpha, \beta)$.

In this example, we are interested in comparing the null model $\mathcal{H}_0$, which posits that the group-level mean $\mu = 0$, to the alternative model $\mathcal{H}_1$, which allows $\mu$ to be different from zero. First, we generate some data from the null model:

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-23 07:34