[程序分享] 基于copula熵的时序因果发现及对北京空气污染问题的分析代码示例 [推广有奖]

11楼

tulipsliu

发表于 2021-3-6 12:49:57

for $i = 1,2, …, \tilde{n}$ and $r=0,1,…,m-1$, so that the first model is a Vector Logistic Smooth Transition AutoRegressive (VLSTAR) model. The ML estimator of θ is obtained by solving the optimization problem
$$
\hat{θ}_{ML} = arg \max_{θ}log L(θ)
$$

切换为英文输入法模式，写上这里的 LATEX 数学公式，前面两个美元符号开头，结尾是两个美元符号。

\hat{θ}_{ML} = arg \max_{θ}log L(θ)

上面的公式就可以展示出来了。

12楼

tulipsliu

发表于 2021-3-6 12:52:34

**Newton method**
The Newton method obtains the iterates based on the gradient $\nabla f$ and the Hessian ${\sf H}$ of the objective function $f(\mathbf{x})$ as follows:
$$\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - {\sf H}^{-1}(\mathbf{x}^{(k)})\nabla f(\mathbf{x}^{(k)})$$

* For the function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} -
  \mathbf{b}^T\log(\mathbf{x})$, the gradient and Hessian are given by
$$\begin{array}{ll}
\nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}\\
{\sf H}(\mathbf{x}) &= \boldsymbol{\Sigma} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2).
\end{array}$$

* For the function $f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} -
  \mathbf{b}^T\log(\mathbf{x})$, the gradient and Hessian are given by
$$\begin{array}{ll}
\nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x}/\sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}/\mathbf{x}\\
{\sf H}(\mathbf{x}) &= \left(\boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{x}\mathbf{x}^T\boldsymbol{\Sigma}/\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}\right) / \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2).
\end{array}$$

**Cyclical coordinate descent algorithm**
This method simply minimizes in a cyclical manner with respect to each element
of the variable $\mathbf{x}$ (denote $\mathbf{x}_{-i}=[x_1,\ldots,x_{i-1},0,x_{i+1},\ldots,x_N]^T$),
while helding the other elements fixed.

* For the function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} -
  \mathbf{b}^T\log(\mathbf{x})$, the minimization w.r.t. $x_i$ is
$$\underset{x_i>0}{\textsf{minimize}} \quad \frac{1}{2}x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i\log{x_i}$$
with gradient $\nabla_i f = x_i\boldsymbol{\Sigma}_{ii} + (\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i/x_i$.
Setting the gradient to zero gives us the second order equation
$$x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i = 0$$
with positive solution given by
$$x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+
4\boldsymbol{\Sigma}_{ii} b_i}}{2\boldsymbol{\Sigma}_{ii}}.$$

* The derivation for the function
$f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})$
follows similarly. The update for $x_i$ is given by
$$x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+
4\boldsymbol{\Sigma}_{ii} b_i \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}}}}{2\boldsymbol{\Sigma}_{ii}}.$$

13楼

tulipsliu

发表于 2021-3-6 12:54:27

The basis for Bayesian inference is derived from Bayes' theorem. Here is Bayes' theorem, equation \ref{bayestheorem}, again

$$\Pr(A | B) = \frac{\Pr(B | A)\Pr(A)}{\Pr(B)}$$

Replacing $B$ with observations $\textbf{y}$, $A$ with parameter set $\Theta$, and probabilities $\Pr$ with densities $p$ (or sometimes $\pi$ or function $f$), results in the following

$$
p(\Theta | \textbf{y}) = \frac{p(\textbf{y} | \Theta)p(\Theta)}{p(\textbf{y})}$$

where $p(\textbf{y})$ will be discussed below, p($\Theta$) is the set of prior distributions of parameter set $\Theta$ before $\textbf{y}$ is observed, $p(\textbf{y} | \Theta)$ is the likelihood of $\textbf{y}$ under a model, and $p(\Theta | \textbf{y})$ is the joint posterior distribution, sometimes called the full posterior distribution, of parameter set $\Theta$ that expresses uncertainty about parameter set $\Theta$ after taking both the prior and data into account. Since there are usually multiple parameters, $\Theta$ represents a set of $j$ parameters, and may be considered hereafter in this article as

$$\Theta = \theta_1,...,\theta_j$$

The denominator

$$p(\textbf{y}) = \int p(\textbf{y} | \Theta)p(\Theta) d\Theta$$

defines the ``marginal likelihood'' of $\textbf{y}$, or the ``prior predictive distribution'' of $\textbf{y}$, and may be set to an unknown constant $\textbf{c}$. The prior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} indicates what $\textbf{y}$ should look like, given the model, before $\textbf{y}$ has been observed. Only the set of prior probabilities and the model's likelihood function are used for the marginal likelihood of $\textbf{y}$. The presence of the marginal likelihood of $\textbf{y}$ normalizes the joint posterior distribution, $p(\Theta | \textbf{y})$, ensuring it is a proper distribution and integrates to one.