[分享]一本好书的gauss代码（BAYESIAN INFERENCE IN DYNAMIC ECONOMETRIC MODELS） [推广有奖]

21楼

xuehe(未真实交易用户) 发表于 2014-3-21 10:46:47

Dynamic Regression Models

Luc Bauwens

Michel Lubrano

Jean-François Richard

DOI:10.1093/acprof:oso/9780198773122.003.0005

Abstract and Keywords

This chapter examines the application of the dynamic regression models for inference and prediction with dynamic econometric models. It shows how to extend to the dynamic case the notion of Bayesian cut seen in the static case to justify conditional inference. The chapter also explains how Bayesian inference can be used for single-equation dynamic models. It discusses the particular case of models with autoregressive errors, discusses the issues of moving average errors, and illustrates the empirical use of the error correction model by an analysis of a money demand function for Belgium.

Keywords: dynamic regression models, econometric models, Bayesian inference, single-equation models, autoregressive errors, moving average errors, error correction, money demand

5.1 Introduction

The previous chapters have introduced all the essential tools of Bayesian analysis. Beyond this, our purpose in the rest of this book is to explain and illustrate how these tools can be used for inference and prediction with dynamic econometric models. This class of models is obviously very large, but stochastic difference equations that are linear in the variables (although not necessarily in the parameters) have been intensively used by econometricians for the last 20 years or so. Their justification, which is to a large extent due to their relative empirical success in economics, has even been grounded by the statistical theory of ‘reduction of dynamic experiments’ of Florens and Mouchart (1982, 1985a, 1985b) as explained in Section 5.2. We show how to extend to the dynamic case the notion of Bayesian cut seen in the static case to justify conditional inference, how to take account of non-stationarity in the Bayesian approach, and how to treat initial conditions which necessarily occur in dynamic models. In Section 5.3, we explain how Bayesian inference can be used for single-equation dynamic models and particularly a popular reparameterization known as the error correction model after Hendry and Richard (1982). In Section 5.4, we treat the particular case of models with autoregressive errors, and in Section 5.5, we discuss the specific issues of moving average errors. Finally, in Section 5.6, we illustrate the empirical use of the error correction model by an analysis of a money demand function for Belgium.

5.2 Statistical Issues Specific to Dynamic Models

Broadly speaking, a model is dynamic every time the variables are indexed by time and appear with different time lags. For instance, γt = β0xt + β1xt−1 + ut is a simple dynamic model, called a distributed lag model. Here the dynamic structure appears on the exogenous variables. It can also appear on the endogenous variables with for instance yt = αyt−1 + ut which is an autoregressive (AR) model. Finally the dynamic structure can also appear on the error process as considered in Sections 5.4 and 5.5. In Chapter 8, we consider models where the dynamic structure is non-linear in the lags of the dependent variable, whereas it is linear in the models of this chapter. In Chapter 7, we consider a different class of models, where the dynamic structure is put on the variance of the error process. In Chapter 9, we consider linear dynamic systems of equations.

(p.130) 5.2.1 Reductions: Exogeneity and Causality

Reductions by marginalization or by conditioning were introduced in Section 2.5 quite generally. No attention was paid to the issue of whether the model might be dynamic rather than static. The notions of cut and of exogeneity were defined relative to a sample of size T, i.e. they were global notions. In a static model (i.e. of independent observations), it does not make any difference if the cut is defined for the complete sample or for each observation. In the sequel of this book, we consider dynamic data generating processes, i.e. processes where the generated random variable is indexed by time. The function which associates t to xt is called the trajectory of the process and the ordered collection of observations the history of the process. It is convenient to note

Dynamic Regression Models
The first observation x0 plays a specific role and is called the initial condition of the process. It represents presample information, the state of the system when it begins to be observable. The model can be characterized by its data density f(X T1 |x0,θ), by conditioning on the initial value x0 (other approaches are discussed in Subsection 5.2.3). For model building, it is convenient to consider the data density from the point of view of sequential analysis which is based on the analysis of the generating process of an observation xt conditional on the past and on the initial condition. This amounts to considering the following factorization:

Dynamic Regression Models
In a sequential model like (5.2) it can be made apparent how prior information is revised by the arrival of a new observation. We can now introduce the definition of a sequential cut.

Definition 5.1

Let us consider a reparameterization of θ in α and β and α partition of xt in yt and zt. A Bayesian sequential cut is obtained if α and β are a priori independent and if

Dynamic Regression Models
An immediate consequence of a sequential cut is that the two members of the likelihood function (5.4) can be treated separately for inference as we shall see below. We have the following theorem given in Florens and Mouchart (1985a).

Theorem 5.2

If α, β, and zt operate a Bayesian sequential cut, then α and β are a posteriori independent.

Engle, Hendry, and Richard (1983) call this type of exogeneity weak exogeneity. In a dynamic model, there are subtleties due to the occurrence of lagged (p.131) variacbles. In (5.3), the first product does not represent the sampling density of (Y1T|Z1T,x0) and the second product is not the sampling density of (Z1T|x0). Therefore, at the phase of model building, a sequential cut is not a sufficient condition to separate the generating process into two subprocesses that could be specified separately. Because we are in a dynamic framework, we have to introduce a new definition, considering what we can call a global cut (called an initial cut by Florens and Mouchart 1985a).

Definition 5.3

Let us consider a reparameterization of θ in α and β and α partition of xt in yt and zt. A Bayesian global cut is obtained if α and β are a priori independent and if the data density can be factorized as

Dynamic Regression Models
We must point out that the parameters α and β are not necessarily the same in the sequential and in the global cut. A global cut introduces a restriction on the marginal process of zt which is

Dynamic Regression Models
This means that the past of yt is of no use for predicting zt. This is the notion of non-causality due to Granger (1969). When both a sequential and a global cut hold, we have the notion of strong exogeneity introduced by Engle, Hendry, and Richard (1983).

Definition 5.4

Let us consider a stochastic process in xt indexed by θ and α partition of xt in yt and zt. The variable zt is said to be strongly exogenous if the reparameterization of θ into α and β operates a sequential cut (zt is weakly exogenous for inference on α) and yt does not Granger-cause zt.

If there is weak exogeneity, the only part of the data density that is relevant for inference on α is the first product in (5.4), which is indeed the likelihood kernel of α. The posterior density of α is obtained by

Dynamic Regression Models
The important thing to notice in (5.6) is that we do not need the second product in (5.3) even though it depends on y. We do not need to specify the marginal density f(zt|X0t−l,β) for inference on α.

For predictive inference on y given z, weak exogeneity is not sufficient. Let us start from the predictive density of X1T which is

Dynamic Regression Models
(p.132) If there is a global cut, this becomes

Dynamic Regression Models
because

Dynamic Regression Models
From (5.8) we see that we can forget f (Z1T|x0,β) and φ(β) to compute the predictive density of y given z. However, if there is only weak exogeneity, the first product in (5.3) is not the conditional density f (Y1T|Z1T,x0,α) that we need to compute f (Y1T|Z1T,x0), because the second product in (5.3) depends on y. So in dynamic models, weak exogeneity is necessary and sufficient for posterior inference as in static models, but strong exogeneity is necessary for prediction.

5.2.2 Reduction of a VAR Model to an ADL Equation

A VAR (Vector autoregressive) model with independent normal error terms is a commonly used representation for a dynamic multivariate stochastic process. Let us consider the k-dimensional random variable xt. The VAR model is noted:

[Ik−A(L) ]xt = xt (5.10)
where A(L) is a matrix of lag polynomials of order p (without a term of degree 0 in L):

A(L) = A1L + A2L2 + … + ApLp (5.11)
and vt ~ Nk(0,Σ). For simplicity, we do not introduce deterministic variables in (5.10). In what follows, we assume that the initial conditions x0 … x−p are known, but we do not write them explicitly as conditioning variables. So we note simply Xt−1 for the past of xt, including the initial conditions. The VAR model has gained important popularity with the work of Sims (1980). Because it requires many parameters and therefore observations, and it often lacks a ‘structural’ interpretation, econometricians are interested in admissible reductions of this model. We can partition xt and Σ conformably in

Dynamic Regression Models
where yt is a scalar and zt has k − 1 elements. This partition is done because we wish to find a regression equation where yt is the explained variable and zt are the explicative variables. We continue by proposing the conformable partitioning of A(L) in

Dynamic Regression Models
(p.133) We can factorize the normal distribution of xt

xt|Xt−1 ~ Nk(A(L)xt,Σ) (5.14)
into the marginal distribution of zt

zt|Xt−1 ~ Nk−1(Az(L)xt, Σzz) (5.15)
and the conditional distribution of yt|zt

yt|zt,Xt−1 ~ N(c′zt + b(L)′xt,σ2), (5.16)
where

c = Σzz−1Σzy,

b(L) = Ay(L)−c′Az(L), (5.17)

σ2 = Σyy − ΣyzΣzz−1Σzy
A sequential cut is obtained if we define the parameters α and β introduced in (5.3) as

α = [c, b(L), σ2],

β = [Az(L),Σzz].(5.18)
We assume prior independence between α and β. From (5.15) and (5.16), we see that z is weakly exogenous for α without further restrictions, because (5.3) holds automatically in the VAR model. For strong exogeneity, however, we need the restriction of Granger non-causality:

Azy(L)=0, (5.19)
so that lagged values of yt do not appear in the marginal model (5.15). As weak exogeneity is automatically satisfied, the conditional model (5.16) can be analysed independently of the marginal model (5.15). This leads to the regression equation

yt = c′zt + b(L)′xt + ut, (5.20)
where ut ~ N(0,σ2). Introducing the partition

b(L) = (by(L)bz(L)′) (5.21)
we can express (5.20) as

yt = by(L)yt + c′ zt + bz(L)′zt + ut. (5.22)
Inference in this type of dynamic regression model, called the ADL (Autoregressive Distributed Lag) model is studied in the next section.

(p.134) As in the static case, the (weak) exogeneity property in the dynamic case is a direct consequence of the properties of the multivariate normal distribution. Suppose now that we have incidental parameters, so that (5.10) becomes

[Ik − A(L)) (xt − μt) = vt. (5.23)
Let us partition the incidental mean vector μt as xt in (5.12) and let us assume that μt is constrained by the linear relation

μyt = cˉ′ μzt. (5.24)
We have the joint distribution of xt

xt|Xt−1 ~ Nk(μt + A(L) (xt − μt), Σ), (5.25)
which factorizes into the marginal model

zt|Xt−1 ~ Nk−1(μzt + Az(L) (xt − μt), Σzz), (5.26)
and, given (5.24), the conditional model

Dynamic Regression Models
The parameters of (5.27) are defined by (5.17). To obtain a conditional model without incidental parameters, we need the restriction

c = cˉ (5.28)
to eliminate the term (cˉ − c)′μzt, and, given (5.28) and (5.24),

Ayz(L) − cˉ ′Azz(L) = Ayy(L) cˉ ′ − cˉ ′Azy(L) cˉ ′ (5.29)
to eliminate the term b(L)′μt. With these restrictions, if we define the parameters α and β of (5.4) by

α = [cˉ,b(L), σ2] (5.30)

β = [Az(L),Σzz, μ1,…,μT],
z is weakly exogenous for α. Imposing (5.19) also gives strong exogeneity.

5.2.3 Treatment of Initial Observations

In a dynamic model, the distribution of the initial observations (hereafter denoted y0 plays a special role.

22楼

xuehe(未真实交易用户) 发表于 2014-3-21 10:47:32

no way.电子版只能图书馆网络看，没有pdf

23楼

xuehe(未真实交易用户) 发表于 2014-3-21 10:48:18

人大经济论坛粘贴内容还很有限，落伍于大数据时代了

24楼

xuehe(未真实交易用户) 发表于 2014-3-21 10:50:37

Systems of Equations

Luc Bauwens

Michel Lubrano

Jean-François Richard

DOI:10.1093/acprof:oso/9780198773122.003.0009

Abstract and Keywords

This chapter aims to review how Bayesian inference can be applied to some of the so-called systems of equations models. These models can be defined in several forms including multivariate regression models, vector autoregressive (VAR) models, simultaneous equation models (SEM), and systems of seemingly unrelated regression equation (SURE) models. This chapter analyses VAR models which are formally equivalent to multivariate regression models and suggests that VAR models can be either open or closed depending on whether exogenous variables are included or not.

Keywords: Bayesian inference, systems of equation models, multivariate regression models, VAR models, SEM, SURE models

9.1 Introduction

Systems of equations can be defined in several forms: multivariate regression models, systems of seemingly unrelated regression equations (SURE), vector autoregressive (VAR) models, possibly with moving average components, simultaneous equation models (SEM). This chapter aims to review how Bayesian inference can be applied to some of these models. Section 9.2 covers VAR models which are formally equivalent to multivariate regression models which may be subject to parametric restrictions. When the variables modeled by the VAR are stationary after differencing them once, there arises the interesting possibility of cointegration between their levels. This type of model, also called VECM (vector error correction mechanism) model is studied in Section 9.3. VAR models can be open or closed, depending on whether exogenous variables are included or not. In the former case, the model can be considered as the reduced form of a system of simultaneous equations. The analysis of such ‘structural’ open VAR models is not undertaken in this book, but a brief guide to the literature on this topic is provided in the last section.

9.2 VAR Models

9.2.1 Unrestricted VAR Models and Multivariate Regression

In Section 5.2, we show how to reduce a VAR model having normal errors to a dynamic regression equation. Bayesian analysis of the normal VAR system is relatively straightforward, if we condition on the initial observations. It generalizes the analysis of a regression equation (Section 2.7). We write the VAR as

A(L)xt = c + εt (9.1)
where

A(L) = In − A1L − A2L2 − … −APLP (9.2)
is a polynomial of degree p in the lag operator, and εt ~ Nn(0, Σ) and independent of εs for s ≠ t. In (9.1), we have included a vector of intercepts (c), and we could include other terms like a trend term and dummy variables. At the present stage we assume that no restrictions are imposed on the parameters c and Ai, implying that all equations of the system have the same explanatory variables (p  (p.266) lags of each variable in xt). The VAR system is then in the form of a mutivariate regression model written as

yt=B′zt + εt, εt~ INn(0,Σ) (9.3)
where yt, zt, and B are of dimension n × 1, k × 1, and k × n, respectively. The VAR model (9.1) corresponds to

yt = xt

zt = (1 x′t−1 x′t−2… x′t−p)′

B′ = (c A1 A2 … Ap)

k = (n × p) + 1.
The matrix version of (9.3) for T observations (plus p initial ones in the VAR case) is

Y = ZB + E, E ˜ MNT×n(0, Σ ⊗ IT) (9.5)
where Y, Z, and E are obtained by stacking the row vectors y′t, z′t, and ε′t, respectively, and M N denotes a matricvariate normal distribution as defined by (A.60). The following theorem provides the main posterior results of the Bayesian analysis of (9.3) with a non-informative prior density. The non-informative prior density is proportional to the square root of the determinant of the information matrix block of Σ (Jeffreys' prior); its computation must take care of the symmetry restrictions on Σ (see Richard 1975). This prior can also be obtained as the kernel of an IWn(S,v) (Inverted Wishart) density—see (A.96)—when the parameters take the particular values S = 0 and v = 0, which are on the boundary of the parameter space.

Theorem 9.1 Under the non-informative prior

Systems of Equations (9.6)
the posterior densities of the parameters of (9.5) are given by

Systems of Equations 9.7
where

Systems of Equations (9.8)

Proof Using (A.60), the posterior density is

Systems of Equations (9.9)

The first part of (9.7) follows by application of Theorem A.19. Posterior moments follow from the properties of the matricvariate Student distribution for B, see  (p.267) Subsection A.2.7, and of the inverted Wishart distribution for Σ, see A.2.6.

□

We have not given Theorem 9.1 for the case of a natural conjugate prior, because the latter is too restrictive to be useful: the prior covariance matrix of every column of B has the same correlation structure, since it is proportional to the same matrix (the prior counterpart of (Z′Z)–l). We refer to Drèze and Richard (1983: Section 4) for more details and additional results (in particular for the definition of an extended natural conjugate prior which overcomes the inherent restrictiveness of the natural conjugate prior). Typically, the practical definition of an informative prior density for this type of model is difficult since it involves the choice of a large number of parameters. A particular informative prior has been defined by Litterman for VAR models and is explained below.

9.2.2 Restricted VAR Models and SURE Models

It may happen that the parameters of a VAR are subject to restrictions such that the explanatory variables are not the same in all equations. For example, in the partition defined by (5.18), the Granger non-causality restriction (5.26) implies that the lags of some variables do not appear in some equations while they do in the other ones. Another example is the inclusion of a linear trend or seasonal dummy variables in some equations but not in others. In such a case, the matrix B in (9.3) is subject to zero restrictions. These restrictions may complicate the derivation of the posterior results under the prior (9.6) (when B is restricted, this prior is meant to bear on the unrestricted parameters). In the proof of Theorem 9.1, the last line of (9.9) may not be true, i.e. the posterior density of Bc (the restricted B) is not necessarily (and typically is not) a matricvariate normal distribution with the given expectation and covariance matrix. It is if the structure of the restrictions is such that we can apply (A.68) or one of its particular cases (A.64) and (A.66) (where X stands for Bc). Otherwise, the only thing we can state is that the posterior density of Bc is a conditional density of a matricvariate Student density, i.e. we know the form of its kernel, but not its properties. It can be expressed as Systems of Equations(9.10) Posterior results have to be computed by Monte Carlo integration. Bauwens (1984) and Richard and Steel (1988) used importance sampling.

The VAR model (9.1) can be cast in the form of a SURE model, which is a set of regression equations (possibly with different regressors) whose error terms are correlated. This way of considering the VAR is especially useful when its parameters are restricted as described above. The SURE model can be written as

Yi = Ziβi + Ei, i = 1, …, n (9.11)
where Yi, Zi, and βi are of dimension T × 1, T × ki, and ki × l respectively. In compact matrix format, we write  (p.268) Systems of Equations (9.12) where Systems of Equations (9.13) and Systems of Equations (9.14) The distribution of the Tn × 1 vector ε is assumed to be NTn (0,Σ ⊗ IT); this is actually the same hypothesis as in (9.5) since ε = vec E. Another useful expression of the system (9.12) is

Y = WBc + E (9.15)
with Y and E as in (9.5) and

W = (Z1 Z2 … Zn) Systems of Equations (9.16)
The matrix W is not of full column rank if some equations share the same explanatory variables (e.g. a constant). An example of a bivariate VAR with one lag and non-causality of x2 for x1 that can be put easily in the form of (9.12) or (9.15) is:

x1,t = β11x1,t−1 +c1 + ε1,t

x2,t = β21x1,t−l + β22 x2,t−1 + c2+ ε2,t
Posterior marginal densities for the SURE model are not available analytically, but full conditional densities of β and Σ are available and can be used to define a Gibbs sampling algorithm with two blocks (see Subsection 3.4.3), as in Percy (1992).

Theorem 9.2 Under the non-informative prior

φ(β, Σ) ∝ |Σ|−(n+1)/2
the following conditional posterior densities of the parameters of (9.12) or (9.15) are available:  (p.269)

Systems of Equations (9.17)
where k = Σni=1 and

Systems of Equations (9.18)

Proof Conditionally on Σ, (9.12) is a normal linear regression model, so that

Systems of Equations
where s = y′ (Σ−1 ⊗IT)y −

Systems of Equations
'Ƶ′(Σ−1 ⊗IT)Ƶ

Systems of Equations.
The posterior conditional density of β\Σ follows directly. To obtain the complementary conditional