抽样问题 - 第2页 - 经管之家

11楼

ywh19860616 发表于 2010-10-18 10:35:52

#epoh,谢谢您的热心帮助

12楼

zhangtao 发表于 2010-10-18 21:52:36

ywh19860616 朋友，能不能把你在9楼所附文件的完整文章传上来，我学习学习，非常感谢！
要是不方便，就算了，还是非常感谢你！

13楼

ywh19860616 发表于 2010-10-19 13:21:28

zhangtao兄弟，完整文章我就不上传了，同学的
您可以搜索，作者：林馨怡，国立政治大学经济学系

14楼

zhangtao 发表于 2010-10-19 16:04:04

非常非常感谢！

15楼

epoh 发表于 2010-10-19 17:00:46

Block Bootstrap:resample blocks instead of single elements.
                        the dependence structure is preserved.

##An illustration of "Moving Block Bootstrap";
##the blue points in the three yellow rectangles
##(blocks) are bootstrap samples.

x = 5 * sin(seq(0, pi, length = 90)) + rnorm(90)
plot(x, main = "Illustration of Moving Block Bootstrap (MBB)")
for (idx in 1:(length(x) - 30 + 1)) {
rect(idx, min(x[idx:(idx + 30 - 1)]), idx + 30 - 1, max(x[idx:(idx +
30 - 1)]), border = "gray")
Sys.sleep(0.2)
}
bt = sample(1:(length(x) - 30 + 1), 3, rep = T)
for (b in bt) {
rect(b, min(x[b:(b + 30 - 1)]), b + 30 - 1, max(x[b:(b +
30 - 1)]), col = "yellow")
points(b:(b + 30 - 1), x[b:(b + 30 - 1)], col = "blue", pch = 19)
}

##optimal Block-Length Selection
Automatic Block-Length Selection for the Dependent Bootstrap
http://econ.duke.edu/~ap172/Politis_White_2004.pdf

CORRECTION TO “AUTOMATIC BLOCK-LENGTH SELECTION...."
http://econ.duke.edu/~ap172/Patton_Politis_White_2009.pdf

## ppw.R
## $Id: ppw.R,v 1.47 2008/12/12 14:52:17 jracine Exp jracine $

## Original code in Matlab by A. Patton, R translation and
## modifications by C. Parmeter and J. Racine.
##
## We are grateful to Andrew Patton and Dimitris Politis for their
## assistance and feedback. Kindly report features, deficiencies, and
## improvements to racinej@mcmaster.ca.
##
## The citation is A. Patton, D.N. Politis, and H. White (2008,
## forthcoming), "CORRECTION TO `Automatic Block-Length Selection for
## the Dependent Bootstrap' by D.N. Politis and H. White". This is
## based on the article by Politis, D.N., and H. White (2004),
## "Automatic block-length selection for the dependent bootstrap."
## Econometric Reviews, vol. 23.
##
## INPUTS:  data, an n x k matrix.
##
## OUTPUTS: b.star, a 2 x k vector of optimal bootstrap block lengths
## for the stationary bootstrap and circular bootstrap (BstarSB,
## BstarCB).

## The function lam() is used to construct a "flat-top" lag window for
## spectral estimation based on Politis, D.N. and J.P. Romano (1995),
## "Bias-Corrected Nonparametric Spectral Estimation", Journal of Time
## Series Analysis, vol. 16, No. 1.

lam <- function(s){
  return((abs(s)>=0)*(abs(s)<0.5)+2*(1-abs(s))*(abs(s)>=0.5)*(abs(s)<=1))
}

## The function b.star() returns the optimal bootstrap block
## lengths. Note that an example for usage appears at the bottom of
## this file. If you use this function as input into a routine such as
## tsboot() in the boot library (Angelo Canty and Brian Ripley
## (2008). boot: Bootstrap R (S-Plus) Functions. R package version
## 1.2-34.) you ought to use the option round=TRUE.

b.star <- function(data,
               Kn = NULL,
               mmax= NULL,
               Bmax = NULL,
               c = NULL,
               round = FALSE){

  ## Convert the data object to a data frame to handle both vectors
  ## and matrices.

  data <- data.frame(data)
  n <- nrow(data)
  k <- ncol(data)

  ## Set Defaults. Note that in footnote c, page 59, for Kn Politis
  ## and White (2004) use max(5,log10(n)). Since this must be an
  ## integer we use ceiling(log10(n)).

  if (is.null(Kn)) Kn <- max(5,ceiling(log10(n)))
  if (is.null(mmax)) mmax <- ceiling(sqrt(n))+Kn
  if (is.null(Bmax)) Bmax <- ceiling(min(3*sqrt(n),n/3))
  if (is.null(c)) c <- qnorm(0.975)

  ## Create two vectors of length k in which we store results.

  BstarSB <- numeric(length=k)
  BstarCB <- numeric(length=k)

  ## Now we loop through each variable in data (i.e., column,
  ## data[,i]).

  for(i in 1:k) {

## We first obtain the autocorrelations rho(1),...,rho(mmax) (we
## need to drop the first autocorrelation as it is rho(0), hence
## acf[-1]). This is the default in acf [type="correlation"]. Note
## that Patton uses sample correlations after dropping the first
## mmax observations, while we instead use the acf to obtain
## rho(k).

rho.k <- acf(data[,i],
               lag.max = mmax,
               type = "correlation",
               plot = FALSE)$acf[-1]

## Next we compute mhat. The use of c*sqrt(log10(n)/n) for
## critical values is given in footnote c of Politis and White
## (2004, page 59), and the approach for determining mhat is
## described in footnote c.

rho.k.crit <- c*sqrt(log10(n)/n)

## Compute the number of insignificant runs following each rho(k),
## k=1,...,mmax.

num.insignificant <- sapply(1:(mmax-Kn+1),
                              function(j){
                              sum((abs(rho.k) < rho.k.crit)[j:(j+Kn-1)])
                              })

## If there are any values of rho(k) for which the Kn proceeding
## values of rho(k+j), j=1,...,Kn are all insignificant, take the
## smallest rho(k) such that this holds (see footnote c for
## further details).

if(any(num.insignificant==Kn)) {
   mhat <- which(num.insignificant==Kn)[1]
} else {

   ## If no runs of length Kn are insignificant, take the smallest
   ## value of rho(k) that is significant.

   if(any(abs(rho.k) > rho.k.crit)) {

      lag.sig <- which(abs(rho.k) > rho.k.crit)
      k.sig <- length(lag.sig)

      if(k.sig == 1) {

      ## When only one lag is significant, mhat is the sole
      ## significant rho(k).

      mhat <- lag.sig

      } else {

      ## If there are more than one significant lags but no runs
      ## of length Kn, take the largest value of rho(k) that is
      ## significant.

      mhat <- max(lag.sig)

      }

   } else {

      ## When there are no significant lags, mhat must be the
      ## smallest positive integer (footnote c), hence mhat is set
      ## to one.

      mhat <- 1

   }

}

## Compute M (mhat is at least one).

M <- ifelse(2*mhat > mmax, mmax, 2*mhat)

## We compute BstarSB and BstarCB using the formulas in the above
## references. Now we require the autocovariance R(k) (hence
## type="covariance" in the acf call). Note that Patton uses
## sample covariances after dropping the first mmax observations,
## while we instead use the acf with type="covariance" to obtain
## R(k). Note also that we require R(0) hence we do not drop it as
## we did for rho(k) via acf(...)$acf[-1].

kk <- seq(-M,M)

R.k <- ccf(data[,i], data[,i],
            lag.max = M,
            type = "covariance",
            plot = FALSE)$acf

Ghat <- sum(lam(kk/M)*abs(kk)*R.k)
DCBhat <- 4/3*sum(lam(kk/M)*R.k)^2
DSBhat <- 2*sum(lam(kk/M)*R.k)^2
BstarSB <- ((2*Ghat^2)/DSBhat)^(1/3)*n^(1/3)
BstarCB <- ((2*(Ghat^2)/DCBhat)^(1/3))*(n^(1/3))

  }

  ## The user can choose whether they want rounded values returned or
  ## not. BstarCB is rounded up, BstarSB simply rounded but both must
  ## be positive integers.

  if(round == FALSE) {

BstarSB <- ifelse(BstarSB > Bmax, Bmax, BstarSB)
BstarCB <- ifelse(BstarCB > Bmax, Bmax, BstarCB)

  } else {

BstarSB <- ifelse(BstarSB > Bmax, Bmax, ifelse(BstarSB < 1, 1, round(BstarSB)))
BstarCB <- ifelse(BstarCB > Bmax, Bmax, ifelse(BstarCB < 1, 1, ceiling(BstarCB)))

  }

  return(cbind(BstarSB,BstarCB))

}

## Here is a simple example with an n x 2 matrix containing n=10^5
## observations, where column 2 of x is more persistent than column
## 1. This requires that you first install the forecast library (i.e.,
## install.packages("forecast")).
##
##  library(forecast)
##  set.seed(123)
##  x <- cbind(arima.sim(n = 100000, list(ar = c(.5,.0), ma = c(0,0)),sd = 1),
##          arima.sim(n = 100000, list(ar = c(.5,.4), ma = c(0,0)),sd = 1))
##  b.star(x)
##  b.star(x,round=TRUE)
##> b.star(x)
##    BstarSB BstarCB
##[1,]  50.39272  57.68526
##[2,] 251.62894 288.04323
##> b.star(x,round=TRUE)
##    BstarSB BstarCB
##[1,]    50    58
##[2,]    252    289

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	谢谢，这个程序我也看过，但是好像不适合面板数据选择最优block。

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

16楼

ywh19860616 发表于 2010-10-19 17:52:06

#epoh
我知道这个是自动选择分块数的程序b.star。但是对于面板数据，解释变量多于1个时，好像不适用
您看这个结果：
##    BstarSB BstarCB
##[1,]    50    58
##[2,]    252    289
对应2个解释变量，有不同分法的，这应该是不可以的。因为我数据y x1 x2肯定是放在一起抽样的，才会对应
而不是分开一个一个抽样。您觉得我的说法对吗？
问题：
如果我想把这个程序用于我的问题，应该怎么解决？
如果这个程序不能用，我的数据是以年度为单位的。block_length应该怎么选择为好，是直接选择等于截面个数吗？我觉得如果直接选择截面个数了，那就相当于在每个截面分别抽取个数相同的样本，这样可以理解。
在实际应用中，我的数据是如下排列的（也是R中面板数据排列格式），如数据格式为：
id  year
1 1997
1 1998
1 1999
1 2000
1 2001
2 1997
2 1998
2 1999
2 2000
2 2001
我取block-length=5，那就相当于以每个截面抽样，您帮我写的算法是这样的吗

17楼

楚韵荆风

发表于 2010-10-19 19:30:06

看来都是R高手，学习了！

共享是一种彼此的快乐

18楼

epoh 发表于 2010-10-19 19:32:05

看看package "plm" Linear Models for Panel Data
是不是适合你
data(EmplUK)
   firm year sector    emp wage capital output
1    1 1977    7 5.041 13.1516  0.5894  95.7072
2    1 1978    7 5.600 12.3018  0.6318  97.3569
3    1 1979    7 5.015 12.8395  0.6771  99.6083
4    1 1980    7 4.715 13.8039  0.6171 100.5501
5    1 1981    7 4.093 14.2897  0.5076  99.5581
6    1 1982    7 3.166 14.8681  0.4229  98.6151
7    1 1983    7 2.936 13.7784  0.3920 100.0301
8    2 1977    7  71.319 14.7909 16.9363  95.7072
9    2 1978    7  70.643 14.1036 17.2422  97.3569
10    2 1979    7  70.918 14.9534 17.5413  99.6083
11    2 1980    7  72.031 15.4910 17.6574 100.5501
12    2 1981    7  73.689 16.1969 16.7133  99.5581
13    2 1982    7  72.419 16.1314 16.2469  98.6151
14    2 1983    7  68.518 16.3051 17.3696 100.0301
15    3 1977    7  19.156 22.6920  7.0975  95.7072
..........

library(plm)
data("EmplUK", package="plm")

z2 <- pgmm(dynformula(log(emp)~log(wage)+log(capital),list(1,1,1)),
   data=EmplUK, effect="twoways", model="onestep",
   gmm.inst=~log(emp)+log(wage)+log(capital),lag.gmm=c(2,99),
   transformation="ld")
summary(z2,robust=TRUE)

Twoways effects One step model

Call:
pgmm(formula = dynformula(log(emp) ~ log(wage) + log(capital),
list(1, 1, 1)), data = EmplUK, effect = "twoways", model = "onestep",
gmm.inst = ~log(emp) + log(wage) + log(capital), lag.gmm = c(2,
      99), transformation = "ld")

Unbalanced Panel: n=140, T=7-9, N=1031

Number of Observations Used:  891

Residuals
   Min. 1st Qu.    Median    Mean 3rd Qu.    Max.
-6.571e-01 -4.703e-02  5.937e-03 -2.110e-13  5.784e-02  5.045e-01

Coefficients
                        Estimate    Std. Error z-value  Pr(>|z|)
lag(log(emp), 1)    0.935605 0.026295 35.5810 < 2.2e-16 ***
log(wage)             -0.630976 0.118054  -5.3448 9.050e-08 ***
lag(log(wage), 1) 0.482620 0.136887 3.5257 0.0004224 ***
log(capital)          0.483930 0.053867 8.9838 < 2.2e-16 ***
lag(log(capital), 1) -0.424393 0.058479 -7.2572 3.952e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sargan Test: chisq(100) = 236.0249 (p.value=5.2115e-13)
Autocorrelation test (1): normal = -4.808434 (p.value=7.6059e-07)
Autocorrelation test (2): normal = -0.2800133 (p.value=0.38973)
Wald test for coefficients: chisq(5) = 11174.82 (p.value=< 2.22e-16)

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	epoh，谢谢我不是为了估计方程我是说您发的block块自动选择的程序好

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

19楼

ywh19860616 发表于 2010-10-19 19:37:40

#epoh，谢谢您，我不是为了估计面板数据
我上面列出只是给出数据格式，问问怎么抽样？因为我发现b.star不适合有两个解释变量时X1和X2
因为结果有两种：
##    BstarSB BstarCB
##[1,]    50    58
##[2,]    252    289

呵呵，让你费心了

20楼

epoh 发表于 2010-10-19 21:38:00

BOOTSTRAP FOR PANEL DATA MODELS.pdf

BOOTSTRAP FOR PANEL DATA MODELS.pdf (199.57 KB)

page 3/30
A panel dataset with N individuals
and T time periods is represented
by a matrix Y of N rows and T columns.

page 6/28 Block Bootstrap
Assume that T = Kl, with l the length of a block,
then there are K non-overlapping blocks.

依你的数据 year1997,1998,1999,2000,2001
T=5,K=5,l=1
l没的选了.

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	谢谢epoh，那数据是我假设的您的算法就是按照您给我的文章编写的？

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

抽样问题 [推广有奖]

浏览过的帖子

浏览过的版块

本版微信群