楼主: ywh19860616
19229 58

抽样问题 [推广有奖]

11
ywh19860616 发表于 2010-10-18 10:35:52
#epoh,谢谢您的热心帮助

12
zhangtao 发表于 2010-10-18 21:52:36
ywh19860616 朋友,能不能把你在9楼所附文件的完整文章传上来,我学习学习,非常感谢!
要是不方便,就算了,还是非常感谢你!

13
ywh19860616 发表于 2010-10-19 13:21:28
zhangtao兄弟,完整文章我就不上传了,同学的
您可以搜索,作者:林馨怡,国立政治大学经济学系

14
zhangtao 发表于 2010-10-19 16:04:04
非常非常感谢!

15
epoh 发表于 2010-10-19 17:00:46
Block Bootstrap:resample blocks instead of single elements.
                           the dependence structure is preserved.

##An illustration of "Moving Block Bootstrap";
##the blue points in the three yellow rectangles
##(blocks) are bootstrap samples.

x = 5 * sin(seq(0, pi, length = 90)) + rnorm(90)
plot(x, main = "Illustration of Moving Block Bootstrap (MBB)")
for (idx in 1:(length(x) - 30 + 1)) {
rect(idx, min(x[idx:(idx + 30 - 1)]), idx + 30 - 1, max(x[idx:(idx +
30 - 1)]), border = "gray")
Sys.sleep(0.2)
}
bt = sample(1:(length(x) - 30 + 1), 3, rep = T)
for (b in bt) {
rect(b, min(x[b:(b + 30 - 1)]), b + 30 - 1, max(x[b:(b +
30 - 1)]), col = "yellow")
points(b:(b + 30 - 1), x[b:(b + 30 - 1)], col = "blue", pch = 19)
}

##optimal Block-Length Selection
Automatic Block-Length Selection for the Dependent Bootstrap
http://econ.duke.edu/~ap172/Politis_White_2004.pdf

CORRECTION TO “AUTOMATIC BLOCK-LENGTH SELECTION...."
http://econ.duke.edu/~ap172/Patton_Politis_White_2009.pdf

## ppw.R
## $Id: ppw.R,v 1.47 2008/12/12 14:52:17 jracine Exp jracine $

## Original code in Matlab by A. Patton, R translation and
## modifications by C. Parmeter and J. Racine.
##
## We are grateful to Andrew Patton and Dimitris Politis for their
## assistance and feedback. Kindly report features, deficiencies, and
## improvements to racinej@mcmaster.ca.
##
## The citation is A. Patton, D.N. Politis, and H. White (2008,
## forthcoming), "CORRECTION TO `Automatic Block-Length Selection for
## the Dependent Bootstrap' by D.N. Politis and H. White". This is
## based on the article by Politis, D.N., and H. White (2004),
## "Automatic block-length selection for the dependent bootstrap."
## Econometric Reviews, vol. 23.
##
## INPUTS:  data, an n x k matrix.
##
## OUTPUTS: b.star, a 2 x k vector of optimal bootstrap block lengths
## for the stationary bootstrap and circular bootstrap (BstarSB,
## BstarCB).

## The function lam() is used to construct a "flat-top" lag window for
## spectral estimation based on Politis, D.N. and J.P. Romano (1995),
## "Bias-Corrected Nonparametric Spectral Estimation", Journal of Time
## Series Analysis, vol. 16, No. 1.

lam <- function(s){
  return((abs(s)>=0)*(abs(s)<0.5)+2*(1-abs(s))*(abs(s)>=0.5)*(abs(s)<=1))
}

## The function b.star() returns the optimal bootstrap block
## lengths. Note that an example for usage appears at the bottom of
## this file. If you use this function as input into a routine such as
## tsboot() in the boot library (Angelo Canty and Brian Ripley
## (2008). boot: Bootstrap R (S-Plus) Functions. R package version
## 1.2-34.) you ought to use the option round=TRUE.

b.star <- function(data,
                   Kn = NULL,
                   mmax= NULL,
                   Bmax = NULL,
                   c = NULL,
                   round = FALSE){

  ## Convert the data object to a data frame to handle both vectors
  ## and matrices.

  data <- data.frame(data)
  n <- nrow(data)
  k <- ncol(data)

  ## Set Defaults. Note that in footnote c, page 59, for Kn Politis
  ## and White (2004) use max(5,log10(n)). Since this must be an
  ## integer we use ceiling(log10(n)).

  if (is.null(Kn)) Kn <- max(5,ceiling(log10(n)))
  if (is.null(mmax)) mmax <- ceiling(sqrt(n))+Kn
  if (is.null(Bmax)) Bmax <- ceiling(min(3*sqrt(n),n/3))
  if (is.null(c)) c <- qnorm(0.975)

  ## Create two vectors of length k in which we store results.

  BstarSB <- numeric(length=k)
  BstarCB <- numeric(length=k)

  ## Now we loop through each variable in data (i.e., column,
  ## data[,i]).

  for(i in 1:k) {

    ## We first obtain the autocorrelations rho(1),...,rho(mmax) (we
    ## need to drop the first autocorrelation as it is rho(0), hence
    ## acf[-1]). This is the default in acf [type="correlation"]. Note
    ## that Patton uses sample correlations after dropping the first
    ## mmax observations, while we instead use the acf to obtain
    ## rho(k).

    rho.k <- acf(data[,i],
                 lag.max = mmax,
                 type = "correlation",
                 plot = FALSE)$acf[-1]

    ## Next we compute mhat. The use of c*sqrt(log10(n)/n) for
    ## critical values is given in footnote c of Politis and White
    ## (2004, page 59), and the approach for determining mhat is
    ## described in footnote c.

    rho.k.crit <- c*sqrt(log10(n)/n)

    ## Compute the number of insignificant runs following each rho(k),
    ## k=1,...,mmax.
   
    num.insignificant <- sapply(1:(mmax-Kn+1),
                                function(j){
                                  sum((abs(rho.k) < rho.k.crit)[j:(j+Kn-1)])
                                })

    ## If there are any values of rho(k) for which the Kn proceeding
    ## values of rho(k+j), j=1,...,Kn are all insignificant, take the
    ## smallest rho(k) such that this holds (see footnote c for
    ## further details).

    if(any(num.insignificant==Kn)) {
      mhat <- which(num.insignificant==Kn)[1]
    } else {

      ## If no runs of length Kn are insignificant, take the smallest
      ## value of rho(k) that is significant.

      if(any(abs(rho.k) > rho.k.crit)) {
        
        lag.sig <- which(abs(rho.k) > rho.k.crit)
        k.sig <- length(lag.sig)
        
        if(k.sig == 1) {
         
          ## When only one lag is significant, mhat is the sole
          ## significant rho(k).
         
          mhat <- lag.sig
         
        } else {

          ## If there are more than one significant lags but no runs
          ## of length Kn, take the largest value of rho(k) that is
          ## significant.
         
          mhat <- max(lag.sig)
         
        }
        
      } else {
        
        ## When there are no significant lags, mhat must be the
        ## smallest positive integer (footnote c), hence mhat is set
        ## to one.
        
        mhat <- 1
        
      }

    }

    ## Compute M (mhat is at least one).

    M <- ifelse(2*mhat > mmax, mmax, 2*mhat)

    ## We compute BstarSB and BstarCB using the formulas in the above
    ## references. Now we require the autocovariance R(k) (hence
    ## type="covariance" in the acf call). Note that Patton uses
    ## sample covariances after dropping the first mmax observations,
    ## while we instead use the acf with type="covariance" to obtain
    ## R(k). Note also that we require R(0) hence we do not drop it as
    ## we did for rho(k) via acf(...)$acf[-1].
      
    kk <- seq(-M,M)

    R.k <- ccf(data[,i], data[,i],
               lag.max = M,
               type = "covariance",
               plot = FALSE)$acf
   
    Ghat <- sum(lam(kk/M)*abs(kk)*R.k)
    DCBhat <- 4/3*sum(lam(kk/M)*R.k)^2
    DSBhat <- 2*sum(lam(kk/M)*R.k)^2
    BstarSB <- ((2*Ghat^2)/DSBhat)^(1/3)*n^(1/3)
    BstarCB <- ((2*(Ghat^2)/DCBhat)^(1/3))*(n^(1/3))
      
  }

  ## The user can choose whether they want rounded values returned or
  ## not. BstarCB is rounded up, BstarSB simply rounded but both must
  ## be positive integers.

  if(round == FALSE) {
   
    BstarSB <- ifelse(BstarSB > Bmax, Bmax, BstarSB)
    BstarCB <- ifelse(BstarCB > Bmax, Bmax, BstarCB)

  } else {

    BstarSB <- ifelse(BstarSB > Bmax, Bmax, ifelse(BstarSB < 1, 1, round(BstarSB)))
    BstarCB <- ifelse(BstarCB > Bmax, Bmax, ifelse(BstarCB < 1, 1, ceiling(BstarCB)))

  }
  
  return(cbind(BstarSB,BstarCB))
  
}

## Here is a simple example with an n x 2 matrix containing n=10^5
## observations, where column 2 of x is more persistent than column
## 1. This requires that you first install the forecast library (i.e.,
## install.packages("forecast")).
##
##  library(forecast)
##  set.seed(123)
##  x <- cbind(arima.sim(n = 100000, list(ar = c(.5,.0), ma = c(0,0)),sd = 1),
##             arima.sim(n = 100000, list(ar = c(.5,.4), ma = c(0,0)),sd = 1))
##  b.star(x)  
##  b.star(x,round=TRUE)
##>   b.star(x)  
##       BstarSB   BstarCB
##[1,]  50.39272  57.68526
##[2,] 251.62894 288.04323
##>   b.star(x,round=TRUE)
##     BstarSB BstarCB
##[1,]      50      58
##[2,]     252     289
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 谢谢,这个程序我也看过,但是好像不适合面板数据选择最优block。

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

16
ywh19860616 发表于 2010-10-19 17:52:06
#epoh
我知道这个是自动选择分块数的程序b.star。但是对于面板数据,解释变量多于1个时,好像不适用
您看这个结果:
##     BstarSB BstarCB
##[1,]      50      58
##[2,]     252     289
对应2个解释变量,有不同分法的,这应该是不可以的。因为我数据y x1 x2肯定是放在一起抽样的,才会对应
而不是分开一个一个抽样。您觉得我的说法对吗?
问题:
如果我想把这个程序用于我的问题,应该怎么解决?
如果这个程序不能用,我的数据是以年度为单位的。block_length应该怎么选择为好,是直接选择等于截面个数吗?我觉得如果直接选择截面个数了,那就相当于在每个截面分别抽取个数相同的样本,这样可以理解。
在实际应用中,我的数据是如下排列的(也是R中面板数据排列格式),如数据格式为:
id  year
1   1997
1   1998
1   1999
1   2000
1   2001
2   1997
2   1998
2   1999
2    2000
2   2001
我取block-length=5,那就相当于以每个截面抽样,您帮我写的算法是这样的吗

17
楚韵荆风 学生认证  发表于 2010-10-19 19:30:06
看来都是R高手,学习了!
共享是一种彼此的快乐

18
epoh 发表于 2010-10-19 19:32:05
看看package "plm" Linear Models for Panel Data
是不是适合你
data(EmplUK)
     firm year sector     emp    wage capital   output
1       1 1977      7   5.041 13.1516  0.5894  95.7072
2       1 1978      7   5.600 12.3018  0.6318  97.3569
3       1 1979      7   5.015 12.8395  0.6771  99.6083
4       1 1980      7   4.715 13.8039  0.6171 100.5501
5       1 1981      7   4.093 14.2897  0.5076  99.5581
6       1 1982      7   3.166 14.8681  0.4229  98.6151
7       1 1983      7   2.936 13.7784  0.3920 100.0301
8       2 1977      7  71.319 14.7909 16.9363  95.7072
9       2 1978      7  70.643 14.1036 17.2422  97.3569
10      2 1979      7  70.918 14.9534 17.5413  99.6083
11      2 1980      7  72.031 15.4910 17.6574 100.5501
12      2 1981      7  73.689 16.1969 16.7133  99.5581
13      2 1982      7  72.419 16.1314 16.2469  98.6151
14      2 1983      7  68.518 16.3051 17.3696 100.0301
15      3 1977      7  19.156 22.6920  7.0975  95.7072
..........

library(plm)
data("EmplUK", package="plm")

z2 <- pgmm(dynformula(log(emp)~log(wage)+log(capital),list(1,1,1)),
     data=EmplUK, effect="twoways", model="onestep",
     gmm.inst=~log(emp)+log(wage)+log(capital),lag.gmm=c(2,99),
     transformation="ld")
summary(z2,robust=TRUE)

Twoways effects One step model

Call:
pgmm(formula = dynformula(log(emp) ~ log(wage) + log(capital),
    list(1, 1, 1)), data = EmplUK, effect = "twoways", model = "onestep",
    gmm.inst = ~log(emp) + log(wage) + log(capital), lag.gmm = c(2,
        99), transformation = "ld")

Unbalanced Panel: n=140, T=7-9, N=1031

Number of Observations Used:  891

Residuals
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
-6.571e-01 -4.703e-02  5.937e-03 -2.110e-13  5.784e-02  5.045e-01

Coefficients
                            Estimate      Std. Error z-value  Pr(>|z|)   
lag(log(emp), 1)      0.935605   0.026295 35.5810 < 2.2e-16 ***
log(wage)              -0.630976   0.118054  -5.3448 9.050e-08 ***
lag(log(wage), 1)    0.482620   0.136887   3.5257 0.0004224 ***
log(capital)             0.483930   0.053867   8.9838 < 2.2e-16 ***
lag(log(capital), 1) -0.424393   0.058479 -7.2572 3.952e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sargan Test: chisq(100) = 236.0249 (p.value=5.2115e-13)
Autocorrelation test (1): normal = -4.808434 (p.value=7.6059e-07)
Autocorrelation test (2): normal = -0.2800133 (p.value=0.38973)
Wald test for coefficients: chisq(5) = 11174.82 (p.value=< 2.22e-16)
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 epoh,谢谢 我不是为了估计方程 我是说您发的block块自动选择的程序好

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

19
ywh19860616 发表于 2010-10-19 19:37:40
#epoh,谢谢您,我不是为了估计面板数据
我上面列出只是给出数据格式,问问怎么抽样?因为我发现b.star不适合有两个解释变量时X1和X2
因为结果有两种:
##     BstarSB BstarCB
##[1,]      50      58
##[2,]     252     289

呵呵,让你费心了

20
epoh 发表于 2010-10-19 21:38:00
BOOTSTRAP FOR PANEL DATA MODELS.pdf
BOOTSTRAP FOR PANEL DATA MODELS.pdf (199.57 KB)

page 3/30
A panel dataset with N individuals
and T time periods is represented
by a matrix Y of N rows and T columns.

page 6/28 Block Bootstrap
Assume that T = Kl, with l the length of a block,
then there are K non-overlapping blocks.

依你的数据 year1997,1998,1999,2000,2001
T=5,K=5,l=1
l没的选了.
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 谢谢epoh,那数据是我假设的 您的算法就是按照您给我的文章编写的?

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-26 01:52