抽样问题 - 经管之家

40关注
166
粉丝

已卖：3份资源

学术权威

32%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 725 个
通用积分: 4293.7287
学术水平: 867 点
热心指数: 1004 点
信用等级: 649 点
经验: 116372 点
帖子: 3968
精华: 0
在线时间: 7743 小时
注册时间: 2009-9-3
最后登录: 2025-9-7

楼主

ywh19860616 发表于 2010-10-16 07:58:07 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

用R使用了如下命令：
> y <- rnorm(50)
> x <- matrix(rnorm(100),50)
> fit <- rq(y~x,tau = .4)
Call:
rq(formula = y ~ x, tau = 0.4)
Coefficients:
(Intercept)       x1       x2
-0.38203826  0.09384216  0.09347414
Degrees of freedom: 50 total; 47 residual
Call: rq(formula = y ~ x, tau = 0.4)
tau: [1] 0.4
Coefficients:
         Value Std. Error t value  Pr(>|t|)
(Intercept) -0.38204  0.21700 -1.76056  0.08482
x1          0.09384  0.18203 0.51552  0.60861
x2          0.09347  0.24560 0.38059  0.70522

但是我现在想做的数据结构不是上面的一列数据格式（简称为截面），我现在想做面板（问一句，对于bootstrap，不同数据类型会不同吗，如面板数据和截面数据？），计算系数的程序已经
基本上有了，就是R中quantreg的rq.fit.sfn。但是这个程序只运行出coeff，而t值和p值都没有，所以我想用bootstrap计算（使用block bootstrap即可）
应该怎么实现？

> summary.rq
function (object, se = "nid", covariance = FALSE, hs = TRUE,
...)
{
if (object$method == "lasso")
      stop("no inference for lasso'd rq fitting: try rqss (if brave)")
mt <- terms(object)
m <- model.frame(object)
y <- model.response(m)
x <- model.matrix(mt, m, contrasts = object$contrasts)
wt <- model.weights(object$model)
tau <- object$tau
eps <- .Machine$double.eps^(2/3)
coef <- coefficients(object)
if (is.matrix(coef))
      coef <- coef[, 1]
vnames <- dimnames(x)[[2]]
resid <- object$residuals
n <- length(resid)
p <- length(coef)
rdf <- n - p
if (!is.null(wt)) {
      resid <- resid * wt
      x <- x * wt
      y <- y * wt
}
if (missing(se)) {
      if (n < 1001 & covariance == FALSE)
         se <- "rank"
      else se <- "nid"
}
if (se == "rank") {
      f <- rq.fit.br(x, y, tau = tau, ci = TRUE, ...)
}
if (se == "iid") {
      xxinv <- diag(p)
      xxinv <- backsolve(qr(x)$qr[1:p, 1:p, drop = FALSE],
         xxinv)
      xxinv <- xxinv %*% t(xxinv)
      pz <- sum(abs(resid) < eps)
      h <- max(p + 1, ceiling(n * bandwidth.rq(tau, n, hs = hs)))
      ir <- (pz + 1):(h + pz + 1)
      ord.resid <- sort(resid[order(abs(resid))][ir])
      xt <- ir/(n - p)
      sparsity <- rq(ord.resid ~ xt)$coef[2]
      cov <- sparsity^2 * xxinv * tau * (1 - tau)
      scale <- 1/sparsity
      serr <- sqrt(diag(cov))
}
else if (se == "nid") {
      h <- bandwidth.rq(tau, n, hs = hs)
      if (tau + h > 1)
         stop("tau + h > 1:  error in summary.rq")
      if (tau - h < 0)
         stop("tau - h < 0:  error in summary.rq")
      bhi <- rq.fit.fnb(x, y, tau = tau + h)$coef
      blo <- rq.fit.fnb(x, y, tau = tau - h)$coef
      dyhat <- x %*% (bhi - blo)
      if (any(dyhat <= 0))
         warning(paste(sum(dyhat <= 0), "non-positive fis"))
      f <- pmax(0, (2 * h)/(dyhat - eps))
      fxxinv <- diag(p)
      fxxinv <- backsolve(qr(sqrt(f) * x)$qr[1:p, 1:p, drop = FALSE],
         fxxinv)
      fxxinv <- fxxinv %*% t(fxxinv)
      cov <- tau * (1 - tau) * fxxinv %*% crossprod(x) %*%
         fxxinv
      scale <- mean(f)
      serr <- sqrt(diag(cov))
}
else if (se == "ker") {
      h <- bandwidth.rq(tau, n, hs = hs)
      if (tau + h > 1)
         stop("tau + h > 1:  error in summary.rq")
      if (tau - h < 0)
         stop("tau - h < 0:  error in summary.rq")
      uhat <- c(y - x %*% coef)
      h <- (qnorm(tau + h) - qnorm(tau - h)) * min(sqrt(var(uhat)),
         (quantile(uhat, 0.75) - quantile(uhat, 0.25))/1.34)
      f <- dnorm(uhat/h)/h
      fxxinv <- diag(p)
      fxxinv <- backsolve(qr(sqrt(f) * x)$qr[1:p, 1:p, drop = FALSE],
         fxxinv)
      fxxinv <- fxxinv %*% t(fxxinv)
      cov <- tau * (1 - tau) * fxxinv %*% crossprod(x) %*%
         fxxinv
      scale <- mean(f)
      serr <- sqrt(diag(cov))
}
else if (se == "boot") {
      B <- boot.rq(x, y, tau, ...)
      cov <- cov(B)
      serr <- sqrt(diag(cov))
}
if (se == "rank") {
      coef <- f$coef
}
else {
      coef <- array(coef, c(p, 4))
      dimnames(coef) <- list(vnames, c("Value", "Std. Error",
         "t value", "Pr(>|t|)"))
      coef[, 2] <- serr
      coef[, 3] <- coef[, 1]/coef[, 2]
      coef[, 4] <- if (rdf > 0)
         2 * (1 - pt(abs(coef[, 3]), rdf))
      else NA
}
object <- object[c("call", "terms")]
if (covariance == TRUE) {
      object$cov <- cov
      if (se == "iid")
         object$scale <- scale
      if (se %in% c("nid", "ker")) {
         object$Hinv <- fxxinv
         object$J <- crossprod(x)
         object$scale <- scale
      }
      else if (se == "boot") {
         object$B <- B
      }
}
object$coefficients <- coef
object$rdf <- rdf
object$tau <- tau
class(object) <- "summary.rq"
object
}

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏4 回帖

关键词：covariance Bootstrap Inference Intercept Bootstra 程序

本帖被以下文库推荐

· Survey Sampling NewOccidental|主题: 206, 订阅: 18

沙发

epoh 发表于 2010-10-17 11:33:45

Normal bootstrap works by creating data
by randomly selecting observations

Block bootstrap does this by randomly selecting
blocks of data, rather than individual data.

想了解一下,楼主是想保留data Block部分的当时特性?
如果是的话,等于是不用summary
重新编程,算出系数的standard errors.

已有 2 人评分	经验	论坛币	学术水平	热心指数	信用等级	收起理由
ltx5151	+ 20	+ 20				根据规定进行奖励
ywh19860616			+ 1	+ 1	+ 1	谢谢，我主要是计算系数的standard errors，从而计算t和p

总评分: 经验 + 20 论坛币 + 20 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

藤椅

ywh19860616 发表于 2010-10-17 13:18:28

#epoh
嗯，我用了一个面板数据模型，因为我程序只运行出来系数，而没有给出对应的p值和t值。看了文献，文献提到使用block bootstrap抽样来计算standard errors，从而计算出对应的t值和p-value等拟合优度。我的目的就是这个。
我看了一些文献，说面板数据抽样和一般数据抽样存在不同，所以想问这应该怎么实现？

板凳

epoh 发表于 2010-10-17 16:32:06

library(quantreg)
n=1000    #n observations
y <- rnorm(n)
x <- matrix(rnorm(2*n),n)
fit <- rq(y~x,tau = .4)
fit
summary(fit,se = "boot", bsmethod= "xy")
#Coefficients:
#          Value Std. Error t value  Pr(>|t|)
#(Intercept) -0.24258  0.03358 -7.22396  0.00000
#x1             0.05268  0.03514 1.49913  0.13416
#x2             0.03694  0.02999 1.23160  0.21839
######block bootstrap
b=1000    #Number of bootstrap samples
boot_bhat=matrix(NA,b, dim(x)[2]+1)
block_length = 50
num_blocks = 20 #n/block_length
Indices = seq(1:n)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){       #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]    #Find which data are in each block  250 x 20
Ind_sim = c(Ind_sim)    #1000 x 1
Xsim = x[Ind_sim,1:2]    #Construct the x data
Ysim = y[Ind_sim]          #Construct the y data
boot_bhat[i,] = rq(Ysim~Xsim,tau=0.4)$coefficients
}
bhat=colMeans(boot_bhat)
bhat
#[1] -0.24659193  0.05058387  0.04053553
cov=cov(boot_bhat)
serr = sqrt(diag(cov))
serr
# 0.02940081 0.03847271 0.03450373

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	非常感谢您。您能帮我再看看吗我不知如何套用您的程序

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

报纸

ywh19860616 发表于 2010-10-17 18:32:29

非常感谢#epoh，您写的很详细了，但是我初学R，不知道怎么转换成我想要的。
程序运行步骤是这样的，如下例：
m <- 3
n <- 10
s <- rep(1:n,rep(m,n))
x <- exp(rnorm(n*m))
X <- cbind(1,x)
u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
a <- rep(rnorm(n),rep(m,n))
y <- a + u
fit <- rq.fit.panel(X,y,s)
运行结果：
$coef
[1]  3.7116467 -2.6259010  3.8272535 -1.4451341  4.1059513 -0.9171851
[7] -3.8163941 -2.7247383 -3.2609396 -2.1087291 -1.8900982 -3.0358369
[13] -0.6968870 -0.1394642 -1.3196756 -2.5104425
$ierr  ##这是Error code for the internal Fortran routine srqfnc，不是标准误差
[1] 0
$it
[1] 10
$time
[1] 0
从上面运行结果可以看出，只有coef是我要的系数，但是其对应的t值和p-values未给出。所以想用
block bootstrap抽样得到。您上面程序我不知道如何套用。因为我运行时除了x，y，好有s(应该是用于识别面板结构的)。您能帮我再看看吗。
function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){

      require(SparseM)
      require(quantreg)
      K <- length(w)
      if(K != length(taus))
            stop("length of w and taus must match")
      X <- as.matrix(X)
      p <- ncol(X)
      n <- length(levels(as.factor(s)))
      N <- length(y)
      if(N != length(s) || N != nrow(X))
            stop("dimensions of y,X,s must match")
      Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
      Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
      Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
      D <- rbind(Fidelity,Penalty)
      y <- c(w %x% y,rep(0,n))
      a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
            sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
      rq.fit.sfn(D,y,rhs=a)
      }
function (a, y, tau = 0.5, rhs = (1 - tau) * c(t(a) %*% rep(1,
length(y))), nsubmax, tmpmax, nnzlmax, cachsz = 64, small = 1e-06,
maxiter = 100, warn.mesg = TRUE)
{
y <- -y
n <- length(y)
m <- a@dimension[2]
if (n != a@dimension[1])
      stop("Dimensions of design matrix and the response vector are not compatible")
u <- rep(1, length = n)
x <- rep((1 - tau), length = n)
nnzdmax <- nnza <- a@ia[n + 1] - 1
iwmax <- 7 * m + 3
ao <- t(a)
e <- ao %*% a
nnzemax <- e@ia[m + 1] - 1
if (missing(nsubmax))
      nsubmax <- nnzemax
if (missing(nnzlmax))
      nnzlmax <- 4 * nnzdmax
if (missing(tmpmax))
      tmpmax <- 6 * m
wwm <- vector("numeric", 3 * m)
s <- u - x
b1 <- solve(e, ao %*% y, tmpmax = tmpmax, nnzlmax = nnzlmax,
      nsubmax = nsubmax)
r <- y - a %*% b1
z <- ifelse(abs(r) < small, (r * (r > 0) + small), r * (r >
      0))
w <- z - r
wwn <- matrix(0, n, 14)
wwn[, 1] <- r
wwn[, 2] <- z
wwn[, 3] <- w
srqfnb.o <- .Fortran("srqfn", n = as.integer(n), m = as.integer(m),
      nnza = as.integer(nnza), a = as.double(a@ra), ja = as.integer(a@ja),
      ia = as.integer(a@ia), ao = as.double(ao@ra), jao = as.integer(ao@ja),
      iao = as.integer(ao@ia), nnzdmax = as.integer(nnzdmax),
      d = double(nnzdmax), jd = integer(nnzdmax), id = integer(m +
         1), dsub = double(nnzemax + 1), jdsub = integer(nnzemax +
         1), nnzemax = as.integer(nnzemax), e = as.double(e@ra),
      je = as.integer(e@ja), ie = as.integer(e@ia), nsubmax = as.integer(nsubmax),
      lindx = integer(nsubmax), xlindx = integer(m + 1), nnzlmax = as.integer(nnzlmax),
      lnz = double(nnzlmax), xlnz = integer(m + 1), iw = integer(m *
         5), iwmax = as.integer(iwmax), iwork = integer(iwmax),
      xsuper = integer(m + 1), tmpmax = as.integer(tmpmax),
      tmpvec = double(tmpmax), wwm = as.double(wwm), wwn = as.double(wwn),
      cachsz = as.integer(cachsz), level = as.integer(8), x = as.double(x),
      s = as.double(s), u = as.double(u), c = as.double(y),
      sol = as.double(b1), rhs = as.double(rhs), small = as.double(small),
      ierr = integer(1), maxiter = as.integer(maxiter), time = double(7),
      PACKAGE = "quantreg")[c("sol", "ierr", "maxiter", "time")]
ierr <- srqfnb.o$ierr
if (!(ierr == 0) && warn.mesg)
      warning(sfnMessage(ierr))
list(coef = -srqfnb.o$sol, ierr = ierr, it = srqfnb.o$maxiter,
      time = sum(srqfnb.o$time))
}

地板

epoh 发表于 2010-10-17 20:25:01

library(quantreg)
m <- 3
n <- 10
s <- rep(1:n,rep(m,n))
x <- exp(rnorm(n*m))
X <- cbind(1,x)
u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
a <- rep(rnorm(n),rep(m,n))
y <- a + u
#########Sparse Regression Quantile Fitting
sX <- as.matrix.csr(X)
fit.sfn=rq.fit.sfn(sX, y)
fit.sfn
#$coef
#[1]  0.7230244 -1.5599576

######block bootstrap
n1=length(y)  #30
b=1000    #Number of bootstrap samples
boot_bhat=matrix(NA,b, dim(X)[2])
block_length = 10
num_blocks = n1/block_length #n/block_length  #3
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){       #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]    #Find which data are in each block
Ind_sim = c(Ind_sim)
Xsim = X[Ind_sim,1:2]    #Construct the x data
sXsim <- as.matrix.csr(Xsim)
Ysim = y[Ind_sim]       #Construct the y data
boot_bhat[i,] = rq.fit.sfn(sXsim, y)$coef
}
bhat=colMeans(boot_bhat)
#bhat
cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef

         Value       Std. Error    t value       Pr(>|t|)
#[1,]  0.03135153  0.7788350  0.04025439 0.9681760
#[2,] -0.52056648  0.8418555 -0.61835611 0.5413385

##############################
rq.fit.panel<-function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){
# prototype function for panel data fitting of QR models
# the matrix X is assumed to contain an intercept
# the vector s is a strata indicator assumed (so far) to be a one-way layout
# NB:
# 1.  The value of the shrinkage parameter lambda is an open research problem in
#    the simplest homogneous settings it should be the ratio of the scale parameters
#    of the fixed effects and the idiocyncratic errors
# 2.  On return the coefficient vector has m*p + n elements where m is the number
#    quantiles being estimated, p is the number of colums of X, and n is the
#    number of distinct values of s.  The first m*p coefficients are the
#    slope estimates, and the last n are the "fixed effects"
# 3.  Like all shrinkage (regularization) estimators, asymptotic inference is somewhat
#    problematic... so the bootstrap is the natural first resort.

      require(SparseM)
      require(quantreg)
      K <- length(w)
      if(K != length(taus))
            stop("length of w and taus must match")
      X <- as.matrix(X)
      p <- ncol(X)
      n <- length(levels(as.factor(s)))
      N <- length(y)
      if(N != length(s) || N != nrow(X))
            stop("dimensions of y,X,s must match")
      Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
      Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
      Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
      D <- rbind(Fidelity,Penalty)
      y <- c(w %x% y,rep(0,n))
      a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
            sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
      rq.fit.sfn(D,y,rhs=a)
      }
#########rq.fit.panel
fit.panel <- rq.fit.panel(X,y,s)
fit.panel
#$coef
#[1]  3.5696496 -3.8264048  3.0765997 -2.0488377  2.9257768 -0.8172033 -3.6342146 -3.6971347
#[9] -0.1692880 -0.5130398 -2.1588006 -0.9896829 -3.6029453 -0.9873640 -1.0739194 -1.5147111

######block bootstrap
n1=length(y)  #30
b=1000    #Number of bootstrap samples
ncoef=16
boot_bhat=matrix(NA,b, ncoef)
block_length = 10
num_blocks = n1/block_length #n/block_length
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){       #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]    #Find which data are in each block
Ind_sim = c(Ind_sim)
Xsim = X[Ind_sim,1:dim(X)[2]]    #Construct the x data
Ysim = y[Ind_sim]       #Construct the y data
boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
}
bhat=colMeans(boot_bhat)
#bhat

cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef
> coef
         Value          Std. Error t value       Pr(>|t|)
[1,]  3.5833974 1.224927  2.925397 0.011071952
[2,] -3.2408558 0.836343 -3.875032 0.001682470
[3,]  3.4498021 1.443538  2.389824 0.031476152
[4,] -1.7417926 1.514311 -1.150221 0.269321804
[5,]  3.2779020 1.430895  2.290806 0.038006777
[6,] -0.3387540 1.453237 -0.233103 0.819055078
[7,] -2.8649529 1.526798 -1.876445 0.081595934
[8,] -2.2847649 1.831747 -1.247315 0.232744869
[9,] -2.2825497 1.571369 -1.452587 0.168381907
[10,] -2.6825952 0.885694 -3.028806 0.009021254
[11,] -2.6366763 1.210942 -2.177375 0.047051236
[12,] -1.1954863 0.885605 -1.349909 0.198467848
[13,] -3.6453777 1.662293 -2.192982 0.045697269
[14,] -1.7934437 1.630096 -1.100208 0.289800173
[15,] -1.7644270 1.041941 -1.693405 0.112501121
[16,] -2.6937529 1.733657 -1.553798 0.142542035

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	十分感谢您的热心帮助，谢谢我自己看看，不懂再向您请教

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

7楼

ywh19860616 发表于 2010-10-17 20:58:31

#epoh
感谢您的热心解答，非常感谢

8楼

zhangtao 发表于 2010-10-17 22:08:42

非常佩服epoh,为什么我的R运行结果如下？
> library(quantreg)
> m <- 3
> n <- 10
> s <- rep(1:n,rep(m,n))
> x <- exp(rnorm(n*m))
> X <- cbind(1,x)
> u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
> a <- rep(rnorm(n),rep(m,n))
> y <- a + u
> #########Sparse Regression Quantile Fitting
> sX <- as.matrix.csr(X)
> fit.sfn=rq.fit.sfn(sX, y)
> fit.sfn
$coef
[1]  1.5907851 -0.9816827

$ierr
[1] 0

$it
[1] 8

$time
[1] 0

> #$coef
> #[1]  0.7230244 -1.5599576
>
> ######block bootstrap
> n1=length(y)  #30
> b=1000    #Number of bootstrap samples
> boot_bhat=matrix(NA,b, dim(X)[2])
> block_length = 10
> num_blocks = n1/block_length #n/block_length  #3
> Indices = seq(1:n1)  # All of the indices from 1 to n
> Indices = matrix(Indices,block_length,num_blocks)
> for (i in 1:b){       #Number of bootstrap samples
+ randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
+ Ind_sim = Indices[,randblock]    #Find which data are in each block
+ Ind_sim = c(Ind_sim)
+ Xsim = X[Ind_sim,1:2]    #Construct the x data
+ sXsim <- as.matrix.csr(Xsim)
+ Ysim = y[Ind_sim]       #Construct the y data
+ boot_bhat[i,] = rq.fit.sfn(sXsim, y)$coef
+ }
> bhat=colMeans(boot_bhat)
> #bhat
> cov=cov(boot_bhat)
> serr = sqrt(diag(cov))
> #serr
>
> p <- length(bhat)
> rdf <- n1 - p
> vnames<- dimnames(x)[[2]]
> coef <- array(bhat, c(p, 4))
> dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
> coef[, 2] <- serr
> coef[, 3] <- coef[, 1]/coef[, 2]
> coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
> coef
         Value Std. Error    t value  Pr(>|t|)
[1,] -0.01807533  1.0159156 -0.01779216 0.9859308
[2,] -0.16323621  0.5183736 -0.31490071 0.7551718
>
>             Value       Std. Error    t value       Pr(>|t|)
错误: 意外的符号在"          Value       Std."里
> #[1,]  0.03135153  0.7788350  0.04025439 0.9681760
> #[2,] -0.52056648  0.8418555 -0.61835611 0.5413385
>
> ##############################
> rq.fit.panel<-function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){
+ # prototype function for panel data fitting of QR models
+ # the matrix X is assumed to contain an intercept
+ # the vector s is a strata indicator assumed (so far) to be a one-way layout
+ # NB:
+ # 1.  The value of the shrinkage parameter lambda is an open research problem in
+ #    the simplest homogneous settings it should be the ratio of the scale parameters
+ #    of the fixed effects and the idiocyncratic errors
+ # 2.  On return the coefficient vector has m*p + n elements where m is the number
+ #    quantiles being estimated, p is the number of colums of X, and n is the
+ #    number of distinct values of s.  The first m*p coefficients are the
+ #    slope estimates, and the last n are the "fixed effects"
+ # 3.  Like all shrinkage (regularization) estimators, asymptotic inference is somewhat
+ #    problematic... so the bootstrap is the natural first resort.
+
+
+       require(SparseM)
+       require(quantreg)
+       K <- length(w)
+       if(K != length(taus))
+                stop("length of w and taus must match")
+       X <- as.matrix(X)
+       p <- ncol(X)
+       n <- length(levels(as.factor(s)))
+       N <- length(y)
+       if(N != length(s) || N != nrow(X))
+                stop("dimensions of y,X,s must match")
+       Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
+       Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
+       Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
+       D <- rbind(Fidelity,Penalty)
+       y <- c(w %x% y,rep(0,n))
+       a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
+                sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
+       rq.fit.sfn(D,y,rhs=a)
+       }
> #########rq.fit.panel
> fit.panel <- rq.fit.panel(X,y,s)
> fit.panel
$coef
[1]  7.6260607 -2.1070905  7.5975820 -0.9305439  7.9502965 -0.6942313
[7] -8.0326189 -8.3645710 -6.1608858 -7.8483822 -9.1936181 -6.8814008
[13] -2.8142175 -4.5646799 -5.0150503 -5.2894871

$ierr
[1] 0

$it
[1] 9

$time
[1] 0

> #$coef
> #[1]  3.5696496 -3.8264048  3.0765997 -2.0488377  2.9257768 -0.8172033 -3.6342146 -3.6971347
> #[9] -0.1692880 -0.5130398 -2.1588006 -0.9896829 -3.6029453 -0.9873640 -1.0739194 -1.5147111
>
> ######block bootstrap
> n1=length(y)  #30
> b=1000    #Number of bootstrap samples
> ncoef=16
> boot_bhat=matrix(NA,b, ncoef)
> block_length = 10
> num_blocks = n1/block_length #n/block_length
> Indices = seq(1:n1)  # All of the indices from 1 to n
> Indices = matrix(Indices,block_length,num_blocks)
> for (i in 1:b){       #Number of bootstrap samples
+ randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
+ Ind_sim = Indices[,randblock]    #Find which data are in each block
+ Ind_sim = c(Ind_sim)
+ Xsim = X[Ind_sim,1:dim(X)[2]]    #Construct the x data
+ Ysim = y[Ind_sim]       #Construct the y data
+ boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
+ }
共有43个警告 (用warnings()来显示)
> bhat=colMeans(boot_bhat)
> #bhat
>
> cov=cov(boot_bhat)
> serr = sqrt(diag(cov))
> #serr
>
> p <- length(bhat)
> rdf <- n1 - p
> vnames<- dimnames(x)[[2]]
> coef <- array(bhat, c(p, 4))
> dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
> coef[, 2] <- serr
> coef[, 3] <- coef[, 1]/coef[, 2]
> coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
> coef
      Value Std. Error t value  Pr(>|t|)
[1,]  7.534053 4.751090  1.585753 0.1351177
[2,] -2.856759 2.167977 -1.317707 0.2087586
[3,]  7.426722 4.896351  1.516787 0.1515725
[4,] -1.594420 1.660894 -0.959977 0.3533539
[5,]  7.847167 4.793434  1.637066 0.1238897
[6,] -1.187242 1.503429 -0.789689 0.4428796
[7,] -5.581834 4.066550 -1.372621 0.1914601
[8,] -5.812536 4.126287 -1.408660 0.1807565
[9,] -5.574963 3.945628 -1.412947 0.1795168
[10,] -5.235239 3.849973 -1.359812 0.1953872
[11,] -6.375304 3.684595 -1.730259 0.1055559
[12,] -5.187254 4.007587 -1.294359 0.2164851
[13,] -3.887769 2.977477 -1.305726 0.2126952
[14,] -5.792848 3.690437 -1.569691 0.1388069
[15,] -4.975033 3.700436 -1.344445 0.2001844
[16,] -5.451939 3.380240 -1.612885 0.1290763
> > coef
错误: 意外的'>'在">"里

9楼

ywh19860616 发表于 2010-10-18 07:35:00

楼上的，我运行不会出现这个问题

#epoh ，谢谢您，能帮我再看下这个问题吗
我想问一下，block_length = 10 这个是如何定的？是根据n=10来确定的吗？即按时间或者截面个数来确定吗？
就是block bootstrap分块是按照什么标准来分的？您能否推荐一些文献。您帮我编写的程序是按下图的抽样方法吗？

10楼

epoh 发表于 2010-10-18 09:45:08

block_length = 10,只是为了程序运行,
随意订的.
block_length,应该跟数据本质有关,
譬如一星期,一个月,一季,一年.

你上传的是最基本的bootstrap
假设数据data是:
   x    y
1  10 100
2  20 200
3  30 300
4  40 400
5  50 500
6  60 600
7  70 700
8  80 800
9  90 900
10 99 999

n=10
index=sample(seq(1:n),n,replace = TRUE)
index
#[1]  7  7  9  5  9  2 10 10  9  5
#随机抽样后,取出又放回,所以replace = TRUE
#抽出后的新样本就是:
data[index,]
      x    y
[1,] 70 700
[2,] 70 700
[3,] 90 900
[4,] 50 500
[5,] 90 900
[6,] 20 200
[7,] 99 999
[8,] 99 999
[9,] 90 900
[10,] 50 500

然后以新样本回归,再重新抽样,进行B次

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 1	+ 1	+ 1	太感谢您了，我样本是以年度数据为单位的，block-length=10合适吗

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

抽样问题 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

抽样问题 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群