楼主: ywh19860616
19227 58

抽样问题 [推广有奖]

已卖:3份资源

学术权威

32%

还不是VIP/贵宾

-

威望
0
论坛币
725 个
通用积分
4238.2714
学术水平
867 点
热心指数
1004 点
信用等级
649 点
经验
116372 点
帖子
3968
精华
0
在线时间
7743 小时
注册时间
2009-9-3
最后登录
2025-9-7

楼主
ywh19860616 发表于 2010-10-16 07:58:07 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
用R使用了如下命令:
> y <- rnorm(50)
> x <- matrix(rnorm(100),50)
> fit <- rq(y~x,tau = .4)
Call:
rq(formula = y ~ x, tau = 0.4)
Coefficients:
(Intercept)          x1          x2
-0.38203826  0.09384216  0.09347414
Degrees of freedom: 50 total; 47 residual
Call: rq(formula = y ~ x, tau = 0.4)
tau: [1] 0.4
Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept) -0.38204  0.21700   -1.76056  0.08482
x1           0.09384  0.18203    0.51552  0.60861
x2           0.09347  0.24560    0.38059  0.70522

但是我现在想做的数据结构不是上面的一列数据格式(简称为截面),我现在想做面板(问一句,对于bootstrap,不同数据类型会不同吗,如面板数据和截面数据?),计算系数的程序已经
基本上有了,就是R中quantreg的rq.fit.sfn。但是这个程序只运行出coeff,而t值和p值都没有,所以我想用bootstrap计算(使用block bootstrap即可)
应该怎么实现?

> summary.rq
function (object, se = "nid", covariance = FALSE, hs = TRUE,
    ...)
{
    if (object$method == "lasso")
        stop("no inference for lasso'd rq fitting: try rqss (if brave)")
    mt <- terms(object)
    m <- model.frame(object)
    y <- model.response(m)
    x <- model.matrix(mt, m, contrasts = object$contrasts)
    wt <- model.weights(object$model)
    tau <- object$tau
    eps <- .Machine$double.eps^(2/3)
    coef <- coefficients(object)
    if (is.matrix(coef))
        coef <- coef[, 1]
    vnames <- dimnames(x)[[2]]
    resid <- object$residuals
    n <- length(resid)
    p <- length(coef)
    rdf <- n - p
    if (!is.null(wt)) {
        resid <- resid * wt
        x <- x * wt
        y <- y * wt
    }
    if (missing(se)) {
        if (n < 1001 & covariance == FALSE)
            se <- "rank"
        else se <- "nid"
    }
    if (se == "rank") {
        f <- rq.fit.br(x, y, tau = tau, ci = TRUE, ...)
    }
    if (se == "iid") {
        xxinv <- diag(p)
        xxinv <- backsolve(qr(x)$qr[1:p, 1:p, drop = FALSE],
            xxinv)
        xxinv <- xxinv %*% t(xxinv)
        pz <- sum(abs(resid) < eps)
        h <- max(p + 1, ceiling(n * bandwidth.rq(tau, n, hs = hs)))
        ir <- (pz + 1):(h + pz + 1)
        ord.resid <- sort(resid[order(abs(resid))][ir])
        xt <- ir/(n - p)
        sparsity <- rq(ord.resid ~ xt)$coef[2]
        cov <- sparsity^2 * xxinv * tau * (1 - tau)
        scale <- 1/sparsity
        serr <- sqrt(diag(cov))
    }
    else if (se == "nid") {
        h <- bandwidth.rq(tau, n, hs = hs)
        if (tau + h > 1)
            stop("tau + h > 1:  error in summary.rq")
        if (tau - h < 0)
            stop("tau - h < 0:  error in summary.rq")
        bhi <- rq.fit.fnb(x, y, tau = tau + h)$coef
        blo <- rq.fit.fnb(x, y, tau = tau - h)$coef
        dyhat <- x %*% (bhi - blo)
        if (any(dyhat <= 0))
            warning(paste(sum(dyhat <= 0), "non-positive fis"))
        f <- pmax(0, (2 * h)/(dyhat - eps))
        fxxinv <- diag(p)
        fxxinv <- backsolve(qr(sqrt(f) * x)$qr[1:p, 1:p, drop = FALSE],
            fxxinv)
        fxxinv <- fxxinv %*% t(fxxinv)
        cov <- tau * (1 - tau) * fxxinv %*% crossprod(x) %*%
            fxxinv
        scale <- mean(f)
        serr <- sqrt(diag(cov))
    }
    else if (se == "ker") {
        h <- bandwidth.rq(tau, n, hs = hs)
        if (tau + h > 1)
            stop("tau + h > 1:  error in summary.rq")
        if (tau - h < 0)
            stop("tau - h < 0:  error in summary.rq")
        uhat <- c(y - x %*% coef)
        h <- (qnorm(tau + h) - qnorm(tau - h)) * min(sqrt(var(uhat)),
            (quantile(uhat, 0.75) - quantile(uhat, 0.25))/1.34)
        f <- dnorm(uhat/h)/h
        fxxinv <- diag(p)
        fxxinv <- backsolve(qr(sqrt(f) * x)$qr[1:p, 1:p, drop = FALSE],
            fxxinv)
        fxxinv <- fxxinv %*% t(fxxinv)
        cov <- tau * (1 - tau) * fxxinv %*% crossprod(x) %*%
            fxxinv
        scale <- mean(f)
        serr <- sqrt(diag(cov))
    }
    else if (se == "boot") {
        B <- boot.rq(x, y, tau, ...)
        cov <- cov(B)
        serr <- sqrt(diag(cov))
    }
    if (se == "rank") {
        coef <- f$coef
    }
    else {
        coef <- array(coef, c(p, 4))
        dimnames(coef) <- list(vnames, c("Value", "Std. Error",
            "t value", "Pr(>|t|)"))
        coef[, 2] <- serr
        coef[, 3] <- coef[, 1]/coef[, 2]
        coef[, 4] <- if (rdf > 0)
            2 * (1 - pt(abs(coef[, 3]), rdf))
        else NA
    }
    object <- object[c("call", "terms")]
    if (covariance == TRUE) {
        object$cov <- cov
        if (se == "iid")
            object$scale <- scale
        if (se %in% c("nid", "ker")) {
            object$Hinv <- fxxinv
            object$J <- crossprod(x)
            object$scale <- scale
        }
        else if (se == "boot") {
            object$B <- B
        }
    }
    object$coefficients <- coef
    object$rdf <- rdf
    object$tau <- tau
    class(object) <- "summary.rq"
    object
}
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:covariance Bootstrap Inference Intercept Bootstra 程序

本帖被以下文库推荐

沙发
epoh 发表于 2010-10-17 11:33:45
Normal bootstrap works by creating data
by randomly selecting observations

Block bootstrap does this by randomly selecting
blocks of data, rather than individual data.

想了解一下,楼主是想保留data Block部分的当时特性?
如果是的话,等于是不用summary
重新编程,算出系数的standard errors.
已有 2 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
ltx5151 + 20 + 20 根据规定进行奖励
ywh19860616 + 1 + 1 + 1 谢谢,我主要是计算系数的standard errors,从而计算t和p

总评分: 经验 + 20  论坛币 + 20  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

藤椅
ywh19860616 发表于 2010-10-17 13:18:28
#epoh
嗯,我用了一个面板数据模型,因为我程序只运行出来系数,而没有给出对应的p值和t值。看了文献,文献提到使用block bootstrap抽样来计算standard errors,从而计算出对应的t值和p-value等拟合优度。我的目的就是这个。
我看了一些文献,说面板数据抽样和一般数据抽样存在不同,所以想问这应该怎么实现?

板凳
epoh 发表于 2010-10-17 16:32:06
library(quantreg)
n=1000     #n observations
y <- rnorm(n)
x <- matrix(rnorm(2*n),n)
fit <- rq(y~x,tau = .4)
fit
summary(fit,se = "boot", bsmethod= "xy")
#Coefficients:
#            Value    Std. Error t value  Pr(>|t|)
#(Intercept) -0.24258  0.03358   -7.22396  0.00000
#x1               0.05268  0.03514    1.49913  0.13416
#x2               0.03694  0.02999    1.23160  0.21839
######block bootstrap
b=1000     #Number of bootstrap samples
boot_bhat=matrix(NA,b, dim(x)[2]+1)
block_length = 50  
num_blocks = 20   #n/block_length
Indices = seq(1:n)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){          #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]       #Find which data are in each block  250 x 20
Ind_sim = c(Ind_sim)       #1000 x 1
Xsim = x[Ind_sim,1:2]      #Construct the x data
Ysim = y[Ind_sim]            #Construct the y data
boot_bhat[i,] = rq(Ysim~Xsim,tau=0.4)$coefficients
}
bhat=colMeans(boot_bhat)
bhat
#[1] -0.24659193  0.05058387  0.04053553
cov=cov(boot_bhat)
serr = sqrt(diag(cov))
serr   
# 0.02940081 0.03847271 0.03450373
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 非常感谢您。您能帮我再看看吗 我不知如何套用您的程序

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

报纸
ywh19860616 发表于 2010-10-17 18:32:29
非常感谢#epoh,您写的很详细了,但是我初学R,不知道怎么转换成我想要的。
程序运行步骤是这样的,如下例:
m <- 3
n <- 10
s <- rep(1:n,rep(m,n))
x <- exp(rnorm(n*m))
X <- cbind(1,x)
u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
a <- rep(rnorm(n),rep(m,n))
y <- a + u
fit <- rq.fit.panel(X,y,s)
运行结果:
$coef
[1]  3.7116467 -2.6259010  3.8272535 -1.4451341  4.1059513 -0.9171851
[7] -3.8163941 -2.7247383 -3.2609396 -2.1087291 -1.8900982 -3.0358369
[13] -0.6968870 -0.1394642 -1.3196756 -2.5104425
$ierr  ##这是Error code for the internal Fortran routine srqfnc,不是标准误差
[1] 0
$it
[1] 10
$time
[1] 0
从上面运行结果可以看出,只有coef是我要的系数,但是其对应的t值和p-values未给出。所以想用
block bootstrap抽样得到。您上面程序我不知道如何套用。因为我运行时除了x,y,好有s(应该是用于识别面板结构的)。您能帮我再看看吗。
function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){

        require(SparseM)
        require(quantreg)
        K <- length(w)
        if(K != length(taus))
                stop("length of w and taus must match")
        X <- as.matrix(X)
        p <- ncol(X)
        n <- length(levels(as.factor(s)))
        N <- length(y)
        if(N != length(s) || N != nrow(X))
                stop("dimensions of y,X,s must match")
        Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
        Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
        Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
        D <- rbind(Fidelity,Penalty)
        y <- c(w %x% y,rep(0,n))
        a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
                sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
        rq.fit.sfn(D,y,rhs=a)
        }
function (a, y, tau = 0.5, rhs = (1 - tau) * c(t(a) %*% rep(1,
    length(y))), nsubmax, tmpmax, nnzlmax, cachsz = 64, small = 1e-06,
    maxiter = 100, warn.mesg = TRUE)
{
    y <- -y
    n <- length(y)
    m <- a@dimension[2]
    if (n != a@dimension[1])
        stop("Dimensions of design matrix and the response vector are not compatible")
    u <- rep(1, length = n)
    x <- rep((1 - tau), length = n)
    nnzdmax <- nnza <- a@ia[n + 1] - 1
    iwmax <- 7 * m + 3
    ao <- t(a)
    e <- ao %*% a
    nnzemax <- e@ia[m + 1] - 1
    if (missing(nsubmax))
        nsubmax <- nnzemax
    if (missing(nnzlmax))
        nnzlmax <- 4 * nnzdmax
    if (missing(tmpmax))
        tmpmax <- 6 * m
    wwm <- vector("numeric", 3 * m)
    s <- u - x
    b1 <- solve(e, ao %*% y, tmpmax = tmpmax, nnzlmax = nnzlmax,
        nsubmax = nsubmax)
    r <- y - a %*% b1
    z <- ifelse(abs(r) < small, (r * (r > 0) + small), r * (r >
        0))
    w <- z - r
    wwn <- matrix(0, n, 14)
    wwn[, 1] <- r
    wwn[, 2] <- z
    wwn[, 3] <- w
    srqfnb.o <- .Fortran("srqfn", n = as.integer(n), m = as.integer(m),
        nnza = as.integer(nnza), a = as.double(a@ra), ja = as.integer(a@ja),
        ia = as.integer(a@ia), ao = as.double(ao@ra), jao = as.integer(ao@ja),
        iao = as.integer(ao@ia), nnzdmax = as.integer(nnzdmax),
        d = double(nnzdmax), jd = integer(nnzdmax), id = integer(m +
            1), dsub = double(nnzemax + 1), jdsub = integer(nnzemax +
            1), nnzemax = as.integer(nnzemax), e = as.double(e@ra),
        je = as.integer(e@ja), ie = as.integer(e@ia), nsubmax = as.integer(nsubmax),
        lindx = integer(nsubmax), xlindx = integer(m + 1), nnzlmax = as.integer(nnzlmax),
        lnz = double(nnzlmax), xlnz = integer(m + 1), iw = integer(m *
            5), iwmax = as.integer(iwmax), iwork = integer(iwmax),
        xsuper = integer(m + 1), tmpmax = as.integer(tmpmax),
        tmpvec = double(tmpmax), wwm = as.double(wwm), wwn = as.double(wwn),
        cachsz = as.integer(cachsz), level = as.integer(8), x = as.double(x),
        s = as.double(s), u = as.double(u), c = as.double(y),
        sol = as.double(b1), rhs = as.double(rhs), small = as.double(small),
        ierr = integer(1), maxiter = as.integer(maxiter), time = double(7),
        PACKAGE = "quantreg")[c("sol", "ierr", "maxiter", "time")]
    ierr <- srqfnb.o$ierr
    if (!(ierr == 0) && warn.mesg)
        warning(sfnMessage(ierr))
    list(coef = -srqfnb.o$sol, ierr = ierr, it = srqfnb.o$maxiter,
        time = sum(srqfnb.o$time))
}

地板
epoh 发表于 2010-10-17 20:25:01
library(quantreg)
m <- 3
n <- 10
s <- rep(1:n,rep(m,n))
x <- exp(rnorm(n*m))
X <- cbind(1,x)
u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
a <- rep(rnorm(n),rep(m,n))
y <- a + u
#########Sparse Regression Quantile Fitting
sX <- as.matrix.csr(X)
fit.sfn=rq.fit.sfn(sX, y)
fit.sfn
#$coef
#[1]  0.7230244 -1.5599576

######block bootstrap
n1=length(y)  #30
b=1000     #Number of bootstrap samples
boot_bhat=matrix(NA,b, dim(X)[2])
block_length = 10  
num_blocks = n1/block_length    #n/block_length  #3
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){          #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]       #Find which data are in each block  
Ind_sim = c(Ind_sim)      
Xsim = X[Ind_sim,1:2]      #Construct the x data
sXsim <- as.matrix.csr(Xsim)
Ysim = y[Ind_sim]          #Construct the y data
boot_bhat[i,] = rq.fit.sfn(sXsim, y)$coef
}
bhat=colMeans(boot_bhat)
#bhat
cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr   

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef

             Value          Std. Error       t value          Pr(>|t|)
#[1,]  0.03135153  0.7788350  0.04025439 0.9681760
#[2,] -0.52056648  0.8418555 -0.61835611 0.5413385

##############################
rq.fit.panel<-function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){
# prototype function for panel data fitting of QR models
# the matrix X is assumed to contain an intercept
# the vector s is a strata indicator assumed (so far) to be a one-way layout
# NB:  
# 1.  The value of the shrinkage parameter lambda is an open research problem in
#       the simplest homogneous settings it should be the ratio of the scale parameters
#       of the fixed effects and the idiocyncratic errors
# 2.  On return the coefficient vector has m*p + n elements where m is the number
#       quantiles being estimated, p is the number of colums of X, and n is the
#       number of distinct values of s.  The first m*p coefficients are the
#       slope estimates, and the last n are the "fixed effects"
# 3.  Like all shrinkage (regularization) estimators, asymptotic inference is somewhat
#       problematic... so the bootstrap is the natural first resort.


        require(SparseM)
        require(quantreg)
        K <- length(w)
        if(K != length(taus))
                stop("length of w and taus must match")
        X <- as.matrix(X)
        p <- ncol(X)
        n <- length(levels(as.factor(s)))
        N <- length(y)
        if(N != length(s) || N != nrow(X))
                stop("dimensions of y,X,s must match")
        Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
        Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
        Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
        D <- rbind(Fidelity,Penalty)
        y <- c(w %x% y,rep(0,n))
        a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
                sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
        rq.fit.sfn(D,y,rhs=a)
        }
#########rq.fit.panel
fit.panel <- rq.fit.panel(X,y,s)
fit.panel
#$coef
#[1]  3.5696496 -3.8264048  3.0765997 -2.0488377  2.9257768 -0.8172033 -3.6342146 -3.6971347
#[9] -0.1692880 -0.5130398 -2.1588006 -0.9896829 -3.6029453 -0.9873640 -1.0739194 -1.5147111

######block bootstrap
n1=length(y)  #30
b=1000     #Number of bootstrap samples
ncoef=16
boot_bhat=matrix(NA,b, ncoef)
block_length = 10  
num_blocks = n1/block_length    #n/block_length  
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){          #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]       #Find which data are in each block  
Ind_sim = c(Ind_sim)      
Xsim = X[Ind_sim,1:dim(X)[2]]      #Construct the x data
Ysim = y[Ind_sim]          #Construct the y data
boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
}
bhat=colMeans(boot_bhat)
#bhat

cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr   

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef
> coef
           Value           Std. Error   t value        Pr(>|t|)
[1,]  3.5833974   1.224927  2.925397 0.011071952
[2,] -3.2408558   0.836343 -3.875032 0.001682470
[3,]  3.4498021   1.443538  2.389824 0.031476152
[4,] -1.7417926   1.514311 -1.150221 0.269321804
[5,]  3.2779020   1.430895  2.290806 0.038006777
[6,] -0.3387540   1.453237 -0.233103 0.819055078
[7,] -2.8649529   1.526798 -1.876445 0.081595934
[8,] -2.2847649   1.831747 -1.247315 0.232744869
[9,] -2.2825497   1.571369 -1.452587 0.168381907
[10,] -2.6825952   0.885694 -3.028806 0.009021254
[11,] -2.6366763   1.210942 -2.177375 0.047051236
[12,] -1.1954863   0.885605 -1.349909 0.198467848
[13,] -3.6453777   1.662293 -2.192982 0.045697269
[14,] -1.7934437   1.630096 -1.100208 0.289800173
[15,] -1.7644270   1.041941 -1.693405 0.112501121
[16,] -2.6937529   1.733657 -1.553798 0.142542035
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 十分感谢您的热心帮助,谢谢 我自己看看,不懂再向您请教

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

7
ywh19860616 发表于 2010-10-17 20:58:31
#epoh
感谢您的热心解答,非常感谢

8
zhangtao 发表于 2010-10-17 22:08:42
非常佩服epoh,为什么我的R运行结果如下?
> library(quantreg)
> m <- 3
> n <- 10
> s <- rep(1:n,rep(m,n))
> x <- exp(rnorm(n*m))
> X <- cbind(1,x)
> u <- x*rnorm(m*n) + (1-x)*rf(m*n,3,3)
> a <- rep(rnorm(n),rep(m,n))
> y <- a + u
> #########Sparse Regression Quantile Fitting
> sX <- as.matrix.csr(X)
> fit.sfn=rq.fit.sfn(sX, y)
> fit.sfn
$coef
[1]  1.5907851 -0.9816827

$ierr
[1] 0

$it
[1] 8

$time
[1] 0

> #$coef
> #[1]  0.7230244 -1.5599576
>
> ######block bootstrap
> n1=length(y)  #30
> b=1000     #Number of bootstrap samples
> boot_bhat=matrix(NA,b, dim(X)[2])
> block_length = 10  
> num_blocks = n1/block_length    #n/block_length  #3
> Indices = seq(1:n1)  # All of the indices from 1 to n
> Indices = matrix(Indices,block_length,num_blocks)
> for (i in 1:b){          #Number of bootstrap samples
+ randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
+ Ind_sim = Indices[,randblock]       #Find which data are in each block  
+ Ind_sim = c(Ind_sim)      
+ Xsim = X[Ind_sim,1:2]      #Construct the x data
+ sXsim <- as.matrix.csr(Xsim)
+ Ysim = y[Ind_sim]          #Construct the y data
+ boot_bhat[i,] = rq.fit.sfn(sXsim, y)$coef
+ }
> bhat=colMeans(boot_bhat)
> #bhat
> cov=cov(boot_bhat)
> serr = sqrt(diag(cov))
> #serr   
>
> p <- length(bhat)
> rdf <- n1 - p
> vnames<- dimnames(x)[[2]]
> coef <- array(bhat, c(p, 4))
> dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
> coef[, 2] <- serr
> coef[, 3] <- coef[, 1]/coef[, 2]
> coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
> coef
           Value Std. Error     t value  Pr(>|t|)
[1,] -0.01807533  1.0159156 -0.01779216 0.9859308
[2,] -0.16323621  0.5183736 -0.31490071 0.7551718
>
>              Value          Std. Error       t value          Pr(>|t|)
错误: 意外的符号在"             Value          Std."里
> #[1,]  0.03135153  0.7788350  0.04025439 0.9681760
> #[2,] -0.52056648  0.8418555 -0.61835611 0.5413385
>
> ##############################
> rq.fit.panel<-function(X,y,s,w=c(.25,.5,.25),taus=(1:3)/4,lambda = 1){
+ # prototype function for panel data fitting of QR models
+ # the matrix X is assumed to contain an intercept
+ # the vector s is a strata indicator assumed (so far) to be a one-way layout
+ # NB:  
+ # 1.  The value of the shrinkage parameter lambda is an open research problem in
+ #       the simplest homogneous settings it should be the ratio of the scale parameters
+ #       of the fixed effects and the idiocyncratic errors
+ # 2.  On return the coefficient vector has m*p + n elements where m is the number
+ #       quantiles being estimated, p is the number of colums of X, and n is the
+ #       number of distinct values of s.  The first m*p coefficients are the
+ #       slope estimates, and the last n are the "fixed effects"
+ # 3.  Like all shrinkage (regularization) estimators, asymptotic inference is somewhat
+ #       problematic... so the bootstrap is the natural first resort.
+
+
+         require(SparseM)
+         require(quantreg)
+         K <- length(w)
+         if(K != length(taus))
+                 stop("length of w and taus must match")
+         X <- as.matrix(X)
+         p <- ncol(X)
+         n <- length(levels(as.factor(s)))
+         N <- length(y)
+         if(N != length(s) || N != nrow(X))
+                 stop("dimensions of y,X,s must match")
+         Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
+         Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
+         Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
+         D <- rbind(Fidelity,Penalty)
+         y <- c(w %x% y,rep(0,n))
+         a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
+                 sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
+         rq.fit.sfn(D,y,rhs=a)
+         }
> #########rq.fit.panel
> fit.panel <- rq.fit.panel(X,y,s)
> fit.panel
$coef
[1]  7.6260607 -2.1070905  7.5975820 -0.9305439  7.9502965 -0.6942313
[7] -8.0326189 -8.3645710 -6.1608858 -7.8483822 -9.1936181 -6.8814008
[13] -2.8142175 -4.5646799 -5.0150503 -5.2894871

$ierr
[1] 0

$it
[1] 9

$time
[1] 0

> #$coef
> #[1]  3.5696496 -3.8264048  3.0765997 -2.0488377  2.9257768 -0.8172033 -3.6342146 -3.6971347
> #[9] -0.1692880 -0.5130398 -2.1588006 -0.9896829 -3.6029453 -0.9873640 -1.0739194 -1.5147111
>
> ######block bootstrap
> n1=length(y)  #30
> b=1000     #Number of bootstrap samples
> ncoef=16
> boot_bhat=matrix(NA,b, ncoef)
> block_length = 10  
> num_blocks = n1/block_length    #n/block_length  
> Indices = seq(1:n1)  # All of the indices from 1 to n
> Indices = matrix(Indices,block_length,num_blocks)
> for (i in 1:b){          #Number of bootstrap samples
+ randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
+ Ind_sim = Indices[,randblock]       #Find which data are in each block  
+ Ind_sim = c(Ind_sim)      
+ Xsim = X[Ind_sim,1:dim(X)[2]]      #Construct the x data
+ Ysim = y[Ind_sim]          #Construct the y data
+ boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
+ }
共有43个警告 (用warnings()来显示)
> bhat=colMeans(boot_bhat)
> #bhat
>
> cov=cov(boot_bhat)
> serr = sqrt(diag(cov))
> #serr   
>
> p <- length(bhat)
> rdf <- n1 - p
> vnames<- dimnames(x)[[2]]
> coef <- array(bhat, c(p, 4))
> dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
> coef[, 2] <- serr
> coef[, 3] <- coef[, 1]/coef[, 2]
> coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
> coef
          Value Std. Error   t value  Pr(>|t|)
[1,]  7.534053   4.751090  1.585753 0.1351177
[2,] -2.856759   2.167977 -1.317707 0.2087586
[3,]  7.426722   4.896351  1.516787 0.1515725
[4,] -1.594420   1.660894 -0.959977 0.3533539
[5,]  7.847167   4.793434  1.637066 0.1238897
[6,] -1.187242   1.503429 -0.789689 0.4428796
[7,] -5.581834   4.066550 -1.372621 0.1914601
[8,] -5.812536   4.126287 -1.408660 0.1807565
[9,] -5.574963   3.945628 -1.412947 0.1795168
[10,] -5.235239   3.849973 -1.359812 0.1953872
[11,] -6.375304   3.684595 -1.730259 0.1055559
[12,] -5.187254   4.007587 -1.294359 0.2164851
[13,] -3.887769   2.977477 -1.305726 0.2126952
[14,] -5.792848   3.690437 -1.569691 0.1388069
[15,] -4.975033   3.700436 -1.344445 0.2001844
[16,] -5.451939   3.380240 -1.612885 0.1290763
> > coef
错误: 意外的'>'在">"里

9
ywh19860616 发表于 2010-10-18 07:35:00
楼上的,我运行不会出现这个问题

#epoh ,谢谢您,能帮我再看下这个问题吗
我想问一下,block_length = 10  这个是如何定的?是根据n=10来确定的吗?即按时间或者截面个数来确定吗?
就是block bootstrap分块是按照什么标准来分的?您能否推荐一些文献。您帮我编写的程序是按下图的抽样方法吗?

10
epoh 发表于 2010-10-18 09:45:08
block_length = 10,只是为了程序运行,
随意订的.
block_length,应该跟数据本质有关,
譬如一星期,一个月,一季,一年.

你上传的是最基本的bootstrap
假设数据data是:
      x      y
1  10   100
2  20   200
3  30   300
4  40   400
5  50   500
6  60   600
7  70   700
8  80   800
9  90   900
10 99   999

n=10
index=sample(seq(1:n),n,replace = TRUE)
index
#[1]  7  7  9  5  9  2 10 10  9  5
#随机抽样后,取出又放回,所以replace = TRUE
#抽出后的新样本就是:
data[index,]
        x     y
[1,] 70 700
[2,] 70 700
[3,] 90 900
[4,] 50 500
[5,] 90 900
[6,] 20 200
[7,] 99 999
[8,] 99 999
[9,] 90 900
[10,] 50 500

然后以新样本回归,再重新抽样,进行B次
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ywh19860616 + 1 + 1 + 1 太感谢您了,我样本是以年度数据为单位的,block-length=10合适吗

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-26 00:03