人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › R语言论坛 › 带置信区间的画图问题

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: ywh19860616

64613 65

带置信区间的画图问题 [推广有奖]

31楼

ywh19860616 发表于 2011-4-4 00:09:51 |只看作者 |坛友微信交流群

epoh前辈，您说过rq和rqss根本不能实现多个分位数同时估计的，而且使用rq函数，内部就有画图程序，效果和您的是一样的。如plot（summary（...））就可以。
我的意思是在rq.fit.sfn函数基础上（即使用rq.fit.panel函数），使用bootstrap出来的标准误直接计算出置信区间，进而画出图形

n1=length(y)  #30
b=1000    #Number of bootstrap samples
ncoef=16
boot_bhat=matrix(NA,b, ncoef)
block_length = 10
num_blocks = n1/block_length #n/block_length
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){       #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]    #Find which data are in each block
Ind_sim = c(Ind_sim)
Xsim = X[Ind_sim,1:dim(X)[2]]    #Construct the x data
Ysim = y[Ind_sim]       #Construct the y data
boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
}
bhat=colMeans(boot_bhat)
#bhat

cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef
就是在上面计算出系数coef后，直接计算置信区间，进而再画出图形（尽管现在估计的分位点较少，但是我只是想看大致趋势，没有关系）

一份耕耘，一份收获。

使用道具举报

32楼

epoh 发表于 2011-4-4 09:59:57 |只看作者 |坛友微信交流群

在function rq.fit.sfn()基础上
底下结果都相同,你要的是哪种图形
fit1=rq.fit.sfn(xx,yy,tau=0.5,a)
fit1
fit2=rq.fit.sfn(xx,yy,tau=0.3,a)
fit2
fit3=rq.fit.sfn(xx,yy,tau=0.7,a)
fit3
fit4=rq.fit.sfn(xx,yy,tau=c(0.1,0.25,0.5,0.75,0.9) ,a)
fit4

另针你说用bootstrap结果不稳定,
应该说bootstrap结果是稳定,
是我block bootstrap程序,忙中出错.

还是回到我提供的BOOTSTRAP FOR PANEL DATA MODELS.pdf
page 6/30 block bootstrap resampling is operating with columns
所以根据page 2/30
panel dataset with N individuals and T time periods is represented
            by a matrix Y of N rows and T columns

就以提供的数据为例:
year=c(1991,1992,1993,1994,1995,1996,1991,1992,1993,1994,1995,1996)
x1=c(10,20,30,40,50,60,70,80,90,100,110,120)
x2=c(101,201,301,401,501,601,701,801,901,1001,1101,1201)
x=cbind(x1,x2)
y=year
#x
> x
   year  x1 x2
[1,] 1991  10  101
[2,] 1992  20  201
[3,] 1993  30  301
[4,] 1994  40  401
[5,] 1995  50  501
[6,] 1996  60  601
[7,] 1991  70  701
[8,] 1992  80  801
[9,] 1993  90  901
[10,] 1994 100 1001
[11,] 1995 110 1101
[12,] 1996 120 1201

数据需先转换成下列pattern, 再以 column 抽样

T 1991 1992 1993 1994 1995 1996
            10    20    30    40    50    60
   id  701 801 901 1001  1101 1201

然后再还原成paneldata格式,进行回归.

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 5	+ 5	+ 5	epoh前辈，谢谢您我不是说bootstrap不稳定，而是我自己参数设置问题

总评分: 学术水平 + 5 热心指数 + 5 信用等级 + 5 查看全部评分

使用道具举报

33楼

ywh19860616 发表于 2011-4-4 10:50:21 |只看作者 |坛友微信交流群

epoh前辈，呵呵，我还是没有表达清楚，勿怪
我要的图形应该是fit4=rq.fit.sfn(xx,yy,tau=c(0.1,0.25,0.5,0.75,0.9) ,a)
意思就是说假如现在我有4个解释变量（包括intercept），根据上面命令，每个解释变量都估计了5个分位点
我想直观观察每个解释变量随着分位点不同，其值的变化规律。
就是以分位点（0.1,0.25,0.5,0.75,0.9）为横轴，而4个解释变量的系数值为纵轴，这样就画出了4幅反映
各解释变量规律的图像了。如
n1=length(y)  #30
b=1000    #Number of bootstrap samples
ncoef=16
boot_bhat=matrix(NA,b, ncoef)
block_length = 10
num_blocks = n1/block_length #n/block_length
Indices = seq(1:n1)  # All of the indices from 1 to n
Indices = matrix(Indices,block_length,num_blocks)
for (i in 1:b){       #Number of bootstrap samples
randblock =sample(seq(1:num_blocks),num_blocks,replace = TRUE) # Choose which blocks to use
Ind_sim = Indices[,randblock]    #Find which data are in each block
Ind_sim = c(Ind_sim)
Xsim = X[Ind_sim,1:dim(X)[2]]    #Construct the x data
Ysim = y[Ind_sim]       #Construct the y data
boot_bhat[i,] = rq.fit.panel(Xsim,Ysim,s)$coef
}
bhat=colMeans(boot_bhat)
#bhat

cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr

p <- length(bhat)
rdf <- n1 - p
vnames<- dimnames(x)[[2]]
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(vnames, c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef
如上面您已经bootstrap得出了各解释变量在不同分位点的系数，那就想把这些系数趋势画在图形中反映出来

我前面已经说了，呵呵，不是bootstrap不稳定问题，而是我自己参数设置问题
啊，bootstrap抽样有错误吗？那前辈能修改吗？

一份耕耘，一份收获。

使用道具举报

34楼

epoh 发表于 2011-4-4 13:48:34 |只看作者 |坛友微信交流群

library(SparseM)
library(quantreg)
library(Ecdat)
data(Grunfeld)
#panel data Grunfeld
#n=10, T=20, N=200
x1=Grunfeld$value
x2=Grunfeld$capital
X=cbind(x1,x2)
y=Grunfeld$inv
s= rep(1:10,rep(20,10))
lambda=1
w=c(0.1,0.2,0.3,0.4,0.5,0.4,0.3,0.2,0.1)
taus=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9)
ntaus=length(taus)
K <- length(w)
X <- as.matrix(X)
p <- ncol(X)
n <- length(levels(as.factor(s)))
N <- length(y)
Z <- as.matrix.csr(model.matrix(~as.factor(s)-1))
Fidelity <- cbind(as(w,"matrix.diag.csr") %x% X,w %x% Z)
Penalty <- cbind(as.matrix.csr(0,n,K*p),lambda*as(n,"matrix.diag.csr"))
D <- rbind(Fidelity,Penalty)
yy <- c(w %x% y,rep(0,n))
a <- c((w*(1-taus)) %x% (t(X)%*%rep(1,N)),
   sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1,n))
rq.fit.sfn(D,yy,tau=taus,rhs=a)
#####block bootstrap
#n=10, T=20, N=200
N=length(y)  #200
T=20
n=10
blockmat=matrix(NA,n,T)
for(i in 1:n){
for(j in 1:T){
  # blockmat[i,j]=(j-1)*T+i
      blockmat[i,j]=(i-1)*T+j
   }
}
b=5000 #Number of bootstrap samples
ncoef=28
boot_bhat=matrix(NA,b, ncoef)
indices = seq(1:T)  # All of the indices from 1 to 20
####

for (j in 1:b){       #Number of bootstrap samples
randblock =sample(indices,replace = TRUE) # Choose which blocks to use
ind_sim=0*seq(1:n)
for(i in 1:n){
ind_sim[((i-1)*T+1):(i*T)]=(i-1)*T+randblock
      }
xsim=cbind(x1[ind_sim],x2[ind_sim]) #Construct the X data
xx1=cbind(as(w,"matrix.diag.csr") %x% xsim,w %x% Z)
xx<-rbind(xx1,Penalty)
ysim = y[ind_sim]       #Construct the y data
yy <- c(w %x% ysim,rep(0,n))
a <- c((w*(1-taus)) %x% (t(xsim)%*%rep(1,N)),
            sum(w*(1-taus)) * (t(Z) %*% rep(1,N)) + lambda * rep(1/2,n))
boot_bhat[j,] = rq.fit.sfn(xx,yy,taus,a)$coef
}

bhat=colMeans(boot_bhat)
#bhat
cov=cov(boot_bhat)
serr = sqrt(diag(cov))
#serr
p <- length(bhat)
rdf <- N-ncoef
coef <- array(bhat, c(p, 4))
dimnames(coef) <- list(seq(1:ncoef), c("Value", "Std. Error", "t value","Pr(>|t|)"))
coef[, 2] <- serr
coef[, 3] <- coef[, 1]/coef[, 2]
coef[, 4] <- if (rdf > 0) 2 * (1 - pt(abs(coef[, 3]), rdf))
coef
> coef
         Value  Std. Error t value    Pr(>|t|)
1    0.06107249  0.01386247  4.4055984 1.850521e-05
2    0.15487193  0.02205991  7.0205165 4.903056e-11
3    0.07234995  0.01482287  4.8809663 2.396156e-06
4    0.16567351  0.02101250  7.8845210 3.477219e-13
5    0.08053246  0.01565537  5.1440794 7.267875e-07
6    0.17087047  0.02093900  8.1603941 6.794565e-14
7    0.08633521  0.01577335  5.4734852 1.539569e-07
8    0.17589873  0.02234190  7.8730426 3.721468e-13
9    0.08989036  0.01543300  5.8245559 2.752130e-08
10 0.18514027  0.02556903  7.2408017 1.425859e-11
11 0.09275225  0.01509710  6.1437128 5.434659e-09
12 0.19914111  0.03041511  6.5474402 6.494829e-10
13 0.09563246  0.01524897  6.2714042 2.799298e-09
14 0.21819694  0.03619383  6.0285673 9.816594e-09
15 0.09971964  0.01643330  6.0681427 8.017352e-09
16 0.24902179  0.05148496  4.8367874 2.915302e-06
17 0.10538962  0.02081095  5.0641423 1.048965e-06
18 0.31456336  0.07023116  4.4789711 1.362888e-05
19 70.87586275 53.31286209  1.3294327 1.854658e-01
20  178.11811868 39.54191280  4.5045398 1.224064e-05
21 -145.88272883 33.85621760 -4.3088903 2.754051e-05
22 -1.70397795  8.63275211 -0.1973853 8.437591e-01
23  -51.72378988 17.42847433 -2.9677750 3.427315e-03
24 -1.91490600  6.15461311 -0.3111334 7.560759e-01
25  -24.71403762  8.68786452 -2.8446619 4.985616e-03
26  -31.82235648 11.51383878 -2.7638355 6.335172e-03
27  -42.08427967 10.15349925 -4.1448055 5.328559e-05
28 -4.48096796  0.96710385 -4.6333886 7.076866e-06
###############
coef1mat=matrix(NA,ntaus,5)
coef2mat=matrix(NA,ntaus,5)
#If n is large(>1000), t=1.96
t=1.96
for (i in 1:ntaus){
#coefficient1
coef1mat[i,1]=taus
coef1mat[i,2]=coef[(2*(i-1)+1),1]
coef1mat[i,3]=coef[(2*(i-1)+1),2]
coef1mat[i,4]=coef[(2*(i-1)+1),1]+t*coef[(2*(i-1)+1),2]
coef1mat[i,5]=coef[(2*(i-1)+1),1]-t*coef[(2*(i-1)+1),2]
#coefficient2
coef2mat[i,1]=taus
coef2mat[i,2]=coef[2*i,1]
coef2mat[i,3]=coef[2*i,2]
coef2mat[i,4]=coef[2*i,1]+t*coef[2*i,2]
coef2mat[i,5]=coef[2*i,1]-t*coef[2*i,2]
}
coef1mat
coef2mat
> coef1mat
   [,1]    [,2]    [,3]    [,4]    [,5]
[1,]  0.1 0.06107249 0.01386247 0.08824294 0.03390204
[2,]  0.2 0.07234995 0.01482287 0.10140278 0.04329712
[3,]  0.3 0.08053246 0.01565537 0.11121698 0.04984793
[4,]  0.4 0.08633521 0.01577335 0.11725098 0.05541944
[5,]  0.5 0.08989036 0.01543300 0.12013903 0.05964168
[6,]  0.6 0.09275225 0.01509710 0.12234257 0.06316193
[7,]  0.7 0.09563246 0.01524897 0.12552045 0.06574448
[8,]  0.8 0.09971964 0.01643330 0.13192891 0.06751036
[9,]  0.9 0.10538962 0.02081095 0.14617909 0.06460016
> coef2mat
   [,1]    [,2]    [,3]    [,4]    [,5]
[1,]  0.1 0.1548719 0.02205991 0.1981093 0.1116345
[2,]  0.2 0.1656735 0.02101250 0.2068580 0.1244890
[3,]  0.3 0.1708705 0.02093900 0.2119109 0.1298300
[4,]  0.4 0.1758987 0.02234190 0.2196889 0.1321086
[5,]  0.5 0.1851403 0.02556903 0.2352556 0.1350250
[6,]  0.6 0.1991411 0.03041511 0.2587547 0.1395275
[7,]  0.7 0.2181969 0.03619383 0.2891369 0.1472570
[8,]  0.8 0.2490218 0.05148496 0.3499323 0.1481113
[9,]  0.9 0.3145634 0.07023116 0.4522164 0.1769103
###plot coef1,coef2 95%CI
par(mfrow = c(1, 2))
####coefficient1
upperbd1=coef1mat[,4]
lowerbd1=coef1mat[,5]
coef1=coef1mat[,2]
plot(taus,upperbd1,type = 'n',  ylim = c(0.00, 0.20),
   ylab = 'Coef1 Value', xlab = 'Quantile')
lines(taus, lowerbd1, col = 'grey')
lines(taus, upperbd1, col = 'grey')
polygon(c(taus, rev(taus)), c(upperbd1, rev(lowerbd1)),
   col = "lightblue", border = 'black')
points(taus, coef1,col='blue',pch=21)
lines(taus, coef1,col='blue')
title("gross Investment vs value of the firm")
####coefficient2
upperbd2=coef2mat[,4]
lowerbd2=coef2mat[,5]
coef2=coef2mat[,2]
plot(taus,upperbd2,type = 'n',  ylim = c(0.00, 0.50),
   ylab = 'Coef2 Capital', xlab = 'Quantile')
lines(taus, lowerbd2, col = 'grey')
lines(taus, upperbd2, col = 'grey')
polygon(c(taus, rev(taus)), c(upperbd2, rev(lowerbd2)),
   col = "lightblue", border = 'black')
points(taus, coef2,col='blue',pch=21)
lines(taus, coef2,col='blue')
title("gross Investment vs stock of plant and equipment")


2011-4-4 13:48:34 上传
下载附件 (87.81 KB)

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 5	+ 5	+ 5	非常感谢epoh前辈，呵呵，您做事很认真负责，好的

总评分: 学术水平 + 5 热心指数 + 5 信用等级 + 5 查看全部评分

使用道具举报

35楼

ywh19860616 发表于 2011-4-4 17:25:52 |只看作者 |坛友微信交流群

epoh前辈，谢谢您
可以看出，您很认真负责

一份耕耘，一份收获。

使用道具举报

36楼

ywh19860616 发表于 2011-4-4 23:08:46 |只看作者 |坛友微信交流群

epoh前辈，我运行这一段时，会提示错误
t=1.96
for (i in 1:ntaus){
#coefficient1
coef1mat[i,1]=taus
coef1mat[i,2]=coef[(2*(i-1)+1),1]
coef1mat[i,3]=coef[(2*(i-1)+1),2]
coef1mat[i,4]=coef[(2*(i-1)+1),1]+t*coef[(2*(i-1)+1),2]
coef1mat[i,5]=coef[(2*(i-1)+1),1]-t*coef[(2*(i-1)+1),2]
#coefficient2
coef2mat[i,1]=taus
coef2mat[i,2]=coef[2*i,1]
coef2mat[i,3]=coef[2*i,2]
coef2mat[i,4]=coef[2*i,1]+t*coef[2*i,2]
coef2mat[i,5]=coef[2*i,1]-t*coef[2*i,2]
}
coef1mat
coef2mat

错误于coef1mat[i, 1] = taus : 被替换的项目不是替换值长度的倍数

我哪里出错？

一份耕耘，一份收获。

使用道具举报

37楼

epoh 发表于 2011-4-4 23:15:08 |只看作者 |坛友微信交流群

coef1mat[i,1]=taus[i]
coef2mat[i,1]=taus[i]

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
ywh19860616	+ 5	+ 5	+ 5	谢谢您，epoh前辈

总评分: 学术水平 + 5 热心指数 + 5 信用等级 + 5 查看全部评分

使用道具举报

38楼

ywh19860616 发表于 2011-4-4 23:48:35 |只看作者 |坛友微信交流群

额，epoh老师，已经可以了
谢谢您的指导

一份耕耘，一份收获。

使用道具举报

39楼

ywh19860616 发表于 2011-4-8 18:32:28 |只看作者 |坛友微信交流群

epoh老师，又碰到疑问了
我看到一些文章运行出各分位点的系数后，如在0.1,0.2,0.3,0.4,0.5的系数值分别为coeff1，coeff2，coeff3，coeff4，coeff5，文章提到了通过统计量检验这些系数间是否存在显著差异。有些文章提到了系数的联合显著性检验（ie，有些文章提到：对FDI在不同分位点的各系数差异性的联合F检验），这些在R有相关命令吗？
附件中有一篇文章，其中我用红色标志了2个地方，您看在R中有相关命令可以实现吗？

Quantile regression.pdf

477.97 KB

一份耕耘，一份收获。

使用道具举报

40楼

epoh 发表于 2011-4-9 08:56:00 |只看作者 |坛友微信交流群

function anova.rq()

There are two (as yet) distinct forms of the test.
In the first the fitted objects all have the same specified quantile (tau) and
the intent is to test the hypothesis that smaller models are adequate
relative to the largest specified model.

In the second form of the test the linear predictor of the fits
are all the same, but the specified quantiles (taus) are different