楼主: Lisrelchen
1611 9

STT465: Bayesian Statistical Methods (MSU) [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4192份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50278 个
通用积分
83.5106
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-12-18 03:54:40 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
STT465: Bayesian Statistical Methods (MSU)

本帖隐藏的内容

STT465-master.zip (1.18 MB)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Statistical statistica statistic Bayesian Statist

沙发
Lisrelchen 发表于 2016-12-18 03:55:32
  1. ## Example: Effect of Sample Size on Likelihood and Posterior Inferences

  2. The following example displays the likelihood and posterior density (both up to a constant)
  3. of the Beta-Binomial model, assuming that a sample has render a sample mean of 0.3 (average
  4. number of successes in the sample) with sample size being varied from 10 to 100.

  5. ```R
  6. # Data
  7.   N=c(5,10,20,50,100)
  8.   xBar=.1
  9.   
  10. # Prior
  11.   shape1=3
  12.   shape2=3
  13.    
  14. # Grid of values for theta
  15. theta=seq(from=1/1000,to=999/1000,by=1/1000)


  16. # Likelihood
  17.   L=matrix(nrow=length(theta),ncol=length(N))
  18.   
  19.   logLik=function(xBar,n,theta){
  20.           nSuc=round(xBar*n)
  21.           nFail=(n-nSuc)
  22.           log_lik=nSuc*log(theta)+nFail*log(1-theta)
  23.           return(log_lik)
  24.   }
  25.   
  26.   for(i in 1:length(N)){
  27.           L[,i]=exp(logLik(xBar=xBar,n=N[i],theta))
  28.   }
  29.   
  30. # Posterior density
  31.   PD=L
  32.   for(i in 1:length(N)){
  33.           PD[,i]=dbeta(x=theta,shape1=xBar*N[i]+shape1,shape2=(1-xBar)*N[i]+shape2)
  34.   }

  35. # Saling to get nice plots

  36.    for(i in 1:length(N)){
  37.              PD[,i]=PD[,i]/max(PD[,i])
  38.              L[,i]=L[,i]/max(L[,i])
  39.    }

  40. plot(numeric()~numeric(),col=1,lty=1,ylim=c(0,1),xlim=range(theta),xlab=expression('theta'))

  41. for(i in 1:length(N)){
  42.         lines(x=theta,y=PD[,i],col=i,lty=1)
  43.         lines(x=theta,y=L[,i],col=i,lty=2)
  44. }

  45. ```
复制代码

藤椅
Lisrelchen 发表于 2016-12-18 03:56:22
  1. ### A Gibbs Sampler for a linear regression model

  2. ```R

  3. ## Toy simulation to test the program
  4. n=100
  5. x=rnorm(n=n)
  6. int=100
  7. beta=2
  8. y=int+x*beta+rnorm(n)


  9. ## Generating an incidence matrix for effects (intercetp and regression coeff)
  10.   X=cbind(1,x)
  11.   n=nrow(X)
  12.   p=ncol(X)

  13. ## Hyper-parameters
  14. df0=0
  15. S0=0  # these two give the prior p(var)=1/var
  16. b0=rep(0,p)
  17. varB=rep(1e6 ,p)# this gives flat prior for the intercept and regression coef.


  18. ## Parameters that control the algorithm
  19. nIter=11000
  20. burnIn=1000

  21. ## Objects that will store samples
  22. B=matrix(nrow=nIter,ncol=p,NA)
  23. B[1,1]=mean(y) #initializing the intercept
  24. B[1,-1]=0 # now regression coef
  25. varE=rep(NA,nIter)
  26. varE[1]=var(y)
  27. SSx=colSums(X^2) # we will need this in the sampler
  28.   
  29. ## Sampler

  30. for(i in 2:nIter){
  31.         b=B[i-1,]
  32.   for(j in 1:p){
  33.     yStar=y-X[,-j,drop=F]%*%b[-j]
  34.     rhs=sum(X[,j]*yStar)/varE[i-1]+b0[j]/varB[j]
  35.     C=SSx[j]/varE[i-1]+1/varB[j]
  36.     sol=rhs/C
  37.     b[j]=rnorm(n=1,mean=sol,sd=sqrt(1/C))
  38.   }
  39.   B[i,]=b
  40.   # sampling the error variance
  41.          error=y-X%*%b
  42.         # sampling the variance
  43.          SS=sum(error^2)+S0
  44.          DF=n+df0
  45.          varE[i]=SS/rchisq(n=1,df=DF)
  46.          print(i)
  47. }

  48. plot(varE)
  49. plot(B[,1],type='o')
  50. plot(B[,2],type='o')

  51. # 95% Cred. Regions
  52.   apply(FUN=quantile,prob=c(.025,.975),X=B[-c(1:burnIn),MARGIN=2)

  53. # Posterior Means
  54.   colMeans(B[-c(1:burnIn))

  55. # Posterior SD
  56.   apply(FUN=sd,X=B[-c(1:burnIn),MARGIN=2)



  57. ```
复制代码

板凳
Lisrelchen 发表于 2016-12-18 03:57:20
  1. ########################################################################
  2. # A simple Gibbs Sampler for the mean and variance of a normal model   #
  3. ########################################################################

  4. ```R

  5. ## Toy simulation to test the program
  6. n=100
  7. y=rnorm(mean=125,sd=1,n=n)

  8. ## Hyper-parameters
  9. df0=0
  10. S0=0  # these two give the prior p(var)=1/var
  11. mu0=0
  12. varMu=1e6 # this gives flat prior for the mean

  13. ## Parameters that control the algorithm
  14. nIter=12000
  15. burnIn=2000


  16. ## Objects that will store samples
  17. mu=rep(NA,nIter)
  18. varE=rep(NA,nIter)

  19. ## Initializing
  20. varE[1]=var(y)
  21. mu[1]=mean(y)

  22. ## Objects that we will use
  23. Sy=sum(y)

  24. ## Sampler

  25. for(i in 2:nIter){
  26.         # sample the mean
  27.          rhs=Sy/varE[i-1]+1/varMu

  28.          C=n/varE[i-1]+1/varMu
  29.          sol=rhs/C
  30.          mu[i]=rnorm(sd=sqrt(1/C),mean=sol,n=1)
  31.        
  32.         # sampling the variance
  33.          SS=sum((y-mu[i])^2)+S0
  34.          DF=n+df0
  35.          varE[i]=SS/rchisq(n=1,df=DF)
  36.          print(i)
  37. }

  38. plot(varE)
  39. abline(h=var(y),lwd=2,col=2)
  40. plot(hist(varE,30))
  41. abline(v=var(y),lwd=2,col=2)
  42. plot(mu,varE)
  43. ```
复制代码

报纸
Lisrelchen 发表于 2016-12-18 03:58:09
  1. # Logistic regression

  2. Logistic regression is one of the most commonly used regression methods used for binary outcome. The regression function is defined
  3. at the level of the logit

  4.          
  5.             log(theta_i/(1-theta_i))=xi'beta
  6.             

  7. The following function evaluates the negative log-likelihood for a logistic regression.

  8. ```R

  9. negLogLikLogistic=function(y,X,beta){
  10.        eta=X%*%beta
  11.        theta=exp(eta)/(1+exp(eta))
  12.       
  13.        out=ifelse(y==1,log(theta),log(1-theta))
  14.        return(-sum(out))
  15. }

  16. ```

  17. Now we show how the above function can be used in combination with `optim` to obtain Maximum Likelihood Estimates.

  18. We compare our MLE estimates with those obtained using the `glm` function.

  19. ```R
  20. load('~/GitHub/EPI853B/gout.RData')

  21. y=ifelse(Y$Gout=='Y',1,0)
  22. X=as.matrix(model.matrix(~UricAcid+Race+Sex+Age,data=Y))

  23. # centering all columsn of X does not change the ML estimates but facilitates convergence.
  24.   for(i in 2:ncol(X)){ X[,i]=X[,i]-mean(X[,i])}

  25. INT=log(mean(y)/(1-mean(y)))
  26. initialValues=c(INT, rep(0,ncol(X)-1))

  27. # Optim is a general purpouse optimization (minimization) function.
  28.   fm=optim(fn=negLogLikLogistic,y=y,X=X,par=initialValues)
  29.   names(fm$par)=colnames(X)
  30.   
  31. # GLM performs ML estimation for logistic regression
  32.   fm2=glm(y~X-1,family=binomial)
  33.   
  34. # Comparison of results
  35.   cbind(fm$par,coef(fm2))

  36. ```
复制代码

地板
Lisrelchen 发表于 2016-12-18 03:59:05
  1. ## Example 1. OLS regression using 'cell means'

  2. ```R
  3. rm(list=ls())
  4. library(BGLR)
  5. data(mice)

  6. counts=table(mice.pheno$cage)
  7. tmp=names(counts)[counts>1]
  8. Y=mice.pheno[mice.pheno$cage%in%tmp,]
  9. y=Y$Obesity.BMI
  10. cage=factor(Y$cage)

  11. # Note 1: OLS on a set of orthogonal dummy variables (columns of Z) produces 'cell means'
  12.   cageMeans=tapply(FUN=mean,X=y,INDEX=cage)
  13.   fm=lm(y~cage-1)
  14.   plot(cageMeans,coef(fm))
  15.   
  16. # Note 2: SEs of OLS are inversely proportional to counts
  17.   SE.lm=summary(fm)$coef[,2]
  18.   n=counts[match(names(SE.lm),paste0('cage',names(counts)))]
  19.   plot(n,SE.lm)
  20. ```

  21. When the number of records per group (cage in our case) is small, the sampling variance of estimates is large and over-fitting becomes a risk. In these cases Bayesian estimators that shrink estimates towards a prior mean can aid.

  22. Before, we have assigned IID normal priors to effects with null mean and very large variance to avoid influence of the prior on inferences.

  23. However, in this case we want the prior to influence estimates. Using a smaller prior variance can induce shrinkage of estimates towards zero. But what value should we choose for the prior variance? It turns out we can estimate the variance parameter from the data by simply treating this variance as unknown. For simplicity we will assign to it a scaled-inverse chi-square prior.


  24. ## Bayesian Shrinkage Estimation (Bayesian 'Ridge Regression')

  25. The variance of effects can be treated as uknown, in the same way we treated the error variance as uknown. The following sampler allows us to do that. The algorithm involves minimal changes relative to the one we used for the case of a model with a 'flat' prior for effects. I took the code of the [Gibbs Sampler for Multiple Linear Regression](https://github.com/gdlc/STT465/blob/master/gibbsLinearRegression.md) and made modifications so that we can group predictors into sets. Each set has it's own variance of effects. For some we can specify `type='fixed'` in which case the prior variance is assinged a large value and not updated, or `type='random'` in which case the variance parameter is sampled from the posterior density.

  26. #### Gibbs Sampler
  27. The following Gibbs Sampler implements a multiple linear regression model of the form `y=Xb+e`. Here, `y` is an nx1 vector (NAs allowed), `X` (nxp) is an incidence matrix for effects (if you want an intercept include it in `X`), `b` (px1) is a vector of effects and `e` is a vector of model residuals which are assumed to be IID normal with null mean and common variance (varE). The effects of `X` are grouped into terms according to the index provided in `group` (integer, px1). The prior of effects are normal with zero mean and group-specific variance. The vector `type` indicate for every group wheather to assign a flat (`"fixed"`) or non-flat (`"random"`) prior. For flat prior the variance is set to a very large value, for non-flat priors the variance is sampled from the fully conditional density (one variance per group). All variances are assigned scaled-inverse chi-square prior densities.
  28. ```R
  29. GIBBS.MM=function(y,X,group,type,nIter){

  30. whichNA=which(is.na(y))
  31. nNA=length(whichNA)

  32. n=nrow(X)
  33. p=ncol(X)
  34. ## Centering all columns except the 1st (intercept)
  35.   for(i in 2:p){ X[,i]=X[,i]-mean(X[,i]) }
  36.   SSx=colSums(X^2) # we will need the sum of squares in the sampler
  37.   
  38.    
  39. ## Hyper-parameters
  40. nGroups=length(unique(group))
  41. groupSize=table(group)
  42. df0.b=rep(0,nGroups)
  43. S0.b=rep(0,nGroups)  # these two give the prior p(var)=1/var
  44. b0=rep(0,nGroups)
  45. df0.e=0
  46. S0.e=0

  47. ## Objects that will store samples
  48. B=matrix(nrow=nIter,ncol=p,NA)
  49. B[1,1]=mean(y,na.rm=T) #initializing the intercept
  50. B[1,-1]=0 # now regression coef

  51. error=y-B[1,1] #*#
  52. if(nNA>0){
  53.   error[whichNA]=0
  54.   yStar=rep(B[1,1],nNA)
  55. }
  56. varE=rep(NA,nIter)
  57. varE[1]=var(y,na.rm=T)

  58. varB=matrix(nrow=nIter,ncol=nGroups)
  59. colVar=apply(FUN=var,X=X,MARGIN=2)

  60. Vy=varE[1]
  61. Vmodel=.5*Vy
  62. Vb=Vmodel/sum(colVar)
  63. for(i in 1:nGroups){
  64.         if(type[i]=='fixed'){
  65.                 varB[,i]=1e6
  66.         }else{
  67.              varB[1,i]=Vb
  68.         }
  69. }

  70. ## Sampler

  71. for(i in 2:nIter){
  72.   # Sampling effects
  73.   b=B[i-1,]
  74.   for(j in 1:p){
  75.     xj=X[,j]
  76.     error=error+xj*b[j] #*#
  77.      rhs=sum(xj*error)/varE[i-1]+b0[group[j]]/varB[i-1,group[j]] #*#
  78.      C=SSx[j]/varE[i-1]+1/varB[i-1,group[j]] #*#
  79.      sol=rhs/C
  80.      b[j]=rnorm(n=1,mean=sol,sd=sqrt(1/C))
  81.     error=error-xj*b[j]
  82.   }
  83.   B[i,]=b
  84.   # sampling the error variance
  85.          SS=sum(error^2)+S0.e
  86.          DF=n+df0.e
  87.          varE[i]=SS/rchisq(n=1,df=DF)
  88.          print(i)
  89.   # sampling variance of effects #*#
  90.   for(j in 1:nGroups){
  91.           if(type[j]!='fixed'){
  92.                   tmp=b[group==j]-b0[j]
  93.                   SS=sum(tmp^2)+S0.b[j]
  94.                   DF=groupSize[j]+df0.b
  95.                   varB[i,j]=SS/rchisq(n=1,df=DF)
  96.           }
  97.   }
  98.   # Sample missing values
  99.    if(nNA>0){
  100.       yHat=(yStar-error[whichNA])
  101.       yStar=rnorm(n=nNA,sd=sqrt(varE[i]),mean=yHat)
  102.       error[whichNA]=yStar-yHat
  103.    }
  104. }
  105. out=list(varE=varE,B=B,varB=varB)
  106. return(out)
  107. }
  108. ```

  109. ## Example 1: Without NAs

  110. ```R
  111.   
  112.   library(BGLR)
  113.   data(mice)
  114.   
  115.   # remove data from cages with a single mice
  116.    mice.pheno$cage=as.character(mice.pheno$cage)
  117.    counts=table(mice.pheno$cage)
  118.    tmp=names(counts)[counts>2]
  119.    tmp=mice.pheno$cage%in%tmp
  120.    mice.pheno=mice.pheno[tmp,]
  121.    
  122.   # phenotype and predictor
  123.    y=scale(mice.pheno$Obesity.BMI)
  124.    cage=factor(mice.pheno$cage)

  125.   # incidence matrix
  126.    Z=as.matrix(model.matrix(~cage-1))


  127.   # OLS
  128.    fm=lm(y~cage-1)
  129.    
  130.   # Bayesian
  131.    groups=c(1,rep(2,ncol(Z)))
  132.    type=c("fixed","random")
  133.    fmB=GIBBS.MM(y=y,X=cbind(1,Z),group=groups,type=type,nIter=600)
  134.    
  135.   # Comparison of OLS and Bayesian
  136.    bHatB=colMeans(fmB$B[-(1:100),])
  137.    yHatB=cbind(1,Z)%*%bHatB
  138.    yHatOLS=predict(fm)
  139.    tmp=range(yHatB,yHatOLS)
  140.    plot(predict(fm),yHatB,ylim=tmp, xlim=tmp)
  141.    abline(a=0,b=1,col=2)
  142. ```

  143. ## Example 2: With NAs

  144. Here we create a testing set `tst` by asigning mice within cage to a testing set.

  145. ```R

  146. # Sampling a tst set
  147.   tst=sample(1:nrow(Z),size=200)
  148.   yNA=y
  149.   yNA[tst]=NA

  150. # OLS

  151.   fmOLS=lm(yNA~Z-1)
  152.   bHatOLS=coef(fmOLS)
  153.   yHatOLS=Z%*%bHatOLS
  154.   
  155.   fmB=GIBBS.MM(y=yNA,X=cbind(1,Z),group=groups,type=type,nIter=600)
  156.   bHatB=colMeans(fmB$B[-(1:10),])
  157.   yHatB=cbind(1,Z)%*%bHatB
  158.   
  159.   COR=matrix(nrow=2,ncol=2)
  160.   colnames(COR)=c('TRN','TST')
  161.   rownames(COR)=c('OLS','Bayes')
  162.   
  163.   COR[1,1]=cor(yHatOLS[-tst],y[-tst])
  164.   COR[1,2]=cor(yHatOLS[tst],y[tst])
  165.   COR[2,1]=cor(yHatB[-tst],y[-tst])
  166.   COR[2,2]=cor(yHatB[tst],y[tst])
  167.   
  168. ```

  169. ## Example: regression using cage and SNPs (genetic markers)

  170. ```R
  171.    tmp=which(rownames(mice.X)%in%mice.pheno$SUBJECT.NAME)
  172.    
  173.    W=scale(mice.X[tmp,])
  174.    W=W[,rep(c(TRUE,rep(FALSE,9)),times=ceiling(ncol(X)/10))[1:ncol(X)]]   
  175.    groups=c(1,rep(2,ncol(Z)),rep(3,ncol(W)))
  176.    type=c('fixed','random','random')
  177.    fmB2=GIBBS.MM(y=yNA,X=cbind(1,Z,W),group=groups,type=type,nIter=600)
  178.    yHatB2=cbind(1,Z,W)%*%colMeans(fmB2$B[-c(1:100),])
  179.    cor(yHatB2[tst],y[tst])
  180.    
  181. ```
复制代码

7
Lisrelchen 发表于 2016-12-18 04:01:07
  1. ## Computing the predictive distribution using MC methods

  2. **Example 1: Binomial**

  3. Here becasue the future data (yF) is a dummy variable, the predictive distribution must be Bernoulli, we just
  4. need to find the success probability. It turns out that the success probability (see derivation in class or in the book, p. 40) is the
  5. posterior mean of theta. The example below compares the predictive distribution derived using this result with one obtained using MC methods.


  6. ```R
  7. B=5e6

  8. shape1=1.5
  9. shape2=1.5
  10. N=20
  11. xBar=.2

  12. # posterior density is Beta with these shape paramerters
  13.   post.a=N*xBar+shape1
  14.   post.b=N*(1-xBar)+shape2

  15. theta=rbeta(shape1=post.a,shape2=post.b,n=B)
  16. yF=rbinom(p=theta,n=B,size=1)

  17. mean(yF) # estimated success probability obtained by averaging over possible realizations of theta
  18. (N*xBar+shape1)/(N+shape1+shape2) # analythical solution

  19. ```

  20. **Example 2: Poisson-Gamma:**

  21. In this case the predictive density is Negative Binomial, but let's try to obtain it using MC methods.


  22. ```R
  23. # Simulating data from Poisson
  24.   N=20
  25.   x=rpois(lambda=4,n=N)

  26. # Setting the prior, say that our prior expectation is E[lambda]=3 and CV[lambda]=1 (large variance relative to mean)
  27.   prior.mean=3
  28.   prior.CV=1
  29.   
  30.   prior.shape=(1/prior.CV)^2
  31.   prior.rate=prior.shape/prior.mean

  32. # The posterior distribution is gamma with these parameters
  33.   post.shape=prior.shape+sum(x)
  34.   post.rate=N+prior.rate

  35. # Let's generate the predictive distribution

  36.   B=1e6
  37.   samplesLambda=rgamma(n=B,shape=post.shape,rate=post.rate)
  38.   yF=rpois(n=B,lambda=samplesLambda)
  39.   
  40.   var(rpois(n=B,lambda=post.shape/post.rate))

  41. ```
复制代码

8
franky_sas 发表于 2016-12-18 11:31:34

9
southlander 发表于 2016-12-20 10:37:40
cccccccccc

10
tianwk 发表于 2019-7-8 13:57:17
thanks for sharing

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-9 09:45