STT465: Bayesian Statistical Methods (MSU)

0关注
62粉丝

VIP

已卖：4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 50288 个
通用积分: 83.6306
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2016-12-18 03:54:40 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

STT465: Bayesian Statistical Methods (MSU)

Instructor: Gustavo de los Campos ( gustavoc@msu.edu )
Time/Place: MW 10:20am-11:40am A120 Wells Hall (WH)
Syllabus
Required textbook
R-software
Tentative Schedule

本帖隐藏的内容

STT465-master.zip (1.18 MB)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏5 回帖

关键词：Statistical statistica statistic Bayesian Statist

相关帖子

沙发

Lisrelchen 发表于 2016-12-18 03:55:32

## Example: Effect of Sample Size on Likelihood and Posterior Inferences
The following example displays the likelihood and posterior density (both up to a constant)
of the Beta-Binomial model, assuming that a sample has render a sample mean of 0.3 (average
number of successes in the sample) with sample size being varied from 10 to 100.
```R
# Data
N=c(5,10,20,50,100)
xBar=.1
# Prior
shape1=3
shape2=3
# Grid of values for theta
theta=seq(from=1/1000,to=999/1000,by=1/1000)
# Likelihood
L=matrix(nrow=length(theta),ncol=length(N))
logLik=function(xBar,n,theta){
nSuc=round(xBar*n)
nFail=(n-nSuc)
log_lik=nSuc*log(theta)+nFail*log(1-theta)
return(log_lik)
}
for(i in 1:length(N)){
L[,i]=exp(logLik(xBar=xBar,n=N[i],theta))
}
# Posterior density
PD=L
for(i in 1:length(N)){
PD[,i]=dbeta(x=theta,shape1=xBar*N[i]+shape1,shape2=(1-xBar)*N[i]+shape2)
}
# Saling to get nice plots
for(i in 1:length(N)){
PD[,i]=PD[,i]/max(PD[,i])
L[,i]=L[,i]/max(L[,i])
}
plot(numeric()~numeric(),col=1,lty=1,ylim=c(0,1),xlim=range(theta),xlab=expression('theta'))
for(i in 1:length(N)){
lines(x=theta,y=PD[,i],col=i,lty=1)
lines(x=theta,y=L[,i],col=i,lty=2)
}
```

复制代码

藤椅

Lisrelchen 发表于 2016-12-18 03:56:22

### A Gibbs Sampler for a linear regression model
```R
## Toy simulation to test the program
n=100
x=rnorm(n=n)
int=100
beta=2
y=int+x*beta+rnorm(n)
## Generating an incidence matrix for effects (intercetp and regression coeff)
X=cbind(1,x)
n=nrow(X)
p=ncol(X)
## Hyper-parameters
df0=0
S0=0 # these two give the prior p(var)=1/var
b0=rep(0,p)
varB=rep(1e6 ,p)# this gives flat prior for the intercept and regression coef.
## Parameters that control the algorithm
nIter=11000
burnIn=1000
## Objects that will store samples
B=matrix(nrow=nIter,ncol=p,NA)
B[1,1]=mean(y) #initializing the intercept
B[1,-1]=0 # now regression coef
varE=rep(NA,nIter)
varE[1]=var(y)
SSx=colSums(X^2) # we will need this in the sampler
## Sampler
for(i in 2:nIter){
b=B[i-1,]
for(j in 1:p){
yStar=y-X[,-j,drop=F]%*%b[-j]
rhs=sum(X[,j]*yStar)/varE[i-1]+b0[j]/varB[j]
C=SSx[j]/varE[i-1]+1/varB[j]
sol=rhs/C
b[j]=rnorm(n=1,mean=sol,sd=sqrt(1/C))
}
B[i,]=b
# sampling the error variance
error=y-X%*%b
# sampling the variance
SS=sum(error^2)+S0
DF=n+df0
varE[i]=SS/rchisq(n=1,df=DF)
print(i)
}
plot(varE)
plot(B[,1],type='o')
plot(B[,2],type='o')
# 95% Cred. Regions
apply(FUN=quantile,prob=c(.025,.975),X=B[-c(1:burnIn),MARGIN=2)
# Posterior Means
colMeans(B[-c(1:burnIn))
# Posterior SD
apply(FUN=sd,X=B[-c(1:burnIn),MARGIN=2)
```

复制代码

板凳

Lisrelchen 发表于 2016-12-18 03:57:20

########################################################################
# A simple Gibbs Sampler for the mean and variance of a normal model #
########################################################################
```R
## Toy simulation to test the program
n=100
y=rnorm(mean=125,sd=1,n=n)
## Hyper-parameters
df0=0
S0=0 # these two give the prior p(var)=1/var
mu0=0
varMu=1e6 # this gives flat prior for the mean
## Parameters that control the algorithm
nIter=12000
burnIn=2000
## Objects that will store samples
mu=rep(NA,nIter)
varE=rep(NA,nIter)
## Initializing
varE[1]=var(y)
mu[1]=mean(y)
## Objects that we will use
Sy=sum(y)
## Sampler
for(i in 2:nIter){
# sample the mean
rhs=Sy/varE[i-1]+1/varMu
C=n/varE[i-1]+1/varMu
sol=rhs/C
mu[i]=rnorm(sd=sqrt(1/C),mean=sol,n=1)
# sampling the variance
SS=sum((y-mu[i])^2)+S0
DF=n+df0
varE[i]=SS/rchisq(n=1,df=DF)
print(i)
}
plot(varE)
abline(h=var(y),lwd=2,col=2)
plot(hist(varE,30))
abline(v=var(y),lwd=2,col=2)
plot(mu,varE)
```

复制代码

报纸

Lisrelchen 发表于 2016-12-18 03:58:09

# Logistic regression
Logistic regression is one of the most commonly used regression methods used for binary outcome. The regression function is defined
at the level of the logit
log(theta_i/(1-theta_i))=xi'beta
The following function evaluates the negative log-likelihood for a logistic regression.
```R
negLogLikLogistic=function(y,X,beta){
eta=X%*%beta
theta=exp(eta)/(1+exp(eta))
out=ifelse(y==1,log(theta),log(1-theta))
return(-sum(out))
}
```
Now we show how the above function can be used in combination with `optim` to obtain Maximum Likelihood Estimates.
We compare our MLE estimates with those obtained using the `glm` function.
```R
load('~/GitHub/EPI853B/gout.RData')
y=ifelse(Y$Gout=='Y',1,0)
X=as.matrix(model.matrix(~UricAcid+Race+Sex+Age,data=Y))
# centering all columsn of X does not change the ML estimates but facilitates convergence.
for(i in 2:ncol(X)){ X[,i]=X[,i]-mean(X[,i])}
INT=log(mean(y)/(1-mean(y)))
initialValues=c(INT, rep(0,ncol(X)-1))
# Optim is a general purpouse optimization (minimization) function.
fm=optim(fn=negLogLikLogistic,y=y,X=X,par=initialValues)
names(fm$par)=colnames(X)
# GLM performs ML estimation for logistic regression
fm2=glm(y~X-1,family=binomial)
# Comparison of results
cbind(fm$par,coef(fm2))
```

复制代码

地板

Lisrelchen 发表于 2016-12-18 03:59:05

## Example 1. OLS regression using 'cell means'
```R
rm(list=ls())
library(BGLR)
data(mice)
counts=table(mice.pheno$cage)
tmp=names(counts)[counts>1]
Y=mice.pheno[mice.pheno$cage%in%tmp,]
y=Y$Obesity.BMI
cage=factor(Y$cage)
# Note 1: OLS on a set of orthogonal dummy variables (columns of Z) produces 'cell means'
cageMeans=tapply(FUN=mean,X=y,INDEX=cage)
fm=lm(y~cage-1)
plot(cageMeans,coef(fm))
# Note 2: SEs of OLS are inversely proportional to counts
SE.lm=summary(fm)$coef[,2]
n=counts[match(names(SE.lm),paste0('cage',names(counts)))]
plot(n,SE.lm)
```
When the number of records per group (cage in our case) is small, the sampling variance of estimates is large and over-fitting becomes a risk. In these cases Bayesian estimators that shrink estimates towards a prior mean can aid.
Before, we have assigned IID normal priors to effects with null mean and very large variance to avoid influence of the prior on inferences.
However, in this case we want the prior to influence estimates. Using a smaller prior variance can induce shrinkage of estimates towards zero. But what value should we choose for the prior variance? It turns out we can estimate the variance parameter from the data by simply treating this variance as unknown. For simplicity we will assign to it a scaled-inverse chi-square prior.
## Bayesian Shrinkage Estimation (Bayesian 'Ridge Regression')
The variance of effects can be treated as uknown, in the same way we treated the error variance as uknown. The following sampler allows us to do that. The algorithm involves minimal changes relative to the one we used for the case of a model with a 'flat' prior for effects. I took the code of the [Gibbs Sampler for Multiple Linear Regression](https://github.com/gdlc/STT465/blob/master/gibbsLinearRegression.md) and made modifications so that we can group predictors into sets. Each set has it's own variance of effects. For some we can specify `type='fixed'` in which case the prior variance is assinged a large value and not updated, or `type='random'` in which case the variance parameter is sampled from the posterior density.
#### Gibbs Sampler
The following Gibbs Sampler implements a multiple linear regression model of the form `y=Xb+e`. Here, `y` is an nx1 vector (NAs allowed), `X` (nxp) is an incidence matrix for effects (if you want an intercept include it in `X`), `b` (px1) is a vector of effects and `e` is a vector of model residuals which are assumed to be IID normal with null mean and common variance (varE). The effects of `X` are grouped into terms according to the index provided in `group` (integer, px1). The prior of effects are normal with zero mean and group-specific variance. The vector `type` indicate for every group wheather to assign a flat (`"fixed"`) or non-flat (`"random"`) prior. For flat prior the variance is set to a very large value, for non-flat priors the variance is sampled from the fully conditional density (one variance per group). All variances are assigned scaled-inverse chi-square prior densities.
```R
GIBBS.MM=function(y,X,group,type,nIter){
whichNA=which(is.na(y))
nNA=length(whichNA)
n=nrow(X)
p=ncol(X)
## Centering all columns except the 1st (intercept)
for(i in 2:p){ X[,i]=X[,i]-mean(X[,i]) }
SSx=colSums(X^2) # we will need the sum of squares in the sampler
## Hyper-parameters
nGroups=length(unique(group))
groupSize=table(group)
df0.b=rep(0,nGroups)
S0.b=rep(0,nGroups) # these two give the prior p(var)=1/var
b0=rep(0,nGroups)
df0.e=0
S0.e=0
## Objects that will store samples
B=matrix(nrow=nIter,ncol=p,NA)
B[1,1]=mean(y,na.rm=T) #initializing the intercept
B[1,-1]=0 # now regression coef
error=y-B[1,1] #*#
if(nNA>0){
error[whichNA]=0
yStar=rep(B[1,1],nNA)
}
varE=rep(NA,nIter)
varE[1]=var(y,na.rm=T)
varB=matrix(nrow=nIter,ncol=nGroups)
colVar=apply(FUN=var,X=X,MARGIN=2)
Vy=varE[1]
Vmodel=.5*Vy
Vb=Vmodel/sum(colVar)
for(i in 1:nGroups){
if(type[i]=='fixed'){
varB[,i]=1e6
}else{
varB[1,i]=Vb
}
}
## Sampler
for(i in 2:nIter){
# Sampling effects
b=B[i-1,]
for(j in 1:p){
xj=X[,j]
error=error+xj*b[j] #*#
rhs=sum(xj*error)/varE[i-1]+b0[group[j]]/varB[i-1,group[j]] #*#
C=SSx[j]/varE[i-1]+1/varB[i-1,group[j]] #*#
sol=rhs/C
b[j]=rnorm(n=1,mean=sol,sd=sqrt(1/C))
error=error-xj*b[j]
}
B[i,]=b
# sampling the error variance
SS=sum(error^2)+S0.e
DF=n+df0.e
varE[i]=SS/rchisq(n=1,df=DF)
print(i)
# sampling variance of effects #*#
for(j in 1:nGroups){
if(type[j]!='fixed'){
tmp=b[group==j]-b0[j]
SS=sum(tmp^2)+S0.b[j]
DF=groupSize[j]+df0.b
varB[i,j]=SS/rchisq(n=1,df=DF)
}
}
# Sample missing values
if(nNA>0){
yHat=(yStar-error[whichNA])
yStar=rnorm(n=nNA,sd=sqrt(varE[i]),mean=yHat)
error[whichNA]=yStar-yHat
}
}
out=list(varE=varE,B=B,varB=varB)
return(out)
}
```
## Example 1: Without NAs
```R
library(BGLR)
data(mice)
# remove data from cages with a single mice
mice.pheno$cage=as.character(mice.pheno$cage)
counts=table(mice.pheno$cage)
tmp=names(counts)[counts>2]
tmp=mice.pheno$cage%in%tmp
mice.pheno=mice.pheno[tmp,]
# phenotype and predictor
y=scale(mice.pheno$Obesity.BMI)
cage=factor(mice.pheno$cage)
# incidence matrix
Z=as.matrix(model.matrix(~cage-1))
# OLS
fm=lm(y~cage-1)
# Bayesian
groups=c(1,rep(2,ncol(Z)))
type=c("fixed","random")
fmB=GIBBS.MM(y=y,X=cbind(1,Z),group=groups,type=type,nIter=600)
# Comparison of OLS and Bayesian
bHatB=colMeans(fmB$B[-(1:100),])
yHatB=cbind(1,Z)%*%bHatB
yHatOLS=predict(fm)
tmp=range(yHatB,yHatOLS)
plot(predict(fm),yHatB,ylim=tmp, xlim=tmp)
abline(a=0,b=1,col=2)
```
## Example 2: With NAs
Here we create a testing set `tst` by asigning mice within cage to a testing set.
```R
# Sampling a tst set
tst=sample(1:nrow(Z),size=200)
yNA=y
yNA[tst]=NA
# OLS
fmOLS=lm(yNA~Z-1)
bHatOLS=coef(fmOLS)
yHatOLS=Z%*%bHatOLS
fmB=GIBBS.MM(y=yNA,X=cbind(1,Z),group=groups,type=type,nIter=600)
bHatB=colMeans(fmB$B[-(1:10),])
yHatB=cbind(1,Z)%*%bHatB
COR=matrix(nrow=2,ncol=2)
colnames(COR)=c('TRN','TST')
rownames(COR)=c('OLS','Bayes')
COR[1,1]=cor(yHatOLS[-tst],y[-tst])
COR[1,2]=cor(yHatOLS[tst],y[tst])
COR[2,1]=cor(yHatB[-tst],y[-tst])
COR[2,2]=cor(yHatB[tst],y[tst])
```
## Example: regression using cage and SNPs (genetic markers)
```R
tmp=which(rownames(mice.X)%in%mice.pheno$SUBJECT.NAME)
W=scale(mice.X[tmp,])
W=W[,rep(c(TRUE,rep(FALSE,9)),times=ceiling(ncol(X)/10))[1:ncol(X)]]
groups=c(1,rep(2,ncol(Z)),rep(3,ncol(W)))
type=c('fixed','random','random')
fmB2=GIBBS.MM(y=yNA,X=cbind(1,Z,W),group=groups,type=type,nIter=600)
yHatB2=cbind(1,Z,W)%*%colMeans(fmB2$B[-c(1:100),])
cor(yHatB2[tst],y[tst])
```

复制代码

7楼

Lisrelchen 发表于 2016-12-18 04:01:07

## Computing the predictive distribution using MC methods
**Example 1: Binomial**
Here becasue the future data (yF) is a dummy variable, the predictive distribution must be Bernoulli, we just
need to find the success probability. It turns out that the success probability (see derivation in class or in the book, p. 40) is the
posterior mean of theta. The example below compares the predictive distribution derived using this result with one obtained using MC methods.
```R
B=5e6
shape1=1.5
shape2=1.5
N=20
xBar=.2
# posterior density is Beta with these shape paramerters
post.a=N*xBar+shape1
post.b=N*(1-xBar)+shape2
theta=rbeta(shape1=post.a,shape2=post.b,n=B)
yF=rbinom(p=theta,n=B,size=1)
mean(yF) # estimated success probability obtained by averaging over possible realizations of theta
(N*xBar+shape1)/(N+shape1+shape2) # analythical solution
```
**Example 2: Poisson-Gamma:**
In this case the predictive density is Negative Binomial, but let's try to obtain it using MC methods.
```R
# Simulating data from Poisson
N=20
x=rpois(lambda=4,n=N)
# Setting the prior, say that our prior expectation is E[lambda]=3 and CV[lambda]=1 (large variance relative to mean)
prior.mean=3
prior.CV=1
prior.shape=(1/prior.CV)^2
prior.rate=prior.shape/prior.mean
# The posterior distribution is gamma with these parameters
post.shape=prior.shape+sum(x)
post.rate=N+prior.rate
# Let's generate the predictive distribution
B=1e6
samplesLambda=rgamma(n=B,shape=post.shape,rate=post.rate)
yF=rpois(n=B,lambda=samplesLambda)
var(rpois(n=B,lambda=post.shape/post.rate))
```

复制代码

8楼

franky_sas 发表于 2016-12-18 11:31:34

9楼

southlander 发表于 2016-12-20 10:37:40

cccccccccc

10楼

tianwk 发表于 2019-7-8 13:57:17

thanks for sharing

STT465: Bayesian Statistical Methods (MSU) [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

STT465: Bayesian Statistical Methods (MSU) [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群