When the number of records per group (cage in our case) is small, the sampling variance of estimates is large and over-fitting becomes a risk. In these cases Bayesian estimators that shrink estimates towards a prior mean can aid.
Before, we have assigned IID normal priors to effects with null mean and very large variance to avoid influence of the prior on inferences.
However, in this case we want the prior to influence estimates. Using a smaller prior variance can induce shrinkage of estimates towards zero. But what value should we choose for the prior variance? It turns out we can estimate the variance parameter from the data by simply treating this variance as unknown. For simplicity we will assign to it a scaled-inverse chi-square prior.
The variance of effects can be treated as uknown, in the same way we treated the error variance as uknown. The following sampler allows us to do that. The algorithm involves minimal changes relative to the one we used for the case of a model with a 'flat' prior for effects. I took the code of the [Gibbs Sampler for Multiple Linear Regression](https://github.com/gdlc/STT465/blob/master/gibbsLinearRegression.md) and made modifications so that we can group predictors into sets. Each set has it's own variance of effects. For some we can specify `type='fixed'` in which case the prior variance is assinged a large value and not updated, or `type='random'` in which case the variance parameter is sampled from the posterior density.
#### Gibbs Sampler
The following Gibbs Sampler implements a multiple linear regression model of the form `y=Xb+e`. Here, `y` is an nx1 vector (NAs allowed), `X` (nxp) is an incidence matrix for effects (if you want an intercept include it in `X`), `b` (px1) is a vector of effects and `e` is a vector of model residuals which are assumed to be IID normal with null mean and common variance (varE). The effects of `X` are grouped into terms according to the index provided in `group` (integer, px1). The prior of effects are normal with zero mean and group-specific variance. The vector `type` indicate for every group wheather to assign a flat (`"fixed"`) or non-flat (`"random"`) prior. For flat prior the variance is set to a very large value, for non-flat priors the variance is sampled from the fully conditional density (one variance per group). All variances are assigned scaled-inverse chi-square prior densities.
```R
GIBBS.MM=function(y,X,group,type,nIter){
whichNA=which(is.na(y))
nNA=length(whichNA)
n=nrow(X)
p=ncol(X)
## Centering all columns except the 1st (intercept)
for(i in 2:p){ X[,i]=X[,i]-mean(X[,i]) }
SSx=colSums(X^2) # we will need the sum of squares in the sampler
## Hyper-parameters
nGroups=length(unique(group))
groupSize=table(group)
df0.b=rep(0,nGroups)
S0.b=rep(0,nGroups) # these two give the prior p(var)=1/var
b0=rep(0,nGroups)
df0.e=0
S0.e=0
## Objects that will store samples
B=matrix(nrow=nIter,ncol=p,NA)
B[1,1]=mean(y,na.rm=T) #initializing the intercept