请选择 进入手机版 | 继续访问电脑版
楼主: oliyiyi
1699 1

Hierarchical Linear Model [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
272091 个
通用积分
31269.1753
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383778 点
帖子
9599
精华
66
在线时间
5466 小时
注册时间
2007-5-21
最后登录
2024-3-21

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2016-11-18 13:10:44 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Hierarchical Linear Model

Linear regression probably is the most familiar technique of data analysis, but its application is often hamstrung by model assumptions. For instance, if the data has a hierarchical structure, quite often the assumptions of linear regression are feasible only at local levels. We will investigate an extension of the linear model to bi-level hierarchies.


Example

We borrow an example from Rossi, Allenby and McCulloch (2005) for demonstration. It is based upon a data set called ’cheese’ from the baysem package. The data set contains marketing data of certain brand name processed cheese, such as the weekly sales volume (VOLUME), unit retail price (PRICE), and display activity level (DISP) in various regional retailer accounts. For each account, we can define the following linear regression model of the log sales volume, where β1 is the intercept term, β2 is the display measure coefficient, and β3 is the log price coefficient.


The stochastic term ϵ relies on regional market conditions, and we would not expect it to have the same dispersion among retailers. For the same reason, we cannot expect identical regression coefficients for all accounts, or attempt to define a single linear regression model for the entire data set.

Nevertheless, we do expect regression coefficients of the retailer accounts to be related. A common approach to simulate the relationship is the hierarchical linear model, which treats the regression coefficients as random variables of yet another linear regression at the system level.


Problem

Fit the data set cheese with the hierarchical linear model, and estimate the average impact on sales volumes of the retailers if the unit retail price is to be raised by 5%.


Solution

Our first task is to divide the data according to regional retailers. The following shows that the RETAILER column has the factor data type, which allow us to extract the entire list of retailer accounts with the levels method.

> library(bayesm)
> data(cheese)
> str(cheese)
’data.frame’:  5555 obs. of  4 variables:
$ RETAILER: Factor w/ 88 levels ...
$ VOLUME  : int  21374 6427 17302 ...
$ DISP    : num  0.162 0.1241 0.102 ...
$ PRICE   : num  2.58 3.73 2.71 2.65 ...

> retailer <- levels(cheese$RETAILER)
> nreg <- length(retailer); nreg
[1] 88

Let i to be an integer between 1 and the number of retailer accounts. We define a filter for the i-th account as follows.

    filter <- cheese$RETAILER==retailer

We now loop through the accounts, and create a list of data items consisting of the Xand y components of the linear regression model in each account. The columns of Xbelow contains the intercept placeholder, the display measure, and log price data.

regdata <- NULL
for (i in 1:nreg) {
    filter <- cheese$RETAILER==retailer
    y <- log(cheese$VOLUME[filter])
    X <- cbind(1,      # intercept placeholder
            cheese$DISP[filter],
            log(cheese$PRICE[filter]))
    regdata[] <- list(y=y, X=X)
}

Finally, we wrap the regdata and the iteration parameters in lists, and invoke therhierLinearModel method of the bayesm package. It takes less than 2 seconds for 2,000 MCMC iterations on a quad-core CPU.

> Data <- list(regdata=regdata)
> Mcmc <- list(R=2000)

> system.time(
+ out <- bayesm::rhierLinearModel(
+         Data=Data,
+         Mcmc=Mcmc))
....................
   user  system elapsed
  2.374   3.804   1.639

We can perform the same MCMC simulation with an identically named method in rpudplus. The process finishes at about the same rate. The speedup will be more pronounced for larger data sets such as the one in Exercise 3 below. Note the extra output option that we explicitly set to be ’bayesm’ for compatibility.

> library(rpud)    # load rpudplus
> system.time(
+ out <- rpud::rhierLinearModel(
+         Data=Data,
+         Mcmc=Mcmc,
+         output="bayesm"))
....................
   user  system elapsed
  0.131   1.090   1.249

Observe that the log price data is in the third column of the X component. Hence we can estimate the log price coefficient from the third component of the second dimension in the betadraw attribute of the MCMC output. We also drop the first 10% samples for burn-in, i.e. 200 MCMC draws, before evaluating the mean of the simulation values.

> beta.3 <- mean(as.vector(out$betadraw[, 3, 201:2000]))
> beta.3
[1] -2.146

A 5% increase of the unit price amounts to an increment of log(1.05) in the log price. Therefore the log sales volume will be adjusted by -2.146 * log(1.05). The computation below shows that the sales volume is expected to decrease by 10% on average.

> exp(beta.3 * log(1.05))
[1] 0.9006


Exercise
  • Confirm MCMC convergence in the simulation of the hierarchical linear model of the cheese data set. As a hint, there is a ’coda’ output option in the rpud::rhierLinearModel method for this purpose. Make sure the coda package is installed beforehand.
  • Create a plot for the posterior mean of display measures and the matching least square coefficients of the cheese data set as illustrated in Fig 3.6 of Rossi, Allenby and McCulloch (2005). Define the Gibbs priors to be ν = 6 and V = 0.6 * I3. Consult the text for an explanation of the prior parameters. Observe outliers of the least square coefficients.
  • The cudaBayesreg package employs a hierarchical linear model for analysis of fMRI data. There is sample code in Silva (2010) that creates a brain activity image as below. Replace the cudaMultireg.slice method in the sample code by rpud::rhierLinearModel for the same effect.


Note

Due to sequential nature of the MCMC algorithm and modest data size, the current implementation of the rpud::rhierLinearModel method is mostly CPU bound. Its only CUDA dependency is the random number generator for MCMC simulation. There is plenty of room for optimization with large data sets.


References





二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Hierarchical Linear model Linea ARCH

已有 1 人评分学术水平 热心指数 信用等级 收起 理由
janyiyi + 3 + 3 + 3 精彩帖子

总评分: 学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
janyiyi 发表于 2016-12-6 12:35:52 |显示全部楼层 |坛友微信交流群
谢谢分享
已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
oliyiyi + 20 + 1 + 1 + 1 精彩帖子

总评分: 论坛币 + 20  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-29 17:43