楼主: Lisrelchen
1783 6

【Case Study】Hierarchical Linear Regression in PyMC3 [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-12-18 06:47:49 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Authors: Danne Elbers, Thomas Wiecki



Today's blog post is co-written by Danne Elbers who is doing her masters thesis with me on computational psychiatry using Bayesian modeling. This post also borrows heavily from a Notebook by Chris Fonnesbeck.

The power of Bayesian modelling really clicked for me when I was first introduced to hierarchical modelling. In this blog post we will:

  • Provide and intuitive explanation of hierarchical/multi-level Bayesian modeling;
  • Show how this type of model can easily be built and estimated in PyMC3;
  • Demonstrate the advantage of using hierarchical Bayesian modelling as opposed to non-hierarchical Bayesian modelling by comparing the two;
  • Visualize the "shrinkage effect" (explained below); and
  • Highlight connections to the frequentist version of this model.

本帖隐藏的内容

Hierarchical Linear Regression in PyMC3.pdf (1.64 MB)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Hierarchical Case study regression regressio regress really Chris power blog

沙发
Lisrelchen 发表于 2016-12-18 06:53:48

1.Load Data

  1. %matplotlib inline
  2. import matplotlib.pyplot as plt
  3. import numpy as np
  4. import pymc3 as pm
  5. import pandas as pd

  6. data = pd.read_csv('data/radon.csv')

  7. county_names = data.county.unique()
  8. county_idx = data['county_code'].values
  9. n_counties = len(data.county.unique())
复制代码

藤椅
Lisrelchen 发表于 2016-12-18 06:55:05
  1. indiv_traces = {}
  2. for county_name in county_names:
  3.     # Select subset of data belonging to county
  4.     c_data = data.ix[data.county == county_name]
  5.     c_log_radon = c_data.log_radon
  6.     c_floor_measure = c_data.floor.values
  7.    
  8.     with pm.Model() as individual_model:
  9.         # Intercept prior (variance == sd**2)
  10.         a = pm.Normal('alpha', mu=0, sd=100**2)
  11.         # Slope prior
  12.         b = pm.Normal('beta', mu=0, sd=100**2)
  13.    
  14.         # Model error prior
  15.         eps = pm.Uniform('eps', lower=0, upper=100)
  16.    
  17.         # Linear model
  18.         radon_est = a + b * c_floor_measure
  19.    
  20.         # Data likelihood
  21.         radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=c_log_radon)

  22.         # Inference button (TM)!
  23.         step = pm.NUTS()
  24.         trace = pm.sample(2000, step=step, progressbar=False)
  25.    
  26.     # keep trace for later analysis
  27.     indiv_traces[county_name] = trace
复制代码

板凳
Lisrelchen 发表于 2016-12-18 06:56:20
  1. with pm.Model() as hierarchical_model:
  2.     # Hyperpriors for group nodes
  3.     mu_a = pm.Normal('mu_alpha', mu=0., sd=100**2)
  4.     sigma_a = pm.Uniform('sigma_alpha', lower=0, upper=100)
  5.     mu_b = pm.Normal('mu_beta', mu=0., sd=100**2)
  6.     sigma_b = pm.Uniform('sigma_beta', lower=0, upper=100)
  7.    
  8.     # Intercept for each county, distributed around group mean mu_a
  9.     # Above we just set mu and sd to a fixed value while here we
  10.     # plug in a common group distribution for all a and b (which are
  11.     # vectors of length n_counties).
  12.     a = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=n_counties)
  13.     # Intercept for each county, distributed around group mean mu_a
  14.     b = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=n_counties)
  15.    
  16.     # Model error
  17.     eps = pm.Uniform('eps', lower=0, upper=100)
  18.    
  19.     # Model prediction of radon level
  20.     # a[county_idx] translates to a[0, 0, 0, 1, 1, ...],
  21.     # we thus link multiple household measures of a county
  22.     # to its coefficients.
  23.     radon_est = a[county_idx] + b[county_idx] * data.floor.values
  24.    
  25.     # Data likelihood
  26.     radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
复制代码

报纸
Lisrelchen 发表于 2016-12-18 06:57:32
  1. # Inference button (TM)!
  2. with hierarchical_model:
  3.     # Use ADVI for initialization
  4.     mu, sds, elbo = pm.variational.advi(n=100000)
  5.     step = pm.NUTS(scaling=hierarchical_model.dict_to_array(sds)**2,
  6.                    is_cov=True)
  7.     hierarchical_trace = pm.sample(5000, step, start=mu)
复制代码

地板
Lisrelchen 发表于 2016-12-18 06:58:01
  1. # Plotting the hierarchical model trace -its found values- from 500 iterations onwards (right side plot)
  2. # and its accumulated marginal values (left side plot)
  3. pm.traceplot(hierarchical_trace[500:]);
复制代码

7
franky_sas 发表于 2016-12-18 11:30:48

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 18:23