楼主: Nicolle
442 18

A Primer on Bayesian Multilevel Modeling using PyStan [推广有奖]

版主

巨擘

0%

还不是VIP/贵宾

-

TA的文库  其他...

Python Programming

SAS Programming

Structural Equation Modeling

威望
16
论坛币
12330212 个
学术水平
2838 点
热心指数
2822 点
信用等级
2646 点
经验
433401 点
帖子
18722
精华
83
在线时间
7096 小时
注册时间
2005-4-23
最后登录
2018-11-17

Nicolle 学生认证  发表于 2018-2-14 23:05:13 |显示全部楼层

本帖隐藏的内容

A Primer on Bayesian Multilevel Modeling using PyStan.pdf (792.59 KB)

  1. Author: Chris Fonnesbeck
  2. License: Apache, Version 2.0 (code); CC-BY 3.0 (text)

  3. Hierarchical or multilevel modeling is a generalization of regression modeling.

  4. Multilevel models are regression models in which the constituent model parameters are given probability models. This implies that model parameters are allowed to vary by group.

  5. Observational units are often naturally clustered. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.

  6. A hierarchical model is a particular multilevel model where parameters are nested within one another.

  7. Some multilevel structures are not hierarchical.

  8. e.g. "country" and "year" are not nested, but may represent separate, but overlapping, clusters of parameters
  9. We will motivate this topic using an environmental epidemiology example.
复制代码



本帖被以下文库推荐

stata SPSS
Nicolle 学生认证  发表于 2018-2-14 23:05:45 |显示全部楼层
  1. %matplotlib inline
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib.pyplot as plt
  5. import seaborn as sns; sns.set_context('notebook')

  6. # Import radon data
  7. srrs2 = pd.read_csv('../data/srrs2.dat')
  8. srrs2.columns = srrs2.columns.map(str.strip)
  9. srrs_mn = srrs2.assign(fips=srrs2.stfips*1000 + srrs2.cntyfips)[srrs2.state=='MN']
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:06:18 |显示全部楼层
  1. Next, obtain the county-level predictor, uranium, by combining two variables.

  2. cty = pd.read_csv('../data/cty.dat')
  3. cty_mn = cty[cty.st=='MN'].copy()
  4. cty_mn[ 'fips'] = 1000*cty_mn.stfips + cty_mn.ctfips
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:06:49 |显示全部楼层
  1. Use the merge method to combine home- and county-level information in a single DataFrame.

  2. srrs_mn = srrs_mn.merge(cty_mn[['fips', 'Uppm']], on='fips')
  3. srrs_mn = srrs_mn.drop_duplicates(subset='idnum')
  4. u = np.log(srrs_mn.Uppm)

  5. n = len(srrs_mn)
  6. srrs_mn.head()
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:07:19 |显示全部楼层
  1. We also need a lookup table (dict) for each unique county, for indexing.

  2. srrs_mn.county = srrs_mn.county.str.strip()
  3. mn_counties = srrs_mn.county.unique()
  4. counties = len(mn_counties)
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:07:52 |显示全部楼层
  1. Finally, create local copies of variables.

  2. county_lookup = dict(zip(mn_counties, range(len(mn_counties))))
  3. county = srrs_mn['county_code'] = srrs_mn.county.replace(county_lookup).values
  4. radon = srrs_mn.activity
  5. srrs_mn['log_radon'] = log_radon = np.log(radon + 0.1).values
  6. floor_measure = srrs_mn.floor.values
  7. Distribution of radon levels in MN (log scale):

  8. srrs_mn.activity.apply(lambda x: np.log(x+0.1)).hist(bins=25)
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:08:39 |显示全部楼层
  1. To specify this model in Stan, we begin by constructing the data block, which includes vectors of log-radon measurements (y) and floor measurement covariates (x), as well as the number of samples (N).

  2. pooled_data = """
  3. data {
  4.   int<lower=0> N;
  5.   vector[N] x;
  6.   vector[N] y;
  7. }
  8. """
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:09:08 |显示全部楼层
  1. pooled_parameters = """
  2. parameters {
  3.   vector[2] beta;
  4.   real<lower=0> sigma;
  5. }
  6. """
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:09:35 |显示全部楼层
  1. Finally, we model the log-radon measurements as a normal sample with a mean that is a function of the floor measurement.

  2. pooled_model = """
  3. model {
  4.   y ~ normal(beta[1] + beta[2] * x, sigma);
  5. }
  6. """
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-2-14 23:09:59 |显示全部楼层
  1. We then pass the code, data, and parameters to the stan function. The sampling requires specifying how many iterations we want, and how many parallel chains to sample. Here, we will sample 2 chains of length 1000.

  2. import pystan

  3. pooled_data_dict = {'N': len(log_radon),
  4.                'x': floor_measure,
  5.                'y': log_radon}

  6. pooled_fit = pystan.stan(model_code=pooled_data + pooled_parameters + pooled_model,
  7.                          data=pooled_data_dict, iter=1000, chains=2)
复制代码
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

GMT+8, 2018-11-17 13:11