【Programming using PyMC3】Doing Bayesian Data Analysis by John K. Kruschke

1关注
62粉丝

VIP

已卖：4901份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49675 个
通用积分: 56.2487
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2016-12-11 11:04:26 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

本帖隐藏的内容

Doing Bayesian Data Analysis-master.zip (11.59 MB, 需要: 5 个论坛币)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Programming Bayesian Analysis Kruschke Program

本帖被以下文库推荐

· Bayesian NewOccidental|主题: 578, 订阅: 78

沙发

ReneeBK(未真实交易用户) 发表于 2016-12-11 11:04:47

"""
Goal: Toss a coin N times and compute the running proportion of heads.
"""
import matplotlib.pyplot as plt
import numpy as np
# Specify the total number of flips, denoted N.
N = 500
# Generate a random sample of N flips for a fair coin (heads=1, tails=0);
np.random.seed(47405)
flip_sequence = np.random.choice(a=(0, 1), p=(.5, .5), size=N, replace=True)
# Compute the running proportion of heads:
r = np.cumsum(flip_sequence)
n = np.linspace(1, N, N) # n is a vector.
run_prop = r/n # component by component division.
# Graph the running proportion:
plt.plot(n, run_prop, '-o', )
plt.xscale('log') # an alternative to plot() and xscale() is semilogx()
plt.xlim(1, N)
plt.ylim(0, 1)
plt.xlabel('Flip Number')
plt.ylabel('Proportion Heads')
plt.title('Running Proportion of Heads')
# Plot a dotted horizontal line at y=.5, just as a reference line:
plt.axhline(y=.5, ls='dashed')
# Display the beginning of the flip sequence.
flipletters = ''
for i in flip_sequence[:10]:
if i == 1:
flipletters += 'H'
else:
flipletters += 'T'
plt.text(10, 0.8, 'Flip Sequence = %s...' % flipletters)
# Display the relative frequency at the end of the sequence.
plt.text(25, 0.2, 'End Proportion = %s' % run_prop[-1])
plt.savefig('Figure_3.1.png')

复制代码

藤椅

ReneeBK(未真实交易用户) 发表于 2016-12-11 11:07:23

"""
Inferring a binomial proportion via exact mathematical analysis.
"""
import sys
import numpy as np
from scipy.stats import beta
from scipy.special import beta as beta_func
import matplotlib.pyplot as plt
from HDIofICDF import *
def bern_beta(prior_shape, data_vec, cred_mass=0.95):
"""Bayesian updating for Bernoulli likelihood and beta prior.
Input arguments:
prior_shape
vector of parameter values for the prior beta distribution.
data_vec
vector of 1's and 0's.
cred_mass
the probability mass of the HDI.
Output:
post_shape
vector of parameter values for the posterior beta distribution.
Graphics:
Creates a three-panel graph of prior, likelihood, and posterior
with highest posterior density interval.
Example of use:
post_shape = bern_beta(prior_shape=[1,1] , data_vec=[1,0,0,1,1])"""
# Check for errors in input arguments:
if len(prior_shape) != 2:
sys.exit('prior_shape must have two components.')
if any([i < 0 for i in prior_shape]):
sys.exit('prior_shape components must be positive.')
if any([i != 0 and i != 1 for i in data_vec]):
sys.exit('data_vec must be a vector of 1s and 0s.')
if cred_mass <= 0 or cred_mass >= 1.0:
sys.exit('cred_mass must be between 0 and 1.')
# Rename the prior shape parameters, for convenience:
a = prior_shape[0]
b = prior_shape[1]
# Create summary values of the data:
z = sum(data_vec[data_vec == 1]) # number of 1's in data_vec
N = len(data_vec) # number of flips in data_vec
# Compute the posterior shape parameters:
post_shape = [a+z, b+N-z]
# Compute the evidence, p(D):
p_data = beta_func(z+a, N-z+b)/beta_func(a, b)
# Construct grid of theta values, used for graphing.
bin_width = 0.005 # Arbitrary small value for comb on theta.
theta = np.arange(bin_width/2, 1-(bin_width/2)+bin_width, bin_width)
# Compute the prior at each value of theta.
p_theta = beta.pdf(theta, a, b)
# Compute the likelihood of the data at each value of theta.
p_data_given_theta = theta**z * (1-theta)**(N-z)
# Compute the posterior at each value of theta.
post_a = a + z
post_b = b+N-z
p_theta_given_data = beta.pdf(theta, a+z, b+N-z)
# Determine the limits of the highest density interval
intervals = HDIofICDF(beta, cred_mass, a=post_shape[0], b=post_shape[1])
# Plot the results.
plt.figure(figsize=(12, 12))
plt.subplots_adjust(hspace=0.7)
# Plot the prior.
locx = 0.05
plt.subplot(3, 1, 1)
plt.plot(theta, p_theta)
plt.xlim(0, 1)
plt.ylim(0, np.max(p_theta)*1.2)
plt.xlabel(r'$\theta$')
plt.ylabel(r'$P(\theta)$')
plt.title('Prior')
plt.text(locx, np.max(p_theta)/2, r'beta($\theta$;%s,%s)' % (a, b))
# Plot the likelihood:
plt.subplot(3, 1, 2)
plt.plot(theta, p_data_given_theta)
plt.xlim(0, 1)
plt.ylim(0, np.max(p_data_given_theta)*1.2)
plt.xlabel(r'$\theta$')
plt.ylabel(r'$P(D|\theta)$')
plt.title('Likelihood')
plt.text(locx, np.max(p_data_given_theta)/2, 'Data: z=%s, N=%s' % (z, N))
# Plot the posterior:
plt.subplot(3, 1, 3)
plt.plot(theta, p_theta_given_data)
plt.xlim(0, 1)
plt.ylim(0, np.max(p_theta_given_data)*1.2)
plt.xlabel(r'$\theta$')
plt.ylabel(r'$P(\theta|D)$')
plt.title('Posterior')
locy = np.linspace(0, np.max(p_theta_given_data), 5)
plt.text(locx, locy[1], r'beta($\theta$;%s,%s)' % (post_a, post_b))
plt.text(locx, locy[2], 'P(D) = %g' % p_data)
# Plot the HDI
plt.text(locx, locy[3],
'Intervals = %.3f - %.3f' % (intervals[0], intervals[1]))
plt.fill_between(theta, 0, p_theta_given_data,
where=np.logical_and(theta > intervals[0],
theta < intervals[1]),
color='blue', alpha=0.3)
return intervals
data_vec = np.repeat([1, 0], [11, 3]) # 11 heads, 3 tail
intervals = bern_beta(prior_shape=[100, 100], data_vec=data_vec)
plt.savefig('Figure_5.2.png')
plt.show()

复制代码

板凳

ReneeBK(未真实交易用户) 发表于 2016-12-11 11:08:50

"""
Posterior predictive check. Examine the veracity of the winning model by
simulating data sampled from the winning model and see if the simulated data
'look like' the actual data.
"""
import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt
# Specify known values of prior and actual data.
prior_a = 100
prior_b = 1
actual_data_Z = 8
actual_data_N = 12
# Compute posterior parameter values.
post_a = prior_a + actual_data_Z
post_b = prior_b + actual_data_N - actual_data_Z
# Number of flips in a simulated sample should match the actual sample size:
sim_sample_size = actual_data_N
# Designate an arbitrarily large number of simulated samples.
n_sim_samples = 1000
# Set aside a vector in which to store the simulation results.
sim_sample_Z_record = np.zeros(n_sim_samples)
# Now generate samples from the posterior.
for sample_idx in range(0, n_sim_samples):
# Generate a theta value for the new sample from the posterior.
sample_theta = beta.rvs(post_a, post_b)
# Generate a sample, using sample_theta.
sample_data = np.random.choice([0, 1], p=[1-sample_theta, sample_theta],
size=sim_sample_size, replace=True)
sim_sample_Z_record[sample_idx] = sum(sample_data)
## Make a histogram of the number of heads in the samples.
plt.hist(sim_sample_Z_record)
plt.show()

复制代码

报纸

ReneeBK(未真实交易用户) 发表于 2016-12-11 11:10:40

"""
Use this program as a template for experimenting with the Metropolis algorithm
applied to 2 parameters called theta1,theta2 defined on the domain [0,1]x[0,1].
"""
from __future__ import division
import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt
# Define the likelihood function.
# The input argument is a vector: theta = [theta1 , theta2]
def likelihood(theta):
# Data are constants, specified here:
z1, N1, z2, N2 = 5, 7, 2, 7
likelihood = (theta[0]**z1 * (1-theta[0])**(N1-z1)
* theta[1]**z2 * (1-theta[1])**(N2-z2))
return likelihood
# Define the prior density function.
# The input argument is a vector: theta = [theta1 , theta2]
def prior(theta):
# Here's a beta-beta prior:
a1, b1, a2, b2 = 3, 3, 3, 3
prior = beta.pdf(theta[0], a1, b1) * beta.pdf(theta[1], a2, b2)
return prior
# Define the relative probability of the target distribution, as a function
# of theta. The input argument is a vector: theta = [theta1 , theta2].
# For our purposes, the value returned is the UNnormalized posterior prob.
def target_rel_prob(theta):
if ((theta >= 0.0).all() & (theta <= 1.0).all()):
target_rel_probVal = likelihood(theta) * prior(theta)
else:
# This part is important so that the Metropolis algorithm
# never accepts a jump to an invalid parameter value.
target_rel_probVal = 0.0
return target_rel_probVal
# if ( all( theta >= 0.0 ) & all( theta <= 1.0 ) ) {
# target_rel_probVal = likelihood( theta ) * prior( theta )
# Specify the length of the trajectory, i.e., the number of jumps to try:
traj_length = 5000 # arbitrary large number
# Initialize the vector that will store the results.
trajectory = np.zeros((traj_length, 2))
# Specify where to start the trajectory
trajectory[0, ] = [0.50, 0.50] # arbitrary start values of the two param's
# Specify the burn-in period.
burn_in = np.ceil(.1 * traj_length) # arbitrary number
# Initialize accepted, rejected counters, just to monitor performance.
n_accepted = 0
n_rejected = 0
# Specify the seed, so the trajectory can be reproduced.
np.random.seed(47405)
# Specify the covariance matrix for multivariate normal proposal distribution.
n_dim, sd1, sd2 = 2, 0.2, 0.2
covar_mat = [[sd1**2, 0], [0, sd2**2]]
# Now generate the random walk. step is the step in the walk.
for step in range(traj_length-1):
current_position = trajectory[step, ]
# Use the proposal distribution to generate a proposed jump.
# The shape and variance of the proposal distribution can be changed
# to whatever you think is appropriate for the target distribution.
proposed_jump = np.random.multivariate_normal(mean=np.zeros((n_dim)),
cov=covar_mat)
# Compute the probability of accepting the proposed jump.
prob_accept = np.minimum(1, target_rel_prob(current_position + proposed_jump)
/ target_rel_prob(current_position))
# Generate a random uniform value from the interval [0,1] to
# decide whether or not to accept the proposed jump.
if np.random.rand() < prob_accept:
# accept the proposed jump
trajectory[step+1, ] = current_position + proposed_jump
# increment the accepted counter, just to monitor performance
if step > burn_in:
n_accepted += 1
else:
# reject the proposed jump, stay at current position
trajectory[step+1, ] = current_position
# increment the rejected counter, just to monitor performance
if step > burn_in:
n_rejected += 1
# End of Metropolis algorithm.
#-----------------------------------------------------------------------
# Begin making inferences by using the sample generated by the
# Metropolis algorithm.
# Extract just the post-burnIn portion of the trajectory.
accepted_traj = trajectory[burn_in:]
# Compute the means of the accepted points.
mean_traj = np.mean(accepted_traj, axis=0)
# Compute the standard deviations of the accepted points.
stdTraj = np.std(accepted_traj, axis=0)
# Plot the trajectory of the last 500 sampled values.
plt.plot(accepted_traj[:,0], accepted_traj[:,1], marker='o', alpha=0.3)
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.xlabel(r'$\theta1$')
plt.ylabel(r'$\theta2$')
# Display means in plot.
plt.plot(0, label='M = %.3f, %.3f' % (mean_traj[0], mean_traj[1]), alpha=0.0)
# Display rejected/accepted ratio in the plot.
plt.plot(0, label=r'$N_{pro}=%s$ $\frac{N_{acc}}{N_{pro}} = %.3f$' % (len(accepted_traj), (n_accepted/len(accepted_traj))), alpha=0)
# Evidence for model, p(D).
# Compute a,b parameters for beta distribution that has the same mean
# and stdev as the sample from the posterior. This is a useful choice
# when the likelihood function is binomial.
a = mean_traj * ((mean_traj*(1-mean_traj)/stdTraj**2) - np.ones(n_dim))
b = (1-mean_traj) * ( (mean_traj*(1-mean_traj)/stdTraj**2) - np.ones(n_dim))
# For every theta value in the posterior sample, compute
# beta.pdf(theta, a, b) / likelihood(theta) * prior(theta)
# This computation assumes that likelihood and prior are properly normalized,
# i.e., not just relative probabilities.
wtd_evid = np.zeros(np.shape(accepted_traj)[0])
for idx in range(np.shape(accepted_traj)[0]):
wtd_evid[idx] = (beta.pdf(accepted_traj[idx,0],a[0],b[0] )
* beta.pdf(accepted_traj[idx,1],a[1],b[1]) /
(likelihood(accepted_traj[idx,]) * prior(accepted_traj[idx,])))
p_data = 1 / np.mean(wtd_evid)
# Display p(D) in the graph
plt.plot(0, label='p(D) = %.3e' % p_data, alpha=0)
plt.legend(loc='upper left')
plt.savefig('Figure_8.3.png')
# Estimate highest density region by evaluating posterior at each point.
accepted_traj = trajectory[burn_in:]
npts = np.shape(accepted_traj)[0]
post_prob = np.zeros((npts))
for ptIdx in range(npts):
post_prob[ptIdx] = target_rel_prob(accepted_traj[ptIdx,])
# Determine the level at which credmass points are above:
credmass = 0.95
waterline = np.percentile(post_prob, (credmass))
HDI_points = accepted_traj[post_prob > waterline, ]
plt.figure()
plt.plot(HDI_points[:,0], HDI_points[:,1], 'ro')
plt.xlim(0,1)
plt.ylim(0,1)
plt.xlabel(r'$\theta1$')
plt.ylabel(r'$\theta2$')
# Display means in plot.
plt.plot(0, label='M = %.3f, %.3f' % (mean_traj[0], mean_traj[1]), alpha=0.0)
# Display rejected/accepted ratio in the plot.
plt.plot(0, label=r'$N_{pro}=%s$ $\frac{N_{acc}}{N_{pro}} = %.3f$' % (len(accepted_traj), (n_accepted/len(accepted_traj))), alpha=0)
# Display p(D) in the graph
plt.plot(0, label='p(D) = %.3e' % p_data, alpha=0)
plt.legend(loc='upper left')
plt.savefig('Figure_8.3_HDI.png')
plt.show()

复制代码

地板

soccy(未真实交易用户) 发表于 2016-12-11 18:49:02

......

7楼

franky_sas(未真实交易用户) 发表于 2016-12-12 12:04:48

8楼

terryzhao1(真实交易用户)

发表于 2016-12-24 17:11:20

好东西，可以看一看

9楼

terryzhao1(真实交易用户)

发表于 2016-12-24 17:11:21

好东西，可以看一看。

10楼

restalker(真实交易用户) 发表于 2016-12-26 17:28:25

回复过了

【Programming using PyMC3】Doing Bayesian Data Analysis by John K. Kruschke [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

【Programming using PyMC3】Doing Bayesian Data Analysis by John K. Kruschke [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群