以下内容转自 数析学院,只节选了部分,有需要的同学可以直接查看原文
如何下载数据
1) 注册并登陆 "Project Tycho"
2) 前往 1 级数据 ,然后搜索和检索数据
3) 检索条件:geographic level := state; disease outcome := incidence
4) 添加所有 (使用 Ctrl+A (or Cmd+A on Macs) 全选)
5) 并向下滚动到点击此处下载结果为 excel
6) 在 excel 中打开,并导出为 CSV 文件
7) 将所有数据文件放在一个名为data的 notebook 文件夹下
1、导入相关库
- %matplotlib inline
- import matplotlib.pyplot as plt
- import seaborn as sb
- import pandas as pd
- import numpy as np
- sb.set_style('white')
- polio_data = pd.read_csv('data/POLIO_Incidence_1928-1969_20160304121200.csv', skiprows=2, na_values='-')
- polio_years = list(polio_data['YEAR'].unique())
- polio_states = polio_data.drop(['YEAR', 'WEEK'], axis=1).columns.values
- polio_states = [state.title() for state in polio_states]
- polio_data.drop(['WEEK'], axis=1, inplace=True)
- polio_data = polio_data.groupby('YEAR').sum()
- polio_data = polio_data.transpose().values
- plt.figure(figsize=(12, 12))
- sb.heatmap(polio_data, cmap='Reds', robust=True,
- xticklabels=[year if year % 5 == 0 or year == max(polio_years) else '' for year in polio_years],
- yticklabels=polio_states)
- plt.plot([polio_years.index(1955) + 0.425, polio_years.index(1955) + 0.425],
- [0, 51],
- color='black', lw=1.5)
- cax = plt.gcf().axes[-1]
- cax.set_yticklabels([10, 20, 30, 40, '50+'])
- cax.tick_params(labelsize=12)
- plt.xticks(fontsize=12)
- plt.yticks(fontsize=12)
- plt.text(0, 51.5, 'Polio cases in the United States', fontsize=14)
- plt.text(polio_years.index(1955) + 0.15, 51.5, 'Vaccine introduced', fontsize=12, weight='bold')
- plt.text(-8, -3, 'Data source: Project Tycho (tycho.pitt.edu) '
- '| Author: Randy Olson (randalolson.com / @randal_olson)',
- fontsize=10)
- plt.savefig('polio-cases-heatmap-sequential-colormap.png', bbox_inches='tight')
- ;
2、对 Measles 麻疹数据进行同样的可视化处理
- measles_data = pd.read_csv('data/MEASLES_Incidence_1928-2003_20160304120254.csv', skiprows=2, na_values='-')
- measles_years = list(measles_data['YEAR'].unique())
- measles_states = measles_data.drop(['YEAR', 'WEEK'], axis=1).columns.values
- measles_states = [state.title() for state in measles_states]
- measles_data.drop(['WEEK'], axis=1, inplace=True)
- measles_data = measles_data.groupby('YEAR').sum()
- measles_data = measles_data.transpose().values
- plt.figure(figsize=(12, 12))
- sb.heatmap(measles_data, cmap='Reds', robust=True,
- xticklabels=[year if year % 10 == 0 else '' for year in measles_years],
- yticklabels=measles_states)
- plt.plot([polio_years.index(1963) + 0.425, polio_years.index(1963) + 0.425],
- [0, 51],
- color='black', lw=1.5)
- cax = plt.gcf().axes[-1]
- cax.set_yticklabels([0, 200, 400, 600, 800, '1,000+'])
- cax.tick_params(labelsize=12)
- plt.xticks(fontsize=12)
- plt.yticks(fontsize=12)
- plt.text(0, 51.5, 'Measles cases in the United States', fontsize=14)
- plt.text(polio_years.index(1963) + 0.05, 51.5, 'Vaccine introduced', fontsize=12, weight='bold')
- plt.text(-14.5, -3, 'Data source: Project Tycho (tycho.pitt.edu) '
- '| Author: Randy Olson (randalolson.com / @randal_olson)',
- fontsize=10)
- plt.savefig('measles-cases-heatmap-sequential-colormap.png', bbox_inches='tight')
- ;
以上内容转自 数析学院,如需完整内容可以直接查看原文