楼主: 时光人
1262 18

数据分析最有用的25个 Matplotlib图 [分享]

  • 3关注
  • 25粉丝

学科带头人

97%

还不是VIP/贵宾

-

威望
0
论坛币
1999 个
通用积分
172.4803
学术水平
63 点
热心指数
75 点
信用等级
59 点
经验
30686 点
帖子
1511
精华
1
在线时间
408 小时
注册时间
2019-2-23
最后登录
2019-12-13

时光人 学生认证  发表于 2019-12-3 09:58:26 |显示全部楼层

AIU人工智能学院:数据科学、人工智能从业者的在线大学。

数据科学(Python/R/Julia)数据分析、机器学习、深度学习



25个Matplotlib图的汇编,在数据分析和可视化中最有用。此列表允许您使用Python的Matplotlib和Seaborn库选择要显示的可视化对象。

1.关联

  • 散点图
  • 带边界的气泡图
  • 带线性回归最佳拟合线的散点图
  • 抖动图
  • 计数图
  • 边缘直方图
  • 边缘箱形图
  • 相关图
  • 矩阵图

2.偏差

  • 发散型条形图
  • 发散型文本
  • 发散型包点图
  • 带标记的发散型棒棒糖图
  • 面积图

3.排序

  • 有序条形图
  • 棒棒糖图
  • 包点图
  • 坡度图
  • 哑铃图

4.分布

  • 连续变量的直方图
  • 类型变量的直方图
  • 密度图
  • 直方密度线图
  • Joy Plot
  • 分布式包点图
  • 包点+箱形图
  • Dot + Box Plot
  • 小提琴图
  • 人口金字塔
  • 分类图

5.组成

  • 华夫饼图
  • 饼图
  • 树形图
  • 条形图


6.变化

  • 时间序列图
  • 带波峰波谷标记的时序图
  • 自相关和部分自相关图
  • 交叉相关图
  • 时间序列分解图
  • 多个时间序列
  • 使用辅助Y轴来绘制不同范围的图形
  • 带有误差带的时间序列
  • 堆积面积图
  • 未堆积的面积图
  • 日历热力图
  • 季节图

7.分组

  • 树状图
  • 簇状图
  • 安德鲁斯曲线
  • 平行坐标
  1. # !pip install brewer2mpl
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib as mpl
  5. import matplotlib.pyplot as plt
  6. import seaborn as sns
  7. import warnings; warnings.filterwarnings(action='once')
  8. large = 22; med = 16; small = 12
  9. params = {'axes.titlesize': large,
  10. 'legend.fontsize': med,
  11. 'figure.figsize': (16, 10),
  12. 'axes.labelsize': med,
  13. 'axes.titlesize': med,
  14. 'xtick.labelsize': med,
  15. 'ytick.labelsize': med,
  16. 'figure.titlesize': large}
  17. plt.rcParams.update(params)
  18. plt.style.use('seaborn-whitegrid')
  19. sns.set_style("white")
  20. %matplotlib inline
  21. # Version
  22. print(mpl.__version__) #> 3.0.0
  23. print(sns.__version__) #> 0.9.0
复制代码


相关性

相关图用于可视化两个或多个变量之间的关系。也就是说,一个变量相对于另一个变量如何变化。


1. 散点图

Scatteplot是用于研究两个变量之间关系的经典和基本图。如果数据中有多个组,则可能需要以不同颜色可视化每个组。在Matplotlib,你可以方便地使用。

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()
复制代码


2. 带边界的气泡图

有时,您希望在边界内显示一组点以强调其重要性。在此示例中,您将从应该被环绕的数据帧中获取记录,并将其传递给下面的代码中描述的记录。encircle()

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()
复制代码



3. 带线性回归最佳拟合线的散点图

如果你想了解两个变量如何相互改变,那么最合适的线就是要走的路。下图显示了数据中各组之间最佳拟合线的差异。要禁用分组并仅为整个数据集绘制一条最佳拟合线,请从下面的调用中删除该参数。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


每个回归线都在自己的列中

或者,您可以在其自己的列中显示每个组的最佳拟合线。你可以通过在里面设置参数来实现这一点。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Each line in its own column
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy",
  7. data=df_select,
  8. height=7,
  9. robust=True,
  10. palette='Set1',
  11. col="cyl",
  12. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  13. # Decorations
  14. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  15. plt.show()
复制代码



关注“AIU人工智能”公众号,回复“白皮书”获取数据分析、大数据、人工智能行业白皮书及更多精选学习资料!




回帖推荐

dqq2001 发表于17楼  查看完整内容

感谢楼主分享,好文件

小小马克思 发表于19楼  查看完整内容

厉害,学习了,有python的代码吗

kkwei 发表于15楼  查看完整内容

一个语法这么冗余的预演怎么会有这么多人用啊~~~

笑开心 发表于10楼  查看完整内容

BUDDHA BLESS !!!

myazure 发表于14楼  查看完整内容

学习一下

ibmandwto 发表于7楼  查看完整内容

楼主辛苦了,感谢。。。
已有 3 人评分经验 学术水平 热心指数 信用等级 收起 理由
cheetahfly + 100 精彩帖子
zl89 + 80 精彩帖子
东方祥 + 80 + 4 + 5 + 3 精彩帖子

总评分: 经验 + 260  学术水平 + 4  热心指数 + 5  信用等级 + 3   查看全部评分


AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
stata SPSS
时光人 学生认证  发表于 2019-12-3 09:58:57 |显示全部楼层

4. 抖动图

通常,多个数据点具有完全相同的X和Y值。结果,多个点相互绘制并隐藏。为避免这种情况,请稍微抖动点,以便您可以直观地看到它们。这很方便使用

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Draw Stripplot
  4. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  5. sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)
  6. # Decorations
  7. plt.title('Use jittered plots to avoid overlapping of points', fontsize=22)
  8. plt.show()
复制代码


5. 计数图

避免点重叠问题的另一个选择是增加点的大小,这取决于该点中有多少点。因此,点的大小越大,周围的点的集中度就越大。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')
  4. # Draw Stripplot
  5. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  6. sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)
  7. # Decorations
  8. plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22)
  9. plt.show()
复制代码


6. 边缘直方图

边缘直方图具有沿X和Y轴变量的直方图。这用于可视化X和Y之间的关系以及单独的X和Y的单变量分布。该图如果经常用于探索性数据分析(EDA)。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Create Fig and gridspec
  4. fig = plt.figure(figsize=(16, 10), dpi= 80)
  5. grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)
  6. # Define the axes
  7. ax_main = fig.add_subplot(grid[:-1, :-1])
  8. ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
  9. ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])
  10. # Scatterplot on main ax
  11. ax_main.scatter('displ',
  12. 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes,
  13. alpha=.9, data=df, cmap="tab10", edgecolors='gray', linewidths=.5)
  14. # histogram on the right
  15. ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink')
  16. ax_bottom.invert_yaxis()
  17. # histogram in the bottom
  18. ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink')
  19. # Decorations
  20. ax_main.set(title='Scatterplot with Histograms
  21. displ vs hwy', xlabel='displ', ylabel='hwy')
  22. ax_main.title.set_fontsize(20)
  23. for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
  24. item.set_fontsize(14)
  25. xlabels = ax_main.get_xticks().tolist()
  26. ax_main.set_xticklabels(xlabels)
  27. plt.show()
复制代码


7.边缘箱形图

边缘箱图与边缘直方图具有相似的用途。然而,箱线图有助于精确定位X和Y的中位数,第25和第75百分位数。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Create Fig and gridspec
  4. fig = plt.figure(figsize=(16, 10), dpi= 80)
  5. grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)
  6. # Define the axes
  7. ax_main = fig.add_subplot(grid[:-1, :-1])
  8. ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
  9. ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])
  10. # Scatterplot on main ax
  11. ax_main.scatter('displ',
  12. 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes,
  13. alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)
  14. # Add a graph in each part
  15. sns.boxplot(df.hwy, ax=ax_right, orient="v")
  16. sns.boxplot(df.displ, ax=ax_bottom, orient="h")
  17. # Decorations ------------------
  18. # Remove x axis name for the boxplot
  19. ax_bottom.set(xlabel='')
  20. ax_right.set(ylabel='')
  21. # Main Title, Xlabel and YLabel
  22. ax_main.set(title='Scatterplot with Histograms
  23. displ vs hwy', xlabel='displ', ylabel='hwy')
  24. # Set font size of different components
  25. ax_main.title.set_fontsize(20)
  26. for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
  27. item.set_fontsize(14)
  28. plt.show()
复制代码


8. 相关图

Correlogram用于直观地查看给定数据帧(或2D数组)中所有可能的数值变量对之间的相关度量。

9. 矩阵图

成对图是探索性分析中的最爱,以理解所有可能的数字变量对之间的关系。它是双变量分析的必备工具。

  1. # Load Dataset
  2. df = sns.load_dataset('iris')
  3. # Plot
  4. plt.figure(figsize=(10,8), dpi= 80)
  5. sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5))
  6. plt.show()
复制代码


  1. # Load Dataset
  2. df = sns.load_dataset('iris')
  3. # Plot
  4. plt.figure(figsize=(10,8), dpi= 80)
  5. sns.pairplot(df, kind="reg", hue="species")
  6. plt.show()
复制代码


偏差



已有 1 人评分经验 收起 理由
cheetahfly + 100 精彩帖子

总评分: 经验 + 100   查看全部评分


AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
回复

使用道具 举报

时光人 学生认证  发表于 2019-12-3 09:59:19 |显示全部楼层

10. 发散型条形图

如果您想根据单个指标查看项目的变化情况,并可视化此差异的顺序和数量,那么发散条是一个很好的工具。它有助于快速区分数据中组的性能,并且非常直观,并且可以立即传达这一点。

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
  6. df.sort_values('mpg_z', inplace=True)
  7. df.reset_index(inplace=True)
  8. # Draw plot
  9. plt.figure(figsize=(14,10), dpi= 80)
  10. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5)
  11. # Decorations
  12. plt.gca().set(ylabel='$Model
  13. [align=center][color=rgb(34, 34, 34)][backcolor=rgb(255, 255, 255)][font=&quot][size=14px][img]http://p3.pstatp.com/large/pgc-image/51250e14d15345a59392bb01781b3661[/img][/size][/font][/backcolor][/color][/align]
  14. [align=left][color=rgb(34, 34, 34)][backcolor=rgb(255, 255, 255)][font=&quot][size=14px]11. 发散型文本[/size][/font][/backcolor][/color][/align][align=left][color=rgb(34, 34, 34)][backcolor=rgb(255, 255, 255)][font=&quot][size=14px]分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。[/size][/font][/backcolor][/color][/align][code]# Prepare Data
  15. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  16. x = df.loc[:, ['mpg']]
  17. df['mpg_z'] = (x - x.mean())/x.std()
  18. df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
  19. df.sort_values('mpg_z', inplace=True)
  20. df.reset_index(inplace=True)
  21. # Draw plot
  22. plt.figure(figsize=(14,14), dpi= 80)
  23. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z)
  24. for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):
  25. t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left',
  26. verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14})
  27. # Decorations
  28. plt.yticks(df.index, df.cars, fontsize=12)
  29. plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20})
  30. plt.grid(linestyle='--', alpha=0.5)
  31. plt.xlim(-2.5, 2.5)
  32. plt.show()
复制代码


12. 发散型包点图

发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']]
  6. df.sort_values('mpg_z', inplace=True)
  7. df.reset_index(inplace=True)
  8. # Draw plot
  9. plt.figure(figsize=(14,16), dpi= 80)
  10. plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors)
  11. for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):
  12. t = plt.text(x, y, round(tex, 1), horizontalalignment='center',
  13. verticalalignment='center', fontdict={'color':'white'})
  14. # Decorations
  15. # Lighten borders
  16. plt.gca().spines["top"].set_alpha(.3)
  17. plt.gca().spines["bottom"].set_alpha(.3)
  18. plt.gca().spines["right"].set_alpha(.3)
  19. plt.gca().spines["left"].set_alpha(.3)
  20. plt.yticks(df.index, df.cars)
  21. plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20})
  22. plt.xlabel('$Mileage
  23. [img]http://p1.pstatp.com/large/pgc-image/dd9880350784481e93a8d042eac5abd2[/img][align=left][color=rgb(34, 34, 34)][backcolor=rgb(255, 255, 255)][font=&quot][size=14px]13. 带标记的发散型棒棒糖图[/size][/font][/backcolor][/color][/align][align=left][color=rgb(34, 34, 34)][backcolor=rgb(255, 255, 255)][font=&quot][size=14px]带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。[/size][/font][/backcolor][/color][/align][code]# Prepare Data
  24. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  25. x = df.loc[:, ['mpg']]
  26. df['mpg_z'] = (x - x.mean())/x.std()
  27. df['colors'] = 'black'
  28. # color fiat differently
  29. df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange'
  30. df.sort_values('mpg_z', inplace=True)
  31. df.reset_index(inplace=True)
  32. # Draw plot
  33. import matplotlib.patches as patches
  34. plt.figure(figsize=(14,16), dpi= 80)
  35. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1)
  36. plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6)
  37. plt.yticks(df.index, df.cars)
  38. plt.xticks(fontsize=12)
  39. # Annotate
  40. plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data',
  41. fontsize=15, ha='center', va='center',
  42. bbox=dict(boxstyle='square', fc='firebrick'),
  43. arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white')
  44. # Add Patches
  45. p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red')
  46. p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green')
  47. plt.gca().add_patch(p1)
  48. plt.gca().add_patch(p2)
  49. # Decorate
  50. plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
  51. plt.grid(linestyle='--', alpha=0.5)
  52. plt.show()
复制代码


, xlabel='$Mileage


11. 发散型文本

分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()
复制代码


12. 发散型包点图

发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()
复制代码

13. 带标记的发散型棒棒糖图

带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


)
plt.yticks(df.index, df.cars, fontsize=12)
plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.show()[/code]


11. 发散型文本

分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()
复制代码


12. 发散型包点图

发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()
复制代码

13. 带标记的发散型棒棒糖图

带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


)
plt.grid(linestyle='--', alpha=0.5)
plt.xlim(-2.5, 2.5)
plt.show()

数据分析最有用的25个 Matplotlib图(一)[/code]

13. 带标记的发散型棒棒糖图

带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


, xlabel='$Mileage


11. 发散型文本

分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()
复制代码


12. 发散型包点图

发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()
复制代码

13. 带标记的发散型棒棒糖图

带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


)
plt.yticks(df.index, df.cars, fontsize=12)
plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.show()[/code]


11. 发散型文本

分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()
复制代码


12. 发散型包点图

发散点图也类似于发散条。然而,与发散条相比,条的不存在减少了组之间的对比度和差异。

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()
复制代码

13. 带标记的发散型棒棒糖图

带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
复制代码


已有 1 人评分经验 收起 理由
cheetahfly + 100 精彩帖子

总评分: 经验 + 100   查看全部评分


AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
回复

使用道具 举报

时光人 学生认证  发表于 2019-12-3 10:00:45 |显示全部楼层

14.面积图

通过对轴和线之间的区域进行着色,区域图不仅强调峰值和低谷,而且还强调高点和低点的持续时间。高点持续时间越长,线下面积越大。

  1. import numpy as np
  2. import pandas as pd
  3. # Prepare Data
  4. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100)
  5. x = np.arange(df.shape[0])
  6. y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100
  7. # Plot
  8. plt.figure(figsize=(16,10), dpi= 80)
  9. plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7)
  10. plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7)
  11. # Annotate
  12. plt.annotate('Peak
  13. 1975', xy=(94.0, 21.0), xytext=(88.0, 28),
  14. bbox=dict(boxstyle='square', fc='firebrick'),
  15. arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white')
  16. # Decorations
  17. xtickvals = [str(m)[:3].upper()+"-"+str(y) for y,m in zip(df.date.dt.year, df.date.dt.month_name())]
  18. plt.gca().set_xticks(x[::6])
  19. plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment': 'center', 'verticalalignment': 'center_baseline'})
  20. plt.ylim(-35,35)
  21. plt.xlim(1,100)
  22. plt.title("Month Economics Return %", fontsize=22)
  23. plt.ylabel('Monthly returns %')
  24. plt.grid(alpha=0.5)
  25. plt.show()
复制代码



排序

15. 有序条形图

有序条形图有效地传达了项目的排名顺序。但是,在图表上方添加度量标准的值,用户可以从图表本身获取精确信息。

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. import matplotlib.patches as patches
  8. fig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80)
  9. ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)
  10. # Annotate Text
  11. for i, cty in enumerate(df.cty):
  12. ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center')
  13. # Title, Label, Ticks and Ylim
  14. ax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22})
  15. ax.set(ylabel='Miles Per Gallon', ylim=(0, 30))
  16. plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12)
  17. # Add patches to color the X axis labels
  18. p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure)
  19. p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure)
  20. fig.add_artist(p1)
  21. fig.add_artist(p2)
  22. plt.show()
复制代码



16. 棒棒糖图

棒棒糖图表以一种视觉上令人愉悦的方式提供与有序条形图类似的目的。

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  8. ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2)
  9. ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7)
  10. # Title, Label, Ticks and Ylim
  11. ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22})
  12. ax.set_ylabel('Miles Per Gallon')
  13. ax.set_xticks(df.index)
  14. ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12})
  15. ax.set_ylim(0, 30)
  16. # Annotate
  17. for row in df.itertuples():
  18. ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)
  19. plt.show()
复制代码


17. 包点图

点图表传达了项目的排名顺序。由于它沿水平轴对齐,因此您可以更容易地看到点彼此之间的距离。

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  8. ax.hlines(y=df.index, xmin=11, xmax=26, color='gray', alpha=0.7, linewidth=1, linestyles='dashdot')
  9. ax.scatter(y=df.index, x=df.cty, s=75, color='firebrick', alpha=0.7)
  10. # Title, Label, Ticks and Ylim
  11. ax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22})
  12. ax.set_xlabel('Miles Per Gallon')
  13. ax.set_yticks(df.index)
  14. ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'})
  15. ax.set_xlim(10, 27)
  16. plt.show()
复制代码



AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
回复

使用道具 举报

时光人 学生认证  发表于 2019-12-3 10:01:57 |显示全部楼层

18. 坡度图

斜率图最适合比较给定人/项目的“之前”和“之后”位置。

  1. import matplotlib.lines as mlines
  2. # Import Data
  3. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")
  4. left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])]
  5. right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])]
  6. klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]
  7. # draw line
  8. # https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941
  9. def newline(p1, p2, color='black'):
  10. ax = plt.gca()
  11. l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6)
  12. ax.add_line(l)
  13. return l
  14. fig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80)
  15. # Vertical Lines
  16. ax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
  17. ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
  18. # Points
  19. ax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7)
  20. ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7)
  21. # Line Segmentsand Annotation
  22. for p1, p2, c in zip(df['1952'], df['1957'], df['continent']):
  23. newline([1,p1], [3,p2])
  24. ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14})
  25. ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})
  26. # 'Before' and 'After' Annotations
  27. ax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700})
  28. ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})
  29. # Decoration
  30. ax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22})
  31. ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita')
  32. ax.set_xticks([1,3])
  33. ax.set_xticklabels(["1952", "1957"])
  34. plt.yticks(np.arange(500, 13000, 2000), fontsize=12)
  35. # Lighten borders
  36. plt.gca().spines["top"].set_alpha(.0)
  37. plt.gca().spines["bottom"].set_alpha(.0)
  38. plt.gca().spines["right"].set_alpha(.0)
  39. plt.gca().spines["left"].set_alpha(.0)
  40. plt.show()
复制代码



19. 哑铃图

哑铃图传达各种项目的“前”和“后”位置以及项目的排序。如果您想要将特定项目/计划对不同对象的影响可视化,那么它非常有用。

  1. import matplotlib.lines as mlines
  2. # Import Data
  3. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv")
  4. df.sort_values('pct_2014', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Func to draw line segment
  7. def newline(p1, p2, color='black'):
  8. ax = plt.gca()
  9. l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue')
  10. ax.add_line(l)
  11. return l
  12. # Figure and Axes
  13. fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80)
  14. # Vertical Lines
  15. ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  16. ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  17. ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  18. ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  19. # Points
  20. ax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7)
  21. ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7)
  22. # Line Segments
  23. for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']):
  24. newline([p1, i], [p2, i])
  25. # Decoration
  26. ax.set_facecolor('#f7f7f7')
  27. ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22})
  28. ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita')
  29. ax.set_xticks([.05, .1, .15, .20])
  30. ax.set_xticklabels(['5%', '15%', '20%', '25%'])
  31. ax.set_xticklabels(['5%', '15%', '20%', '25%'])
  32. plt.show()
复制代码



分配

20. 连续变量的直方图

直方图显示给定变量的频率分布。下面的表示基于分类变量对频率条进行分组,从而更好地了解连续变量和串联变量。

  1. # Import Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. # Prepare data
  4. x_var = 'displ'
  5. groupby_var = 'class'
  6. df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
  7. vals = [df[x_var].values.tolist() for i, df in df_agg]
  8. # Draw
  9. plt.figure(figsize=(16,9), dpi= 80)
  10. colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
  11. n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])
  12. # Decoration
  13. plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
  14. plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
  15. plt.xlabel(x_var)
  16. plt.ylabel("Frequency")
  17. plt.ylim(0, 25)
  18. plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])
  19. plt.show()
复制代码




AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
回复

使用道具 举报

时光人 学生认证  发表于 2019-12-3 10:03:17 |显示全部楼层

21. 类型变量的直方图

分类变量的直方图显示该变量的频率分布。通过对条形图进行着色,您可以将分布与表示颜色的另一个分类变量相关联。

  1. # Import Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. # Prepare data
  4. x_var = 'manufacturer'
  5. groupby_var = 'class'
  6. df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
  7. vals = [df[x_var].values.tolist() for i, df in df_agg]
  8. # Draw
  9. plt.figure(figsize=(16,9), dpi= 80)
  10. colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
  11. n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])
  12. # Decoration
  13. plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
  14. plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
  15. plt.xlabel(x_var)
  16. plt.ylabel("Frequency")
  17. plt.ylim(0, 40)
  18. plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left')
  19. plt.show()
复制代码


22. 密度图

密度图是一种常用工具,可视化连续变量的分布。通过“响应”变量对它们进行分组,您可以检查X和Y之间的关系。以下情况,如果出于代表性目的来描述城市里程的分布如何随着汽缸数的变化而变化。

  1. # Import Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. # Draw Plot
  4. plt.figure(figsize=(16,10), dpi= 80)
  5. sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7)
  6. sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7)
  7. sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7)
  8. sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7)
  9. # Decoration
  10. plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22)
  11. plt.legend()
复制代码


23. 直方密度线图

带有直方图的密度曲线将两个图表传达的集体信息汇集在一起,这样您就可以将它们放在一个图形而不是两个图形中。

  1. # Import Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. # Draw Plot
  4. plt.figure(figsize=(13,10), dpi= 80)
  5. sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
  6. sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
  7. sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
  8. plt.ylim(0, 0.35)
  9. # Decoration
  10. plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22)
  11. plt.legend()
  12. plt.show()
复制代码


24. Joy Plot

Joy Plot允许不同组的密度曲线重叠,这是一种可视化相对于彼此的大量组的分布的好方法。它看起来很悦目,并清楚地传达了正确的信息。它可以使用joypy基于的包来轻松构建matplotlib。

  1. # !pip install joypy
  2. # Import Data
  3. mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  4. # Draw Plot
  5. plt.figure(figsize=(16,10), dpi= 80)
  6. fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10))
  7. # Decoration
  8. plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22)
  9. plt.show()
复制代码


25. 分布式点图

分布点图显示按组分割的点的单变量分布。点数越暗,该区域的数据点集中度越高。通过对中位数进行不同着色,组的真实定位立即变得明显。

  1. import matplotlib.patches as mpatches
  2. # Prepare Data
  3. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  4. cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'}
  5. df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)
  6. # Mean and Median city mileage by make
  7. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  8. df.sort_values('cty', ascending=False, inplace=True)
  9. df.reset_index(inplace=True)
  10. df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())
  11. # Draw horizontal lines
  12. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  13. ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot')
  14. # Draw the Dots
  15. for i, make in enumerate(df.manufacturer):
  16. df_make = df_raw.loc[df_raw.manufacturer==make, :]
  17. ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5)
  18. ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick')
  19. # Annotate
  20. ax.text(33, 13, "$red ; dots ; are ; the : median$", fontdict={'size':12}, color='firebrick')
  21. # Decorations
  22. red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median")
  23. plt.legend(handles=red_patch)
  24. ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22})
  25. ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7)
  26. ax.set_yticks(df.index)
  27. ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7)
  28. ax.set_xlim(1, 40)
  29. plt.xticks(alpha=0.7)
  30. plt.gca().spines["top"].set_visible(False)
  31. plt.gca().spines["bottom"].set_visible(False)
  32. plt.gca().spines["right"].set_visible(False)
  33. plt.gca().spines["left"].set_visible(False)
  34. plt.grid(axis='both', alpha=.4, linewidth=.1)
  35. plt.show()
复制代码





AIU人工智能学院http://edu.cda.cn)AIU人工智能学院以数据分析、机器学习、深度学习、人工智能、TensorFlow、Keras、知识图谱等前沿技术为主题,致力于成为国内前沿的人工智能、数据科学领域在线教育品牌。
回复

使用道具 举报

ibmandwto 企业认证  发表于 2019-12-3 11:03:17 |显示全部楼层
楼主辛苦了,感谢。。。
回复

使用道具 举报

peyzf 发表于 2019-12-3 23:23:45 |显示全部楼层
学习一下
回复

使用道具 举报

luhaoyu 发表于 2019-12-4 08:21:47 |显示全部楼层
xuexile
回复

使用道具 举报

笑开心 发表于 2019-12-4 10:24:56 |显示全部楼层
BUDDHA   BLESS    !!!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2019-12-14 22:21