数据分析最有用的25个 Matplotlib图(二)——AIU人工智能学院-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 数据>>

数据分析

>>

数据分析最有用的25个 Matplotlib图(二)——AIU人工智能学院

数据分析最有用的25个 Matplotlib图(二)——AIU人工智能学院

发布:AIU人工智能学院 | 分类:数据分析

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

AIU人工智能学院:数据科学、人工智能从业者的在线学院。数据科学(Python/R/Julia)数据分析、机器学习、深度学习作者|zsx_yiyiyi来源|python大本营25个Matplotlib图的汇编,在数据分析和可视化中最有用。此列表允许 ...
扫码加入数据分析学习群


AIU人工智能学院:数据科学、人工智能从业者的在线学院。

数据科学(Python/R/Julia)数据分析、机器学习、深度学习


作者 | zsx_yiyiyi
来源 | python大本营

25个Matplotlib图的汇编,在数据分析和可视化中最有用。此列表允许您使用Python的Matplotlib和Seaborn库选择要显示的可视化对象。今天给大家分享剩余的12个,更多干货后续奉上,大家可持续关注我们。

14.面积图

通过对轴和线之间的区域进行着色,区域图不仅强调峰值和低谷,而且还强调高点和低点的持续时间。高点持续时间越长,线下面积越大。

import numpy as npimport pandas as pd# Prepare Datadf = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100)x = np.arange(df.shape[0])y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100# Plotplt.figure(figsize=(16,10), dpi= 80)plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7)plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7)# Annotateplt.annotate('Peak 1975', xy=(94.0, 21.0), xytext=(88.0, 28), bbox=dict(boxcolor:rgb(62, 61, 59)">https://p1.pstatp.com/large/pgc-image/6d9e546e882e4fb0be273cd9521914cb

排序

15. 有序条形图

有序条形图有效地传达了项目的排名顺序。但是,在图表上方添加度量标准的值,用户可以从图表本身获取精确信息。

# Prepare Datadf_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())df.sort_values('cty', inplace=True)df.reset_index(inplace=True)# Draw plotimport matplotlib.patches as patchesfig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80)ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)# Annotate Textfor i, cty in enumerate(df.cty): ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center')# Title, Label, Ticks and Ylimax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22})ax.set(ylabel='Miles Per Gallon', ylim=(0, 30))plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12)# Add patches to color the X axis labelsp1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure)p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure)fig.add_artist(p1)fig.add_artist(p2)plt.show()https://p1.pstatp.com/large/pgc-image/7f77a829faa940e69e3faae01b7994a8

16. 棒棒糖图

棒棒糖图表以一种视觉上令人愉悦的方式提供与有序条形图类似的目的。

# Prepare Datadf_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())df.sort_values('cty', inplace=True)df.reset_index(inplace=True)# Draw plotfig, ax = plt.subplots(figsize=(16,10), dpi= 80)ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2)ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7)# Title, Label, Ticks and Ylimax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22})ax.set_ylabel('Miles Per Gallon')ax.set_xticks(df.index)ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12})ax.set_ylim(0, 30)# Annotatefor row in df.itertuples(): ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)plt.show()https://p1.pstatp.com/large/pgc-image/184e7ea1c11b4de18cf52e31deb85466

17. 包点图

点图表传达了项目的排名顺序。由于它沿水平轴对齐,因此您可以更容易地看到点彼此之间的距离。

# Prepare Datadf_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())df.sort_values('cty', inplace=True)df.reset_index(inplace=True)# Draw plotfig, ax = plt.subplots(figsize=(16,10), dpi= 80)ax.hlines(y=df.index, xmin=11, xmax=26, color='gray', alpha=0.7, linewidth=1, linestyles='dashdot')ax.scatter(y=df.index, x=df.cty, s=75, color='firebrick', alpha=0.7)# Title, Label, Ticks and Ylimax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22})ax.set_xlabel('Miles Per Gallon')ax.set_yticks(df.index)ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'})ax.set_xlim(10, 27)plt.show()https://p1.pstatp.com/large/pgc-image/35bab37856f7449f988c4939f317179a

18. 坡度图

斜率图最适合比较给定人/项目的“之前”和“之后”位置。

import matplotlib.lines as mlines# Import Datadf = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])]right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])]klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]# draw line# https://stackoverflow.com/questi ... o-draw-a-line-with-matplotlib/36479941def newline(p1, p2, color='black'): ax = plt.gca() l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6) ax.add_line(l) return lfig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80)# Vertical Linesax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')# Pointsax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7)ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7)# Line Segmentsand Annotationfor p1, p2, c in zip(df['1952'], df['1957'], df['continent']): newline([1,p1], [3,p2]) ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14}) ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})# 'Before' and 'After' Annotationsax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700})ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})# Decorationax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22})ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita')ax.set_xticks([1,3])ax.set_xticklabels(["1952", "1957"])plt.yticks(np.arange(500, 13000, 2000), fontsize=12)# Lighten bordersplt.gca().spines["top"].set_alpha(.0)plt.gca().spines["bottom"].set_alpha(.0)plt.gca().spines["right"].set_alpha(.0)plt.gca().spines["left"].set_alpha(.0)plt.show()https://p9.pstatp.com/large/pgc-image/2fc680bc47ae4d6e93ecb92a92bf069e

19. 哑铃图

哑铃图传达各种项目的“前”和“后”位置以及项目的排序。如果您想要将特定项目/计划对不同对象的影响可视化,那么它非常有用。

import matplotlib.lines as mlines# Import Datadf = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv")df.sort_values('pct_2014', inplace=True)df.reset_index(inplace=True)# Func to draw line segmentdef newline(p1, p2, color='black'): ax = plt.gca() l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue') ax.add_line(l) return l# Figure and Axesfig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80)# Vertical Linesax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')# Pointsax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7)ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7)# Line Segmentsfor i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']): newline([p1, i], [p2, i])# Decorationax.set_facecolor('#f7f7f7')ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22})ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita')ax.set_xticks([.05, .1, .15, .20])ax.set_xticklabels(['5%', '15%', '20%', '25%'])ax.set_xticklabels(['5%', '15%', '20%', '25%']) plt.show()https://p1.pstatp.com/large/pgc-image/429d66cf31704177b88849626a6e6c51

分配

20. 连续变量的直方图

直方图显示给定变量的频率分布。下面的表示基于分类变量对频率条进行分组,从而更好地了解连续变量和串联变量。

# Import Datadf = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare datax_var = 'displ'groupby_var = 'class'df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)vals = [df[x_var].values.tolist() for i, df in df_agg]# Drawplt.figure(figsize=(16,9), dpi= 80)colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])# Decorationplt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)plt.xlabel(x_var)plt.ylabel("Frequency")plt.ylim(0, 25)plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])plt.show()https://p1.pstatp.com/large/pgc-image/95679a5303a748ca88cb123161dcf70c

21. 类型变量的直方图

分类变量的直方图显示该变量的频率分布。通过对条形图进行着色,您可以将分布与表示颜色的另一个分类变量相关联。

# Import Datadf = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare datax_var = 'manufacturer'groupby_var = 'class'df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)vals = [df[x_var].values.tolist() for i, df in df_agg]# Drawplt.figure(figsize=(16,9), dpi= 80)colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])# Decorationplt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)plt.xlabel(x_var)plt.ylabel("Frequency")plt.ylim(0, 40)plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left')plt.show()https://p1.pstatp.com/large/pgc-image/6e6eaea64c4f4e1381031ef158bcff2c

22. 密度图

密度图是一种常用工具,可视化连续变量的分布。通过“响应”变量对它们进行分组,您可以检查X和Y之间的关系。以下情况,如果出于代表性目的来描述城市里程的分布如何随着汽缸数的变化而变化。

# Import Datadf = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plotplt.figure(figsize=(16,10), dpi= 80)sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7)sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7)sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7)sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7)# Decorationplt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22)plt.legend()https://p3.pstatp.com/large/pgc-image/c3820bc7c1a048d3a32e47f3cafc1181

23. 直方密度线图

带有直方图的密度曲线将两个图表传达的集体信息汇集在一起,这样您就可以将它们放在一个图形而不是两个图形中。

# Import Datadf = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plotplt.figure(figsize=(13,10), dpi= 80)sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})plt.ylim(0, 0.35)# Decorationplt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22)plt.legend()plt.show()https://p1.pstatp.com/large/pgc-image/86e27195f6a64a489f98854d7ab5fe7e

24. Joy Plot

Joy Plot允许不同组的密度曲线重叠,这是一种可视化相对于彼此的大量组的分布的好方法。它看起来很悦目,并清楚地传达了正确的信息。它可以使用joypy基于的包来轻松构建matplotlib。

# !pip install joypy# Import Datampg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plotplt.figure(figsize=(16,10), dpi= 80)fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10))# Decorationplt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22)plt.show()https://p1.pstatp.com/large/pgc-image/956684a19cd94494a6585089469ad1e1

25. 分布式点图

分布点图显示按组分割的点的单变量分布。点数越暗,该区域的数据点集中度越高。通过对中位数进行不同着色,组的真实定位立即变得明显。

import matplotlib.patches as mpatches# Prepare Datadf_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'}df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)# Mean and Median city mileage by makedf = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())df.sort_values('cty', ascending=False, inplace=True)df.reset_index(inplace=True)df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())# Draw horizontal linesfig, ax = plt.subplots(figsize=(16,10), dpi= 80)ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot')# Draw the Dotsfor i, make in enumerate(df.manufacturer): df_make = df_raw.loc[df_raw.manufacturer==make, :] ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5) ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick')# Annotate ax.text(33, 13, "$red ; dots ; are ; the : median$", fontdict={'size':12}, color='firebrick')# Decorationsred_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median")plt.legend(handles=red_patch)ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22})ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7)ax.set_yticks(df.index)ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7)ax.set_xlim(1, 40)plt.xticks(alpha=0.7)plt.gca().spines["top"].set_visible(False) plt.gca().spines["bottom"].set_visible(False) plt.gca().spines["right"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis='both', alpha=.4, linewidth=.1)plt.show()https://p9.pstatp.com/large/pgc-image/bad90b16e9b24a3182f4f26e9c1d8953


关注“AIU人工智能实验室”,回复“录播”获取更多人工智能精选直播视频!


完 谢谢观看
「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-8363263-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。