人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › python论坛 › Groupby-Pandas-User Guide原文翻译3

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

返回列表

发帖

楼主: sunrun19840306

598 0

[作业] Groupby-Pandas-User Guide原文翻译3 [推广有奖]

3关注
1粉丝

本科生

90%

还不是VIP/贵宾

威望: 0 级
论坛币: 1826 个
通用积分: 5.3141
学术水平: 2 点
热心指数: 1 点
信用等级: 0 点
经验: 2868 点
帖子: 62
精华: 0
在线时间: 97 小时
注册时间: 2008-12-20
最后登录: 2024-3-31

楼主

sunrun19840306 发表于 2020-5-14 16:04:29 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

GroupBy with MultiIndex¶514

With hierarchically-indexeddata, it’s quite natural to group byone of the levels of the hierarchy.

Let’s create a Series with a two-level MultiIndex.

GroupBy多索引，伴随着多层级的索引的数据，用其中的一个层级进行分级非常的自然。

我们来创造一个两层级的索引

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],

....: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

....:

生成一个arrays，作为后续索引

In [36]: index =pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

生成多重索引，用之前生成的arrays，索引列名分别叫first和second

In [37]: s =pd.Series(np.random.randn(8), index=index)

生成一个series叫s，由8个随机数构成，索引就使用之前定义的索引

In [38]: s

Out[38]:

first second

bar one -0.919854

two -0.042379

baz one 1.247642

two -0.009920

foo one 0.290213

two 0.495767

qux one 0.362949

two 1.548106

dtype: float64

数据类型为浮点型

We can then group by one of the levels in s.

我们也可以对s变量用其中的几个水平来生成。

In [39]: grouped =s.groupby(level=0)

因为是双重索引，所以第一重就是level=0，第二重就是level=1，第一重就是根据bar、baz等分类，第二重就是onetwo进行分类。

In [40]: grouped.sum()

Out[40]:

first

bar -0.962232

baz 1.237723

foo 0.785980

qux 1.911055

dtype: float64

If the MultiIndex has names specified, these can be passedinstead of the level number:

如果多重索引是作为相关命名的，那么命名是可以代替level数字的。

In [41]: s.groupby(level='second').sum()

Out[41]:

second

one 0.980950

two 1.991575

dtype: float64

level=0变成level=first

level=1变成level=second

The aggregation functions such as sumwill take the level parameter directly. Additionally, the resulting index willbe named according to the chosen level:

汇总函数例如sum将可以把当前level中所有参数直接加总，另外，这将导致index将根据每一个将选择好的level来命名

In [42]: s.sum(level='second')

Out[42]:

second

one 0.980950

two 1.991575

dtype: float64

Grouping with multiple levels is supported.、

超过三重索引，也是支持的。重新生成了s加一层索引

In [43]: s

Out[43]:

first second third

bar doo one -1.131345

two -0.089329

baz bee one 0.337863

two -0.945867

foo bop one -0.932132

two 1.956030

qux bop one 0.017587

two -0.016692

dtype: float64

In [44]: s.groupby(level=['first', 'second']).sum()

对于索引名字是first和second求和。

Out[44]:

first second

bar doo -1.220674

baz bee -0.608004

foo bop 1.023898

qux bop 0.000895

dtype: float64

Index level names may be supplied as keys.

索引的名字可以被当作关键词提供

Out[19]:S的展示，此时与上例的不同是，上面是用level来控制，下面是直接当作groupby分组关键词来控制的。

first second third

bar doo one -0.194029

two -0.325353

baz bee one 0.107807

two 1.445677

foo bop one 2.356253

two 0.585753

qux bop one -1.327697

two 0.494866

dtype: float64

重新生成了third，所以随机数字都变化了，不过不影响理解。

In [45]: s.groupby(['first', 'second']).sum()

Out[45]:

first second

bar doo -1.220674

baz bee -0.608004

foo bop 1.023898

qux bop 0.000895

dtype: float64

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

[作业] Groupby-Pandas-User Guide原文翻译3 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本版微信群

[作业] Groupby-Pandas-User Guide原文翻译3 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群