人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › python论坛 › Groupby-Pandas-User Guide原文翻译6

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: sunrun19840306

851 0

[作业] Groupby-Pandas-User Guide原文翻译6 [推广有奖]

3关注
1粉丝

本科生

90%

还不是VIP/贵宾

威望: 0 级
论坛币: 1826 个
通用积分: 5.3141
学术水平: 2 点
热心指数: 1 点
信用等级: 0 点
经验: 2868 点
帖子: 62
精华: 0
在线时间: 97 小时
注册时间: 2008-12-20
最后登录: 2024-3-31

楼主

sunrun19840306 发表于 2020-5-21 13:33:34 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

欢迎关注微信个人公众号，在个人公众号中，搜索: 大白学财经，有更多金融、python的话题分享。

Aggregation

Once the GroupBy object has been created, several methods areavailable to perform a computation on the grouped data. These operations aresimilar to the aggregating API,window functionsAPI, and resample API.

An obvious one is aggregation via the aggregate()or equivalently agg()method:

加总，一旦GroupBy对象已经被创造了，有几种方法可以使用来展现一个已经分组过的数据计算结果。这些操作都很相似可以构成求和的API，窗口化函数的API，和重抽样的API

An obvious one is aggregation viathe aggregate()or equivalently agg()method:

最明显的就是聚合通过聚合函数和相似的agg方法来聚合

In [62]: grouped =df.groupby('A')

对A这个字段进行分组

In [63]: grouped.aggregate(np.sum)

对A这个关键词进行列求和。

Out[63]:

C D

bar 0.392940 1.732707

foo -1.796421 2.824590

In [64]: grouped =df.groupby(['A', 'B'])

In [65]: grouped.aggregate(np.sum)

Out[65]:

C D

A B

bar one 0.254161 1.511763

three 0.215897 -0.990582

two -0.077118 1.211526

foo one -0.983776 1.614581

three -0.862495 0.024580

two 0.049851 1.185429

As you can see, the result of the aggregation will have thegroup names as the new index along the grouped axis. In the case of multiplekeys, the result is a MultiIndexby default, though this can be changed by using the as_indexoption:

如你所见，这个aggr的结果仍保留了组名，并以分组的序列作为了索引，在多重关键词时，这个多重索引的结果是默认的，尽管这个索引也可以采用asindex进行选择。

In [66]: grouped =df.groupby(['A', 'B'], as_index=False)

In [67]: grouped.aggregate(np.sum)

Out[67]:

A B C D

0 bar one 0.254161 1.511763

1 bar three 0.215897 -0.990582

2 bar two -0.077118 1.211526

3 foo one -0.983776 1.614581

4 foo three -0.862495 0.024580

5 foo two 0.049851 1.185429

In [68]: df.groupby('A', as_index=False).sum()

Out[68]:

A C D

0 bar 0.392940 1.732707

1 foo -1.796421 2.824590

Note that you could use the reset_indexDataFrame function to achieve the same result as the column names are stored inthe resulting MultiIndex:

注意到，你可以使用resetindex这个DF函数来达到同样的结果，当这个列名被存储在多维的索引中。

df.groupby(['A', 'B']).sum().reset_index()

就是用reset_index()代替了as_index=False

Out[69]:

A B C D

0 bar one 0.254161 1.511763

1 bar three 0.215897 -0.990582

2 bar two -0.077118 1.211526

3 foo one -0.983776 1.614581

4 foo three -0.862495 0.024580

5 foo two 0.049851 1.185429

Another simple aggregation example is to compute the size ofeach group. This is included in GroupBy as the sizemethod. It returns a Series whose index are the group names and whose valuesare the sizes of each group.

另一个简单的例子是要计算每组的规模，在groupBy方法中也有，它会返回一个序列，该序列的索引是组的名字，值是每组的size

In [70]: grouped.size()

Out[70]:

A B

bar one 1

three 1

two 1

foo one 2

three 1

two 2

dtype: int64

In [71]: grouped.describe()

Note

Aggregation functions will not return the groupsthat you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns willbe the indices of the returned object.

Passing as_index=False will return the groups that youare aggregating over, if they are named columns.

注意agg函数不会返回这个组，如果他们是被命名的序列，你进行聚合的组。当

As——index为真的时候，这个是默认值。这个分组的列是返回对象的索引。

传入这个为false会返回一个你聚合的组，如果他们是被命名过的列。

[td]

Function	Description
mean()	Compute mean of groups均值
sum()	Compute sum of group values求和
size()	Compute group sizes规模
count()	Compute count of group组计数
std()	Standard deviation of groups组内标准差
var()	Compute variance of groups组内方差
sem()	Standard error of the mean of groups组内均值标准误
describe()	Generates descriptive statistics描述性统计
first()	Compute first of group values组内第一个值
last()	Compute last of group values组内最后一个值
nth()	Take nth value, or a subset if n is a list第n个值或是一个子集，当n是一个list
min()	Compute min of group values最小值
max()	Compute max of group values最大值

The aggregating functions above will exclude NA values.Any function which reduces a Series to a scalar value is an aggregation function and willwork, a trivial example is df.groupby('A').agg(lambda ser: 1). Note that nth() can act as a reducer or afilter, see here.

聚合函数也会自动排除NA值，

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：pandas panda Group Guide guid

[作业] Groupby-Pandas-User Guide原文翻译6 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本版微信群

[作业] Groupby-Pandas-User Guide原文翻译6 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群