欢迎关注微信个人公众号,在个人公众号中,搜索: 大白学财经,有更多金融、python的话题分享。
Applying multiplefunctions at once¶
With grouped Seriesyou can also pass a list or dict of functions to do aggregation with,outputting a DataFrame:
使用多个函数来展示
使用被分组过的series你可以传入一个list或一个dict函数来做汇总,输出成一个DF
In [72]: grouped =df.groupby('A')
备注:被分组过的series是指grouped['C']
print(type(grouped),type(grouped["C"]))
<class'pandas.core.groupby.generic.DataFrameGroupBy'> <class'pandas.core.groupby.generic.SeriesGroupBy'>
备注:传入一个list或一个dict函数来做汇总
指.agg([np.sum, np.mean, np.std])
In [73]: grouped['C'].agg([np.sum, np.mean, np.std])
Out[73]:
sum mean std
A
bar 0.392940 0.130980 0.181231
foo -1.796421 -0.359284 0.912265
On a grouped DataFrame, you can pass a listof functions to apply to each column, which produces an aggregated result witha hierarchical index:
一个被分好组的DF,你可以传一个列函数来运用于每一列,每个都会按照既定的层级产生一个汇总的结果。
In [74]: grouped.agg([np.sum, np.mean, np.std])
Out[74]:
C D
sum mean std sum mean std
A
bar 0.392940 0.130980 0.181231 1.732707 0.577569 1.366330
foo -1.796421 -0.359284 0.912265 2.824590 0.564918 0.884785
a hierarchical index:就是指C和D
The resulting aggregations are named for the functionsthemselves. If you need to rename, then you can add in a chained operation fora Serieslike this:
汇总的结果被用函数自己命名,如果你需要重命名的话,你可以增加一个链式操作像下面的的序列一样。
In [75]: (grouped['C'].agg([np.sum, np.mean, np.std])
....: .rename(columns={'sum': 'foo',
....: 'mean': 'bar',
....: 'std': 'baz'}))
....:
Out[75]:
foo bar baz
A
bar 0.392940 0.130980 0.181231
foo -1.796421 -0.359284 0.912265
很实用的技巧,这样子,输出的结果就比较明确了,注意语法其实是对grouped['C'].agg([np.sum, np.mean, np.std])
备注:这一个series再进行了一次属性的调用。
For a grouped DataFrame,you can rename in a similar manner:
备注:也可以是DF
In [76]: (grouped.agg([np.sum, np.mean, np.std])
....: .rename(columns={'sum': 'foo',
....: 'mean': 'bar',
....: 'std': 'baz'}))
....:
Out[76]:
C D
foo bar baz foo bar baz
A
bar 0.392940 0.130980 0.181231 1.732707 0.577569 1.366330
foo -1.796421 -0.359284 0.912265 2.824590 0.564918 0.884785
Note
In general, the output column names should be unique. Youcan’t apply the same function (or two functions with the same name) to the samecolumn.
一般来说,输出的列名应该是独一无二的,你不能申请同样函数或两个函数运用在同一列。
In [77]: grouped['C'].agg(['sum', 'sum'])
Out[77]:
sum sum
A
bar 0.392940 0.392940
foo -1.796421 -1.796421
Pandas does allowyou to provide multiple lambdas. In this case, pandas will mangle the name ofthe (nameless) lambda functions, appending _<i>to each subsequent lambda.
Pd确实允许你使用多个lambda函数,在这个例子中pd把这几个lambda函数分别用i标记。
In [78]: grouped['C'].agg([lambda x: x.max() -x.min(),
....: lambda x: x.median() -x.mean()])
....:
Out[78]:
<lambda_0> <lambda_1>
A
bar 0.331279 0.084917
foo 2.337259 -0.215962