公众号
中,搜索: 大白学财经,有更多金融、python的话题分享。Named aggregation¶
New in version 0.25.0.
命名的合并,0.25.0版本
To support column-specific aggregation with control over the output columnnames, pandas accepts the special syntax in GroupBy.agg(),known as “named aggregation”, where
为了支持明确的列合并同时控制输出列名,pd接受了特殊的符号在GroupBy.agg(),也就是大家熟知的命名合并。
· The keywords are the output column names
· 关键词就是输出的列名
· The values are tuples whose firstelement is the column to select and the second element is the aggregation toapply to that column. Pandas provides the pandas.NamedAggnamedtuple with the fields ['column', 'aggfunc'] to make it clearer what thearguments are. As usual, the aggregation can be a callable or a string alias.
· 这个值是元组,第一个元素是待选择列名和第二个元素是合并后对应于这一列的名字。Pd提供pandas.NamedAgg用来命名元组用这样的格式['column', 'aggfunc'] 使得它更清晰。一般来说,这个命名是可以被召唤的,也是一个字符串地图。
In [79]: animals= pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
....: 'height': [9.1, 6.0,9.5, 34.0],
....: 'weight': [7.9, 7.5, 9.9, 198.0]})
生成一个DF,这是用元组的方式,很常用
In [80]: animals展示
In [81]:animals.groupby("kind").agg(
....: min_height=pd.NamedAgg(column='height', aggfunc='min'),
....: max_height=pd.NamedAgg(column='height', aggfunc='max'),
....: average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
....: )
用kind分组,groupBy.agg命令,使用pandas.NamedAgg函数
第一列命名为min_height,数值为height这列取最小值,
第二列命名为max_height,数值为height这一列的最大值
第三列命名为average_weight,数值为weight这一列的平均值,其中aggfunc是调用的函数。
pandas.NamedAgg isjust a namedtuple.Plain tuples are allowed as well.
这个函数就是一个命名的元组,普通的元组就也可以允许,下面就是plaintuples的展示
In [82]: animals.groupby("kind").agg(
....: min_height=('height', 'min'),
....: max_height=('height', 'max'),
....: average_weight=('weight', np.mean),
....: )
....:
Out[82]:
min_height max_height average_weight
kind
cat 9.1 9.5 8.90
dog 6.0 34.0 102.75
If your desired output column names are not valid pythonkeywords, construct a dictionary and unpack the keyword arguments
如果你需要输出列名,该列名不是有效的python关键词,构建一个字典和解构所有关键词变量,注意下方的大括号,表明了是字典,双引号和冒号,冒号之前是key后面是values
In [83]: animals.groupby("kind").agg(**{
....: 'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
....: })
....:
Out[83]:
total weight
kind
cat 17.8
dog 205.5
Additional keyword arguments are not passed through to theaggregation functions. Only pairs of (column, aggfunc)should be passed as **kwargs.If your aggregation functions requires additional arguments, partially applythem with functools.partial().
另外的的关键词变量不会给agg函数传入数据,只有(column, aggfunc)这配对才能传入参数**kwargs,如果你的合并函数需要额外的变量,也可以部分使用functools.partial().函数
Note
For Python 3.5 and earlier, the order of **kwargs in a functions was not preserved. Thismeans that the output column ordering would not be consistent. To ensureconsistent ordering, the keys (and so output columns) will always be sorted forPython 3.5.
注意从python3.5开始,参数的顺序就不被保存了,这就意味着输出的列排序并不是一致的,为了保持一致性,这个关键词(也就是输出的列)将会在python3.5下被排序。
Named aggregation is also valid for Series groupby aggregations.In this case there’s no column selection, so the values are just the functions.
命名合并也对于序列的分组结果有效,在这个例子中,没有分组的选择,所有的值就都适用于这个函数
In [84]: animals.groupby("kind").height.agg(
....: min_height='min',
....: max_height='max',
....: )
....:
Out[84]:
min_height max_height
kind
cat 9.1 9.5
dog 6.0 34.0
animals.groupby("kind").height.agg是一个序列的SeriesGroupBy