With hierarchically-indexeddata, it’s quite natural to group byone of the levels of the hierarchy.
Let’s create a Series with a two-level MultiIndex.
GroupBy多索引,伴随着多层级的索引的数据,用其中的一个层级进行分级非常的自然。
我们来创造一个两层级的索引
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
....: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
....:
生成一个arrays,作为后续索引
In [36]: index =pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
生成多重索引,用之前生成的arrays,索引列名分别叫first和second
In [37]: s =pd.Series(np.random.randn(8), index=index)
生成一个series叫s,由8个随机数构成,索引就使用之前定义的索引
In [38]: s
Out[38]:
first second
bar one -0.919854
two -0.042379
baz one 1.247642
two -0.009920
foo one 0.290213
two 0.495767
qux one 0.362949
two 1.548106
dtype: float64
数据类型为浮点型
We can then group by one of the levels in s.
我们也可以对s变量用其中的几个水平来生成。
In [39]: grouped =s.groupby(level=0)
因为是双重索引,所以第一重就是level=0,第二重就是level=1,第一重就是根据bar、baz等分类,第二重就是onetwo进行分类。
In [40]: grouped.sum()
Out[40]:
first
bar -0.962232
baz 1.237723
foo 0.785980
qux 1.911055
dtype: float64
If the MultiIndex has names specified, these can be passedinstead of the level number:
如果多重索引是作为相关命名的,那么命名是可以代替level数字的。
In [41]: s.groupby(level='second').sum()
Out[41]:
second
one 0.980950
two 1.991575
dtype: float64
level=0变成level=first
level=1变成level=second
The aggregation functions such as sumwill take the level parameter directly. Additionally, the resulting index willbe named according to the chosen level:
汇总函数例如sum将可以把当前level中所有参数直接加总,另外,这将导致index将根据每一个将选择好的level来命名
In [42]: s.sum(level='second')
Out[42]:
second
one 0.980950
two 1.991575
dtype: float64
Grouping with multiple levels is supported.、
超过三重索引,也是支持的。重新生成了s加一层索引
In [43]: s
Out[43]:
first second third
bar doo one -1.131345
two -0.089329
baz bee one 0.337863
two -0.945867
foo bop one -0.932132
two 1.956030
qux bop one 0.017587
two -0.016692
dtype: float64
In [44]: s.groupby(level=['first', 'second']).sum()
对于索引名字是first和second求和。
Out[44]:
first second
bar doo -1.220674
baz bee -0.608004
foo bop 1.023898
qux bop 0.000895
dtype: float64
Index level names may be supplied as keys.
索引的名字可以被当作关键词提供
Out[19]:S的展示,此时与上例的不同是,上面是用level来控制,下面是直接当作groupby分组关键词来控制的。
first second third
bar doo one -0.194029
two -0.325353
baz bee one 0.107807
two 1.445677
foo bop one 2.356253
two 0.585753
qux bop one -1.327697
two 0.494866
dtype: float64
重新生成了third,所以随机数字都变化了,不过不影响理解。
In [45]: s.groupby(['first', 'second']).sum()
Out[45]:
first second
bar doo -1.220674
baz bee -0.608004
foo bop 1.023898
qux bop 0.000895
dtype: float64