Note that groupbywill preserve the order in which observationsare sorted withineach group. For example, the groups created by groupby()below are in the order they appeared in the original DataFrame:
注意到groupby函数将会保存这个顺序,这个观察值被按照每一组排序。例如,由groupby创造的组以他们原来在DF中的顺序出现。
df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})
In [25]: df3.groupby(['X']).get_group('A')
Out[25]:
X Y
0 A 1
2 A 3
In [26]: df3.groupby(['X']).get_group('B')
Out[26]:
X Y
1 B 4
3 B 2
PS: 这里说得顺序其实是指组内的顺序,即A组和B组之内的顺序,A组Y是1、3,B组Y是4、2.
GroupBy object attributes¶
The groupsattribute is a dict whose keys are the computed unique groups and correspondingvalues being the axis labels belonging to each group. In the above example wehave:
分组对象属性
groups属性是一个字典(属性是可以被调用的),他的关键词是一组沿着属于每一个组的轴标签,独特的被计算出的组和相应的值。就是我们上面例子:
In [7]: df
Out[7]:
A B C D
0 foo one 0.469112 -0.861849
1 bar one -0.282863 -2.104569
2 foo two -1.509059 -0.494929
3 bar three -1.135632 1.071804
4 foo two 1.212112 0.721555
5 bar two -0.173215 -0.706771
6 foo one 0.119209 -1.039575
7 foo three -1.044236 0.271860
In [27]: df.groupby('A').groups
Out[27]:
{'bar': Int64Index([1, 3, 5],dtype='int64'),
'foo': Int64Index([0, 2, 4, 6, 7],dtype='int64')}
根据A列的唯一值bar和foo进行分组,group属性是一个是一个字典,其中bar和foo是key,
In [28]: df.groupby(get_letter_type, axis=1).groups
Out[28]:
{'consonant': Index(['B', 'C', 'D'],dtype='object'),
'vowel': Index(['A'],dtype='object')}
Calling the standard Python lenfunction on the GroupBy object just returns the length of the groups dict, so it is largely just aconvenience:
召唤标准的pythonlen函数对于分组后的对象,只返回整个groups属性这个字典的长度,所以他很大程度上只是一个便利。
In [29]: grouped =df.groupby(['A', 'B'])
用A、B两个索引同时对df分组,把分组后的变量命名为grouped
In [30]: grouped.groups
Out[30]:
{('bar', 'one'): Int64Index([1],dtype='int64'),
('bar', 'three'): Int64Index([3],dtype='int64'),
('bar', 'two'): Int64Index([5],dtype='int64'),
('foo', 'one'): Int64Index([0, 6],dtype='int64'),
('foo', 'three'): Int64Index([7],dtype='int64'),
('foo', 'two'): Int64Index([2, 4],dtype='int64')}
展示grouped对象的groups属性
In [31]: len(grouped)
Out[31]: 6
求grouped这一对象的长度。
GroupBy will tabcomplete column names (and other attributes):
GroupBy函数用tab可以调出所有列的名字,即其它属性
grouped.<TAB> # noqa: E225, E999 这里是让你打个圆点,然后按下TAB键