以下内容转自 数析学院,只节选了部分,有需要的同学可以直接查看原文
主要内容有:
1、分组数据选择
2、列切片
3、多条件选择
4、组内最大值
5、多重索引选择
6、重置列名
准备工作:
- import pandas as pd
- import numpy as np
- import sys
- %matplotlib inline
1、分组汇总后随机样本的选择
- # 创建 dataframe
- df = pd.DataFrame({'group1' : ["a","b","a","a","b","c","c","c","c",
- "c","a","a","a","b","b","b","b"],
- 'group2' : [1,2,3,4,1,3,5,6,5,4,1,2,3,4,3,2,1],
- 'value' : ["apple","pear","orange","apple",
- "banana","durian","lemon","lime",
- "raspberry","durian","peach","nectarine",
- "banana","lemon","guava","blackberry","grape"]})
- df
- # 不只是想从 df 中随机选择行
- # 首先按照(group1 & group2)分组,然后随机选择行
- from random import choice
- # 首先创建分组
- grouped = df.groupby(['group1','group2'])
- grouped.size()
- #注意 group (a,1) 有两种取值可能
- #注意 group (a,2) 有一种取值可能
- #意味着:如果我们从 group (a,1) 随机选择一个样本, 我们会得到“apple”或“peach”
- #意味着:如果我们从 group (a,2) 随机选择一个样本, 我们总会得到"nectarine"
a 1 2
2 1
3 2
4 1
b 1 2
2 2
3 1
4 1
c 3 1
4 1
5 2
6 1
dtype: int64
- #df.loc[从每个 group 中随机选择一个记录]
- df.loc[[choice(x) for x in grouped.groups.values()]]
以上内容转自 数析学院,后续选择方法有时间再补齐,有需要的同学可以直接查看原文


雷达卡




京公网安备 11010802022788号







