以下内容转自 数析学院,只节选了部分,有需要的同学可以直接查看原文
- import pandas as pd
- import sys
- # 创建一个 dataframe 并以日期作为索引
- States = ['NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'GA', 'GA', 'FL', 'FL']
- data = [1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10]
- idx = pd.date_range('1/1/2012', periods=10, freq='MS')
- df1 = pd.DataFrame(data, index=idx, columns=['Revenue'])
- df1['State'] = States
- # 创建第二个 dataframe
- data2 = [10.0, 10.0, 9, 9, 8, 8, 7, 7, 6, 6]
- idx2 = pd.date_range('1/1/2013', periods=10, freq='MS')
- df2 = pd.DataFrame(data2, index=idx2, columns=['Revenue'])
- df2['State'] = States
- # 连接两个 dataframe
- df = pd.concat([df1,df2])
- df
识别异常值的方法
- # 方法1
- # 新建一个 df 副本
- newdf = df.copy()
- newdf['x-Mean'] = abs(newdf['Revenue'] - newdf['Revenue'].mean())
- newdf['1.96*std'] = 1.96*newdf['Revenue'].std()
- newdf['Outlier'] = abs(newdf['Revenue'] - newdf['Revenue'].mean()) > 1.96*newdf['Revenue'].std()
- newdf
- # 方法 2
- # 按照 item 分组
- # 新建一个 df 的副本
- newdf = df.copy()
- State = newdf.groupby('State')
- newdf['Outlier'] = State.transform( lambda x: abs(x-x.mean()) > 1.96*x.std() )
- newdf['x-Mean'] = State.transform( lambda x: abs(x-x.mean()) )
- newdf['1.96*std'] = State.transform( lambda x: 1.96*x.std() )
- newdf
以上内容节选自 数析学院,原文后续还有多种识别方法,有需要的同学可以先直接到 数析学院 查看原文


雷达卡




京公网安备 11010802022788号







