一、查找离群值。
1、adjacent
adjacent lists adjacent values for a set of numeric variables in varlist. Calculate the upper and lower quartiles, p75 and p25, and thus the interquartile range iqr = p75 - p25. Then the adjacent values are the highest value not greater than p75 + 3/2 iqr and the lowest value not less than p25 - 3/2 iqr.
例如
sysuse auto, clear
adjacent price, by(foreign)
2、egenmore
egen out2 = outside(price), factor(2) //上限为P75+2倍的四分之一间距
二、离群值的处理
1、删除
sysuse auto, clear
adjacent price, by(foreign)
drop if (price>8814&foreign==0) | (price>9735&foreign==1)
2、对数转换
一般对数转换可以明显得去掉一部分离群值
3、使用winsor命令
如
winsor price, gen(P_2) p(0.025) //这个为双边缩尾,p(0.025)为5%的观测值会发生变化
|