- library(dplyr)
- library(moments)
- library(pastecs)
- data <- data.frame(year = rep(2001:2008, 6),
- id = rep (1:6, each = 8),
- value = c(5, 100, 3, 10, 15, 6, 9, 11,
- 30, 35, 20, 32, 40, 42, 43, 43,
- 10, 15, 9, 26, 27, 26, 26, 26,
- 45, 60, 57, 120, 63, 65, 64, 63,
- 80, 100, 5, 90, 98, 87, 85, 82,
- 11, 19, 14, 14, 13, 14, 14, 14))
- outlier <- data %>%
- group_by(id) %>%
- summarise(mean = mean(value),
- median = median(value),
- skew = skewness(value),
- kurt = kurtosis(value))
在group量很大的时候想到一个方法是用skewness大小于+-1,kurtosis大于3的id挑出来进行处理,但由于这是检测常态分布,会有点问题,例如id 6组内没有离群,但skewness = 1.1523064、kurtosis = 4.352835,请问有没有更好的方法?


雷达卡



京公网安备 11010802022788号







