楼主: oliyiyi
1342 1

Anomaly Detection in R [推广有奖]

版主

已卖:2993份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
117070 个
通用积分
31670.9540
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2015-12-18 20:52:12 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Introduction

Inspired by this Netflix post, I decided to write a post based on this topic using R.

There are several nice packages to achieve this goal, the one we′re going to review is AnomalyDetection.

Download full –and tiny– R code of this post here.

Normal Vs. Abnormal

The definition for abnormal, or outlier, is an element which does not follow the behaviour of the majority.

Data has noise, same example as a radio which doesn’t have good signal, and you end up listening to some background noise.

Noise in time series data

The orange section could be noise in data, since it oscillates around a value without showing a defined pattern, in other words: White noise
Are the red circles noise or they are peaks from an undercover pattern?
A good algorithm can detect abnormal points considering the inner noise and leaving it behind. The AnomalyDetectionTs in AnomalyDetection package can perform this task quite well.

Hands on anomaly detection!

In this example, data comes from the well known wikipedia, which offers an API to download from R the daily page views given any {term + language}.

In this case, we’ve got page views from term fifa, language en, from 2013-02-22 up to today.

wikipedia fifa term page views

After applying the algorithm, we can plot the original time series plus the abnormal points in which the page views were over the expected value.

wikipedia fifa term page views

About the algorithm

Parameters in algorithm are max_anoms=0.01 (to have a maximum of 0.01% outliers points in final result), and direction="pos" to detect anomalies over (not below) the expected value.

As a result, 8 anomalies dates were detected. Additionally, the algorithm returns what it would have been the expected value, and an extra calculation is performed to get this value in terms of percentage perc_diff.

wikipedia fifa term page views

If you want to know more about the maths behind it, google: Generalized ESD and time series decomposition

Something went wrong:
Something strange since 1st expected value is the same value as the series has (34028 page views). As a matter of fact perc_diff is 0 while it should be a really low number. However the anomaly is well detected and apparently next ones too. If you know why, you can email and share the knowledge :)

Discovering anomalies

wikipedia fifa term page views

Last plot shows a line indicating linear trend over an specific period -clearly decreasing-, and two black circles. It’s interesting to note that these black points were not detected by the algorithm because they are part of a decreasing tendency (noise perhaps?).

A really nice shot by this algorithm since the focus on detections are on the changes of general patterns. Just take a look at the last detected point in that period, it was a peak that didn’t follow the decreasing pattern (occurred on 2014-07-12).

Checking with the news

These anomalies with the term fifa are correlated with the news, the first group of anomalies is related with the FIFA World Cup (around Jun/Jul 2014), and the second group centered on May 2015 is related with FIFA scandal.

In the LA Times it can be found a timeline about the scandal, and two important dates –May 27th and 28th-, which are two dates founded by the algorithm.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:detection Anomaly Detect CTI TIO

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

沙发
seahhj 发表于 2015-12-18 20:58:09
thanks for sharing

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-6 04:47