人大经济论坛 › 论坛 › 提问悬赏求职新闻读书功能一区 › 悬赏大厅 › 求ROC的算法

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: laiyuu

785 5

[找数据和资料] 求ROC的算法 [推广有奖]

0关注
0粉丝

学前班

50%

还不是VIP/贵宾

威望: 0 级
论坛币: 19 个
通用积分: 0
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 70 点
帖子: 2
精华: 0
在线时间: 0 小时
注册时间: 2017-5-25
最后登录: 2017-5-26

楼主

laiyuu 发表于 2017-5-26 10:24:47 来自手机 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

求ROC的算法！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：ROC

相关帖子

使用道具举报

沙发

wxf480 发表于 2017-5-26 13:22:10 |只看作者 |坛友微信交流群

同求，坐等帮助。

使用道具举报

藤椅

xuwu12345678 发表于 2017-5-27 00:35:49 |只看作者 |坛友微信交流群

在SPSS17.0的医学应用统计学中有讲到

使用道具举报

板凳

teresa_ya 发表于 2017-5-27 09:59:57 |只看作者 |坛友微信交流群

http://alexkong.net/2013/06/introduction-to-auc-and-roc/
这篇博文讲的很清楚，什么是ROC，怎么绘图，已经AUC的计算

使用道具举报

报纸

vying 发表于 2017-5-27 16:32:19 |只看作者 |坛友微信交流群

先算sensitivity, specificity

使用道具举报

地板

alicezyk 发表于 2017-5-29 14:13:11 |只看作者 |坛友微信交流群

stackexchange上面很好的回答：
https://stats.stackexchange.com/a/132832/133228

Abbreviations

AUC = Area Under the Curve.
AUROC = Area Under the Receiver Operating Characteristic curve.
AUC is used most of the time to mean AUROC, which is a bad practice since as Marc Claesen pointed out AUC is ambiguous (could be any curve) while AUROC is not.

Interpreting the AUROC

The AUROC has several equivalent interpretations:

The expectation that a uniformly drawn random positive is ranked before a uniformly drawn random negative.
The expected proportion of positives ranked before a uniformly drawn random negative.
The expected true positive rate if the ranking is split just before a uniformly drawn random negative.
The expected proportion of negatives ranked after a uniformly drawn random positive.
The expected false positive rate if the ranking is split just after a uniformly drawn random positive.
Computing the AUROC

Assume we have a probabilistic, binary classifier such as logistic regression.

Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:

We predict 0 while we should have the class is actually 0: this is called a True Negative, i.e. we correctly predict that the class is negative (0). For example, an antivirus did not detect a harmless file as a virus .
We predict 0 while we should have the class is actually 1: this is called a False Negative, i.e. we incorrectly predict that the class is negative (0). For example, an antivirus failed to detect a virus.
We predict 1 while we should have the class is actually 0: this is called a False Positive, i.e. we incorrectly predict that the class is positive (1). For example, an antivirus considered a harmless file to be a virus.
We predict 1 while we should have the class is actually 1: this is called a True Positive, i.e. we correctly predict that the class is positive (1). For example, an antivirus rightfully detected a virus.
To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:

enter image description here

In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.

Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:

True positive rate (TPR), aka. sensitivity, hit rate, and recall

. Intuitively this metric corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. In other words, the higher TPR, the fewer positive data points we will miss.
False positive rate (FPR), aka. fall-out,

. Intuitively this metric corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. In other words, the higher FPR, the more negative data points we will missclassified.
To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold

使用道具举报