楼主: oliyiyi
3791 163

Deep Learning lets Regulated Industries Refocus on Accuracy   [推广有奖]

回帖奖励 3 个论坛币 回复本帖可获得 1 个论坛币奖励! 每人限 3 次(中奖概率 10%)

版主

大师

83%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
6
论坛币
618217 个
学术水平
1307 点
热心指数
1415 点
信用等级
1218 点
经验
324032 点
帖子
8524
精华
66
在线时间
4808 小时
注册时间
2007-5-21
最后登录
2018-11-15

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2017-3-1 16:36:32 |显示全部楼层

本帖隐藏的内容

Summary: Count yourself lucky if you’re not in one of the regulated industries where regulation requires you to value interpretability over accuracy.  This has been a serious financial weight on the economy but innovations in Deep Learning point a way out.




As Data Scientists we tend to take as gospel that more accuracy is better.  There are some practical limits to this.  It may not be profitable to continue to work a model for many days or weeks when the improvement to be had is minor.  Specifically when the cost of the accuracy exceeds the financial gain.  It is at the core of our beliefs that decreasing cost or increasing profits of the organization is the sole appropriate goal and is the direct result of better, more accurate models.  We create financial value.

If you are in an environment where you can live by this rule then you are fortunate.  There is a pretty significant percentage of our fellow practitioners who are denied this.  They are the ones who work in the so-called regulated industries, mostly banking, financial services, and insurance, but increasingly other areas like real estate and any other industry that can designate individuals as winners or losers.

The guiding regulatory rules, which are by no means consistent, are basically that if your algorithm can create negative financial impact on an individual you need to be able to explain why that particular individual was so rated.

If your credit rating algorithm says an individual is not eligible for the most preferred terms, your underwriting algorithm says they are not eligible for this mortgage rate (or at all), life insurance under the most beneficial terms, or any of dozens of similar outcomes you are only allowed to use predictive techniques that are transparent.  This is the classic battle between accuracy and interpretability written as a regulatory mandate.

Increasingly though this is lapping over into other fields.  If a rental agency uses an algorithm that denies the ability to rent, that may increasingly be covered.  In the not too distant future when the availability of certain high cost medical procedures must be paid from the public pocketbook, then the decision about who receives the treatment and who does not may be determined by algorithm and also fall to this scrutiny.


Issues Raised – Social versus Financial


[/url][url=http://api.ning.com/files/170NrfwF1H5aF*wT-kPmXFdVAll4*CNW61hWkcO-W*FRlk529I1Wfm32jvhH0IKXMbXRV*8zHAAYozuhmZeobrNCmA4-McQg/dirittobancariofinanziarioassicurativo.jpg]
Now that algorithmic decision making is becoming so ingrained in our society, we increasingly hear voices raising a warning against its unregulated use.  This isn’t new, but the volume is getting louder.  The Fair Credit Reporting Act dates back to the 70s.  HIPPA is a reflection of this.  Regulatory restrictions and reporting requirements have been a fact of life in banking, financial services, and insurance for many years with very uneven interpretation and implementation.

Basel II and Dodd-Frank place a great deal of emphasis on financial institutions constantly evaluating and adjusting their capital requirements for risk of all sorts.  This has become so important that larger institutions have had to establish independent Model Validation Groups (MVGs) separate from their operational predictive analytics operation whose sole role is to constantly challenge whether the models in use are consistent with regulations.

There has been an on-going threat of even more intrusive regulation which might even be based on such unscientific theories as ‘disparate impact’ where the government assumes that correlation means causation.

Increasing regulatory oversight is not going away.  In fact, as our data science capabilities accelerate with deep learning, AI, even bigger data sets, and more unstructured data there is a very real concern that regulation will not keep up and outmoded standards will be applied to new technologies and techniques.

It’s likely that this will result in a continuing dynamic tension between well intentioned individuals seeking to protect against perceived economic misdeeds, and very real but hidden costs imposed on all of our economy by sub-optimized processes.  Very small changes in model accuracy can leverage into much larger increases in the success of different types of campaigns and decision criteria.  Intentionally forgoing the benefit of that incremental accuracy imposes a cost on all of society.


How Does Data Science Adapt

The ability of Data Scientists to explain their work is not a new requirement.  The tension between accuracy and interpretability isn’t just the result of regulation.  Many times it’s the need to explain a business decision based on predictive analytics to a group of executives who are not familiar with these techniques at all.  In many of these cases interpretability can be achieved even with black box models with the use of data visualization techniques and good story telling that allows management to have some confidence in the outcome you promise.

In regulated industries however, the distinction is much more severe.  If the man in the street who has been impacted by your decision cannot understand why he specifically was rated in a certain way, your technique is not allowed.

Essentially this throws our predictive analysts back on only two techniques, simple linear models and simple decision trees.  Techniques of any complexity including Random Forests, pretty much any multi-model ensemble, GBM, and all the flavors of SVM and Neural Nets are clearly out of bounds.

These Data Scientists are not insensitive to the dilemma this presents and some techniques have been developed as work arounds.  In many cases this involves convincing the individual regulator that your new technique is compliant with existing regs.  Not an easy task.


Specialized Regression Techniques:  

There are a few specialized techniques that can give better accuracy while maintaining interpretability.  These include Penalized Regression techniques that work well where there are more columns than rows by selecting a small number of variables for the final model.  Generalized Additive Models fit linear terms to subsets of the data dividing nonlinear data into segments that may be represented as a group of separate linear sub-models.  Quantile Regression achieves a similar result of dividing non-linear data into subsets or customer segments that can be addressed with linear regression.


Surrogate Modeling With Black Box Methods:

Make no mistake; the Data Scientists in regulated industries are exploring their data with the more accurate, less interpretable black box methods, just not putting them into production.  Since techniques like Neural Nets naturally take into account a large number of variable interactions, then when linear models are seen to dramatically underperform it’s a tip off that some additional feature engineering may be necessary.

Another technique is to first train a more accurate black box model, and to use the output of that model (not the original training data) to train a linear model or decision tree.  It’s likely that the newly trained surrogate decision tree will be more accurate than the original linear mode and serve as an interpretable proxy.


Input Impact to Explain Performance:

GBM, random forests, and neural nets can provide tables that show the importance of each variable to the final outcome.  If your regulator will allow this, explanation of input impacts may meet the requirement for interpretability.


Regulated Industries Innovate and Push Back with Deep Learning


[/url][url=http://api.ning.com/files/170NrfwF1H4o2ue3MKTXtbZObGl8uIEqRSz1H-o9wXfyw3dySGazVSoUtiYr4jn39rggWpJM-oRHLM1li6I*A3DD*98KEJTu/deepneuralnetwork.png]
We tend to think of Deep Learning as applicable mostly to image processing (Convolutional Neural Nets) and natural language processing (Recurrent Neural Nets), both major components of AI.  However, Neural Nets in their simpler forms of perceptrons and feed forwards have long been a regular part of the predictive analytics tool box.

The fact is that Deep Neural Nets with many hidden layers can as easily be applied to traditional row and column predictive modeling problems and the increases in accuracy can be remarkable.

Equifax, the credit rating company recently decided the restrictions on modeling were too severe and decided to work with Deep Neural Nets.  Taking advantage of MPP and the other techniques associated with Deep Learning they were able to look at the massive data file of the last 72 months across hundreds of thousands of attributes.  This also wholly eliminated the requirement for manual down sampling of the data into different homogenous segments since each hidden layer can effectively emulate a different, not previously defined, customer segment.

Peter Maynard, Senior Vice President of Global Analytics at Equifax says the experiment improved model accuracy 15% and reduced manual data science time by 20%.  The real magic however was that they were able to reverse engineer the DNN to make it interpretable.  Maynard says in a Forbes interview:


“My team decided to challenge that and find a way to make neural nets interpretable.  We developed a mathematical proof that shows that we could generate a neural net solution that can be completely interpretable for regulatory purposes. Each of the inputs can map into the hidden layer of the neural network and we imposed a set of criteria that enable us to interpret the attributes coming into the final model. We stripped apart the black box so we can have an interpretable outcome. That was revolutionary; no one has ever done that before.”



缺少币币的网友请访问有奖回帖集合
http://bbs.pinggu.org/thread-3990750-1-1.html
stata SPSS
刘鸣鸣爱学习 发表于 2017-3-1 22:29:08 |显示全部楼层
回复

使用道具 举报

stormchao 在职认证  发表于 2017-3-1 22:47:51 |显示全部楼层
不错的好书
已有 1 人评分论坛币 收起 理由
oliyiyi + 10 精彩帖子

总评分: 论坛币 + 10   查看全部评分

回复

使用道具 举报

soccy 发表于 2017-3-1 23:55:37 |显示全部楼层

回帖奖励 +1 个论坛币

回复

使用道具 举报

雪过无痕 发表于 2017-3-7 18:05:07 |显示全部楼层
Thanks for Sharing!
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

回复

使用道具 举报

司徒洁贞 发表于 2017-3-16 23:17:00 |显示全部楼层
谢谢楼主分享
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

回复

使用道具 举报

尔特容易让他 发表于 2017-3-19 16:13:35 |显示全部楼层
谢谢分享
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

回复

使用道具 举报

aaajf2008 在职认证  发表于 2017-3-19 21:18:39 |显示全部楼层

回帖奖励 +1 个论坛币

非常的好
回复

使用道具 举报

儒雅的KB 学生认证  发表于 2017-3-20 23:40:22 |显示全部楼层
我也来试试
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

回复

使用道具 举报

edmcheng 发表于 2017-3-21 06:12:33 |显示全部楼层
谢谢分享
已有 1 人评分论坛币 收起 理由
oliyiyi + 3 精彩帖子

总评分: 论坛币 + 3   查看全部评分

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

GMT+8, 2018-11-15 16:51