人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › LATEX论坛 › Why Does Deep Learning Work?

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: oliyiyi

2155 3

Why Does Deep Learning Work? [推广有奖]

1关注
184
粉丝

版主

泰斗

还不是VIP/贵宾

TA的文库 其他...

计量文库

威望: 7 级
论坛币: 271951 个
通用积分: 31269.3519
学术水平: 1435 点
热心指数: 1554 点
信用等级: 1345 点
经验: 383775 点
帖子: 9598
精华: 66
在线时间: 5468 小时
注册时间: 2007-5-21
最后登录: 2024-4-18

楼主

oliyiyi 发表于 2016-8-12 08:01:46 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

By Charles H Martin, PhD (Calculation Consulting).

This is the big question on everyone’s mind these days. C’mon we all know the answer already:

“the long-term behavior of certain neural network models are governed by the statistical mechanism of infinite-range Ising spin-glass Hamiltonians” [1] In other words,

Multilayer Neural Networks are just Spin Glasses? Right?

This is kinda true–depending on what you mean by a spin glass.

In a recent paper by LeCun, he attempts to extend our understanding of training neural networks by studying the SGD approach to solving the multilayer Neural Network optimization problem [1]. Furthermore, he claims

None of these works however make the attempt to explain the paradigm of optimizing the highly non-convex neural network objective function through the prism of spin-glass theory and thus in this respect our approach is very novel. And, again, this is kinda true

But here’s the thing…we already have a good idea of what the Energy Landscape of multiscale spin glass models* look like–from early theoretical protein folding work (by Wolynes, Dill, etc [2,3,4]). In fact, here is a typical surface:

Energy Landscape of MultiScale Spin Glass Model

*[technically these are Ising spin models with multi-spin interactions]

Let us consider the nodes, which above represent partially folded states, as nodes in a multiscale spin glass–or , say, a multilayer neural network. Immediately we see the analogy and the appearance of the ‘Energy funnel’ In fact, researchers have studied these ‘folding funnels’ of spin glass models over 20 years ago [2,3,4]. And we knew then that

as we increase the network size, the funnel gets sharper

3D Energy Landscape of the Folding Funnel of a Spin Glass

Note: the Wolynes protein-folding spin-glass model is significantly different from the p-spin Hopfield model that LeCun discusses because it contains multi-scale, multi-spin interactions. These details matter.

Spin glasses and spin funnels are quite different. Spin glasses are highly non-convex with lots of local minima, saddle points, etc. Spin funnels, however, are designed to find the spin glass of minimal-frustration, and have a convex, funnel shaped, energy landscape.

This seemed to be necessary to resolve one of the great mysteries of protein folding: Levinthal’s paradox [5]. If nature just used statistical sampling to fold a protein, it would take longer than the ‘known’ lifetime of the Universe. It is why Machine Learning is not just statistics.

Spin Funnels (DFM) vs Spin Glasses (SG) [4]

Deep Learning Networks are (probably) Spin Funnels

So with a surface like this, it is not so surprising that an SGD method might be able to find the Energy minima (called the Native State in protein folding theory). We just need to jump around until we reach the top of the funnel, and then it is a straight shot down. This, in fact, defines a so-called ‘folding funnel’[4]

So is not surprising at all that SGD may work.

Recent research at Google and Stanford confirms that the Deep Learning Energy Landscapes appear to be quite smooth and generally convex! [6]

Note that a real theory of protein folding, which would actually be able to fold a protein correctly (i.e. Freed’s approach [7]), would be a lot more detailed than a simple spin glass model. Likewise, real Deep Learning systems are going to have a lot more engineering details–to avoid overtraining (Dropout, Pooling, Momentum) than a theoretical spin funnel.

It is not that Deep Learning is non-convex–is that we need to avoid over-training

Still, hopefully we can learn something using the techniques developed to study the energy landscape of multi-scale spin glass/ spin funnels models. [8,9], thereby utilizing methods theoretical chemistry and condensed matter physics.

Indeed, I believe this is the first conjecture that Supervised Deep Learning is related to a Spin Funnel. In the next post, I will examine the relationship between Unsupervised Deep Learning and the Variational Renormalization Group

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Learning earning Learn deep Does everyone question network already recent

相关帖子

• CDA数据分析师认证考试
• Deep learning
• 【独家发布】Deep Learning with Pytorch
• 2018新书推荐---deep learning with r
• deep learning book 深度学习
• (Deep Learning’s Deep Flaws)’s Deep Flaws
• deep learning 是不是很火呢
• Deep Learning
• 【2017新书】Deep Learning with Keras
• 内容已遭删除
• Deep Learning with Keras

缺少币币的网友请访问有奖回帖集合：
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具举报

沙发

oliyiyi 发表于 2016-8-12 08:02:40 |只看作者 |坛友微信交流群

[1] LeCun et. al., The Loss Surfaces of Multilayer Networks, 2015

[2] Spin glasses and the statistical mechanics of protein folding, PNAS, 1987

[3] THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective, Annu. Rev. Phys. Chem. 1997

[4] ENERGY LANDSCAPES, SUPERGRAPHS, AND “FOLDING FUNNELS” IN SPIN SYSTEMS, 1999

[5] From Levinthal to pathways to funnels, Nature, 1997

[6] QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS, Google Research (2015)

[7] Mimicking the folding pathway to improve homology-free protein structure prediction, 2008

[8] Funnels in Energy Landscapes, 2007

[9] Landscape Statistics of the Low Autocorrelated Binary String Problem, 2007

[10] A Common Logic to Seeing Cats and Cosmos, 2014

使用道具举报