楼主: oliyiyi
970 1

Deep Learning, Pachinko, and James Watt: Efficiency is the Driver of Uncertainty [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

A reasoned discussion of why the next generation of data efficient learning approaches rely on us developing new algorithms that can propagate stochasticity or uncertainty right through the model, and which are mathematically more involved than the standard approaches.

By Neil Lawrence, University of Sheffield.

It seems it may only be a matter of time before the best Go player on the planet is a computer 1. AlphaGo beat the European champion in Go and was driven by machine learning, a technology that has underpinned the recent major advances in artificial intelligence in computer vision, speech recognition and language translation.
(Editor: this post was written in March 2016, shortly before AlphaGo beat world's top Go player Lee Se-dol)

Machine learning is a data driven approach to artificial intelligence. AlphaGo learnt how to play Go by many games played against itself, and by observing a large history of games played by professional players.

The end result is that by the time of its first match against the European Champion AlphaGo had already played many more games of Go than any human could possibly play in their lifetime. And since that win AlphaGo has been actively learning to improve itself. Relentlessly playing all day and all night in an effort to ready itself to play the world champion.

Our Data Delusion


This phenomenon isn’t restricted to AlphaGo. Our human-level performance in other domains is also driven by an unearthly amount of data. Our vision systems see way more labeled images than we require to recognize objects, our speech systems require many more words than we do to understand words. Our translation systems require many more examples of translated text than a human could read.


Assessment of the increase in the world’s data capacity.2

So while we are making considerable progress on tasks which were once thought extremely difficult or impossible, the truth is that the progress is driven far more by the availability of data than an improvement in algorithms. Indeed, when we first tried to address tasks such as object recognition and language translation they would have been impossible if we had tried to apply state of the art methodologies to the data we had then. It is the explosion of data that has rendered them tractable.

Steam


The situation reminds me greatly of the early stage of the industrial revolution. Thomas Newcomen was born in the South West of England, an area well known, since Roman times, for tin mining. Mines need pumping out to keep water at bay. And in 1712 Newcomen invented the steam engine for doing that. His engine consisted of a large piston sitting on top of a boiler. The piston was alternatively filled with steam by the boiler, and then cooled through direct injection of water causing motion.

Coal


As it happened, Newcomen’s engine had relatively little effect at his local tin mines. They were so inefficient, that they were impractical. They were widely used but in coalfields, where they could be easily fueled.

This brings me to mind somewhat of the situation today: the major internet companies can profit from our current generation of inference engines because they are equivalent to the coalfields of yesteryear. They have enormous quantities of data readily available.

Medical Applications


The equivalent of the tin mines, and any other mine, is currently missing out. In application domains such as medicine we face challenges because firstly: the complexity of the system is much greater than speech, vision or even the game of Go. Our interventions are often at a biochemical level, and yet their manifestations occur at the global level of our health.

For rare or complex diseases: those that have causes driven by a combination of environmental and genetic causes, with the current set of inefficient models we will never have sufficient data for these complex models to learn what we need to know to diagnose early and deliver the cures we need.

Separate Condenser


The steam engine is far more associated in our minds with the name of James Watt, Watt made the steam engine practical by introducing the separate condenser. Rather than directly injecting water into the cylinder he sucked the steam out of the cylinder and cooled it separately.


Schematic of James Watt’s engine with the separate condenser.

This was more efficient as the cylinder itself no longer had to go through cycles of heating and cooling. The resulting doubling in efficiency made the steam engine practical, not just for Cornish tin mines, but for railways and traction engines.

Intelligence


My definition of intelligence is the use of information to save energy.3

Intelligent decision making implies that we assimilate the facts and make a decision that reduces our expenditure relative to actions we would have taken had we not had those facts. Under that definition, we can become more intelligent by either saving more energy with our resulting decisions, or by using less information.

In a very real sense there is currently a data efficiency deficit, just as large as the efficiency deficit present in Newcomen’s engine. Machine learning needs its separate condenser moment. What we require is a revolution in data efficiency equivalent to Watt’s separate condenser moment.

Function Composition and Deep Learning



Architecture of DeepFace, Facebook’s recognition system from their paper. See footnote for link.4

In mathematics we can sometimes think of a function as a deterministic process. So we can think of deep learning as a combination of processes. Each individual process is composed together to create a more complex process. Presentation of data to the process allows us to apply a set of transformations to reach a result. Above is an example from Facebook’s DeepFace algorithm, the aim is to classify whether or not this is Calista Flockhart’s face.

In machine learning this is known as deep learning. For some authors it is related to the brain or a fundamental way of thinking about AI, but we can simply think of it as a sensible idea of applying a set of simple transformations to an image to built a complex transformation.

The challenge of machine learning is how to determine what these transformations should be. Each of the simpler deterministic transformations can actually have very many parameters. In the case of DeepFace there are more than 120 million parameters overall.

When we think of deep learning, we can think of any given data point passing through the deterministic processes like a ball falling through an early pinball machine (or a Pachinko machine)5. Each layer of pins is equivalent to a layer in the deep model.

The objective of machine learning, is to reconstruct the configuration of those pins in such a way that Calista’s face is correctly detected. Think of the face as a set of initial conditions, and the face passes through each process to emerge in modified form. The task of inference is to ensure that the right set of initial conditions lead to the right conclusion.

Our deep model is like a Pachinko machine, but one where we have control over the location of the pins. The objective in deep learning is to move the pins around in such a way that for the right set of initial conditions (as given by the image in our analogy), the correct result is achieved.

The parameters of the model are sometimes known as network weights. They are like the positions of the pins. So we can think of DeepFace having 120 million pins in their Pachinko machine.

We can think of the what the network ‘thinks’ about a particular pattern as being the left-right position of the ball as it falls through the network. Of course in reality, at each layer of pins this ‘thought’ is only one value. It’s one dimensional. Real deep networks have many many dimensions, so it’s like a Pachinko machine where the pins are in a high dimensional space, normally called a hyper-space. This means the networks ‘thoughts’ are higher dimension and more complex.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:uncertainty EFFICIENCY certainty Uncertain Learning generation University technology efficient computer

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
hjtoh 发表于 2016-7-6 07:51:14 来自手机 |只看作者 |坛友微信交流群
oliyiyi 发表于 2016-7-6 07:45
A reasoned discussion of why the next generation of data efficient learning approaches rely on us de ...
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 03:02