【阿尔法系列】将定期介绍业界最新成果及各种阿尔法策略!想要随时跟踪【阿尔法系列】,请点击头像下方“加关注”。关注成功后,查看这里即可:三步走把千本好书“一网打尽”!。
[相关阅读]
【阿尔法系列】(资料汇总帖,附链接,持续添加中)
【机器学习系列】(资料汇总帖,附链接,持续添加中)
【经典教材系列】深度学习 Deep Learning (2016,Goofllow)
【经典教材系列】加强学习入门 Reinforcement Learning: An Introduction 第二版
人工智能入门: 灵魂机器的时代:当计算机超过人类智能 The Age of Spiritual Machines
人工智能入门: 奇点临近 The Singularity Is Near: When Humans Transcend Biology
人工智能入门: 机器学习 Machine Learning (Tom Michell)
人工智能入门: 人工智能的未来 On Intelligence
人工智能入门: 人工智能:智能系统指南 (第三版)Artificial Intelligence: A Guide
人工智能入门: Artificial Intelligence: A Modern Approach (第3版,高清)
人工智能入门: Artificial Intelligence: Structures and Strategies for Complex Problem (第6版)
【畅销书系列】Rise of the Robots: Technology and the Threat of a Jobless Future
【畅销书系列】Humans Need Not Apply: A Guide to Wealth and Work in the Age of
【畅销书系列】The Second Machine Age: Work, Progress, and Prosperity in a
今年(2016年6月)的国际机器学习大会(the International Conference on Machine Learning,ICML)的主题是深度学习(Deep Learning)。本次大会共分四个主题:Recurrent Neural Networks, Unsupervised Learning,Supervised Training Methods,Deep Reinforcement Learning。附件中的论文分别代表着这几研究方向的最新前沿。
深度学习(Deep Learning)是机器学习的一个分支,至今已有数种深度学习框架,如深度神经网络、卷积神经网络,深度信念网络和递归神经网络等。有些已被应用于计算机视觉、语音识别、自然语言处理、音频识别与生物信息学等领域并取得了极好的效果。
值得一提的是,2016年3月谷歌子公司DeepMind开发的围棋人工智能程序“AlphaGo” 正是运用了“Deep Reinforcement Learning” 技术 (https://deepmind.com/blog/deep-reinforcement-learning/),在与韩国李世石九段围棋五番棋大战中,历史上首次击败人类围棋顶尖高手。由于围棋的复杂程度远远超过其他任何游戏,人们之前一直认为机器想要战胜人类还遥遥无期。在人机五番棋大战中,人们的态度从起先轻视,怀疑,再到惊讶,迷茫,最后绝望,醒悟,不经意间经历了一次心灵上极其震撼的洗礼。AlphaGo的胜利意味着人工智能的无限可能,这是一件可以写入人类历史的里程碑事件!
与此同时在量化投资领域,人们也在探索运用机器学习技术的可能性。许多对冲基金(例如,Man Group, Two Sigma, DE Shaw,等等)都投入了大量的人力和财力以求占得先机。本文是美国知名对冲基金Two Sigma的专家Vinod Valsalam对本次国际机器学习大会的总结(论文在附件中)。可以想象在不远的将来,机器学习将成为量化投资中的制胜法宝!
(地址回复可见,17篇最新论文,降价出售3天)
本帖隐藏的内容
Machine learning offers powerful techniques to find patterns in data for solving challenging predictive problems. The dominant track at the International Conference onMachine Learning (ICML) in New York this year was deep learning, which uses artificial neural networks to solve problems by learning feature representations from large amounts of data.
Significant recent successes in applications such as image and speech recognition, and natural language processing, have helped fuel an explosion of interest in deep learning. And new research in the field is continuing to push the boundaries of applications, techniques, and theory. Below, Two Sigma research scientist Vinod Valsalam provides an overview of some of the most interesting research presented at ICML 2016, covering recurrent neural networks, unsupervised learning, supervised training methods, and deep reinforcement methods.
1. Recurrent Neural Networks
Unlike feed-forward networks, the outputs of recurrent neural networks (RNNs) can depend on past inputs, providing a natural framework for learning from time series and sequential data. But training them for tasks that require long-term memory is especially difficult due to the vanishing and exploding gradients problem, i.e., the error signals for adapting network weights become increasingly difficult to propagate through the network. Specialized network architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) mitigate this problem by utilizing gating units, a technique that has been very successful in tasks such as speech recognition and language modeling. An alternative approach that is now gaining more focus is to constrain the weight matrices in a way that is more conducive to gradient propagation, as explored in the following papers.
Unitary Evolution Recurrent Neural Networks
Arjovsky, M., Shah, A., & Bengio, Y. (2016)
The problem of vanishing and exploding gradients occurs when the magnitude of the eigenvalues of weight matrices deviate from 1. Therefore, the authors use weight matrices that are unitary to guarantee that the eigenvalues have magnitude 1. The challenge with this constraint is to ensure that the matrices remain unitary when updating them during training without performing excessive computations. Their strategy is to decompose each unitary weight matrix into the product of several simple unitary matrices. The resulting parameterization makes it possible to learn the weights efficiently while providing sufficient expressiveness. They demonstrate state of the art performance on standard benchmark problems such as the copy and addition tasks. An additional benefit of their approach is that it is relatively insensitive to parameter initialization, since unitary matrices preserve norms.
Recurrent Orthogonal Networks and Long-Memory Tasks
Henaff, M., Szlam, A., & LeCun, Y. (2016)
In this paper, the authors construct explicit solutions based on orthogonal weight matrices for the copy and addition benchmark tasks. Orthogonal matrices avoid the vanishing and exploding gradients problem in the same way as unitary matrices, but they have real-valued entries instead of complex-valued entries. The authors show that their hand-designed networks work well when applied to the task for which they are designed, but produce poor results when applied to other tasks. These experiments illustrate the difficulty of designing general networks that perform well on a range of tasks.
Strongly-Typed Recurrent Neural Networks
Balduzzi, D., & Ghifary, M. (2016)
Physics has the notion of dimensional homogeneity, i.e. it is only meaningful to add quantities of the same physical units. Types in programming languages express a similar idea. The authors extend these ideas to constrain RNN design. They define a type as an inner product space with an orthonormal basis. The operations and transformations that a neural network performs can then be expressed in terms of types. For example, applying an activation function to a vector preserves its type. In contrast, applying an orthogonal weight matrix to a vector transforms its type. The authors argue that the feedback loop of RNNs produces vectors that are type-inconsistent with the feed-forward vectors for addition. While symmetric weight matrices are one way to preserve types in feedback loops, the authors tweak the LSTM and GRU networks to produce variants that have strong types. Experiments were inconclusive in showing better generalization of typed networks, but they are an interesting avenue for further research.
2. Unsupervised Learning
The resurgence of deep learning in the mid-2000s was made possible to a large extent by using unsupervised learning to pre-train deep neural networks to establish good initial weights for later supervised training. Later, using large labeled data sets for supervised training was found to obviate the need for unsupervised pre-training. But more recently, there has been renewed interest in utilizing unsupervised learning to improve the performance of supervised training, particularly by combining both into the same training phase.
Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification
Zhang, Y., Lee, K., & Lee, H. (2016)
This paper starts out with a brief history of using unsupervised and semi-supervised methods in deep learning. The authors showed how such methods can be scaled to solve large-scale problems. Using their approach, existing neural network architectures for image classification can be augmented with unsupervised decoding pathways for image reconstruction. The decoding pathways consist of a deconvolutional network that mirrors the original network using autoencoders. They initialized the weights for the encoding pathway with the original network and for the decoding pathway with random values. Initially, they trained only the decoding pathway while keeping the encoding pathway fixed. Then they fine-tuned the full network with a reduced learning rate. Applying this method to a state-of-the-art image classification network boosted its
performance significantly.
Deconstructing the Ladder Network Architecture
Pezeshki, M., Fan, L., Brakel, P., Courville, A., & Bengio, Y. (2016)
A different approach for combining supervised and unsupervised training of deep neural networks is the Ladder Network architecture. It also improves the performance of an existing classifier network by augmenting it with an auxiliary decoder network, but it has additional lateral connections between the original and decoder networks. The resultant network forms a deep stack of denoising autoencoders that is trained to reconstruct each layer from a noisy version. In this paper, the authors studied the ladder architecture systematically by removing its components one at a time to see how much each component contributed to performance. They found that the lateral connections are the most important, followed by the injection of noise, and finally by the choice of the combinator function that combines the vertical and lateral connections. They also introduced a new combinator function that improved the already impressive performance of the ladder network on the Permutation-Invariant MNIST handwritten digit recognition task, both for the supervised and semi-supervised settings.
【阿尔法系列】将定期介绍业界最新成果及各种阿尔法策略!想要随时跟踪【阿尔法系列】,请点击头像下方“加关注”。关注成功后,查看这里即可:三步走把千本好书“一网打尽”!。
请继续往下看。。。。