Automated Machine Learning

1关注
185
粉丝

版主

已卖：2994份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

计量文库

0%

威望: 7 级
论坛币: 84105 个
通用积分: 31671.0967
学术水平: 1454 点
热心指数: 1573 点
信用等级: 1364 点
经验: 384134 点
帖子: 9629
精华: 66
在线时间: 5508 小时
注册时间: 2007-5-21
最后登录: 2025-7-8

楼主

oliyiyi 发表于 2017-2-19 12:33:46 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

本帖隐藏的内容

Automated Machine Learning (AutoML) has become a topic of considerable interest over the past year. A recent KDnuggets blog competition focused on this topic, resulting in a handful of interesting ideas and projects. Several AutoML tools have been generating notable interest and gaining respect and notoriety in this time frame as well.

This post will provide a brief explanation of AutoML, argue for its justification and adoption, present a pair of contemporary tools for its pursuit, and discuss AutoML's anticipated future and direction.

What is Automated Machine Learning?

We can talk about what automated machine learning is, and we can talk about what automated machine learning is not.

AutoML is not automated data science. While there is undoubtedly overlap, machine learning is but one of many tools in the data science toolkit, and its use does not actually factor in to all data science tasks. For example, if prediction will be part of a given data science task, machine learning will be a useful component; however, machine learning may not play in to a descriptive analytics task at all.

Even for predictive tasks, data science encompasses much more than the actual predictive modeling. Data scientist Sandro Saitta, when discussing the potential confusion between AutoML and automated data science, had this to say:

The misconception comes from the confusion between the whole Data Science process (see for example CRISP-DM) and the sub-tasks of data preparation (feature extraction, etc.) and modeling (algorithm selection, hyper-parameters tuning, etc.) which I call Machine Learning.
[...]
When you read news about tools that automate Data Science and Data Science competitions, people with no industry experience may be confused and think that Data Science is only modeling and can be fully automated.

He is absolutely correct, and it's not just a matter of semantics. If you want (need?) more clarification on the relationship between machine learning and data science (and several other related concepts), read this.

Further, data scientist and leading automated machine learning proponent Randy Olson states that effective machine learning design requires us to:

Always tune the hyperparameters for our models
Always try out many different models
Always explore numerous feature representations for our data

Taking all of the above into account, if we consider AutoML to be the tasks of algorithm selection, hyperparameter tuning, iterative modeling, and model assessment, we can start to define what AutoML actually is. There will not be total agreement on this definition (for comparison, ask 10 people to define "data science," and then compare the 11 answers you get), but it arguably starts us off on the right foot.

Why Do We Need It?

While we are done with defining concepts, as an exercise in considering why AutoML may be beneficial, let's have a look at why machine learning is hard.

Credit: S. Zayd Enam

AI Researcher and Stanford University PhD candidate S. Zayd Enam, in a fantastic blog post titled "Why is machine learning 'hard'?," recently wrote the following (emphasis added):

[M]achine learning remains a relatively ‘hard’ problem. There is no doubt the science of advancing machine learning algorithms through research is difficult. It requires creativity, experimentation and tenacity. Machine learning remains a hard problem when implementing existing algorithms and models to work well for your new application.

Note that, while Enam is primarily referring to machine learning research, he also touches on the implementation of existing algorithms in use cases (see emphasis).

Enam goes on to elaborate on the difficulties of machine learning, and focuses on the nature of algorithms (again, emphasis added):

An aspect of this difficulty involves building an intuition for what tool should be leveraged to solve a problem. This requires being aware of available algorithms and models and the trade-offs and constraints of each one.
[...]
The difficulty is that machine learning is a fundamentally hard debugging problem. Debugging for machine learning happens in two cases: 1) your algorithm doesn't work or 2) your algorithm doesn't work well enough.[...] Very rarely does an algorithm work the first time and so this ends up being where the majority of time is spent in building algorithms.

Enam then eloquantly elaborates this framed problem from the algorithm research point of view. Again, however, what he says applies to... well, applying algorithms. If an algorithm does not work, or does not do so well enough, and the process of choosing and refinining becomes iterative, this exposes an opportunity for automation, hence automated machine learning.

I have previously attempted to capture AutoML's essence as follows:

If, as Sebastian Raschka has described it, computer programming is about automation, and machine learning is "all about automating automation," then automated machine learning is "the automation of automating automation." Follow me, here: programming relieves us by managing rote tasks; machine learning allows computers to learn how to best perform these rote tasks; automated machine learning allows for computers to learn how to optimize the outcome of learning how to perform these rote actions.
This is a very powerful idea; while we previously have had to worry about tuning parameters and hyperparameters, automated machine learning systems can learn the best way to tune these for optimal outcomes by a number of different possible methods.

The rationale for AutoML stems from this idea: if numerous machine learning models must be built, using a variety of algorithms and a number of differing hyperparameter configurations, then this model building can be automated, as can the comparison of model performance and accuracy.

Simple, right?

A Comparison of Select Automated Machine Learning Tools

Now that we know what AutoML is, and why we would use it... how do we do it? The following is an overview and comparison of a pair of contemporary Python AutoML tools which take different approaches in an attempt to achieve more or less the same goal, that of automating the machine learning process.

[size=+1]Auto-sklearn

Auto-sklearn is "an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator." It also happens to be the winner of KDnuggets' recent automated data science and machine learning blog contest.

auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction. Learn more about the technology behind auto-sklearn by reading this paper published at the NIPS 2015.

As the above excerpt from the project's documentation notes, Auto-sklearn performs hyperparameter optimization by way of Bayesian optimization, which proceeds by iterating the following steps:

Build a probabilistic model to capture the relationship between hyperparameter settings and their performance
Use the model to select useful hyperparameter settings to try next by trading off exploration (searching in parts of the space where the model is uncertain) and exploitation (focussing on parts of the space predicted to perform well)
Run the machine learning algorithm with those hyperparameter settings

Further explanation of how this process plays out follows:

This process can be generalized to jointly select algorithms, preprocessing methods, and their hyperparameters as follows: the choices of classifier / regressor and preprocessing methods are top-level, categorical hyperparameters, and based on their settings the hyperparameters of the selected methods become active. The combined space can then be searched with Bayesian optimization methods that handle such high-dimensional, conditional spaces; we use the random-forest-based SMAC, which has been shown to work best for such cases.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏3 回帖

关键词：Automated Learning earning machine Learn interest handful present provide respect

Automated Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

Automated Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群