发帖

楼主: oliyiyi

1586 2

How to Get the Most From Your Machine Learning Data [推广有奖]

1关注
185
粉丝

版主

已卖：2995份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

计量文库

0%

威望: 7 级
论坛币: 66190 个
通用积分: 31671.1867
学术水平: 1454 点
热心指数: 1573 点
信用等级: 1364 点
经验: 384134 点
帖子: 9629
精华: 66
在线时间: 5508 小时
注册时间: 2007-5-21
最后登录: 2025-7-8

楼主

oliyiyi 发表于 2018-4-19 21:01:27 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

How to Get the Most From Your Machine Learning Data3d
[/url][url=]

The data that you use, and how you use it, will likely define the success of your predictive modeling problem.
Data and the framing of your problem may be the point of biggest leverage on your project.
Choosing the wrong data or the wrong framing for your problem may lead to a model with poor performance or, at worst, a model that cannot converge.
It is not possible to analytically calculate what data to use or how to use it, but it is possible to use a trial-and-error process to discover how to best use the data that you have.
In this post, you will discover to get the most from your data on your machine learning project.
After reading this post, you will know:

The importance of exploring alternate framings of your predictive modeling problem.
The need to develop a suite of “views” on your input data and to systematically test each.
The notion that feature selection, engineering, and preparation are ways of creating more views on your problem.

Let’s get started.

How to Get the Most From Your Machine Learning Data
Photo by Jean-Marc Bolfing, some rights reserved.

OverviewThis post is divided into 8 parts; they are:

Problem Framing
Collect More Data
Study Your Data
Training Data Sample Size
Feature Selection
Feature Engineering
Data Preparation
Go Further

1. Problem FramingBrainstorm multiple ways to frame your predictive modeling problem.
The framing of the problem means the combination of:

Inputs
Outputs
Problem Type

For example:

Can you use more or less data as inputs to the model?
Can you predict something else instead?
Can you change the problem to be regression/classification/sequence/etc.?

The more creative you get, the better.
Use ideas from other projects, papers, and the domain itself.
Brainstorm. Write down all of the ideas, even if they are crazy.
I have some frameworks that will help with brainstorming the framing here:

How to Define Your Machine Learning Problem

I talk a little about changing the problem type in this post:

Difference Between Classification and Regression in Machine Learning

2. Collect More DataGet more data than you need, even data that is tangentially related to the outcome being predicted.
We cannot know how much data will be needed.
Data is the currency spent during model development. It is the oxygen needed by the project to breathe. Each time you use some data, it is less data available for other tasks.
You need to spend data on tasks like:

Model training.
Model evaluation.
Model tuning.
Model validation.

Further, the project is new. No one has done your specific project before, modeled your specific data. You don’t really know what features will be useful yet. You might have ideas, but you don’t know. Collect them all; make them all available at this stage.
3. Study Your DataUse every data visualization you can think of to look at your data from every angle.

Looking at raw data helps. You will notice things.
Looking at summary statistics helps. Again, you will notice things.
Data visualization is like a beautiful combination of these two ways of learning. You will notice a lot more things.

Spend a long time with your raw data and summary statistics. Then move on to the visualizations last as they can take more time to prepare.
Use every data visualization you can think of and glean from books and papers on your data.

Review plots.
Save plots.
Annotate plots.
Show plots to domain experts.

You are seeking a little more insight into the data. Ideas that you can use to help better select, engineer, and prepare data for modeling. It will pay off.
4. Training Data Sample SizePerform a sensitivity analysis with your data sample to see how much (or little) data you actually need.
You do not have all observations. If you did, you would not need to make predictions for new data.
Instead, you are working with a sample of the data. Therefore, there is an open question as to how much data will be needed to fit the model.
Don’t assume that more is better. Test.

Design experiments to see how model skill changes with sample size.
Use statistics to see how important trends and tendencies change with sample size.

Without this knowledge, you won’t know enough about your test harness to comment on model skill sensibly.
Learn more about sample size in this post:

How Much Training Data is Required for Machine Learning?

5. Feature SelectionCreate many different views of your input features and test each one.
You don’t know what variables will be helpful or most helpful in your predictive modeling problem.

You can guess.
You can use advice from domain experts.
You can even use suggestions from feature selection methods.

But they are all just guesses.
Each set of suggested input features is a “view” on your problem. An idea on what features might be useful for modeling and predicting the output variable.
Brainstorm, compute, and collect as many different views of your input data as you can.
Design experiments and carefully test and compare each view. Use data to inform you which features and which view are the most predictive.
For more on feature selection, see this post:

An Introduction to Feature Selection

6. Feature EngineeringUse feature engineering to create additional features and views on your predictive modeling problem.
Sometimes you have all of the data you can get, but a given feature or set of features locks up knowledge that is too dense for the machine learning methods to learn and map to the outcome variable.
Examples include:

Date/Times.
Transactions.
Descriptions.

Break down these data into simpler additional component features, such as counts, flags, and other elements.
Make things as simple as you can for the modeling process.
For more on feature engineering, see the post:

Discover Feature Engineering, How to Engineer Features, and How to Get Good at It

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏5 回帖

关键词：Learning earning machine Learn Earn

How to Get the Most From Your Machine Learning Data [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

How to Get the Most From Your Machine Learning Data [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群