楼主: oliyiyi
1019 2

How to Get the Most From Your Machine Learning Data [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
How to Get the Most From Your Machine Learning Data3d
[/url][url=]



The data that you use, and how you use it, will likely define the success of your predictive modeling problem.
Data and the framing of your problem may be the point of biggest leverage on your project.
Choosing the wrong data or the wrong framing for your problem may lead to a model with poor performance or, at worst, a model that cannot converge.
It is not possible to analytically calculate what data to use or how to use it, but it is possible to use a trial-and-error process to discover how to best use the data that you have.
In this post, you will discover to get the most from your data on your machine learning project.
After reading this post, you will know:
  • The importance of exploring alternate framings of your predictive modeling problem.
  • The need to develop a suite of “views” on your input data and to systematically test each.
  • The notion that feature selection, engineering, and preparation are ways of creating more views on your problem.
Let’s get started.
How to Get the Most From Your Machine Learning Data
Photo by Jean-Marc Bolfing, some rights reserved.

OverviewThis post is divided into 8 parts; they are:
  • Problem Framing
  • Collect More Data
  • Study Your Data
  • Training Data Sample Size
  • Feature Selection
  • Feature Engineering
  • Data Preparation
  • Go Further
1. Problem FramingBrainstorm multiple ways to frame your predictive modeling problem.
The framing of the problem means the combination of:
  • Inputs
  • Outputs
  • Problem Type
For example:
  • Can you use more or less data as inputs to the model?
  • Can you predict something else instead?
  • Can you change the problem to be regression/classification/sequence/etc.?
The more creative you get, the better.
Use ideas from other projects, papers, and the domain itself.
Brainstorm. Write down all of the ideas, even if they are crazy.
I have some frameworks that will help with brainstorming the framing here:
I talk a little about changing the problem type in this post:
2. Collect More DataGet more data than you need, even data that is tangentially related to the outcome being predicted.
We cannot know how much data will be needed.
Data is the currency spent during model development. It is the oxygen needed by the project to breathe. Each time you use some data, it is less data available for other tasks.
You need to spend data on tasks like:
  • Model training.
  • Model evaluation.
  • Model tuning.
  • Model validation.
Further, the project is new. No one has done your specific project before, modeled your specific data. You don’t really know what features will be useful yet. You might have ideas, but you don’t know. Collect them all; make them all available at this stage.
3. Study Your DataUse every data visualization you can think of to look at your data from every angle.
  • Looking at raw data helps. You will notice things.
  • Looking at summary statistics helps. Again, you will notice things.
  • Data visualization is like a beautiful combination of these two ways of learning. You will notice a lot more things.
Spend a long time with your raw data and summary statistics. Then move on to the visualizations last as they can take more time to prepare.
Use every data visualization you can think of and glean from books and papers on your data.
  • Review plots.
  • Save plots.
  • Annotate plots.
  • Show plots to domain experts.
You are seeking a little more insight into the data. Ideas that you can use to help better select, engineer, and prepare data for modeling. It will pay off.
4. Training Data Sample SizePerform a sensitivity analysis with your data sample to see how much (or little) data you actually need.
You do not have all observations. If you did, you would not need to make predictions for new data.
Instead, you are working with a sample of the data. Therefore, there is an open question as to how much data will be needed to fit the model.
Don’t assume that more is better. Test.
  • Design experiments to see how model skill changes with sample size.
  • Use statistics to see how important trends and tendencies change with sample size.
Without this knowledge, you won’t know enough about your test harness to comment on model skill sensibly.
Learn more about sample size in this post:
5. Feature SelectionCreate many different views of your input features and test each one.
You don’t know what variables will be helpful or most helpful in your predictive modeling problem.
  • You can guess.
  • You can use advice from domain experts.
  • You can even use suggestions from feature selection methods.
But they are all just guesses.
Each set of suggested input features is a “view” on your problem. An idea on what features might be useful for modeling and predicting the output variable.
Brainstorm, compute, and collect as many different views of your input data as you can.
Design experiments and carefully test and compare each view. Use data to inform you which features and which view are the most predictive.
For more on feature selection, see this post:
6. Feature EngineeringUse feature engineering to create additional features and views on your predictive modeling problem.
Sometimes you have all of the data you can get, but a given feature or set of features locks up knowledge that is too dense for the machine learning methods to learn and map to the outcome variable.
Examples include:
  • Date/Times.
  • Transactions.
  • Descriptions.
Break down these data into simpler additional component features, such as counts, flags, and other elements.
Make things as simple as you can for the modeling process.
For more on feature engineering, see the post:


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning earning machine Learn Earn

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
albertwishedu 发表于 2018-4-20 08:35:42 |只看作者 |坛友微信交流群

使用道具

藤椅
minixi 发表于 2018-4-20 11:56:24 |只看作者 |坛友微信交流群
学习了。谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 19:22