Suppose you have a population. You want to divide this population into relevant subgroups based on specific features characterizing each subgroup, so that you can accurately predict outcomes associated with each subgroup. For instance, you could :
- Use the list of the people on the Titanic, and by dividing them into subgroups depending on specific criteria (e.g. female vs male, passengers in 1st class vs 2nd and 3rd class, age class....) determines if they were (probably) going to survive or not.
- Look at the people who bought product on your e-commerce website, divide this population into segments depending on specific features (e.g. returning visitors vs new visitors, localization, ...) and determines for future visitors if they are (probably) going to buy your product or not.
In sum, you want to create a model that predicts the value of a target variable (e.g. survive/die; buy/not buy) based on simple decision rules inferred from the data features (e.g. female vs male, age, etc.).
The result is a decision tree that offers the great advantage to be easily vizualized and simple to understand. For instance, the picture below, fromwikipedia, shows the probability of passengers of the Titanic to survive depending on their sex, age and number of spouses or siblings aboard. Note how each branching is based on answering a question (the decision rule) and how the graph looks like an inverted tree.
本帖隐藏的内容