Table of Contents
Preface
Chapter 1: Data Understanding
Chapter 2: Data Preparation – Select
Chapter 3: Data Preparation – Clean
Chapter 4: Data Preparation – Construct
Chapter 5: Data Preparation – Integrate and Format
Chapter 6: Selecting and Building a Model
Chapter 7: Modeling – Assessment, Evaluation, Deployment, and Monitoring
Chapter 8: CLEM Scripting
Appendix: Business Understanding
Index
- Preface
- Chapter 1: Data Understanding
- Introduction
- Using an empty aggregate to evaluate sample size
- Evaluating the need to sample from the initial data
- Using CHAID stumps when interviewing an SME
- Using a single cluster K-means as an alternative to anomaly detection
- Using an @NULL multiple Derive to explore missing data
- Creating an Outlier report to give to SMEs
- Detecting potential model instability early using the Partition node and Feature Selection node
- Chapter 2: Data Preparation – Select
- Introduction
- Using the Feature Selection node creatively to remove or decapitate perfect predictors
- Running a Statistics node on anti-join to evaluate the potential missing data
- Evaluating the use of sampling for speed
- Removing redundant variables using correlation matrices
- Selecting variables using the CHAID Modeling node
- Selecting variables using the Means node
- Selecting variables using single-antecedent Association Rules
- Chapter 3: Data Preparation – Clean
- Introduction
- Binning scale variables to address missing data
- Using a full data model/partial data model approach to address missing data
- Imputing in-stream mean or median
- Imputing missing values randomly from uniform or normal distributions
- Using random imputation to match a variable's distribution
- Searching for similar records using a Neural Network for inexact matching
- Using neuro-fuzzy searching to find similar names
- Producing longer Soundex codes
- Chapter 4: Data Preparation – Construct
- Introduction
- Building transformations with multiple Derive nodes
- Calculating and comparing conversion rates
- Grouping categorical values
- Transforming high skew and kurtosis variables with a multiple Derive node
- Creating flag variables for aggregation
- Using Association Rules for interaction detection/feature creation
- Creating time-aligned cohorts
- Chapter 5: Data Preparation – Integrate and Format
- Introduction
- Speeding up merge with caching and optimization settings
- Merging a lookup table
- Shuffle-down (nonstandard aggregation)
- Cartesian product merge using key-less merge by key
- Multiplying out using Cartesian product merge, user source, and derive dummy
- Changing large numbers of variable names without scripting
- Parsing nonstandard dates
- Parsing and performing a conversion on a complex stream
- Sequence processing
- Chapter 6: Selecting and Building a Model
- Introduction
- Evaluating balancing with Auto Classifier
- Building models with and without outliers
- Using Neural Network for Feature Selection
- Creating a bootstrap sample
- Creating bagged logistic regression models
- Using KNN to match similar cases
- Using Auto Classifier to tune models
- Next-Best-Offer for large datasets
- Chapter 7: Modeling – Assessment, Evaluation, Deployment, and Monitoring
- Introduction
- How (and why) to validate as well as test
- Using classification trees to explore the predictions of a Neural Network
- Correcting a confusion matrix for an imbalanced target variable by incorporating priors
- Using aggregate to write cluster centers to Excel for conditional formatting
- Creating a classification tree financial summary using aggregateand an Excel Export node
- Reformatting data for reporting with a Transpose node
- Changing formatting of fields in a Table node
- Combining generated filters
- Chapter 8: CLEM Scripting
- Introduction
- Building iterative Neural Network forecasts
- Quantifying variable importance with Monte Carlo simulation
- Implementing champion/challenger model management
- Detecting Outliers with the jackknife method
- Optimizing K-means cluster solutions
- Automating time series forecasts
- Automating HTML reports and graphs
- Rolling your own modeling algorithm – Weibull analysis
- Appendix: Business Understanding
- Introduction
- Define business objectives by Tom Khabaza
- Assessing the situation by Meta Brown
- Translating your business objective into a data mining objective by Dean Abbott
- Produce a project plan – ensuring a realistic timeline by Keith McCormick