by Curtis Miller (Author)
About the Author
Curtis Miller is a doctoral candidate at the University of Utah studying mathematical statistics. He writes software for both research and personal interest, including the R package (CPAT) available on the Comprehensive R Archive Network (CRAN). Among Curtis Miller's publications are academic papers along with books and video courses all published by Packt Publishing. Curtis Miller's video courses include Unpacking NumPy and Pandas, Data Acquisition and Manipulation with Python, Training Your Systems with Python Statistical Modelling, and Applications of Statistical Learning with Python. His books include Hands-On Data Analysis with NumPy and Pandas.
About this book
Leverage the power of Python and statistical modeling techniques for building accurate predictive models
Key Features
- Get introduced to Python's rich suite of libraries for statistical modeling
- Implement regression, clustering and train neural networks from scratch
- Includes real-world examples on training end-to-end machine learning systems in Python
Book Description
Python's ease of use and multi-purpose nature has led it to become the choice of tool for many data scientists and machine learning developers today. Its rich libraries are widely used for data analysis, and more importantly, for building state-of-the-art predictive models. This book takes you through an exciting journey, of using these libraries to implement effective statistical models for predictive analytics.
You'll start by diving into classical statistical analysis, where you will learn to compute descriptive statistics using pandas. You will look at supervised learning, where you will explore the principles of machine learning and train different machine learning models from scratch. You will also work with binary prediction models, such as data classification using k-nearest neighbors, decision trees, and random forests. This book also covers algorithms for regression analysis, such as ridge and lasso regression, and their implementation in Python. You will also learn how neural networks can be trained and deployed for more accurate predictions, and which Python libraries can be used to implement them.
By the end of this book, you will have all the knowledge you need to design, build, and deploy enterprise-grade statistical models for machine learning using Python and its rich ecosystem of libraries for predictive analytics.
What you will learn
- Understand the importance of statistical modeling
- Learn about the various Python packages for statistical analysis
- Implement algorithms such as Naive Bayes, random forests, and more
- Build predictive models from scratch using Python's scikit-learn library
- Implement regression analysis and clustering
- Learn how to train a neural network in Python
Who this book is for
If you are a data scientist, a statistician or a machine learning developer looking to train and deploy effective machine learning models using popular statistical techniques, then this book is for you. Knowledge of Python programming is required to get the most out of this book.
Brief contents
Classical Statistical Analysis
• Technical requirements
• Computing descriptive statistics
• Preprocessing the data
• Computing basic statistics
• Classical inference for proportions
• Computing confidence intervals for proportions
• Hypothesis testing for proportions
• Testing for common proportions
• Classical inference for means
• Computing confidence intervals for means
• Hypothesis testing for means
• Testing with two samples
• One-way analysis of variance (ANOVA)
• Diving into Bayesian analysis
• How Bayesian analysis works
• Using Bayesian analysis to solve a hit-and-run
• Bayesian analysis for proportions
• Conjugate priors for proportions
• Credible intervals for proportions
• Bayesian hypothesis testing for proportions
• Comparing two proportions
• Bayesian analysis for means
• Credible intervals for means
• Bayesian hypothesis testing for means
• Testing with two samples
• Finding correlations
• Testing for correlation
• Summary
Introduction to Supervised Learning
• Principles of machine learning
• Checking the variables using the iris dataset
• The goal of supervised learning
• Training models
• Issues in training supervised learning models
• Splitting data
• Cross-validation
• Evaluating models
• Accuracy
• Precision
• Recall
• F1 score
• Classification report
• Bayes factor
• Summary
Binary Prediction Models
• K-nearest neighbors classifier
• Training a kNN classifier
• Hyperparameters in kNN classifiers
• Decision trees
• Fitting the decision tree
• Visualizing the tree
• Restricting tree depth
• Random forests
• Optimizing hyperparameters
• Naive Bayes classifier
• Preprocessing the data
• Training the classifier
• Support vector machines
• Training a SVM
• Logistic regression
• Fitting a logit model
• Extending beyond binary classifiers
• Multiple outcomes for decision trees
• Multiple outcomes for random forests
• Multiple outcomes for Naive Bayes
• One-versus-all and one-versus-one classification
• Summary
Regression Analysis and How to Use It
• Linear models
• Fitting a linear model with OLS
• Performing cross-validation
• Evaluating linear models
• Using AIC to pick models
• Bayesian linear models
• Choosing a polynomial
• Performing Bayesian regression
• Ridge regression
• Finding the right alpha value
• LASSO regression
• Spline interpolation
• Using SciPy for interpolation
• 2D interpolation
• Summary
Neural Networks
• An introduction to perceptrons
• Neural networks
• The structure of a neural network
• Types of neural networks
• The MLP model
• MLPs for classification
• Optimization techniques
• Training the network
• Fitting an MLP to the iris dataset
• Fitting an MLP to the digits dataset
• MLP for regression
• Summary
Clustering Techniques
• Introduction to clustering
• Computing distances
• Exploring the k-means algorithm
• Clustering the iris dataset
• Compressing images with k-means
• Evaluating clusters
• The elbow method
• The silhouette method
• Hierarchical clustering
• Clustering the iris dataset
• Clustering the Headlines dataset
• Spectral clustering
• Clustering the Headlines dataset
• Summary
Dimensionality Reduction
• Introducing dimensionality reduction
• Uses of dimensionality reduction
• Principal component analysis
• Demonstration of PCA
• Choosing the number of components
• Singular value decomposition
• SVD for image compression
• Low-rank approximation
• Reconstructing the image using compact SVD
• Low-dimensional representation
• Example of MDS
• MDS in action
• How MDS comes into the picture
• Constructing distances
• Summary
Pages: 290 pages
Publisher: Packt Publishing (May 20, 2019)
Language: English
ISBN-10: 1838823735
ISBN-13: 978-1838823733