Recently I had to work on a Machine Learning problem for class and found a good opportunity for a Spark Tutorial. Using store sales from Rossmann found on kaggle, we are going to set up a machine learning pipeline to cover everything from the preprocessing all the way to making and saving predictions. In a sense, this is a "full-stack" machine learning project that's probably fairly similar to something we might do in the real world. Spark's ML Pipelines API is going to make it very easy for us to do this. You can follow along with the code on my github or below.