Elephas: Distributed Deep Learning with Keras & Spark Build Status
Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark. Elephas currently supports a number of applications, including:
Data-parallel training of deep learning models
Distributed hyper-parameter optimization
Distributed training of ensemble models
Schematically, elephas works as follows.
Elephas
Table of content:
Elephas: Distributed Deep Learning with Keras & Spark
Introduction
Getting started
Installation
Basic example
Spark ML example
Usage of data-parallel models
Model updates (optimizers)
Update frequency
Update mode
Asynchronous updates with read and write locks (mode='asynchronous')
Asynchronous updates without locks (mode='hogwild')
Synchronous updates (mode='synchronous')
Degree of parallelization (number of workers)
Distributed hyper-parameter optimization
Distributed training of ensemble models
Discussion
Future work & contributions
Literature
Introduction
Elephas brings deep learning with Keras to Spark. Elephas intends to keep the simplicity and high usability of Keras, thereby allowing for fast prototyping of distributed models, which can be run on massive data sets. For an introductory example, see the following iPython notebook.
ἐλέφας is Greek for ivory and an accompanying project to κέρας, meaning horn. If this seems weird mentioning, like a bad dream, you should confirm it actually is at the Keras documentation. Elephas also means elephant, as in stuffed yellow elephant.
Elephas implements a class of data-parallel algorithms on top of Keras, using Spark's RDDs and data frames. Keras Models are initialized on the driver, then serialized and shipped to workers, alongside with data and broadcasted model parameters. Spark workers deserialize the model, train their chunk of data and send their gradients back to the driver. The "master" model on the driver is updated by an optimizer, which takes gradients either synchronously or asynchronously.
Getting started
Installation
Install elephas from PyPI with
pip install elephas
Depending on what OS you are using, you may need to install some prerequisite modules (LAPACK, BLAS, fortran compiler) first.
A quick way to install Spark locally is to use homebrew on Mac
brew install spark
or linuxbrew on linux.
brew install apache-spark
The brew version of Spark may be outdated at times. To build from source, simply follow the instructions at the Spark download section or use the following commands.
After that, make sure to put these path variables to your shell profile (e.g. ~/.zshrc):
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
Using Docker
Install and get Docker running by following the instructions here (https://www.docker.com/).
Building
The build takes quite a while to run the first time since many packages need to be downloaded and installed. In the same directory as the Dockerfile run the following commands
docker build . -t pyspark/elephas
Running
The following command starts a container with the Notebook server listening for HTTP connections on port 8899 (since local Jupyter notebooks use 8888) without authentication configured.
A SparkModel is defined by passing Spark context and Keras model. Additionally, one has choose an optimizer used for updating the elephas model, an update frequency, a parallelization mode and the degree of parallelism, i.e. the number of workers.
from elephas.spark_model import SparkModel
from elephas import optimizers as elephas_optimizers
Increasing the driver memory even further may be necessary, as the set of parameters in a network may be very large and collecting them on the driver eats up a lot of resources. See the examples folder for a few working examples.