Preface................................................................................................................................................ 5
Audience................................................................................................................................................5
How This Book is Organized.............................................................................................................. 6
Supporting Books.................................................................................................................................6
Code Examples..................................................................................................................................... 7
Early Release Status and Feedback................................................................................................... 7
Chapter 1. Introduction to Data Analysis with Spark......................................................8
What is Apache Spark?....................................................................................................................... 8
A Unified Stack.....................................................................................................................................8
Who Uses Spark, and For What?......................................................................................................11
A Brief History of Spark.................................................................................................................... 13
Spark Versions and Releases............................................................................................................ 13
Spark and Hadoop............................................................................................................................. 14
Chapter 2. Downloading and Getting Started...................................................................15
Downloading Spark............................................................................................................................15
Introduction to Spark’s Python and Scala Shells.......................................................................... 16
Introduction to Core Spark Concepts.............................................................................................20
Standalone Applications...................................................................................................................23
Conclusion.......................................................................................................................................... 25
Chapter 3. Programming with RDDs................................................................................... 26
RDD Basics......................................................................................................................................... 26
Creating RDDs................................................................................................................................... 28
RDD Operations................................................................................................................................ 28
Passing Functions to Spark.............................................................................................................. 32
Common Transformations and Actions......................................................................................... 36
Persistence (Caching)........................................................................................................................46
Conclusion.......................................................................................................................................... 48
Chapter 4. Working with Key-Value Pairs.........................................................................49
4
Motivation.......................................................................................................................................... 49
Creating Pair RDDs........................................................................................................................... 49
Transformations on Pair RDDs....................................................................................................... 50
Actions Available on Pair RDDs......................................................................................................60
Data Partitioning................................................................................................................................61
Conclusion.......................................................................................................................................... 70
Chapter 5. Loading and Saving Your Data.......................................................................... 71
Motivation........................................................................................................................................... 71
Choosing a Format............................................................................................................................. 71
Formats............................................................................................................................................... 72
File Systems........................................................................................................................................88
Compression.......................................................................................................................................89
Databases............................................................................................................................................ 91
Conclusion.......................................................................................................................................... 93
About the Authors.....................................................................................................................
本帖隐藏的内容
注意:这个是pre-release版本