[New Book, 2017] Fast Data Processing with Spark 2 - Third Edition: - spark高速集群计算平台

5关注
23粉丝

学科带头人

79%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 29082 个
通用积分: 844.3645
学术水平: 53 点
热心指数: 70 点
信用等级: 58 点
经验: 176572 点
帖子: 3222
精华: 0
在线时间: 1395 小时
注册时间: 2013-7-21
最后登录: 2024-4-22

楼主

neuroexplorer 发表于 2017-1-13 03:11:06 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Book Description

When people want a way to process Big Data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it’s unsurprising that it’s becoming popular with data analysts and engineers everywhere.

Beginning with the fundamentals, we’ll show you how to get set up with Spark with minimum fuss. You’ll then get to grips with some simple APIs before investigating machine learning and graph processing – throughout we’ll make sure you know exactly how to apply your knowledge.

You will also learn how to use the Spark shell, how to load data before finding out how to build and run your own Spark applications. Discover how to manipulate your RDD and get stuck into a range of DataFrame APIs. As if that’s not enough, you’ll also learn some useful Machine Learning algorithms with the help of Spark MLlib and integrating Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you learn more about the GraphX API

https://bbs.pinggu.org/thread-5060262-1-1.html

What You Will Learn

Install and set up Spark in your cluster
Prototype distributed applications with Spark's interactive shell
Perform data wrangling using the new DataFrame APIs
Get to know the different ways to interact with Spark's distributed representation of data (RDDs)
Query Spark with a SQL-like query syntax
See how Spark works with Big Data
Implement machine learning systems with highly scalable algorithms
Use R, the popular statistical language, to work with Spark
Apply interesting graph algorithms and graph processing with GraphX

Authors :Krishna Sankar

Krishna Sankar is a Senior Specialist—AI Data Scientist with Volvo Cars focusing on Autonomous Vehicles. His earlier stints include Chief Data Scientist at http://cadenttech.tv/, Principal Architect/Data Scientist at Tata America Intl. Corp., Director of Data Science at a bioinformatics startup, and as a Distinguished Engineer at Cisco. He has been speaking at various conferences including ML tutorials at Strata SJC and London 2016, Spark Summit [goo.gl/ab30lD], Strata-Spark Camp, OSCON, PyCon, and PyData, writes about Robots Rules of Order [goo.gl/5yyRv6], Big Data Analytics—Best of the Worst [goo.gl/ImWCaz], predicting NFL, Spark [http://goo.gl/E4kqMD], Data Science [http://goo.gl/9pyJMH], Machine Learning [http://goo.gl/SXF53n], Social Media Analysis [http://goo.gl/D9YpVQ] as well as has been a guest lecturer at the Naval Postgraduate School. His occasional blogs can be found at https://doubleclix.wordpress.com/. His other passion is flying drones (working towards Drone Pilot License (FAA UAS Pilot) and Lego Robotics—you will find him at the St.Louis FLL World Competition as Robots Design Judge.

Table of Contents1: INSTALLING SPARK AND SETTING UP YOUR CLUSTER
Directory organization and convention

Installing the prebuilt distribution

Building Spark from source

Spark topology

A single machine

Running Spark on EC2

Deploying Spark with Chef (Opscode)

Deploying Spark on Mesos

Spark on YARN

Spark standalone mode

References

Summary

2: USING THE SPARK SHELL

The Spark shell

Loading a simple text file

Interactively loading data from S3

Summary

3: BUILDING AND RUNNING A SPARK APPLICATION

Building Spark applications

Data wrangling with iPython

Developing Spark with Eclipse

Developing Spark with other IDEs

Building your Spark job with Maven

Building your Spark job with something else

References

Summary

4: CREATING A SPARKSESSION OBJECT

SparkSession versus SparkContext

Building a SparkSession object

SparkContext - metadata

Shared Java and Scala APIs

Python

iPython

Reference

Summary

5: LOADING AND SAVING DATA IN SPARK

Spark abstractions

Data modalities

Data modalities and Datasets/DataFrames/RDDs

Loading data into an RDD

Saving your data

References

Summary

6: MANIPULATING YOUR RDD

Manipulating your RDD in Scala and Java

Manipulating your RDD in Python

References

Summary

7: SPARK 2.0 CONCEPTS

Code and Datasets for the rest of the book

The data scientist and Spark features

Spark v2.0 and beyond

Apache Spark - evolution

Apache Spark - the full stack

The art of a big data store - Parquet

References

Summary

8: SPARK SQL

The Spark SQL architecture

Spark SQL how-to in a nutshell

Spark SQL programming

References

Summary

9: FOUNDATIONS OF DATASETS/DATAFRAMES – THE PROVERBIAL WORKHORSE FOR DATASCIENTISTS

Datasets - a quick introduction

Dataset APIs - an overview

Dataset interfaces and functions

References

Summary

10: SPARK WITH BIG DATA

Parquet - an efficient and interoperable big data format

HBase

Reference

Summary

11: MACHINE LEARNING WITH SPARK ML PIPELINES

Spark's machine learning algorithm table

Spark machine learning APIs - ML pipelines and MLlib

ML pipelines

Spark ML examples

The API organization

Basic statistics

Linear regression

Classification

Clustering

Recommendation

Hyper parameters

The final thing

References

Summary

12: GRAPHX

Graphs and graph processing - an introduction

Spark GraphX

GraphX - computational model

The first example - graph

Building graphs

The GraphX API landscape

Structural APIs

Community, affiliation, and strengths

Algorithms

Partition strategy

Case study - AlphaGo tweets analytics

References

Summary

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏2 回帖

关键词：Processing processI Edition Process editio Spark Big Data Data Analysis Data Science

[New Book, 2017] Fast Data Processing with Spark 2 - Third Edition: [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本版微信群

[New Book, 2017] Fast Data Processing with Spark 2 - Third Edition: [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群