楼主: neuroexplorer
2013 4

[New Book, 2017] Fast Data Processing with Spark 2 - Third Edition: [推广有奖]

  • 5关注
  • 23粉丝

学科带头人

79%

还不是VIP/贵宾

-

威望
0
论坛币
29082 个
通用积分
844.3645
学术水平
53 点
热心指数
70 点
信用等级
58 点
经验
176572 点
帖子
3222
精华
0
在线时间
1395 小时
注册时间
2013-7-21
最后登录
2024-4-22

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Book Description

When people want a way to process Big Data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it’s unsurprising that it’s becoming popular with data analysts and engineers everywhere.

Beginning with the fundamentals, we’ll show you how to get set up with Spark with minimum fuss. You’ll then get to grips with some simple APIs before investigating machine learning and graph processing – throughout we’ll make sure you know exactly how to apply your knowledge.

You will also learn how to use the Spark shell, how to load data before finding out how to build and run your own Spark applications. Discover how to manipulate your RDD and get stuck into a range of DataFrame APIs. As if that’s not enough, you’ll also learn some useful Machine Learning algorithms with the help of Spark MLlib and integrating Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you learn more about the GraphX API



https://bbs.pinggu.org/thread-5060262-1-1.html


What You Will Learn

  • Install and set up Spark in your cluster
  • Prototype distributed applications with Spark's interactive shell
  • Perform data wrangling using the new DataFrame APIs
  • Get to know the different ways to interact with Spark's distributed representation of data (RDDs)
  • Query Spark with a SQL-like query syntax
  • See how Spark works with Big Data
  • Implement machine learning systems with highly scalable algorithms
  • Use R, the popular statistical language, to work with Spark
  • Apply interesting graph algorithms and graph processing with GraphX

Authors :Krishna Sankar

Krishna Sankar is a Senior Specialist—AI Data Scientist with Volvo Cars focusing on Autonomous Vehicles. His earlier stints include Chief Data Scientist at http://cadenttech.tv/, Principal Architect/Data Scientist at Tata America Intl. Corp., Director of Data Science at a bioinformatics startup, and as a Distinguished Engineer at Cisco. He has been speaking at various conferences including ML tutorials at Strata SJC and London 2016, Spark Summit [goo.gl/ab30lD], Strata-Spark Camp, OSCON, PyCon, and PyData, writes about Robots Rules of Order [goo.gl/5yyRv6], Big Data Analytics—Best of the Worst [goo.gl/ImWCaz], predicting NFL, Spark [http://goo.gl/E4kqMD], Data Science [http://goo.gl/9pyJMH], Machine Learning [http://goo.gl/SXF53n], Social Media Analysis [http://goo.gl/D9YpVQ] as well as has been a guest lecturer at the Naval Postgraduate School. His occasional blogs can be found at https://doubleclix.wordpress.com/. His other passion is flying drones (working towards Drone Pilot License (FAA UAS Pilot) and Lego Robotics—you will find him at the St.Louis FLL World Competition as Robots Design Judge.



Table of Contents1: INSTALLING SPARK AND SETTING UP YOUR CLUSTER
Directory organization and convention


Installing the prebuilt distribution


Building Spark from source


Spark topology


A single machine


Running Spark on EC2


Deploying Spark with Chef (Opscode)


Deploying Spark on Mesos


Spark on YARN


Spark standalone mode


References


Summary



2: USING THE SPARK SHELL

The Spark shell


Loading a simple text file


Interactively loading data from S3


Summary



3: BUILDING AND RUNNING A SPARK APPLICATION

Building Spark applications


Data wrangling with iPython


Developing Spark with Eclipse


Developing Spark with other IDEs


Building your Spark job with Maven


Building your Spark job with something else


References


Summary



4: CREATING A SPARKSESSION OBJECT

SparkSession versus SparkContext


Building a SparkSession object


SparkContext - metadata


Shared Java and Scala APIs


Python


iPython


Reference


Summary



5: LOADING AND SAVING DATA IN SPARK

Spark abstractions


Data modalities


Data modalities and Datasets/DataFrames/RDDs


Loading data into an RDD


Saving your data


References


Summary



6: MANIPULATING YOUR RDD

Manipulating your RDD in Scala and Java


Manipulating your RDD in Python


References


Summary



7: SPARK 2.0 CONCEPTS

Code and Datasets for the rest of the book


The data scientist and Spark features


Spark v2.0 and beyond


Apache Spark - evolution


Apache Spark - the full stack


The art of a big data store - Parquet


References


Summary



8: SPARK SQL

The Spark SQL architecture


Spark SQL how-to in a nutshell


Spark SQL programming


References


Summary



9: FOUNDATIONS OF DATASETS/DATAFRAMES – THE PROVERBIAL WORKHORSE FOR DATASCIENTISTS

Datasets - a quick introduction


Dataset APIs - an overview


Dataset interfaces and functions


References


Summary



10: SPARK WITH BIG DATA

Parquet - an efficient and interoperable big data format


HBase


Reference


Summary



11: MACHINE LEARNING WITH SPARK ML PIPELINES

Spark's machine learning algorithm table


Spark machine learning APIs - ML pipelines and MLlib


ML pipelines


Spark ML examples


The API organization


Basic statistics


Linear regression


Classification


Clustering


Recommendation


Hyper parameters


The final thing


References


Summary



12: GRAPHX

Graphs and graph processing - an introduction


Spark GraphX


GraphX - computational model


The first example - graph


Building graphs


The GraphX API landscape


Structural APIs


Community, affiliation, and strengths


Algorithms


Partition strategy


Case study - AlphaGo tweets analytics


References


Summary





二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Processing processI Edition Process editio Spark Big Data Data Analysis Data Science

沙发
西门高 发表于 2017-1-23 09:11:17 |只看作者 |坛友微信交流群
谢谢分享

使用道具

藤椅
willwinn 发表于 2017-1-23 14:37:09 |只看作者 |坛友微信交流群
在哪里下载呢

使用道具

板凳
西门高 发表于 2017-1-25 10:04:00 |只看作者 |坛友微信交流群
谢谢分享

使用道具

报纸
neuroexplorer 发表于 2017-1-25 10:15:15 |只看作者 |坛友微信交流群
willwinn 发表于 2017-1-23 14:37
在哪里下载呢
https://bbs.pinggu.org/thread-5060262-1-1.html

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 07:23