楼主: ReneeBK
2245 12

[GitHub]Agile Data Science 2.0 [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.6937
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2017-8-22 21:59:35 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

  1. Agile Data Science 2.0
  2. Building Full-Stack Data Analytics Applications with Spark
  3. By Russell Jurney
  4. Publisher: O'Reilly Media
  5. Release Date: June 2017
  6. Pages: 352
  7. Read on Safari with a 10-day trial
  8. Start your free trial
  9. Buy on Amazon
  10. Where’s the cart? Now you can get everything on Safari. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
  11. 1-800-889-8969 / 707-827-7019
  12. support@oreilly.com
  13. Download Example Code
  14. View/Submit Errata
  15. Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.
  16. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.
  17. Build value from your data in a series of agile sprints, using the data-value pyramid
  18. Extract features for statistical models from a single dataset
  19. Visualize data with charts, and expose different aspects through interactive reports
  20. Use historical data to predict the future via classification and regression
  21. Translate predictions into actions
  22. Get feedback from users after each sprint to keep your project on track
复制代码

本帖隐藏的内容

Agile Data Science 2.0-master.zip (2.52 MB, 需要: 1 个论坛币)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Science GitHub Agile Data

沙发
ReneeBK(未真实交易用户) 发表于 2017-8-22 22:00:25
  1. # Load the text file using the SparkContext
  2. csv_lines = sc.textFile("../data/example.csv")

  3. # Map the data to split the lines into a list
  4. data = csv_lines.map(lambda line: line.split(","))

  5. # Collect the dataset into local RAM
  6. data.collect()
复制代码

藤椅
ReneeBK(未真实交易用户) 发表于 2017-8-22 22:01:05
  1. Creating Objects from CSV
  2. Using a function with a map operation to create objects (dicts) as records...
  3. In [3]:
  4. # Turn the CSV lines into objects
  5. def csv_to_record(line):
  6.   parts = line.split(",")
  7.   record = {
  8.     "name": parts[0],
  9.     "company": parts[1],
  10.     "title": parts[2]
  11.   }
  12.   return record

  13. # Apply the function to every record
  14. records = csv_lines.map(csv_to_record)

  15. # Inspect the first item in the dataset
  16. records.first()
复制代码

板凳
ReneeBK(未真实交易用户) 发表于 2017-8-22 22:01:34
  1. GroupBy
  2. Using the groupBy operator to count the number of jobs per person...
  3. In [4]:
  4. # Group the records by the name of the person
  5. grouped_records = records.groupBy(lambda x: x["name"])

  6. # Show the first group
  7. grouped_records.first()

  8. # Count the groups
  9. job_counts = grouped_records.map(
  10.   lambda x: {
  11.     "name": x[0],
  12.     "job_count": len(x[1])
  13.   }
  14. )

  15. job_counts.first()

  16. job_counts.collect()
复制代码

报纸
军旗飞扬(未真实交易用户) 发表于 2017-8-22 22:04:06
谢谢楼主分享!

地板
duoduoduo(真实交易用户) 在职认证  发表于 2017-8-22 22:06:18
好书啊
真实的好书

7
MouJack007(未真实交易用户) 发表于 2017-8-22 23:48:14
谢谢楼主分享!

8
MouJack007(未真实交易用户) 发表于 2017-8-22 23:48:51

9
clb_polaris(未真实交易用户) 发表于 2017-8-23 08:22:15
谢谢楼主分享!

10
钱学森64(未真实交易用户) 发表于 2017-8-23 08:27:27
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-27 03:26