楼主: ReneeBK
1704 12

[GitHub]Agile Data Science 2.0 [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8704
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

  1. Agile Data Science 2.0
  2. Building Full-Stack Data Analytics Applications with Spark
  3. By Russell Jurney
  4. Publisher: O'Reilly Media
  5. Release Date: June 2017
  6. Pages: 352
  7. Read on Safari with a 10-day trial
  8. Start your free trial
  9. Buy on Amazon
  10. Where’s the cart? Now you can get everything on Safari. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
  11. 1-800-889-8969 / 707-827-7019
  12. support@oreilly.com
  13. Download Example Code
  14. View/Submit Errata
  15. Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.
  16. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.
  17. Build value from your data in a series of agile sprints, using the data-value pyramid
  18. Extract features for statistical models from a single dataset
  19. Visualize data with charts, and expose different aspects through interactive reports
  20. Use historical data to predict the future via classification and regression
  21. Translate predictions into actions
  22. Get feedback from users after each sprint to keep your project on track
复制代码

本帖隐藏的内容

Agile Data Science 2.0-master.zip (2.52 MB, 需要: 1 个论坛币)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Science GitHub Agile Data

沙发
ReneeBK 发表于 2017-8-22 22:00:25 |只看作者 |坛友微信交流群
  1. # Load the text file using the SparkContext
  2. csv_lines = sc.textFile("../data/example.csv")

  3. # Map the data to split the lines into a list
  4. data = csv_lines.map(lambda line: line.split(","))

  5. # Collect the dataset into local RAM
  6. data.collect()
复制代码

使用道具

藤椅
ReneeBK 发表于 2017-8-22 22:01:05 |只看作者 |坛友微信交流群
  1. Creating Objects from CSV
  2. Using a function with a map operation to create objects (dicts) as records...
  3. In [3]:
  4. # Turn the CSV lines into objects
  5. def csv_to_record(line):
  6.   parts = line.split(",")
  7.   record = {
  8.     "name": parts[0],
  9.     "company": parts[1],
  10.     "title": parts[2]
  11.   }
  12.   return record

  13. # Apply the function to every record
  14. records = csv_lines.map(csv_to_record)

  15. # Inspect the first item in the dataset
  16. records.first()
复制代码

使用道具

板凳
ReneeBK 发表于 2017-8-22 22:01:34 |只看作者 |坛友微信交流群
  1. GroupBy
  2. Using the groupBy operator to count the number of jobs per person...
  3. In [4]:
  4. # Group the records by the name of the person
  5. grouped_records = records.groupBy(lambda x: x["name"])

  6. # Show the first group
  7. grouped_records.first()

  8. # Count the groups
  9. job_counts = grouped_records.map(
  10.   lambda x: {
  11.     "name": x[0],
  12.     "job_count": len(x[1])
  13.   }
  14. )

  15. job_counts.first()

  16. job_counts.collect()
复制代码

使用道具

报纸
军旗飞扬 发表于 2017-8-22 22:04:06 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

地板
duoduoduo 在职认证  发表于 2017-8-22 22:06:18 |只看作者 |坛友微信交流群
好书啊
真实的好书

使用道具

7
MouJack007 发表于 2017-8-22 23:48:14 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

8
MouJack007 发表于 2017-8-22 23:48:51 |只看作者 |坛友微信交流群

使用道具

9
clb_polaris 发表于 2017-8-23 08:22:15 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

10
钱学森64 发表于 2017-8-23 08:27:27 |只看作者 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-2 01:17