楼主: igs816
2435 12

[书籍介绍] Big Data Analysis with Python [推广有奖]

泰斗

5%

还不是VIP/贵宾

-

威望
9
论坛币
2694397 个
通用积分
18515.2377
学术水平
2743 点
热心指数
3466 点
信用等级
2559 点
经验
484572 点
帖子
5413
精华
52
在线时间
3586 小时
注册时间
2007-8-6
最后登录
2024-4-19

高级学术勋章 特级学术勋章 高级信用勋章 特级信用勋章 高级热心勋章 特级热心勋章

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
7079b0b10096439936e51335ee09b0ef.jpg
Packt | 2019 | ISBN: 978-1-78995-528-6 | 274 Pages | EPUB
Big Data Analysis with Python.epub (14.02 MB, 需要: 10 个论坛币)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Big data Analysis Analysi Analys alysis

已有 2 人评分经验 学术水平 热心指数 信用等级 收起 理由
Nicolle + 100 + 1 + 1 + 1 精彩帖子
yunnandlg + 1 + 5 精彩帖子

总评分: 经验 + 100  学术水平 + 2  热心指数 + 6  信用等级 + 1   查看全部评分

本帖被以下文库推荐

沙发
zhou_yl 发表于 2019-6-2 17:04:16 来自手机 |只看作者 |坛友微信交流群
谢谢分享

使用道具

藤椅
phipe 发表于 2019-6-3 00:44:34 |只看作者 |坛友微信交流群
谢谢分享

使用道具

板凳
yunnandlg 在职认证  学生认证  发表于 2019-6-3 17:26:54 |只看作者 |坛友微信交流群
谢谢分享

使用道具

报纸
heiyaodai 发表于 2019-6-3 22:57:28 |只看作者 |坛友微信交流群
谢谢分享

使用道具

地板
cometwx 发表于 2019-6-9 21:10:49 |只看作者 |坛友微信交流群
感谢分享

使用道具

7
cometwx 发表于 2019-6-10 06:24:09 |只看作者 |坛友微信交流群
感谢分享

使用道具

8
jydcb003 学生认证  发表于 2019-6-10 07:54:31 来自手机 |只看作者 |坛友微信交流群
感谢分享,如果有目录就更好了

使用道具

9
WFMZZ 发表于 2019-6-20 14:47:37 |只看作者 |坛友微信交流群
Table of Contents
Preface
Chapter 1: The Python Data Science Stack
Introduction
Python Libraries and Packages
IPython: A Powerful Interactive Shell
Exercise 1: Interacting with the Python Shell Using the IPython Commands
The Jupyter Notebook
Exercise 2: Getting Started with the Jupyter Notebook
IPython or Jupyter?
Activity 1: IPython and Jupyter
NumPy
SciPy
Matplotlib
Pandas
Using Pandas
Reading Data
Exercise 3: Reading Data with Pandas
Data Manipulation
Selection and Filtering
Selecting Rows Using Slicing
Exercise 4: Data Selection and the .loc Method
Applying a Function to a Column?
Activity 2: Working with Data Problems
Data Type Conversion
Exercise 5: Exploring Data Types
Aggregation and Grouping
Exercise 6: Aggregation and Grouping Data
NumPy on Pandas
Exporting Data from Pandas
Exercise 7: Exporting Data in Different Formats
Visualization with Pandas
Activity 3: Plotting Data with Pandas
Summary
Chapter 2: Statistical Visualizations
Introduction
Types of Graphs and When to Use Them
Exercise 8: Plotting an Analytical Function
Components of a Graph
Exercise 9: Creating a Graph
Exercise 10: Creating a Graph for a Mathematical Function
Seaborn
Which Tool Should Be Used?
Types of Graphs
Line Graphs
Time Series Plots
Exercise 11: Creating Line Graphs Using Different Libraries
Pandas DataFrames and Grouped Data
Activity 4: Line Graphs with the Object-Oriented API and Pandas DataFrames
Scatter Plots
Activity 5: Understanding Relationships of Variables Using Scatter Plots
Histograms
Exercise 12: Creating a Histogram of Horsepower Distribution
Boxplots
Exercise 13: Analyzing the Behavior of the Number of Cylinders and Horsepower Using a Boxplot
Changing Plot Design: Modifying Graph Components
Title and Label Configuration for Axis Objects
Exercise 14: Configuring a Title and Labels for Axis Objects
Line Styles and Color
Figure Size
Exercise 15: Working with Matplotlib Style Sheets
Exporting Graphs
Activity 6: Exporting a Graph to a File on Disk
Activity 7: Complete Plot Design
Summary
Chapter 3: Working with Big Data Frameworks
Introduction
Hadoop
Manipulating Data with the HDFS
Exercise 16: Manipulating Files in the HDFS
Spark
Spark SQL and Pandas DataFrames
Exercise 17: Performing DataFrame Operations in Spark
Exercise 18: Accessing Data with Spark
Exercise 19: Reading Data from the Local Filesystem and the HDFS
Exercise 20: Writing Data Back to the HDFS and PostgreSQL
Writing Parquet Files
Exercise 21: Writing Parquet Files
Increasing Analysis Performance with Parquet and Partitions
Exercise 22: Creating a Partitioned Dataset
Handling Unstructured Data
Exercise 23: Parsing Text and Cleaning
Activity 8: Removing Stop Words from Text
Summary
Chapter 4: Diving Deeper with Spark
Introduction
Getting Started with Spark DataFrames
Exercise 24: Specifying the Schema of a DataFrame
Exercise 25: Creating a DataFrame from an Existing RDD
Exercise 25: Creating a DataFrame Using a CSV File
Writing Output from Spark DataFrames
Exercise 27: Converting a Spark DataFrame to a Pandas DataFrame
Exploring Spark DataFrames
Exercise 28: Displaying Basic DataFrame Statistics
Activity 9: Getting Started with Spark DataFrames
Data Manipulation with Spark DataFrames
Exercise 29: Selecting and Renaming Columns from the DataFrame
Exercise 30: Adding and Removing a Column from the DataFrame
Exercise 31: Displaying and Counting Distinct Values in a DataFrame
Exercise 32: Removing Duplicate Rows and Filtering Rows of a DataFrame
Exercise 33: Ordering Rows in a DataFrame
Exercise 34: Aggregating Values in a DataFrame
Activity 10: Data Manipulation with Spark DataFrames
Graphs in Spark
Exercise 35: Creating a Bar Chart
Exercise 36: Creating a Linear Model Plot
Exercise 37: Creating a KDE Plot and a Boxplot
Activity 11: Graphs in Spark
Summary
Chapter 5: Handling Missing Values and Correlation Analysis
Introduction
Setting up the Jupyter Notebook
Missing Values
Exercise 38: Counting Missing Values in a DataFrame
Exercise 39: Counting Missing Values in All DataFrame Columns
Fetching Missing Value Records from the DataFrame
Handling Missing Values in Spark DataFrames
Exercise 40: Removing Records with Missing Values from a DataFrame
Exercise 41: Filling Missing Values with a Constant in a DataFrame Column
Correlation
Exercise 42: Computing Correlation
Activity 12: Missing Value Handling and Correlation Analysis with PySpark DataFrames
Summary
Chapter 6: Exploratory Data Analysis
Introduction
Defining a Business Problem
Problem Identification
Requirement Gathering
Data Pipeline and Workflow
Identifying Measurable Metrics
Documentation and Presentation
Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)
Data Gathering
Analysis of Data Generation
KPI Visualization
Feature Importance
Exercise 43: Identify the Target Variable and Related KPIs from the Given Data for the Business Problem
Exercise 44: Generate the Feature Importance of the Target Variable and Carry Out EDA
Structured Approach to the Data Science Project Life Cycle
Data Science Project Life Cycle Phases
Phase 1: Understanding and Defining the Business Problem
Phase 2: Data Access and Discovery
Phase 3: Data Engineering and Pre-processing
Activity 13: Carry Out Mapping to Gaussian Distribution of Numeric Features from the Given Data
Phase 4: Model Development
Summary
Chapter 7: Reproducibility in Big Data Analysis
Introduction
Reproducibility with Jupyter Notebooks
Introduction to the Business Problem
Documenting the Approach and Workflows
Explaining the Data Pipeline
Explain the Dependencies
Using Source Code Version Control
Modularizing the Process
Gathering Data in a Reproducible Way
Functionalities in Markdown and Code Cells
Explaining the Business Problem in the Markdown
Providing a Detailed Introduction to the Data Source
Explain the Data Attributes in the Markdown
Exercise 45: Performing Data Reproducibility
Code Practices and Standards
Environment Documentation
Writing Readable Code with Comments
Effective Segmentation of Workflows
Workflow Documentation
Exercise 46: Missing Value Preprocessing with High Reproducibility
Avoiding Repetition
Using Functions and Loops for Optimizing Code
Developing Libraries/Packages for Code/Algorithm Reuse
Activity 14: Carry normalisation of data
Summary
Chapter 8: Creating a Full Analysis Report
Introduction
Reading Data in Spark from Different Data Sources
Exercise 47: Reading Data from a CSV File Using the PySpark Object
Reading JSON Data Using the PySpark Object
SQL Operations on a Spark DataFrame
Exercise 48: Reading Data in PySpark and Carrying Out SQL Operations
Exercise 49: Creating and Merging Two DataFrames
Exercise 50: Subsetting the DataFrame
Generating Statistical Measurements
Activity 15: Generating Visualization Using Plotly
Summary
Appendix

使用道具

10
jonck 发表于 2019-6-22 15:55:56 |只看作者 |坛友微信交流群
many thanks

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-20 03:51