楼主: kukenghuqian
1777 3

Mastering Pandas [推广有奖]

  • 5关注
  • 31粉丝

人间农夫

院士

12%

还不是VIP/贵宾

-

威望
0
论坛币
132199 个
通用积分
303.5364
学术水平
143 点
热心指数
172 点
信用等级
117 点
经验
55129 点
帖子
1377
精华
0
在线时间
3000 小时
注册时间
2012-9-27
最后登录
2024-4-27

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Chapter 1: Introduction to pandas and Data Analysis 1
Motivation for data analysis 1
We live in a big data world 1
4 V's of big data 2
Volume of big data 2
Velocity of big data 3
Variety of big data 3
Veracity of big data 4
So much data, so little time for analysis 4
The move towards real-time analytics 5
How Python and pandas fit into the data analytics mix 5
What is pandas? 6
Benefits of using pandas 7
Summary 10
Chapter 2: Installation of pandas and the Supporting Software 11
Selecting a version of Python to use 11
Python installation 12
Linux 12
Installing Python from compressed tarball 13
Windows 14
Core Python installation 14
Third-party Python software installation 15
Mac OS X 15
Installation using a package manager 16
Installation of Python and pandas from a third-party vendor 16
Continuum Analytics Anaconda 17
Installing Anaconda 17
Linux 17
Mac OS X 18
www.allitebooks.com
Table of Contents
[ ii ]
Windows 18
Final step for all platforms 18
Other numeric or analytics-focused Python distributions 19
Downloading and installing pandas 19
Linux 20
Ubuntu/Debian 21
Red Hat 21
Ubuntu/Debian 21
Fedora 21
OpenSuse 21
Mac 21
Source installation 22
Binary installation 22
Windows 22
Binary Installation 22
Source installation 23
IPython 24
IPython Notebook 24
IPython installation 26
Linux 26
Windows 26
Mac OS X 26
Install via Anaconda (for Linux/Mac OS X) 27
Wakari by Continuum Analytics 27
Virtualenv 27
Virtualenv installation and usage 27
Summary 28
Chapter 3: The pandas Data Structures 29
NumPy ndarrays 29
NumPy array creation 30
NumPy arrays via numpy.array 30
NumPy array via numpy.arange 30
NumPy array via numpy.linspace 31
NumPy array via various other functions 31
NumPy datatypes 33
NumPy indexing and slicing 34
Array slicing 36
Array masking 38
Complex indexing 39
Copies and views 40
Operations 40
Basic operations 41
Reduction operations 44
Statistical operators 45
Logical operators 45
Table of Contents
[ iii ]
Broadcasting 46
Array shape manipulation 47
Flattening a multidimensional array 47
Reshaping 47
Resizing 48
Adding a dimension 49
Array sorting 49
Data structures in pandas 50
Series 50
Series creation 50
Operations on Series 53
DataFrame 56
DataFrame Creation 57
Operations 62
Panel 65
Using 3D NumPy array with axis labels 65
Using a Python dictionary of DataFrame objects 66
Using the DataFrame.to_panel method 67
Other operations 68
Summary 68
Chapter 4: Operations in pandas, Part I – Indexing and
Selecting 69
Basic indexing 69
Accessing attributes using dot operator 71
Range slicing 73
Label, integer, and mixed indexing 75
Label-oriented indexing 75
Selection using a Boolean array 78
Integer-oriented indexing 79
The .iat and .at operators 81
Mixed indexing with the .ix operator 81
MultiIndexing 85
Swapping and reordering levels 89
Cross sections 90
Boolean indexing 91
The is in and any all methods 92
Using the where() method 95
Operations on indexes 97
Summary 98
Table of Contents
[ iv ]
Chapter 5: Operations in pandas, Part II – Grouping, Merging,
and Reshaping of Data 99
Grouping of data 99
The groupby operation 99
Using groupby with a MultiIndex 108
Using the aggregate method 111
Applying multiple functions 111
The transform() method 112
Filtering 114
Merging and joining 114
The concat function 115
Using append 118
Appending a single row to a DataFrame 120
SQL-like merging/joining of DataFrame objects 120
The join function 124
Pivots and reshaping data 125
Stacking and unstacking 127
The stack() function 128
Other methods to reshape DataFrames 131
Using the melt function 131
Summary 133
Chapter 6: Missing Data, Time Series, and Plotting
Using Matplotlib 135
Handling missing data 135
Handling missing values 141
Handling time series 143
Reading in time series data 144
DateOffset and TimeDelta objects 145
Time series-related instance methods 146
Shifting/lagging 147
Frequency conversion 147
Resampling of data 149
Aliases for Time Series frequencies 154
Time series concepts and datatypes 155
Period and PeriodIndex 155
Conversions between Time Series datatypes 157
A summary of Time Series-related objects 158
Plotting using matplotlib 158
Summary 161
Chapter 7: A Tour of Statistics – The Classical Approach 163
Descriptive statistics versus inferential statistics 164
Measures of central tendency and variability 164
Table of Contents
[ v ]
Measures of central tendency 164
The mean 164
The median 165
The mode 165
Computing measures of central tendency of a dataset in Python 166
Measures of variability, dispersion, or spread 170
Range 171
Quartile 171
Deviation and variance 173
Hypothesis testing – the null and alternative hypotheses 174
The null and alternative hypotheses 175
The alpha and p-values 176
Type I and Type II errors 177
Statistical hypothesis tests 177
Background 177
The z-test 178
The t-test 182
A t-test example 185
Confidence intervals 188
An illustrative example 189
Correlation and linear regression 190
Correlation 190
Linear regression 191
An illustrative example 192
Summary 195
Chapter 8: A Brief Tour of Bayesian Statistics 197
Introduction to Bayesian statistics 197
Mathematical framework for Bayesian statistics 199
Bayes theory and odds 202
Applications of Bayesian statistics 202
Probability distributions 203
Fitting a distribution 203
Discrete probability distributions 204
Discrete uniform distributions 204
Continuous probability distributions 213
Bayesian statistics versus Frequentist statistics 221
What is probability? 221
How the model is defined 221
Confidence (Frequentist) versus Credible (Bayesian) intervals 222
Conducting Bayesian statistical analysis 222
Monte Carlo estimation of the likelihood function and PyMC 223
Bayesian analysis example – Switchpoint detection 224
References 237
Summary 238
Table of Contents
[ vi ]
Chapter 9: The pandas Library Architecture 239
Introduction to pandas' file hierarchy 239
Description of pandas' modules and files 240
pandas/core 240
pandas/io 243
pandas/tools 246
pandas/sparse 247
pandas/stats 247
pandas/util 248
pandas/rpy 249
pandas/tests 249
pandas/compat 250
pandas/computation 250
pandas/tseries 251
pandas/sandbox 253
Improving performance using Python extensions 253
Summary 256
Chapter 10: R and pandas Compared 257
R data types 257
R lists 258
R DataFrames 259
Slicing and selection 261
R-matrix and NumPy array compared 261
R lists and pandas series compared 262
Specifying column name in R 264
Specifying column name in pandas 264
R's DataFrames versus pandas' DataFrames 265
Multicolumn selection in R 265
Multicolumn selection in pandas 265
Arithmetic operations on columns 266
Aggregation and GroupBy 267
Aggregation in R 268
The pandas' GroupBy operator 270
Comparing matching operators in R and pandas 271
R %in% operator 271
The pandas isin() function 272
Logical subsetting 272
Logical subsetting in R 272
Logical subsetting in pandas 273
Split-apply-combine 273
Implementation in R 274
Table of Contents
[ vii ]
Implementation in pandas 275
Reshaping using melt 276
The R melt() function 277
The pandas melt() function 277
Factors/categorical data 278
An R example using cut() 278
The pandas solution 279
Summary 281
Chapter 11: Brief Tour of Machine Learning 283
Role of pandas in machine learning 284
Installation of scikit-learn 284
Installing via Anaconda 284
Installing on Unix (Linux/Mac OS X) 284
Installing on Windows 285
Introduction to machine learning 285
Supervised versus unsupervised learning 286
Illustration using document classification 286
Supervised learning 286
Unsupervised learning 286
How machine learning systems learn 287
Application of machine learning – Kaggle Titanic competition 287
The Titanic: machine learning from disaster problem 287
The problem of overfitting 288
Data analysis and preprocessing using pandas 289
Examining the data 289
Handling missing values 290
A naïve approach to Titanic problem 300
The scikit-learn ML/classifier interface 302
Supervised learning algorithms 305
Constructing a model using Patsy for scikit-learn 305
General boilerplate code explanation 306
Logistic regression 309
Support vector machine 311
Decision trees 313
Random forest 315
Unsupervised learning algorithms 316
Dimensionality reduction 316
K-means clustering 321
Summary

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Mastering Master pandas panda Aster

Mastering Pandas.rar

2.66 MB

需要: 4 个论坛币  [购买]

Mastering Pandas

本附件包括:

  • Mastering Pandas.pdf

本帖被以下文库推荐

沙发
Glorevo 发表于 2018-7-17 11:38:22 |只看作者 |坛友微信交流群
1.png
Learning pandas.pdf (8.59 MB, 需要: 5 个论坛币)

2.png
Mastering Pandas for Finance.pdf (6.73 MB, 需要: 5 个论坛币)

使用道具

藤椅
hifinecon 发表于 2018-7-20 06:36:42 |只看作者 |坛友微信交流群
thanks

使用道具

板凳
silver365 在职认证  发表于 2018-7-20 11:00:58 |只看作者 |坛友微信交流群
谢谢

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 01:36