|
第07天
1.主题:正则表达式
《正则表达式必知必会》——Ben Forta著
Python for Data Analysis 2nd Edition
2.摘要
正则表达式的两种用途:查找和替换;
对比:
array0将匹配array0
array[0]将匹配array0
array\[0\]将匹配array[0]
\d匹配0-9的数字,等价于[0-9]
\D匹配0-9之外的字符,等价于^[0-9]
\s匹配空白字符,等价于[\f\n\r\t\v]
\S匹配非空白字符,等价于[^\f\n\r\t\v]
\w匹配数字字母或下划线,等价于[a-zA-Z0-9_]
\W匹配非数字字母或非下划线,等价于[^a-zA-Z0-9_]
Preparation
Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis
%quickref or %magic. 在jupyter notebook中查询快捷键或魔法键
Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectoriza‐ tion.
While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by con‐ trast, is best suited for working with homogeneous numerical array data.
pandas slicing with labels behaves differently than normal Python slicing in that the end‐
point is inclusive:
In [125]: obj['b':'c']
Out[125]:
b 1.0
c 2.0
The re module functions fall into three categories: pattern matching, substitution, and splitting. Naturally these are all related; a regex describes a pattern to locate in the text, which can then be used for many purposes.
3.心得感悟
快速快速学习数据处理预备知识,飞速达到核心知识点:时间序列;
此后暂停,消化一下~~
4.时间统计
昨日阅读5小时,累计280小时
|