想必很多同行这里都用R吧,data frame一定用得很爽。想当初用python先清理干净sql数据再能导入R,多麻烦啊。
现在读过这本书,才知道原来python里面可以用pandas这个library,结合matplotlib和numpy等library,再加上ipython的interactive shell,python已经可以初步达到R的功能。这本python for data analysis正好提供了入门级的指导。
Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing.
- Use the IPython interactive shell as your primary development environment
- Learn basic and advanced NumPy (Numerical Python) features
- Get started with data analysis tools in the pandas library
- Use high-performance tools to load, clean, transform, merge, and reshape data
- Create scatter plots and static or interactive visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Measure data by points in time, whether it’s specific instances, fixed periods, or intervals
- Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples