by Thomas Mailund (Author)
About the Author
Thomas Mailund is an associate professor at Aarhus University, Denmark. He has a background in math and computer science. For the last decade, his main focus has been on genetics and evolutionary studies, particularly comparative genomics, speciation, and gene flow between emerging species. He has published Beginning Data Science in R, Functional Programming in R, and Metaprogramming in R with Apress as well as other books.
About this book
In this handy, practical book you will cover each concept concisely, with many examples. You'll be introduced to several R data science packages, with examples of how to use each of them.
In this book, you’ll learn about the following APIs and packages that deal specifically with data science applications: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more.
After using this handy quick reference guide, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis.
What You Will Learn
- Import data with readr
- Work with categories using forcats, time and dates with lubridate, and strings with stringr
- Format data using tidyr and then transform that data using magrittr and dplyr
- Write functions with R for data science, data mining, and analytics-based applications
- Visualize data with ggplot2 and fit data to models using modelr
Who This Book Is For
Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.
Brief contents
Chapter 1: Introduction 1
Chapter 2: Importing Data: readr 5
Functions for Reading Data .6
File Headers 8
Column Types 11
String-based Column Type Specification .12
Function-based Column Type Specification 18
Parsing Time and Dates 22
Space-separated Columns 28
Functions for Writing Data 31
Chapter 3: Representing Tables: tibble 33
Creating Tibbles 33
Indexing Tibbles 38
Chapter 4: Reformatting Tables: tidyr 45
Tidy Data .45
Gather and Spread 46
Complex Column Encodings 51
Expanding, Crossing, and Completing .57
Missing Values 61
Nesting Data .66
Chapter 5: Pipelines: magrittr 71
The Problem with Pipelines 71
Pipeline Notation .74
Pipelines and Function Arguments .75
Function Composition .78
Other Pipe Operations .79
Chapter 6: Functional Programming: purrr .83
General Features of purrr Functions .84
Filtering .84
Mapping 86
Reduce and Accumulate .97
Partial Evaluation and Function Composition 101
Lambda Expressions .104
Chapter 7: Manipulating Data Frames: dplyr 109
Selecting Columns 109
Filter 117
Sorting 125
Modifying Data Frames .127
Grouping and Summarizing 133
Joining Tables .146
Income in Fictional Countries .155
Chapter 8: Working with Strings: stringr .161
Counting String Patterns .161
Splitting Strings 164
Capitalizing Strings .166
Wrapping, Padding, and Trimming 166
Detecting Substrings .171
Extracting Substrings 174
Transforming Strings 174
Chapter 9: Working with Factors: forcats 181
Creating Factors 181
Concatenation .183
Projection 186
Adding Levels 190
Reorder Levels 191
Chapter 10: Working with Dates: lubridate .195
Time Points .195
Time Zones 197
Time Intervals .199
Chapter 11: Working with Models: broom and modelr .205
broom .205
modelr .208
Chapter 12: Plotting: ggplot2 219
The Basic Plotting Components in ggplot2 .219
Adding Components to Plot Objects 221
Adding Data .223
Adding Aesthetics 223
Adding Geometries 224
Facets 232
Adding Coordinates .236
Chapter 13: Conclusions .239
Index .241
Pages: 246 pages
Publisher: Apress; 1st ed. edition (September 30, 2019)
Language: English
ISBN-10: 1484248937
ISBN-13: 978-1484248935
PDF version
EPUB version