The R procedures and datasets provided here correspond to many of the examples discussed in R.K. Pearson, Exploring Data in Engineering, the Sciences, and Medicine. The R procedures are provided as text files (.txt) that may be copied and pasted into an interactive R session, and the datasets are provided as comma-separated value (.csv) files. These files are easily read in R via the read.csv command, or they may be examined by opening them in Microsoft Excel. Note that the R procedures described here are built on commands available in base R and the add-on packages designated as recommended, and do not require any other add-on packages. These commands were implemented in R version 2.11.1, installed as binary files in a Microsoft Windows environment. Note that versions of a number of these datasets are available as built-in datasets in a variety of R packages (e.g., the von Bortkewitsch horsekick deaths data is available in the R add-on package vcd as the dataset VonBort). In addition, three of these datasets (federalist.csv, horsekick.csv, and bitterpit.csv) were constructed from datasets described in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985) and available from the following website: http://lib.stat.cmu.edu/datasets/Andrews/ Similarly, the datasets mushroom.csv and pima.csv were constructed from datasets available from the UCI Machine Learning Repository (Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science). Chapter 1 – The Art of Analyzing Data
- The Ohm's law dataset (ohmdata.csv)
- The first industrial pressure dataset (pressure1.csv)
- The second industrial pressure dataset (pressure2.csv)
- The third industrial pressure dataset (pressure3.csv)
- The fourth industrial pressure dataset (pressure4.csv)
- R code to generate Fig. 1.8 boxplot (ch1fig8proc.txt)
- The physical property dataset (physprop.csv)
- R code to generate Fig. 1.9 with lowess smoother (ch1fig9proc.txt)
- The brain/body weight dataset (brainbody.csv)
Chapter 2 – Data: Types, Uncertainty, and Quality
- The UCI mushroom dataset (mushroom.csv)
- The UCI Pima Indians diabetes dataset (pima.csv)
- The helicopter dataset (helicopter.csv)
- The makeup flow rate dataset (makeup.csv)
Chapter 3 – Characterizing Categorical Variables
- The Federalist Papers dataset (federalist.csv) was generated from Table 4.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).
- The von Bortkewitsch horsekick deaths data (horsekick.csv) was generated from Table 69.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).
- R procedure for the normalized Shannon measure (shannonproc.txt)
- R procedure for the normalized Simpson measure (simpsonproc.txt)
- R procedure for the normalized Gini measure (giniproc.txt)
- R procedure for the normalized Bray measure (brayproc.txt)
- R procedure to generate Fig. 3.3 (ch3fig3proc.txt)
- R procedure to generate Fig. 3.4 (ch3fig4proc.txt)
Chapter 4 – Uncertainty in Real Variables
Chapter 5 – Fitting Straight Lines
- R procedure to generate Fig. 5.1 (ch5fig1proc.txt)
Chapter 6 – A Brief Introduction to Estimation Theory
- R procedure to generate Fig. 6.7 (ch6fig7proc.txt)
- R procedure to generate Fig. 7.8 (ch6fig8proc.txt)
Chapter 7 – Outliers: Distributional Monsters (?) That Lurk in Data
- R code for the 3 sigma edit rule (threesigmaproc.txt)
- R code for the Hampel outlier identifier (hampelproc.txt)
- R code for moment-based skewness measure (skewnessproc.txt)
- R code for Galton's skewness measure (galtonskewproc.txt)
- R code for Hotelling's skewness measure (hotellingskewproc.txt)
Chapter 8 – Characterizing a Dataset
- R code to generate Poissonness plots (poissonnessplot.txt)
- The Old Faithful geyser dataset is faithful in base R (datasets package)
- R code to generate data comparison plots (dataqqplotproc.txt)
- R code to generate negative binomialness plots (negbinessplot.txt)
Chapter 9 – Confidence Intervals and Hypothesis Testing
- R code for modified Poissonness plots (modpoissonnessplot.txt)
- R code for binomial confidence intervals (binomCIproc.txt)
- R code for Beal's Method (BealsMethodproc.txt)
Chapter 10 – Relations among Variables
- The World Almanac election dataset (elections.csv)
Chapter 11 – Regression Models I: Real Data
Chapter 12 – Reexpression: Data Transformations
- R code for Box-Cox transformations
- R code for Aranda-Ordaz transformations
- R code for angular transformations
Chapter – 13: Regression Models II: Mixed Data Types
- R code for odds ratio characterizations
- The apple tree/bitter pit dataset was generated from Table 59.1 in the book Data by D.F. Andrews and A.M. Herzberg (Springer-Verlag, New York, 1985).
Chapter 14 – Characterizing Analysis Results
Chapter 15 – Regression Models III: Diagnostics and Refinements
Chapter 16 – Dealing with Missing Data


雷达卡





京公网安备 11010802022788号







