- Statistical Analysis and Data Display An Intermediate Course with Examples in S-PLUS, R, and SAS.pdf
PDF 高清晰文字版
1 Introduction and Motivation
1.1 Statistics in Context . .
1.2 Examples of Uses of Statistics ........ .
1.2.1 Investigation of Salary Discrimination
1.2.2 Measuring Body Fat ....
1.2.3 Minimizing Film Thickness ...... .
1.2.4 Surveys ................. .
1.2.5 Bringing Pharmaceutical Products to Market
1.3 The Rest of the Book .
1.3.1 Fundamentals ...
1.3.2 Linear Models . . .
1.3.3 Other Techniques.
1.3.4 New Graphical Display Techniques.
2 Data and Statistics
2.1 Types of Data ........ .
2.2 Data Display and Calculation
2.2.1 Presentation ..... .
2.2.2 Rounding
2.3 Importing Data.
2.3.1 S-Pws ..
2.3.2 SAS ...
2.3.3 Data Rearrangement .
2.4 Analysis with Missing Data. .
2.4.1 Missing Data in S-PWS .
2.4.2 Missing Data in SAS. . .
2.5 Tables and Graphs . . . . . . . . .
2.6 Files for Statistical Analysis and Data Display (HH)
2.6.1 Datasets ............ .
2.6.2 Code, Transcripts, and Figures
2.6.3 Functions and Macros
2.6.4 Software.............
3 Statistics Concepts 21
3.1 A Brief Introduction to Probability . . . . . . . . . . . . . . . 21
3.2 Random Variables and Probability Distributions ....... 22
3.2.1 Discrete Versus Continuous Probability Distributions 23
3.2.2 Displaying Probability Distributions . . . . . . . . 24
3.3 Concepts That Are Used When Discussing Distributions. 27
3.3.1 Expectation and Variance of Random Variables 27
3.3.2 Median of Random Variables. . . . . 28
3.3.3 Symmetric and Skewed Distributions .... . . 28
3.3.4 Displays of Univariate Data . . . . . . . . . . . . 30
3.3.5 Multivariate Distributions-Covariance and Correlation 34
3.4 Three Probability Distributions . . 37
3.4.1 The Binomial Distribution .. 37
3.4.2 The Normal Distribution ... 38
3.4.3 The (Student's) t Distribution 39
3.5 Sampling Distributions .. 40
3.6 Estimation................ 41
3.6.1 Statistical Models. . . . . . . . 41
3.6.2 Point and Interval Estimators. 42
3.6.3 Criteria for Point Estimators . 42
3.6.4 Confidence Interval Estimation . 43
3.6.5 Example-Confidence Interval on the Mean JL of a Population Having Known Standard Deviation .. 44
3.6.6 Example-One-Sided Confidence Intervals 44
3.7 Hypothesis Testing. . . . . . . . . . . . . . . . . . . 45
3.8 Examples of Statistical Tests . . . . . . . . . . . . . 47
3.9 Power and Operating Characteristic (O.C.) Curves 49
3.10 Sampling .............. 52
3.10.1 Simple Random Sampling. . . . . . . . . . 53
3.10.2 Stratified Random Sampling .
3.10.3 Cluster Random Sampling ...
3.10.4 Systematic Random Sampling
3.10.5 Standard Errors of Sample Means
3.10.6 Sources of Bias in Samples
3.11 Exercises .............. .
4 Graphs 63
4.1 Definition.............. 64
4.2 Example-Ecological Correlation 64
4.3 Scatterplots......... 65
4.4 Scatterplot Matrix . . . . . . . . . 67
4.5 Example-Life Expectancy. . . . 71
4.6 Scatterplot Matrices-Continued. 74
4.7 Data Transformations . . . . . . . 78
4.8 Life Expectancy Example-Continued 82
4.9 SAS Graphics. 85
4.10 Exercises ................. 87
5 Introductory Inference 91
5.1 Normal (z) Intervals and Tests. . . . . . . . . . . . . . . . . . . 91
5.1.1 Test of a Hypothesis Concerning the Mean of a Population Having
Known Standard Deviation . . . . . . . . . . . . . . . . . . . 92
5.1.2 Confidence Intervals for Unknown Population Proportion p ... 93
5.1.3 Tests on an Unknown Population Proportion p. . . . . . . . . . . 94
5.1.4 Example-One-Sided Hypothesis Test Concerning a Population Proportion. . 94
5.2 t-intervals and Tests for the Mean of a Population Having Unknown Standard Deviation . 95
5.3 Confidence Interval on the Variance or Standard Deviation of a Normal Population..... 96
5.4 Comparisons of Two Populations Based on Independent Samples. . . . . 97
5.4.1 Confidence Intervals on the Difference Between Two Population Proportions . . 98
5.4.2 Confidence Interval on the Difference of Between Two Means . . 98
5.4.3 Tests Comparing Two Population Means When the Samples Are Independent . 99
5.4.4 Comparing the Variances of Two Normal Populations. 100
5.5 Paired Data. . . . . . . . . . . . . . 101
5.6 Sample Size Determination . . . . . . . . . . 105
5.6.1 Sample Size for Estimation . . . . . 105
5.6.2 Sample Size for Hypothesis Testing 106
5.7 Goodness of Fit .............. 106
5.7.1 Chi-square Goodness-of-Fit Test . . 107
5.7.2 Example-Test of Goodness-of-Fit to a Discrete Uniform Distribution
5.7.3 Example-Test of Goodness-of-Fit to a Binomial Distribution.Normal Probability Plots and Quantile Plots.
5.8.1 Normal Probability Plots ...... .
5.8.2 Example-Comparing t-Distributions
5.9 Kolmogorov-Smirnov Goodness-of-Fit Tests
5.10 Maximum Likelihood ........... .
5.10.1 Maximum Likelihood Estimation.
5.10.2 Likelihood Ratio Tests .
Exercises ........ .
6 One-Way Analysis of Variance 123
6.1 Example-Catalyst Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 123
6.2 Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 127
6.3 Multiple Comparisons-Tukey Procedure for Comparing All Pairs of Means. 130
6.4 Random Effects ............. 135
6.5 Expected Mean Squares (EMS) .... 135
6.6 Example-Catalyst Data-Continued . 136
6.7 Example-Batch Data. 137
6.8 Example-Turkey Data 137
6.8.1 Analysis.... 139
6.8.2 Interpretation. 143
6.8.3 Specification of Analysis. 143
6.9 Contrasts.............. 144
6.9.1 Mathematics of Contrasts 144
6.9.2 Scaling........... 146
6.10 Tests of Homogeneity of Variance 147
6.11 Exercises .............. 148
6.A Appendix: Computation for the Analysis of Variance 153
6.A.1 Computing Notes. 153
6.A.2 Computation.................. 153
7 Multiple Comparisons 155
7.1 Multiple Comparison Procedures. . . . . . . . . . . . . . 156
7.1.1 Bonferroni Method. . . . . . . . . . . . . . . . . 156
7.1.2 Tukey Procedure for All Pairwise Comparisons. 157
7.1.3 The Dunnett Procedure for Comparing One Mean with All Others 157
7.1.4 Simultaneously Comparing All Possible Contrasts-Scheffe and Extended Tukey ...162
7.2 The Mean-Mean Multiple Comparisons Display (MMC Plot) . 168
7.2.1 Difficulties with Standard Displays. . . . . . . . . . . . 168
7.2.2 Hsu and Peruggia's Mean-Mean Scatterplot . . . . . . 173
7.2.3 Extensions of the Mean-Mean Display to Arbitrary Contrasts. 178
7.2.4 Display of an Orthogonal Basis Set of Contrasts
7.2.5 Hsu and Peruggia's Pulmonary Example
7.3 Exercises .......................... .
8 Linear Regression by Least Squares
8.1 Introduction ....... .
8.2 Example-Body Fat Data.
8.3 Simple Linear Regression .
8.3.1 Algebra ..... .
8.3.2 Normal Distribution Theory
8.3.3 Calculations ......... .
8.3.4 Residual Mean Square in Regression Printout
8.3.5 New Observations
8.4 Diagnostics
8.5 Graphics...........
8.6 Exercises ......... .
8.A Appendix: Computation for Regression Analysis.
8.A.1 S-PLUS Functions ...
8.A.2 SAS Macros and Procs ..... .
9 Multiple Regression-More Than One Predictor
9.1 Regression with Two Predictors-Least-Squares Geometry.
9.2 Multiple Regression-Algebra .....
9.2.1 The Hat Matrix and Leverage
9.3 Multiple Regression-Two-X Analysis
9.4 Geometry of Multiple Regression
9.5 Programming .......... .
9.5.1 Model Specification ... .
9.5.2 Printout Idiosyncrasies ..
9.6 Example-Albuquerque Home Price Data
9.7 Partial F-Tests .......... .
9.8 Polynomial Models ........ .
9.9 Models Without a Constant Term
9.10 Prediction ........ .
9.11 Example-Longley Data
9.12 Collinearity ....... .
9.13 Variable Selection ... .
9.13.1 Manual Use of the Stepwise Philosophy
9.13.2 Automated Stepwise Regression ....
9.13.3 Automated Stepwise Modeling of the Longley Data
9.14 Residual Plots ........ .
9.14.1 Partial Residuals .. .
9.14.2 Partial Residual Plots
9.14.3 Partial Correlation ..
9.14.4 Added Variable Plots ..... .
9.14.5 Interpretation of Residual Plots
9.15 Example-U.S. Air Pollution Data
9.16 Exercises ................. .
10 Multiple Regression-Dummy Variables and Contrasts
10.1 Dummy (Indicator) Variables ................... .
10.2 Example-Height and Weight .................. .
10.3 Equivalence of Linear Independent X-Variables for Regression
10.4 Polynomial Contrasts and Orthogonal Polynomials . . . . . . .
10.4.1 Specification and Interpretation of Interaction Terms .
10.5 Analysis Using a Concomitant Variable (Analysis of Covariance)
10.6 Example-Hot Dog Data . . . . . . . . . . .
10.6.1 One-Way ANOVA .......... .
10.6.2 Concomitant Explanatory Variable ..
10.6.3 Tests of Equality of Regression Lines
10.7 ancova Function
10.8 Exercises ................ .
11 Multiple Regression-Regression Diagnostics
11.1 Example-Rent Data ......... .
11.1.1 Rent Levels . . . . . . . . . . .
11.1.2 Alfalfa Rent Relative to Other Rent
11.2 Checks on Model Assumptions
11.2.1 Scatterplot Matrix
11.2.2 Residual Plots.
11.3 Case Statistics . . . . . . .
11.3.1 Leverage ..... .
11.3.2 Deleted Standard Deviation.
11.3.3 Standardized and Studentized Deleted Residuals.
11.3.4 Cook's Distance.
11.3.5 DFFITS ................ .
11.3.6 DFBETAS ............... .
11.3.7 Calculation of Regression Diagnostics
11.4 Exercises .................... .
12 Two-Way Analysis of Variance
12.1 Example-Display Panel Data
12.2 Statistical Model ....... .
12.3 Main Effects and Interactions
12.4 Two-Way Interaction Plot ..
12.5 Sums of Squares in the Two-Way ANOVA Table
12.6 Treatment and Blocking Factors
12.7 Fixed and Random Effects ............ .
12.8 Randomized Complete Block Designs ... .
12.9 Example-The Blood Plasma Data .... .
12.10 Random Effects Models and Mixed Models.
12.11 Introduction to Nesting ........... .
12.11.1 Example-Workstation Data ... .
12.12 Example-Display Panel Data-Continued.
12.13 Example-The Rhizobium Data ...... .
12.13.1 First Rhizobium Experiment: Alfalfa Plants.
12.13.2 Second Rhizobium Experiment: Clover Plants
12.13.3 Initial Plots .. .
12.13.4 Alfalfa Analysis .. .
12.13.5 Clover Analysis .. .
12.14 Models Without Interaction
12.15 Example-Animal Feed Data.
12.16 Exercises ........... .
12.A Appendix: Computation for the Analysis of Variance
13 Design of Experiments-Factorial Designs
13.1 A Three-Way ANOVA-Muscle Data
13.3 Simple Effects for Interaction Analyses
13.4 Nested Factorial Experiment ..... .
13.5 Specification of Model Formulas .....
13.6 Squential and Conditional Tests . . . . . . .
13.7 Exercises .............. .
13.A Appendix: Orientation for Boxplots .
14 Design of Experiments-Complex Designs
14.1 Confounding ....... .
14.2 Split Plot Designs ....... .
14.3 Example-Yates Oat Data .. .
14.4 Introduction to Fractional Factorial Designs
14.5 Introduction to Crossover Designs
14.6 Example-Apple Tree Data ....
14.7 Example--testscore .dat ..
14.8 The Thkey One Degree of Freedom for Nonadditivity.
14.9 Exercises .......... .
15 Bivariate Statistics-Discrete Data 487
15.1 Two-Dimensional Contingency Tables-Chi-Square Analysis. 487
15.2 Two-Dimensional Contingency Tables-Fisher's Exact Test . 492
15.3 Simpson's Paradox . . . . . . . . 495
15.4 Relative Risk and Odds Ratios . . . . . 498
15.5 Retrospective and Prospective Studies 503
15.6 Mantel-Haenszel Test . . . . . 504
15.7 Example--Salk Polio Vaccine. 506
15.8 Exercises ............ 508
16 Nonparametrics 511
16.1 Introduction ..................... 511
16.2 Sign Test for the Location of a Single Population 512
16.3 Comparing the Locations of Paired Populations 514
16.4 Mann-Whitney Test for Two Independent Samples 520
16.5 Kruskal-Wallis Test for Comparing the Locations of at Least Three Populations 523
16.6 Exercises ................................... 526
17 Logistic Regression 527
17.1 Example--The Space Shuttle Challenger Disaster . . 529
17.2 Estimation ......... . . . . . . .. ...... 537
17.3 Example--Budworm Data 540
17.4 Example--Lymph Nodes 542
17.4.1 Data.......... 542
17.5 Numerical Printout ...... 553
17.6 Graphics............. 553
17.7 Model Specification 556
17.8 Fitting Models When the Response Is a Sample Proportion 557
17.9 LogXact. 558
17.10 Exercises ............................. 558
18 Time Series Analysis 565
18.1 Introduction ..................... 565
18.2 The ARIMA Approach to Time Series Modeling. 567
18.3 Autocorrelation.................... 570
18.4 Analysis Steps .. . . . . . . . . . . . . . . . . . . . 571
18.5 Some Algebraic Development, Including Forecasting 573
18.6 Graphical Displays for Time Series Analysis 575
18.7 Models with Seasonal Components ..... 580
18.8 Example of a Seasonal Model-The Monthly co2 Data. 582
18.9 Exercises ........................... 589
18.A Appendix: Graphical Displays for Time Series Analysis. 618
18.A.1 Characteristics of This Presentation of the Time Series Plot 619
18.A.2 Characteristics of This Presentation of the Sample ACF and PACF Plots . .. 619
18.A.3 Construction of Graphical Displays 620
18.A.4 User Functions Written for S-PLUS 620