Data Preparation for Data mining using sas-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 软件培训>>

SAS软件培训

>>

Data Preparation for Data mining using sas

Data Preparation for Data mining using sas

发布:arpanet | 分类:SAS软件培训

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

CHAPTER1INTRODUCTION11.1TheDataMiningProcess11.2MethodologiesofDataMining11.3TheMiningView31.4TheScoringView41.5NotesonDataMiningSoftware4CHAPTER2TASKSANDDATAFLOW72.1DataMiningTasks72.2DataMiningCompe ...
免费学术公开课,扫码加入


CHAPTER
1 INTRODUCTION 1
1.1 The Data Mining Process 1
1.2 Methodologies of Data Mining 1
1.3 The Mining View 3
1.4 The Scoring View 4
1.5 Notes on Data Mining Software 4
CHAPTER
2 TASKS AND DATA FLOW 7
2.1 Data Mining Tasks 7
2.2 Data Mining Competencies 9
2.3 The Data Flow 10
2.4 Types of Variables 11
2.5 The Mining View and the Scoring View 12
2.6 Steps of Data Preparation 13
CHAPTER
3 REVIEW OF DATA MINING MODELING
TECHNIQUES 15
3.1 Introduction 15
3.2 Regression Models 15
3.2.1 Linear Regression 16
v
vi Contents
3.2.2 Logistic Regression 18
3.3 Decision Trees 21
3.4 Neural Networks 22
3.5 Cluster Analysis 25
3.6 Association Rules 26
3.7 Time Series Analysis 26
3.8 Support Vector Machines 26
CHAPTER
4 SAS MACROS: A QUICK START 29
4.1 Introduction:Why Macros? 29
4.2 The Basics: The Macro and Its Variables 30
4.3 Doing Calculations 32
4.4 Programming Logic 33
4.5 Working with Strings 35
4.6 Macros That Call Other Macros 36
4.7 Common Macro Patterns and Caveats 37
4.7.1 Generating a List of Macro Variables 37
4.7.2 Double Coding 39
4.7.3 Using Local Variables 39
4.7.4 From a DATA Step to Macro Variables 40
4.8 Where to Go FromHere 41
CHAPTER
5 DATA ACQUISITION AND INTEGRATION 43
5.1 Introduction 43
5.2 Sources of Data 43
5.2.1 Operational Systems 43
5.2.2 DataWarehouses and Data Marts 44
5.2.3 OLAP Applications 44
5.2.4 Surveys 44
5.2.5 Household and Demographic Databases 45
5.3 Variable Types 45
5.3.1 Nominal Variables 45
5.3.2 Ordinal Variables 46
5.3.3 Real Measures 47
Contents vii
5.4 Data Rollup 47
5.5 Rollup with Sums, Averages, and Counts 54
5.6 Calculation of the Mode 55
5.7 Data Integration 56
5.7.1 Merging 57
5.7.2 Concatenation 59
CHAPTER
6 INTEGRITY CHECKS 63
6.1 Introduction 63
6.2 Comparing Datasets 66
6.3 Dataset Schema Checks 66
6.3.1 Dataset Variables 66
6.3.2 Variable Types 69
6.4 Nominal Variables 70
6.4.1 Testing the Presence of All Categories 70
6.4.2 Testing the Similarity of Ratios 73
6.5 Continuous Variables 76
6.5.1 Comparing Measure from Two Datasets 77
6.5.2 Comparing the Means, Standard Deviations, and Variance 78
6.5.3 The Confidence-Level Calculations Assumptions 80
6.5.4 Comparison of Other Measures 81
CHAPTER
7 EXPLORATORY DATA ANALYSIS 83
7.1 Introduction 83
7.2 Common EDA Procedures 83
7.3 Univariate Statistics 84
7.4 Variable Distribution 86
7.5 Detection of Outliers 86
7.5.1 Identification of Outliers Using Ranges 88
7.5.2 Identification of Outliers Using Model Fitting 91
7.5.3 Identification of Outliers Using Clustering 94
7.5.4 Notes on Outliers 96
7.6 Testing Normality 96
7.7 Cross-tabulation 97
7.8 Investigating Data Structures 97
viii Contents
CHAPTER
8 SAMPLING AND PARTITIONING 99
8.1 Introduction 99
8.2 Contents of Samples 100
8.3 Random Sampling 101
8.3.1 Constraints on Sample Size 101
8.3.2 SAS Implementation 101
8.4 Balanced Sampling 104
8.4.1 Constraints on Sample Size 105
8.4.2 SAS Implementation 106
8.5 Minimum Sample Size 110
8.5.1 Continuous and Binary Variables 110
8.5.2 Sample Size for a Nominal Variable 112
8.6 Checking Validity of Sample 113
CHAPTER
9 DATA TRANSFORMATIONS 115
9.1 Raw and Analytical Variables 115
9.2 Scope of Data Transformations 116
9.3 Creation of New Variables 119
9.3.1 Renaming Variables 120
9.3.2 Automatic Generation of Simple Analytical Variables 124
9.4 Mapping of Nominal Variables 126
9.5 Normalization of Continuous Variables 130
9.6 Changing the Variable Distribution 131
9.6.1 Rank Transformations 131
9.6.2 Box–Cox Transformations 133
9.6.3 Spreading the Histogram 138
CHAPTER
10 BINNING AND REDUCTION
OF CARDINALITY 141
10.1 Introduction 141
10.2 Cardinality Reduction 142
10.2.1 The Main Questions 142
Contents ix
10.2.2 Structured Grouping Methods 144
10.2.3 Splitting a Dataset 144
10.2.4 The Main Algorithm 145
10.2.5 Reduction of Cardinality Using Gini Measure 147
10.2.6 Limitations and Modifications 156
10.3 Binning of Continuous Variables 157
10.3.1 Equal-Width Binning 157
10.3.2 Equal-Height Binning 160
10.3.3 Optimal Binning 164
CHAPTER
11 TREATMENT OF MISSING VALUES 171
11.1 Introduction 171
11.2 Simple Replacement 174
11.2.1 Nominal Variables 174
11.2.2 Continuous and Ordinal Variables 176
11.3 Imputing Missing Values 179
11.3.1 Basic Issues in Multiple Imputation 179
11.3.2 Patterns of Missingness 180
11.4 Imputation Methods and Strategy 181
11.5 SAS Macros for Multiple Imputation 185
11.5.1 Extracting the Pattern of Missing Values 185
11.5.2 Reordering Variables 190
11.5.3 Checking Missing Pattern Status 194
11.5.4 Imputing to a Monotone Missing Pattern 197
11.5.5 Imputing Continuous Variables 198
11.5.6 Combining Imputed Values of Continuous Variables 200
11.5.7 Imputing Nominal and Ordinal Variables 203
11.5.8 Combining Imputed Values of Ordinal and Nominal
Variables 203
11.6 Predicting Missing Values 204
CHAPTER
12 PREDICTIVE POWER AND VARIABLE
REDUCTION I 207
12.1 Introduction 207
12.2 Metrics of Predictive Power 208
x Contents
12.3 Methods of Variable Reduction 209
12.4 Variable Reduction: Before or During Modeling 210
CHAPTER
13 ANALYSIS OF NOMINAL AND ORDINAL
VARIABLES 211
13.1 Introduction 211
13.2 Contingency Tables 211
13.3 Notation and Definitions 212
13.4 Contingency Tables for Binary Variables 214
13.4.1 Difference in Proportion 215
13.4.2 The Odds Ratio 218
13.4.3 The Pearson Statistic 221
13.4.4 The Likelihood Ratio Statistic 224
13.5 Contingency Tables for Multicategory Variables 225
13.6 Analysis of Ordinal Variables 227
13.7 Implementation Scenarios 231
CHAPTER
14 ANALYSIS OF CONTINUOUS VARIABLES 233
14.1 Introduction 233
14.2 When Is Binning Necessary? 233
14.3 Measures of Association 234
14.3.1 Notation 234
14.3.2 The F-Test 236
14.3.3 Gini and Entropy Variances 236
14.4 Correlation Coefficients 239
CHAPTER
15 PRINCIPAL COMPONENT ANALYSIS 247
15.1 Introduction 247
15.2 Mathematical Formulations 248
Contents xi
15.3 Implementing and Using PCA 249
15.4 Comments on Using PCA 254
15.4.1 Number of Principal Components 254
15.4.2 Success of PCA 254
15.4.3 Nominal Variables 256
15.4.4 Dataset Size and Performance 256
CHAPTER
16 FACTOR ANALYSIS 257
16.1 Introduction 257
16.1.1 Basic Model 257
16.1.2 Factor Rotation 259
16.1.3 Estimation Methods 259
16.1.4 Variable Standardization 259
16.1.5 Illustrative Example 259
16.2 Relationship Between PCA and FA 263
16.3 Implementation of Factor Analysis 263
16.3.1 Obtaining the Factors 264
16.3.2 Factor Scores 265
CHAPTER
17 PREDICTIVE POWER AND VARIABLE
REDUCTION II 267
17.1 Introduction 267
17.2 Data with Binary Dependent Variables 267
17.2.1 Notation 267
17.2.2 Nominal Independent Variables 268
17.2.3 Numeric Nominal Independent Variables 273
17.2.4 Ordinal Independent Variables 273
17.2.5 Continuous Independent Variables 274
17.3 Data with Continuous Dependent Variables 275
17.3.1 Nominal Independent Variables 275
17.3.2 Ordinal Independent Variables 275
17.3.3 Continuous Independent Variables 275
17.4 Variable Reduction Strategies 275
xii Contents
CHAPTER
18 PUTTING IT ALL TOGETHER 279
18.1 Introduction 279
18.2 The Process of Data Preparation 279
18.3 Case Study: The Bookstore 281
18.3.1 The Business Problem 281
18.3.2 Project Tasks 282
18.3.3 The Data Preparation Code 283
APPENDIX
LISTING OF SAS MACROS 297
A.1 Copyright and Software License 297
A.2 Dependencies between Macros 298
A.3 Data Acquisition and Integration 299
A.3.1 Macro TBRollup() 299
A.3.2 Macro ABRollup() 301
A.3.3 Macro VarMode() 303
A.3.4 Macro MergeDS() 304
A.3.5 Macro ContcatDS() 304
A.4 Integrity Checks 304
A.4.1 Macro SchCompare() 304
A.4.2 Macro CatCompare() 306
A.4.3 Macro ChiSample() 307
A.4.4 Macro VarUnivar1() 308
A.4.5 Macro CVLimits() 309
A.4.6 Macro CompareTwo() 309
A.5 Exploratory Data Analysis 310
A.5.1 Macro Extremes1() 310
A.5.2 Macro Extremes2() 311
A.5.3 Macro RobRegOL() 312
A.5.4 Macro ClustOL() 312
A.6 Sampling and Partitioning 313
A.6.1 Macro RandomSample() 313
A.6.2 Macro R2Samples() 313
A.6.3 Macro B2Samples() 315
A.7 Data Transformations 318
A.7.1 Macro NorList() 318
Contents xiii
A.7.2 Macro NorVars() 319
A.7.3 Macro AutoInter() 320
A.7.4 Macro CalcCats() 321
A.7.5 Macro MappCats() 322
A.7.6 Macro CalcLL() 323
A.7.7 Macro BoxCox() 324
A.8 Binning and Reduction of Cardinality 325
A.8.1 Macro GRedCats() 325
A.8.2 Macro GSplit() 329
A.8.3 Macro AppCatRed() 331
A.8.4 Macro BinEqW() 332
A.8.5 Macro BinEqW2() 332
A.8.6 Macro BinEqW3() 333
A.8.7 Macro BinEqH() 334
A.8.8 Macro GBinBDV() 336
A.8.9 Macro AppBins() 340
A.9 Treatment of Missing Values 341
A.9.1 Macro ModeCat() 341
A.9.2 Macro SubCat() 342
A.9.3 Macro SubCont() 342
A.9.4 Macro MissPatt() 344
A.9.5 Macro ReMissPat() 347
A.9.6 Macro CheckMono() 349
A.9.7 Macro MakeMono() 350
A.9.8 Macro ImpReg() 351
A.9.9 Macro AvgImp() 351
A.9.10 Macro NORDImp() 352
A.10 Analysis of Nominal and Ordinal Variables 352
A.10.1 Macro ContinMat() 352
A.10.2 Macro PropDiff() 353
A.10.3 Macro OddsRatio() 354
A.10.4 Macro PearChi() 355
A.10.5 Macro LikeRatio() 355
A.10.6 Macro ContPear() 356
A.10.7 Macro ContSpear() 356
A.10.8 Macro ContnAna() 357
A.11 Analysis of Continuous Variables 358
A.11.1 Macro ContGrF() 358
A.11.2 Macro VarCorr() 359
xiv Contents
A.12 Principal Component Analysis 360
A.12.1 Macro PrinComp1() 360
A.12.2 Macro PrinComp2() 360
A.13 Factor Analysis 362
A.13.1 Macro Factor() 362
A.13.2 Macro FactScore() 362
A.13.3 Macro FactRen() 363
A.14 Predictive Power and Variable Reduction II 363
A.14.1 Macro GiniCatBDV() 363
A.14.2 Macro EntCatBDV() 364
A.14.3 Macro PearSpear() 366
A.14.4 Macro PowerCatBDV() 367
A.14.5 Macro PowerOrdBDV() 368
A.14.6 Macro PowerCatNBDV() 370
A.15 Other Macros 372
A.15.1 ListToCol() 372
Bibliography 373
Index 375
About the Author 393
「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-928349-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
经管之家 人大经济论坛 大学 专业 手机版