楼主: rjl248
6561 19

Data Preparation for Data mining using sas [推广有奖]

  • 0关注
  • 0粉丝

大专生

8%

还不是VIP/贵宾

-

威望
0
论坛币
2228 个
通用积分
1.0000
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
428 点
帖子
26
精华
0
在线时间
34 小时
注册时间
2008-4-3
最后登录
2020-10-10

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
243293.rar (2.35 MB, 需要: 10 个论坛币) 本附件包括:
  • 0123735777.pdf
<br/><br>squarekiss
 金钱 +30
 好文章 2008-9-3 13:26:16
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Preparation Data Mining ration ATION ratio 文章

沙发
rjl248 发表于 2008-9-3 12:57:00 |只看作者 |坛友微信交流群
<p>Contents<br/>CHAPTER<br/>1 INTRODUCTION 1<br/>1.1 The Data Mining Process 1<br/>1.2 Methodologies of Data Mining 1<br/>1.3 The Mining View 3<br/>1.4 The Scoring View 4<br/>1.5 Notes on Data Mining Software 4<br/>CHAPTER<br/>2 TASKS AND DATA FLOW 7<br/>2.1 Data Mining Tasks 7<br/>2.2 Data Mining Competencies 9<br/>2.3 The Data Flow 10<br/>2.4 Types of Variables 11<br/>2.5 The Mining View and the Scoring View 12<br/>2.6 Steps of Data Preparation 13<br/>CHAPTER<br/>3 REVIEW OF DATA MINING MODELING<br/>TECHNIQUES 15<br/>3.1 Introduction 15<br/>3.2 Regression Models 15<br/>3.2.1 Linear Regression 16</p><p>3.2.2 Logistic Regression 18<br/>3.3 Decision Trees 21<br/>3.4 Neural Networks 22<br/>3.5 Cluster Analysis 25<br/>3.6 Association Rules 26<br/>3.7 Time Series Analysis 26<br/>3.8 Support Vector Machines 26<br/>CHAPTER<br/>4 SAS MACROS: A QUICK START 29<br/>4.1 Introduction:Why Macros? 29<br/>4.2 The Basics: The Macro and Its Variables 30<br/>4.3 Doing Calculations 32<br/>4.4 Programming Logic 33<br/>4.5 Working with Strings 35<br/>4.6 Macros That Call Other Macros 36<br/>4.7 Common Macro Patterns and Caveats 37<br/>4.7.1 Generating a List of Macro Variables 37<br/>4.7.2 Double Coding 39<br/>4.7.3 Using Local Variables 39<br/>4.7.4 From a DATA Step to Macro Variables 40<br/>4.8 Where to Go FromHere 41<br/>CHAPTER<br/>5 DATA ACQUISITION AND INTEGRATION 43<br/>5.1 Introduction 43<br/>5.2 Sources of Data 43<br/>5.2.1 Operational Systems 43<br/>5.2.2 DataWarehouses and Data Marts 44<br/>5.2.3 OLAP Applications 44<br/>5.2.4 Surveys 44<br/>5.2.5 Household and Demographic Databases 45<br/>5.3 Variable Types 45<br/>5.3.1 Nominal Variables 45<br/>5.3.2 Ordinal Variables 46<br/>5.3.3 Real Measures 475.4 Data Rollup 47<br/>5.5 Rollup with Sums, Averages, and Counts 54<br/>5.6 Calculation of the Mode 55<br/>5.7 Data Integration 56<br/>5.7.1 Merging 57<br/>5.7.2 Concatenation 59<br/>CHAPTER<br/>6 INTEGRITY CHECKS 63<br/>6.1 Introduction 63<br/>6.2 Comparing Datasets 66<br/>6.3 Dataset Schema Checks 66<br/>6.3.1 Dataset Variables 66<br/>6.3.2 Variable Types 69<br/>6.4 Nominal Variables 70<br/>6.4.1 Testing the Presence of All Categories 70<br/>6.4.2 Testing the Similarity of Ratios 73<br/>6.5 Continuous Variables 76<br/>6.5.1 Comparing Measure from Two Datasets 77<br/>6.5.2 Comparing the Means, Standard Deviations, and Variance 78<br/>6.5.3 The Confidence-Level Calculations Assumptions 80<br/>6.5.4 Comparison of Other Measures 81<br/>CHAPTER<br/>7 EXPLORATORY DATA ANALYSIS 83<br/>7.1 Introduction 83<br/>7.2 Common EDA Procedures 83<br/>7.3 Univariate Statistics 84<br/>7.4 Variable Distribution 86<br/>7.5 Detection of Outliers 86<br/>7.5.1 Identification of Outliers Using Ranges 88<br/>7.5.2 Identification of Outliers Using Model Fitting 91<br/>7.5.3 Identification of Outliers Using Clustering 94<br/>7.5.4 Notes on Outliers 96<br/>7.6 Testing Normality 96<br/>7.7 Cross-tabulation 97<br/>7.8 Investigating Data Structures 97CHAPTER<br/>8 SAMPLING AND PARTITIONING 99<br/>8.1 Introduction 99<br/>8.2 Contents of Samples 100<br/>8.3 Random Sampling 101<br/>8.3.1 Constraints on Sample Size 101<br/>8.3.2 SAS Implementation 101<br/>8.4 Balanced Sampling 104<br/>8.4.1 Constraints on Sample Size 105<br/>8.4.2 SAS Implementation 106<br/>8.5 Minimum Sample Size 110<br/>8.5.1 Continuous and Binary Variables 110<br/>8.5.2 Sample Size for a Nominal Variable 112<br/>8.6 Checking Validity of Sample 113<br/>CHAPTER<br/>9 DATA TRANSFORMATIONS 115<br/>9.1 Raw and Analytical Variables 115<br/>9.2 Scope of Data Transformations 116<br/>9.3 Creation of New Variables 119<br/>9.3.1 Renaming Variables 120<br/>9.3.2 Automatic Generation of Simple Analytical Variables 124<br/>9.4 Mapping of Nominal Variables 126<br/>9.5 Normalization of Continuous Variables 130<br/>9.6 Changing the Variable Distribution 131<br/>9.6.1 Rank Transformations 131<br/>9.6.2 Box–Cox Transformations 133<br/>9.6.3 Spreading the Histogram 138<br/>CHAPTER<br/>10 BINNING AND REDUCTION<br/>OF CARDINALITY 141<br/>10.1 Introduction 141<br/>10.2 Cardinality Reduction 142<br/>10.2.1 The Main Questions 142</p><p>10.2.2 Structured Grouping Methods 144<br/>10.2.3 Splitting a Dataset 144<br/>10.2.4 The Main Algorithm 145<br/>10.2.5 Reduction of Cardinality Using Gini Measure 147<br/>10.2.6 Limitations and Modifications 156<br/>10.3 Binning of Continuous Variables 157<br/>10.3.1 Equal-Width Binning 157<br/>10.3.2 Equal-Height Binning 160<br/>10.3.3 Optimal Binning 164<br/>CHAPTER<br/>11 TREATMENT OF MISSING VALUES 171<br/>11.1 Introduction 171<br/>11.2 Simple Replacement 174<br/>11.2.1 Nominal Variables 174<br/>11.2.2 Continuous and Ordinal Variables 176<br/>11.3 Imputing Missing Values 179<br/>11.3.1 Basic Issues in Multiple Imputation 179<br/>11.3.2 Patterns of Missingness 180<br/>11.4 Imputation Methods and Strategy 181<br/>11.5 SAS Macros for Multiple Imputation 185<br/>11.5.1 Extracting the Pattern of Missing Values 185<br/>11.5.2 Reordering Variables 190<br/>11.5.3 Checking Missing Pattern Status 194<br/>11.5.4 Imputing to a Monotone Missing Pattern 197<br/>11.5.5 Imputing Continuous Variables 198<br/>11.5.6 Combining Imputed Values of Continuous Variables 200<br/>11.5.7 Imputing Nominal and Ordinal Variables 203<br/>11.5.8 Combining Imputed Values of Ordinal and Nominal<br/>Variables 203<br/>11.6 Predicting Missing Values 204<br/>CHAPTER<br/>12 PREDICTIVE POWER AND VARIABLE<br/>REDUCTION I 207<br/>12.1 Introduction 207<br/>12.2 Metrics of Predictive Power 208</p><p>12.3 Methods of Variable Reduction 209<br/>12.4 Variable Reduction: Before or During Modeling 210<br/>CHAPTER<br/>13 ANALYSIS OF NOMINAL AND ORDINAL<br/>VARIABLES 211<br/>13.1 Introduction 211<br/>13.2 Contingency Tables 211<br/>13.3 Notation and Definitions 212<br/>13.4 Contingency Tables for Binary Variables 214<br/>13.4.1 Difference in Proportion 215<br/>13.4.2 The Odds Ratio 218<br/>13.4.3 The Pearson Statistic 221<br/>13.4.4 The Likelihood Ratio Statistic 224<br/>13.5 Contingency Tables for Multicategory Variables 225<br/>13.6 Analysis of Ordinal Variables 227<br/>13.7 Implementation Scenarios 231<br/>CHAPTER<br/>14 ANALYSIS OF CONTINUOUS VARIABLES 233<br/>14.1 Introduction 233<br/>14.2 When Is Binning Necessary? 233<br/>14.3 Measures of Association 234<br/>14.3.1 Notation 234<br/>14.3.2 The F-Test 236<br/>14.3.3 Gini and Entropy Variances 236<br/>14.4 Correlation Coefficients 239<br/>CHAPTER<br/>15 PRINCIPAL COMPONENT ANALYSIS 247<br/>15.1 Introduction 247<br/>15.2 Mathematical Formulations 248</p><p>15.3 Implementing and Using PCA 249<br/>15.4 Comments on Using PCA 254<br/>15.4.1 Number of Principal Components 254<br/>15.4.2 Success of PCA 254<br/>15.4.3 Nominal Variables 256<br/>15.4.4 Dataset Size and Performance 256<br/>CHAPTER<br/>16 FACTOR ANALYSIS 257<br/>16.1 Introduction 257<br/>16.1.1 Basic Model 257<br/>16.1.2 Factor Rotation 259<br/>16.1.3 Estimation Methods 259<br/>16.1.4 Variable Standardization 259<br/>16.1.5 Illustrative Example 259<br/>16.2 Relationship Between PCA and FA 263<br/>16.3 Implementation of Factor Analysis 263<br/>16.3.1 Obtaining the Factors 264<br/>16.3.2 Factor Scores 265<br/>CHAPTER<br/>17 PREDICTIVE POWER AND VARIABLE<br/>REDUCTION II 267<br/>17.1 Introduction 267<br/>17.2 Data with Binary Dependent Variables 267<br/>17.2.1 Notation 267<br/>17.2.2 Nominal Independent Variables 268<br/>17.2.3 Numeric Nominal Independent Variables 273<br/>17.2.4 Ordinal Independent Variables 273<br/>17.2.5 Continuous Independent Variables 274<br/>17.3 Data with Continuous Dependent Variables 275<br/>17.3.1 Nominal Independent Variables 275<br/>17.3.2 Ordinal Independent Variables 275<br/>17.3.3 Continuous Independent Variables 275<br/>17.4 Variable Reduction Strategies 275CHAPTER<br/>18 PUTTING IT ALL TOGETHER 279<br/>18.1 Introduction 279<br/>18.2 The Process of Data Preparation 279<br/>18.3 Case Study: The Bookstore 281<br/>18.3.1 The Business Problem 281<br/>18.3.2 Project Tasks 282<br/>18.3.3 The Data Preparation Code 283<br/>APPENDIX<br/>LISTING OF SAS MACROS 297<br/>A.1 Copyright and Software License 297<br/>A.2 Dependencies between Macros 298<br/>A.3 Data Acquisition and Integration 299<br/>A.3.1 Macro TBRollup() 299<br/>A.3.2 Macro ABRollup() 301<br/>A.3.3 Macro VarMode() 303<br/>A.3.4 Macro MergeDS() 304<br/>A.3.5 Macro ContcatDS() 304<br/>A.4 Integrity Checks 304<br/>A.4.1 Macro SchCompare() 304<br/>A.4.2 Macro CatCompare() 306<br/>A.4.3 Macro ChiSample() 307<br/>A.4.4 Macro VarUnivar1() 308<br/>A.4.5 Macro CVLimits() 309<br/>A.4.6 Macro CompareTwo() 309<br/>A.5 Exploratory Data Analysis 310<br/>A.5.1 Macro Extremes1() 310<br/>A.5.2 Macro Extremes2() 311<br/>A.5.3 Macro RobRegOL() 312<br/>A.5.4 Macro ClustOL() 312<br/>A.6 Sampling and Partitioning 313<br/>A.6.1 Macro RandomSample() 313<br/>A.6.2 Macro R2Samples() 313<br/>A.6.3 Macro B2Samples() 315<br/>A.7 Data Transformations 318<br/>A.7.1 Macro NorList() 318</p><p>A.7.2 Macro NorVars() 319<br/>A.7.3 Macro AutoInter() 320<br/>A.7.4 Macro CalcCats() 321<br/>A.7.5 Macro MappCats() 322<br/>A.7.6 Macro CalcLL() 323<br/>A.7.7 Macro BoxCox() 324<br/>A.8 Binning and Reduction of Cardinality 325<br/>A.8.1 Macro GRedCats() 325<br/>A.8.2 Macro GSplit() 329<br/>A.8.3 Macro AppCatRed() 331<br/>A.8.4 Macro BinEqW() 332<br/>A.8.5 Macro BinEqW2() 332<br/>A.8.6 Macro BinEqW3() 333<br/>A.8.7 Macro BinEqH() 334<br/>A.8.8 Macro GBinBDV() 336<br/>A.8.9 Macro AppBins() 340<br/>A.9 Treatment of Missing Values 341<br/>A.9.1 Macro ModeCat() 341<br/>A.9.2 Macro SubCat() 342<br/>A.9.3 Macro SubCont() 342<br/>A.9.4 Macro MissPatt() 344<br/>A.9.5 Macro ReMissPat() 347<br/>A.9.6 Macro CheckMono() 349<br/>A.9.7 Macro MakeMono() 350<br/>A.9.8 Macro ImpReg() 351<br/>A.9.9 Macro AvgImp() 351<br/>A.9.10 Macro NORDImp() 352<br/>A.10 Analysis of Nominal and Ordinal Variables 352<br/>A.10.1 Macro ContinMat() 352<br/>A.10.2 Macro PropDiff() 353<br/>A.10.3 Macro OddsRatio() 354<br/>A.10.4 Macro PearChi() 355<br/>A.10.5 Macro LikeRatio() 355<br/>A.10.6 Macro ContPear() 356<br/>A.10.7 Macro ContSpear() 356<br/>A.10.8 Macro ContnAna() 357<br/>A.11 Analysis of Continuous Variables 358<br/>A.11.1 Macro ContGrF() 358<br/>A.11.2 Macro VarCorr() 359</p><p>A.12 Principal Component Analysis 360<br/>A.12.1 Macro PrinComp1() 360<br/>A.12.2 Macro PrinComp2() 360<br/>A.13 Factor Analysis 362<br/>A.13.1 Macro Factor() 362<br/>A.13.2 Macro FactScore() 362<br/>A.13.3 Macro FactRen() 363<br/>A.14 Predictive Power and Variable Reduction II 363<br/>A.14.1 Macro GiniCatBDV() 363<br/>A.14.2 Macro EntCatBDV() 364<br/>A.14.3 Macro PearSpear() 366<br/>A.14.4 Macro PowerCatBDV() 367<br/>A.14.5 Macro PowerOrdBDV() 368<br/>A.14.6 Macro PowerCatNBDV() 370<br/>A.15 Other Macros 372<br/>A.15.1 ListToCol() 372<br/>Bibliography 373<br/>Index 375<br/>About the Author 393</p>

使用道具

藤椅
edragon1983 发表于 2008-9-3 16:36:00 |只看作者 |坛友微信交流群
<p>好书!值得购买!物超所值!</p>

使用道具

板凳
eijuhz 发表于 2008-9-3 17:50:00 |只看作者 |坛友微信交流群
给予300现金奖励

使用道具

报纸
paul0223 发表于 2008-9-3 18:05:00 |只看作者 |坛友微信交流群
<p>这么多宏,感觉蛮有的。我买了……</p>

使用道具

地板
rjl248 发表于 2008-9-3 18:06:00 |只看作者 |坛友微信交流群
谢谢版主。

使用道具

7
paul0223 发表于 2008-9-3 18:35:00 |只看作者 |坛友微信交流群
可惜我只有10,等我赚钱了再回来……

使用道具

8
myxixi 发表于 2008-9-3 23:14:00 |只看作者 |坛友微信交流群
<p>应该是非常好的书</p><p>谢谢了</p>

使用道具

9
firstknife 发表于 2008-9-4 14:50:00 |只看作者 |坛友微信交流群
看起来很诱人啊,可惜囊中羞涩,不太敢买啊!

使用道具

10
firstknife 发表于 2008-9-4 15:14:00 |只看作者 |坛友微信交流群
实在忍不住,还是买下来先睹为快!

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-28 05:08