| 所在主题: | |
| 文件名: Model Fit.rar | |
| 资料下载链接地址: https://bbs.pinggu.org/a-1496971.html | |
本附件包括:
|
|
| 附件大小: | |
|
Education 231C
Applied Categorical & Nonnormal Data Analysis Model Fit -------------------------------------------------------------------------------- Note: Although we will be discussing and demonstrating model fit in the context of logistic regression, many of the concepts and indices apply to other categorical and non-normal models. First Example use http://www.philender.com/courses/data/honors, clear logit honors lang math science female Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -78.757483 Iteration 2: log likelihood =-74.10976 Iteration 3: log likelihood = -73.650266 Iteration 4: log likelihood = -73.642805 Iteration 5: log likelihood = -73.642803 Logit estimates Number of obs = 200 LR chi2(4) = 84.00 Prob > chi2 = 0.0000 Log likelihood = -73.642803 Pseudo R2 = 0.3632 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lang | .0631137 .0281071 2.25 0.025 .0080248 .1182026 math | .1113485 .0337503 3.30 0.001 .045199 .1774979 science | .0568872 .0326402 1.74 0.081 -.0070864 .1208607 female | 1.362197 .4605193 2.96 0.003 .4595958 2.264798 _cons |-14.57728 2.156767 -6.76 0.000 -18.80447 -10.3501 ------------------------------------------------------------------------------ Note: The pseudo-R2 given above is MacFadden's pseudo R2 which we will discuss later. There are several tools built into Stata that deal with fit. lfit Logistic model for honors, goodness-of-fit test number of covariate patterns = 199 Pearson chi2(194) = 164.86 Prob > chi2 = 0.9365 Hosmer and Lemeshow suggest that when the number of covariate patterns is large relative to the number of observations that their index of fit is more appropriate. lfit, group(10) Logistic model for honors, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) number of observations = 200 number of groups = 10 Hosmer-Lemeshow chi2(8) = 8.25 Prob > chi2 = 0.4095 Another way to look at fit is to examin the classification table. lstat Logistic model for honors -------- True -------- Classified | D ~D| Total -----------+--------------------------+----------- + | 31 10| 41 - | 22 137| 159 -----------+--------------------------+----------- Total | 53 147| 200 Classified + if predicted Pr(D) >= .5 True D defined as honors ~= 0 -------------------------------------------------- Sensitivity Pr( +| D) 58.49% Specificity Pr( -|~D) 93.20% Positive predictive value Pr( D| +) 75.61% Negative predictive value Pr(~D| -) 86.16% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 6.80% False - rate for true D Pr( -| D) 41.51% False + rate for classified + Pr(~D| +) 24.39% False - rate for classified - Pr( D| -) 13.84% -------------------------------------------------- Correctly classified 84.00% -------------------------------------------------- Sensativity is proportion of the 1's that are correctly identified; 31/53 = .58490566. Specificity is the proportion of 0's correctly identified; 135/147 = .93197279. The proportion correctly classified, also known as the Count R2, is (31+137)/200 = .84. Deviance Deviance compares a given model to a fully saturated one. Deviance reflects error associated with the model even after the predictors are included in the model. It thus has to do with the significance of the unexplained variance in the response variable. One wants deviance to be not significant. That is, the significance should be worse than (greater than) .05. In many respects deviance in categorical models functions the way SSresid functions in OLS regression, that is, the smaller the deviance the better the model fits the data. Pseudo R2 As discussed in an earlier unit the R2 in OLS regression can take on several different meanings, proportion of variance accounted for, squared correlation between fitted and predicted, and a transformation of the F-statistic. In categorical models there is no single index that fills all of these roles, instead there are a number of pseudo-R2 that have been developed to help in assessing fit. McFadden's R2 This is also known as the likelihood-ratio index. It compares the likelihood for the intercept only model to the likelihood for the model with the predictors. McFadden's R2 can be as low as zero but can never equal one. Adjusted McFadden's R2 The adjusted version of McFadden's R2 subtracts K, the number of parameters in the model. Thus, the Adjusted McFadden's R2 is to McFadden's R2 as the adjusted R2 is to R2 in OLS regression. Maximum Likelihood R2 The maximum likelihood R2 expresses the model fit as a transformation of likelihood ratio chi-square in an analgous way to that of R2 in OLS regression which can be though of as a transformation of the F-statistic. The maximum likelihood R2 can reach a maximum of 1 - L(Mint)2/N. Craig & Uhler's R2 Because of the limitation on the maximum value for the maximum likelihood R2 Craig and Uhler proposed a relative index that can reach one. McKelvey and Zavoina's R2 The McKelvey and Zavoina R2 is an attempt to measure model fit as the proportion of variance accounted for. In this case, we are attempting to explain the variance of the latent variable. The variance of the latent variable can be computed by y* = β'Var(x)β. Efron's R2 Efron's R2 is another model fit index based on proportion of variance accountef for. Count R2 The count R2, as discussed above, is the proportion of correctly classified observations. Adjusted Count R2 The count R2 can be misleading values under certain circumstances. In a binary model it is possible to correctly categorize at least 50% of the cases, without using information from the predictors, by choosing the outcome with the largest percentage. The count R2 needs to be adjusted by the largest row marginal total. In our example, the adjusted count R2 = ((31+137) - 147)/(200 - 147). Thus, the adjusted count R2 is the proportion of correct guesses beyond that by guessing the largest marginal. Information Indices The pseudo-R2s are limited in that they can only be used to compare nested models. Model fit can also be based on measures of information. Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) are two commonly used measures. One advantage to using information criterion measures is that they can be used to compare non-nested models. For these information measures smaller is better. AIC & AIC*n Where L(Mk) is the likelihood of the model and P is the number of parameters (K+1). Some researchers use AIC multiplied by N which fitstat calls AIC*n. Regardless, smaller is better. BIC & BIC' The BIC is based upon the deviance while the BIC' uses the likelihood ratio chi-square. For BIC the term dfk is the degrees of freedom for the deviance and in the BIC' equation df'k is the number of predictors in the model. In comparing two models the difference in the BICs is the same as the difference in the BIC's. The table below can assist in interpreting the difference in two models. As above the smaller BIC or BIC' is better. Interpreting BIC and BIC' Absolute DifferenceEvidence 0-2 Weak 2-6 Positive 7-10 Strong >10 Very Strong Another Example In the example below the likelihood ratios, deviances and pseudo-R2s can only be compared across nested models. The information indices can be used with non-nested models. fitstat, saving(mod1) Measures of Fit for logit of honors Log-Lik Intercept Only: -115.644 Log-Lik Full Model: -73.643 D(195): 147.286 LR(4): 84.003 Prob > LR: 0.000 McFadden's R2: 0.363 McFadden's Adj R2: 0.320 Maximum Likelihood R2: 0.343 Cragg & Uhler's R2: 0.500 McKelvey and Zavoina's R2: 0.560 Efron's R2: 0.388 Variance of y*: 7.485 Variance of error: 3.290 Count R2: 0.840 Adj Count R2: 0.396 AIC: 0.786 AIC*n: 157.286 BIC: -885.886 BIC': -62.810 (Indices saved in matrix fs_mod1) logit honors lang female Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -87.936305 Iteration 2: log likelihood = -85.536982 Iteration 3: log likelihood = -85.443948 Iteration 4: log likelihood =-85.44372 Logit estimates Number of obs = 200 LR chi2(2) = 60.40 Prob > chi2 = 0.0000 Log likelihood =-85.44372 Pseudo R2 = 0.2612 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lang | .1443657 .0233337 6.19 0.000 .0986325 .1900989 female | 1.120926 .4081028 2.75 0.006 .321059 1.920793 _cons |-9.603365 1.426404 -6.73 0.000 -12.39906 -6.807665 ------------------------------------------------------------------------------ fitstat, using(mod1) Measures of Fit for logit of honors Current Saved Difference Model: logit logit N: 200 200 0 Log-Lik Intercept Only: -115.644 -115.644 0.000 Log-Lik Full Model: -85.444 -73.643 -11.801 D: 170.887(197) 147.286(195) 23.602(2) LR: 60.401(2) 84.003(4) 23.602(2) Prob > LR: 0.000 0.000 0.000 McFadden's R2: 0.261 0.363 -0.102 McFadden's Adj R2: 0.235 0.320 -0.085 Maximum Likelihood R2: 0.261 0.343 -0.082 Cragg & Uhler's R2: 0.380 0.500 -0.120 McKelvey and Zavoina's R2: 0.423 0.560 -0.137 Efron's R2: 0.281 0.388 -0.108 Variance of y*: 5.706 7.485 -1.779 Variance of error: 3.290 3.290 0.000 Count R2: 0.785 0.840 -0.055 Adj Count R2: 0.189 0.396 -0.208 AIC: 0.884 0.786 0.098 AIC*n: 176.887 157.286 19.602 BIC: -872.881 -885.886 13.005 BIC': -49.805 -62.810 13.005 Difference of 13.005 in BIC' provides very strong support for saved model. Note: p-value for difference in LR is only valid if models are nested. -------------------------------------------------------------------------------- Categorical Data Analysis Course Phil Ender楴 |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明