楼主: fatboy
27575 172

A Handbook of Statistical Analyses using Stata 2nd [推广有奖]

11
hanszhu 发表于 2005-2-14 14:34:00

Stata Learning Module on Regression Diagnostics: Hetereoscedasticity

Please Note: Stata graph commands changed with version 8 and this page was developed before version 8 was released and uses Stata 7 graph commands. Please see How do I use version 7 graph commands in Stata version 8? for information on how to either run these Stata 7 graph commands in Stata version 8, or how you can covert these commands to use Stata 8 syntax.

This module will explore the regression diagnostics associated with data that is hetereoscedastic, that is has non-constant variance across the predicted values of y. We will use a file called hetsc.dat to illustrate these problems. The file contains 100 observations, and the variables case y x1 x2 x3 and x4. We will use x1 x2 and x3 as predictors and y as the dependent variable. Below we use the hetsc.dta file.

. use http://www.ats.ucla.edu/stat/stata/modules/reg/hetsc, clear

We try running a regression predicting y from x1 x2 and x3.

. regress y x1 x2 x3
 Source | SS df MS Number of obs = 100---------+------------------------------ F( 3, 96) = 65.68 Model | 8933.72373 3 2977.90791 Prob > F = 0.0000Residual | 4352.46627 96 45.3381903 R-squared = 0.6724---------+------------------------------ Adj R-squared = 0.6622 Total | 13286.19 99 134.203939 Root MSE = 6.7334------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- x1 | .2158539 .083724 2.578 0.011 .0496631 .3820447 x2 | .7559357 .086744 8.715 0.000 .5837503 .9281211 x3 | .3732164 .0591071 6.314 0.000 .2558898 .490543 _cons | 33.23969 .6758811 49.180 0.000 31.89807 34.5813------------------------------------------------------------------------------

We can use the hettest command to test for heteroscedasticity. The test indicates that the regression results are indeed heteroscedastic, so we need to further understand this problem and try to address it.

. hettest
Cook-Weisberg test for heteroscedasticity using fitted values of y H Constant variance chi2(1) = 21.30 Prob > chi2 = 0.0000

Looking at the rvfplot below that shows the residual by fitted (predicted) value, we can clearly see evidence for heteroscedasticity. The variability of the residuals at the left side of the graph is much smaller than the variability of the residuals at the right side of the graph.

. rvfplot

We will try to stabilize the variance by using a square root transformation, and then run the regression again.

. generate sqy = y^.5. regress sqy x1 x2 x3
 Source | SS df MS Number of obs = 100---------+------------------------------ F( 3, 96) = 69.37 Model | 66.0040132 3 22.0013377 Prob > F = 0.0000Residual | 30.4489829 96 .317176905 R-squared = 0.6843---------+------------------------------ Adj R-squared = 0.6744 Total | 96.4529961 99 .974272688 Root MSE = .56318------------------------------------------------------------------------------ sqy | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- x1 | .0170293 .0070027 2.432 0.017 .003129 .0309297 x2 | .0652379 .0072553 8.992 0.000 .0508362 .0796397 x3 | .0328274 .0049438 6.640 0.000 .0230141 .0426407 _cons | 5.682593 .0565313 100.521 0.000 5.570379 5.794807------------------------------------------------------------------------------

Using the hettest again, the chi-square value is somewhat reduced, but the test for heteroscedasticity is still quite significant. The square root transformation was not successful.

. hettest
Cook-Weisberg test for heteroscedasticity using fitted values of sqy H Constant variance chi2(1) = 13.06 Prob > chi2 = 0.0003

Looking at the rvfplot below indeed shows that the results are still heteroscedastic.

. rvfplot

We next try a natural log transformation, and run the regression.

. generate lny = ln(y). regress lny x1 x2 x3
 Source | SS df MS Number of obs = 100---------+------------------------------ F( 3, 96) = 69.85 Model | 8.17710164 3 2.72570055 Prob > F = 0.0000Residual | 3.74606877 96 .03902155 R-squared = 0.6858---------+------------------------------ Adj R-squared = 0.6760 Total | 11.9231704 99 .120436065 Root MSE = .19754------------------------------------------------------------------------------ lny | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- x1 | .0054677 .0024562 2.226 0.028 .0005921 .0103432 x2 | .0230303 .0025448 9.050 0.000 .0179788 .0280817 x3 | .0118223 .001734 6.818 0.000 .0083803 .0152643 _cons | 3.445503 .0198285 173.765 0.000 3.406144 3.484862------------------------------------------------------------------------------

We again try the hettest and the results are much improved, but the test is still significant.

. hettest
Cook-Weisberg test for heteroscedasticity using fitted values of lny H Constant variance chi2(1) = 5.60 Prob > chi2 = 0.0179

Below we see that the rvfplot does not look perfect, but it is much improved.

. rvfplot

Perhaps you might want to try a log (to the base 10) transformation. We show that below.

. generate log10y = log10(y). regress log10y x1 x2 x3
 Source | SS df MS Number of obs = 100---------+------------------------------ F( 3, 96) = 69.85 Model | 1.54229722 3 .514099074 Prob > F = 0.0000Residual | .706552237 96 .007359919 R-squared = 0.6858---------+------------------------------ Adj R-squared = 0.6760 Total | 2.24884946 99 .022715651 Root MSE = .08579------------------------------------------------------------------------------ log10y | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- x1 | .0023746 .0010667 2.226 0.028 .0002571 .004492 x2 | .0100019 .0011052 9.050 0.000 .0078081 .0121957 x3 | .0051344 .0007531 6.818 0.000 .0036395 .0066292 _cons | 1.496363 .0086114 173.765 0.000 1.479269 1.513456------------------------------------------------------------------------------

The results for the hettest are the same as before. Whether we chose a log to the base e or a log to the base 10, the effect in reducing heteroscedasticity (as measured by hettest) was the same.

. hettest
Cook-Weisberg test for heteroscedasticity using fitted values of log10y H Constant variance chi2(1) = 5.60 Prob > chi2 = 0.0179

While these results are not perfect, we will be content for now that this has substantially reduced the heteroscedasticity as compared to the original data.

12
zhangzhi10 发表于 2005-2-15 17:52:00

谢谢啊

13
zark 发表于 2005-2-17 12:11:00
正在找學習手冊...謝謝分享,,推

14
gloryfly 在职认证  发表于 2005-2-21 00:11:00
7版还是8版的?
你们世俗的人都认为大侠是玉树临风的 难道 大侠就不能矮胖吗?

15
helenchow 发表于 2005-3-4 14:41:00
8cuo
我要从南走到北,还要从白走到黑,我要人们都看到我却不知道我是谁.

16
statax 发表于 2005-3-5 16:49:00

楼主好样的

Use it, or lose it!

17
szw1234 发表于 2005-3-7 11:47:00

SUPPORt1!!1

18
qingzhouyangfan 发表于 2010-2-22 20:20:35
好人,好东西!!!!

19
tk 发表于 2010-2-23 00:50:01
dingyixia.

20
guozao 发表于 2010-5-24 19:17:38
谢谢~~正需要中

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-9 10:05