楼主: jingleqq
3120 16

[回归分析求助] 为什么用不同的交互指令做出来的结果有差异? [推广有奖]

11
ywh19860616 发表于 2013-10-16 14:00:19
  1. . clear

  2. . use hh3

  3. . regress logg1018 a2000 labor2 children retire hh_income house houseloan e2002 c7001 savings a2003 age a2012 a2015 a2024 f2021 wo
  4. > rk a3003 workyear worktime f1001 f2001 f3001 cinsurance region a2022rural [pweight=swgt]
  5. (sum of wgt is 1.2329e+08)

  6. Linear regression Number of obs = 791
  7. F( 26, 764) = 7.23
  8. Prob > F = 0.0000
  9. R-squared = 0.2804
  10. Root MSE = 1.0826

  11. ------------------------------------------------------------------------------
  12. | Robust
  13. logg1018 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  14. -------------+----------------------------------------------------------------
  15. a2000 | -.0883657 .0890625 -0.99 0.321 -.2632019 .0864706
  16. labor2 | .0474589 .0907285 0.52 0.601 -.1306478 .2255656
  17. children | .1295626 .1023531 1.27 0.206 -.071364 .3304892
  18. retire | .1748803 .1194552 1.46 0.144 -.059619 .4093797
  19. hh_income | 1.38e-06 6.53e-07 2.11 0.035 9.84e-08 2.66e-06
  20. house | -.4324905 .1565911 -2.76 0.006 -.7398905 -.1250906
  21. houseloan | -.2520211 .1589961 -1.59 0.113 -.5641422 .0601
  22. e2002 | .0779077 .1429656 0.54 0.586 -.2027443 .3585596
  23. c7001 | .5523863 .1269436 4.35 0.000 .3031866 .801586
  24. savings | -.0677573 .0622234 -1.09 0.277 -.1899065 .0543918
  25. a2003 | .165831 .1305863 1.27 0.205 -.0905196 .4221816
  26. age | .003721 .0075394 0.49 0.622 -.0110795 .0185215
  27. a2012 | .1097514 .0450151 2.44 0.015 .0213834 .1981194
  28. a2015 | -.0821773 .0582571 -1.41 0.159 -.1965404 .0321858
  29. a2024 | .0510015 .05941 0.86 0.391 -.0656247 .1676278
  30. f2021 | -.1628977 .0615866 -2.65 0.008 -.2837968 -.0419986
  31. work | .1516389 .1904025 0.80 0.426 -.2221353 .5254131
  32. a3003 | -.1163044 .218655 -0.53 0.595 -.5455404 .3129316
  33. workyear | .016017 .0067794 2.36 0.018 .0027084 .0293255
  34. worktime | -.0000854 .0006073 -0.14 0.888 -.0012775 .0011067
  35. f1001 | .057384 .0754563 0.76 0.447 -.0907424 .2055103
  36. f2001 | -.1228524 .1916797 -0.64 0.522 -.4991339 .253429
  37. f3001 | .2420348 .1267654 1.91 0.057 -.006815 .4908846
  38. cinsurance | .11353 .1491954 0.76 0.447 -.1793515 .4064116
  39. region | .0349508 .0868177 0.40 0.687 -.1354787 .2053803
  40. a2022rural | -.1901757 .1768726 -1.08 0.283 -.5373896 .1570382
  41. _cons | 7.041723 .7055126 9.98 0.000 5.65675 8.426697
  42. ------------------------------------------------------------------------------

  43. . xi: regress logg1018 a2000 labor2 children retire hh_income house houseloan e2002 c7001 savings a2003 age a2012 a2015 a2024 f202
  44. > 1 work a3003 workyear worktime f1001 f2001 f3001 cinsurance region i.rural*a2022 [pweight=swgt]
  45. i.rural _Irural_0-1 (naturally coded; _Irural_0 omitted)
  46. i.rural*a2022 _IrurXa2022_# (coded as above)
  47. (sum of wgt is 1.2329e+08)

  48. Linear regression Number of obs = 791
  49. F( 28, 762) = 6.91
  50. Prob > F = 0.0000
  51. R-squared = 0.2882
  52. Root MSE = 1.0782

  53. -------------------------------------------------------------------------------
  54. | Robust
  55. logg1018 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  56. --------------+----------------------------------------------------------------
  57. a2000 | -.0820923 .0845383 -0.97 0.332 -.2480478 .0838632
  58. labor2 | .036054 .0886534 0.41 0.684 -.1379799 .2100879
  59. children | .1264729 .0993262 1.27 0.203 -.0685125 .3214583
  60. retire | .160383 .1172399 1.37 0.172 -.0697687 .3905346
  61. hh_income | 1.42e-06 6.71e-07 2.12 0.034 1.06e-07 2.74e-06
  62. house | -.3882593 .1641245 -2.37 0.018 -.7104493 -.0660694
  63. houseloan | -.2305242 .1524171 -1.51 0.131 -.5297315 .0686832
  64. e2002 | .0854729 .140618 0.61 0.543 -.1905718 .3615176
  65. c7001 | .5196565 .1277123 4.07 0.000 .2689468 .7703662
  66. savings | -.0660838 .0594367 -1.11 0.267 -.182763 .0505954
  67. a2003 | .1802882 .1279389 1.41 0.159 -.0708664 .4314428
  68. age | .0047518 .0071083 0.67 0.504 -.0092025 .018706
  69. a2012 | .1280514 .0537575 2.38 0.017 .022521 .2335818
  70. a2015 | -.0654256 .0591491 -1.11 0.269 -.1815402 .050689
  71. a2024 | .0365402 .0595249 0.61 0.539 -.0803121 .1533925
  72. f2021 | -.1597722 .0606721 -2.63 0.009 -.2788764 -.0406679
  73. work | .178329 .1886879 0.95 0.345 -.1920808 .5487387
  74. a3003 | -.1254408 .2191348 -0.57 0.567 -.5556205 .3047389
  75. workyear | .0171254 .0067552 2.54 0.011 .0038644 .0303863
  76. worktime | -.000036 .0005968 -0.06 0.952 -.0012077 .0011356
  77. f1001 | .0530592 .0734673 0.72 0.470 -.0911632 .1972816
  78. f2001 | -.1327798 .1901254 -0.70 0.485 -.5060116 .240452
  79. f3001 | .24055 .1261088 1.91 0.057 -.007012 .488112
  80. cinsurance | .0904654 .1478294 0.61 0.541 -.1997359 .3806667
  81. region | .0639742 .0800329 0.80 0.424 -.0931369 .2210854
  82. _Irural_1 | -.4517703 .303656 -1.49 0.137 -1.047872 .1443315
  83. a2022 | .0788755 .1641699 0.48 0.631 -.2434036 .4011545
  84. _IrurXa2022_1 | .1830922 .3539682 0.52 0.605 -.5117764 .8779608
  85. _cons | 6.752954 .7906833 8.54 0.000 5.200778 8.30513
  86. -------------------------------------------------------------------------------
复制代码

上面是你第一个命令和第二个命令的结果,从结果的变量个数你都可以看出,变量个数
都是不想等的,所以结果必然不同。第一个表只有a2022rural ,而第二个表有
_Irural_1,a2022,_IrurXa2022_1。你可以通过看数据文件,里面可以看到_Irural_1
和_IrurXa2022_1的具体数值。

下面两个语句是等价的:
  1. . regress logg1018 a2000 labor2 children retire hh_income house houseloan e2002 c7001 savings a2003 age a2012 a2015 a2024 f2021 work a3003 workyear worktime f1001 f2001 f3001 cinsurance region a2022#rural [pweight=swgt]
  2. (sum of wgt is   1.2329e+08)

  3. Linear regression                                      Number of obs =     791
  4.                                                        F( 28,   762) =    6.91
  5.                                                        Prob > F      =  0.0000
  6.                                                        R-squared     =  0.2882
  7.                                                        Root MSE      =  1.0782

  8. ------------------------------------------------------------------------------
  9.              |               Robust
  10.     logg1018 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
  11. -------------+----------------------------------------------------------------
  12.        a2000 |  -.0820923   .0845383    -0.97   0.332    -.2480478    .0838632
  13.       labor2 |    .036054   .0886534     0.41   0.684    -.1379799    .2100879
  14.     children |   .1264729   .0993262     1.27   0.203    -.0685125    .3214583
  15.       retire |    .160383   .1172399     1.37   0.172    -.0697687    .3905346
  16.    hh_income |   1.42e-06   6.71e-07     2.12   0.034     1.06e-07    2.74e-06
  17.        house |  -.3882593   .1641245    -2.37   0.018    -.7104493   -.0660694
  18.    houseloan |  -.2305242   .1524171    -1.51   0.131    -.5297315    .0686832
  19.        e2002 |   .0854729    .140618     0.61   0.543    -.1905718    .3615176
  20.        c7001 |   .5196565   .1277123     4.07   0.000     .2689468    .7703662
  21.      savings |  -.0660838   .0594367    -1.11   0.267     -.182763    .0505954
  22.        a2003 |   .1802882   .1279389     1.41   0.159    -.0708664    .4314428
  23.          age |   .0047518   .0071083     0.67   0.504    -.0092025     .018706
  24.        a2012 |   .1280514   .0537575     2.38   0.017      .022521    .2335818
  25.        a2015 |  -.0654256   .0591491    -1.11   0.269    -.1815402     .050689
  26.        a2024 |   .0365402   .0595249     0.61   0.539    -.0803121    .1533925
  27.        f2021 |  -.1597722   .0606721    -2.63   0.009    -.2788764   -.0406679
  28.         work |    .178329   .1886879     0.95   0.345    -.1920808    .5487387
  29.        a3003 |  -.1254408   .2191348    -0.57   0.567    -.5556205    .3047389
  30.     workyear |   .0171254   .0067552     2.54   0.011     .0038644    .0303863
  31.     worktime |   -.000036   .0005968    -0.06   0.952    -.0012077    .0011356
  32.        f1001 |   .0530592   .0734673     0.72   0.470    -.0911632    .1972816
  33.        f2001 |  -.1327798   .1901254    -0.70   0.485    -.5060116     .240452
  34.        f3001 |     .24055   .1261088     1.91   0.057     -.007012     .488112
  35.   cinsurance |   .0904654   .1478294     0.61   0.541    -.1997359    .3806667
  36.       region |   .0639742   .0800329     0.80   0.424    -.0931369    .2210854
  37.              |
  38. a2022#rural |
  39.         0 1  |  -.4517703    .303656    -1.49   0.137    -1.047872    .1443315
  40.         1 0  |   .0788755   .1641699     0.48   0.631    -.2434036    .4011545
  41.         1 1  |  -.1898026   .2035583    -0.93   0.351    -.5894043    .2097991
  42.              |
  43.        _cons |   6.752954   .7906833     8.54   0.000     5.200778     8.30513
  44. ------------------------------------------------------------------------------

  45. . regress logg1018 a2000 labor2 children retire hh_income house houseloan e2002 c7001 savings a2003 age a2012 a2015 a2024 f2021 work a3003 workyear worktime f1001 f2001 f3001 cinsurance region i.a2022#i.rural [pweight=swgt]
  46. (sum of wgt is   1.2329e+08)

  47. Linear regression                                      Number of obs =     791
  48.                                                        F( 28,   762) =    6.91
  49.                                                        Prob > F      =  0.0000
  50.                                                        R-squared     =  0.2882
  51.                                                        Root MSE      =  1.0782

  52. ------------------------------------------------------------------------------
  53.              |               Robust
  54.     logg1018 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
  55. -------------+----------------------------------------------------------------
  56.        a2000 |  -.0820923   .0845383    -0.97   0.332    -.2480478    .0838632
  57.       labor2 |    .036054   .0886534     0.41   0.684    -.1379799    .2100879
  58.     children |   .1264729   .0993262     1.27   0.203    -.0685125    .3214583
  59.       retire |    .160383   .1172399     1.37   0.172    -.0697687    .3905346
  60.    hh_income |   1.42e-06   6.71e-07     2.12   0.034     1.06e-07    2.74e-06
  61.        house |  -.3882593   .1641245    -2.37   0.018    -.7104493   -.0660694
  62.    houseloan |  -.2305242   .1524171    -1.51   0.131    -.5297315    .0686832
  63.        e2002 |   .0854729    .140618     0.61   0.543    -.1905718    .3615176
  64.        c7001 |   .5196565   .1277123     4.07   0.000     .2689468    .7703662
  65.      savings |  -.0660838   .0594367    -1.11   0.267     -.182763    .0505954
  66.        a2003 |   .1802882   .1279389     1.41   0.159    -.0708664    .4314428
  67.          age |   .0047518   .0071083     0.67   0.504    -.0092025     .018706
  68.        a2012 |   .1280514   .0537575     2.38   0.017      .022521    .2335818
  69.        a2015 |  -.0654256   .0591491    -1.11   0.269    -.1815402     .050689
  70.        a2024 |   .0365402   .0595249     0.61   0.539    -.0803121    .1533925
  71.        f2021 |  -.1597722   .0606721    -2.63   0.009    -.2788764   -.0406679
  72.         work |    .178329   .1886879     0.95   0.345    -.1920808    .5487387
  73.        a3003 |  -.1254408   .2191348    -0.57   0.567    -.5556205    .3047389
  74.     workyear |   .0171254   .0067552     2.54   0.011     .0038644    .0303863
  75.     worktime |   -.000036   .0005968    -0.06   0.952    -.0012077    .0011356
  76.        f1001 |   .0530592   .0734673     0.72   0.470    -.0911632    .1972816
  77.        f2001 |  -.1327798   .1901254    -0.70   0.485    -.5060116     .240452
  78.        f3001 |     .24055   .1261088     1.91   0.057     -.007012     .488112
  79.   cinsurance |   .0904654   .1478294     0.61   0.541    -.1997359    .3806667
  80.       region |   .0639742   .0800329     0.80   0.424    -.0931369    .2210854
  81.              |
  82. a2022#rural |
  83.         0 1  |  -.4517703    .303656    -1.49   0.137    -1.047872    .1443315
  84.         1 0  |   .0788755   .1641699     0.48   0.631    -.2434036    .4011545
  85.         1 1  |  -.1898026   .2035583    -0.93   0.351    -.5894043    .2097991
  86.              |
  87.        _cons |   6.752954   .7906833     8.54   0.000     5.200778     8.30513
  88. ------------------------------------------------------------------------------
复制代码
一份耕耘,一份收获。

12
ywh19860616 发表于 2013-10-16 14:05:31
  1. Title

  2.     [U] 11.4.3 Factor variables


  3. Description

  4.     Factor variables are extensions of varlists of existing variables.  When a command allows factor variables, in addition to typing variable names from your data, you can
  5.     type factor variables, which might look like

  6.         i.varname

  7.         i.varname#i.varname

  8.         i.varname#i.varname#i.varname

  9.         i.varname##i.varname

  10.         i.varname##i.varname##i.varname

  11.     Factor variables create indicator variables from categorical variables, interactions of indicators of categorical variables, interactions of categorical and continuous
  12.     variables, and interactions of continuous variables (polynomials).  They are allowed with most estimation and postestimation commands, along with a few other commands.

  13.     There are four factor-variable operators:

  14.          Operator  Description
  15.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------
  16.          i.        unary operator to specify indicators
  17.          c.        unary operator to treat as continuous
  18.          #         binary operator to specify interactions
  19.          ##        binary operator to specify factorial interactions
  20.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------

  21.     The indicators and interactions created by factor-variable operators are referred to as virtual variables.  They act like variables in varlists but do not exist in the
  22.     dataset.

  23.     Categorical variables to which factor-variable operators are applied must contain nonnegative integers with values in the range 0 to 32,740, inclusive.

  24.     Factor variables may be combined with the L. and F. time-series operators.


  25. Remarks

  26.     Remarks are presented under the following headings:

  27.         Basic examples
  28.         Base levels
  29.         Selecting levels
  30.         Applying operators to a group of variables


  31. Basic examples

  32.     Here are some examples of use of the operators:

  33.          Factor            
  34.          specification     Result
  35.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------
  36.          i.group           indicators for levels of group

  37.          i.group#i.sex     indicators for each combination of levels of group and sex, a two-way interaction

  38.          group#sex         same as i.group#i.sex

  39.          group#sex#arm     indicators for each combination of levels of group, sex, and arm, a three-way interaction

  40.          group##sex        same as i.group i.sex group#sex

  41.          group##sex##arm   same as i.group i.sex i.arm group#sex group#arm sex#arm group#sex#arm

  42.          sex#c.age         two variables -- age for males and 0 elsewhere, and age for females and 0 elsewhere; if age is also in the model, one of the two virtual variables
  43.                              will be treated as a base

  44.          sex##c.age        same as i.sex age sex#c.age

  45.          c.age             same as age

  46.          c.age#c.age       age squared

  47.          c.age#c.age#c.age age cubed
  48.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------


  49. Base levels

  50.     You can specify the base level of a factor variable by using the ib. operator.  The syntax is

  51.            Base         
  52.            operator(*)    Description
  53.            ------------------------------------------------------------------------------------------------------------------------------------------------------------------
  54.            ib#.           use # as base, #=value of variable
  55.            ib(##).        use the #th ordered value as base (**)
  56.            ib(first).     use smallest value as base (the default)
  57.            ib(last).      use largest value as base
  58.            ib(freq).      use most frequent value as base
  59.            ibn.           no base level
  60.            ------------------------------------------------------------------------------------------------------------------------------------------------------------------
  61.             (*) The i may be omitted.  For instance, you may type ib2.group or b2.group.
  62.            (**) For example, ib(#2). means to use the second value as the base.

  63.     If you want to use group==3 as the base in a regression, you can type,

  64.         . regress y  i.sex ib3.group

  65.     You can also permanently set the base levels of categorical variables by using the fvset command.


  66. Selecting levels

  67.     You can select a range of levels -- a range of virtual variables -- by using the i(numlist). operator.

  68.          Examples          Description
  69.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------
  70.          i2.cat            a single indicator for cat==2

  71.          2.cat             same as i2.cat

  72.          i(2 3 4).cat      three indicators, cat==2, cat==3, and cat==4;
  73.                              same as i2.cat i3.cat i4.cat

  74.          i(2/4).cat        same as i(2 3 4).cat

  75.          2.cat#1.sex       a single indicator that is 1 when cat==2 and sex==1, and is 0 otherwise

  76.          i2.cat#i1.sex     same as 2.cat#1.sex
  77.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------


  78. Applying operators to a group of variables

  79.     Factor-variable operators may be applied to groups of variables by using parentheses.

  80.     In the examples that follow, variables group, sex, arm, and cat are categorical, and variables age, wt, and bp are continuous:

  81.          Examples                  Expansion
  82.          --------------------------------------------------------------------------------------------------------------------------------------------------------------------
  83.          i.(group sex arm)         i.group i.sex i.arm

  84.          group#(sex arm cat)       group#sex group#arm group#cat

  85.          group##(sex arm cat)      i.group i.sex i.arm i.cat group#sex group#arm group#cat

  86.          group#(c.age c.wt c.bp)   i.group group#c.age group#c.wt group#c.bp

  87.          group#c.(age wt bp)       same as group#(c.age c.wt c.bp)
复制代码
具体用法你可以详见
一份耕耘,一份收获。

13
jingleqq 发表于 2013-10-16 14:19:38
终于明白了!太感谢了!!那再请问一般那些发表的文章中,哪一种指令更为常用?

14
ywh19860616 发表于 2013-10-16 14:24:16
jingleqq 发表于 2013-10-16 14:19
终于明白了!太感谢了!!那再请问一般那些发表的文章中,哪一种指令更为常用?
语句没有关系,你能实现自己目的就行。
比如在生成虚拟变量时,你有两种方法:
1、直接在命令前面加xi
xi:reg y x i.year
2、可以先利用tab命令生成虚拟变量,然后加入
tab year,gen(yeardum)
reg y x yeardum*
一份耕耘,一份收获。

15
jingleqq 发表于 2013-10-16 14:43:31
恩,我一直使用的是xi的方法。

我还有点困惑:在发表的文章当中,我看见凡是有交互项的回归分析表,其交互项的相关数据coef.、标准差或t值)都只有一行,难道他们都是采用诸如“a2022rural”这一种指令吗?假如用xi:i.rural*a2022或 rural#a2022这两种指令,相关数据会有三行(如下所示),那么文章中的回归分析表是否该全部列出这几行数据,还是只需要列出最后一行的数据即可?

1.        a2022#rural |
2.                0 1  |  -.4517703    .303656    -1.49   0.137    -1.047872    .1443315
3.                1 0  |   .0788755   .1641699     0.48   0.631    -.2434036    .4011545
4.                1 1  |  -.1898026   .2035583    -0.93   0.351    -.5894043    .2097991

1.        _Irural_1 | -.4517703 .303656 -1.49 0.137 -1.047872 .1443315
2.        a2022 | .0788755 .1641699 0.48 0.631 -.2434036 .4011545
3.        _IrurXa2022_1 | .1830922 .3539682 0.52 0.605 -.5117764 .8779608

16
ywh19860616 发表于 2013-10-16 14:46:35
jingleqq 发表于 2013-10-16 14:43
恩,我一直使用的是xi的方法。

我还有点困惑:在发表的文章当中,我看见凡是有交互项的回归分析表,其交 ...
楼主,这要看你自己的研究设计。
实际中你可以见到的更多是两个连续变量的交叉项。
如果文章中就是设计包含3项的,那结果都要给出的,不能只给出一项。
一份耕耘,一份收获。

17
jingleqq 发表于 2013-10-16 15:02:37
好的,我明白了!您真是非常专业和耐心

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-27 06:13