楼主: deem
40000 14

[stata资源分享] 转一篇PSM + DID详细分析   [推广有奖]

  • 2关注
  • 43粉丝

学科带头人

51%

还不是VIP/贵宾

-

威望
0
论坛币
7022 个
通用积分
1027.0870
学术水平
215 点
热心指数
237 点
信用等级
205 点
经验
71652 点
帖子
979
精华
0
在线时间
2998 小时
注册时间
2009-7-30
最后登录
2024-4-9

楼主
deem 学生认证  发表于 2017-8-31 15:35:56 |只看作者 |坛友微信交流群|倒序 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
看到论坛上很多朋友在做PSM+DID时候有问题,特此转一篇文章,是我看到考虑的比较仔细的,可以直接使用。需要提醒大家,如果随机分组,一般不需要做匹配了,直接DID即可,现在top journal一般是这一类。如果分组不是随机,需要用下面的方法。目前比较好的方式是用stata teffects命令,这个命令有个缺点,是无法实现不放回匹配。在做不放回匹配时候,我一般用SAS来实现匹配的过程,然后用stata teffects。


Propensity Score Matching in Stata using teffectshttps://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm
For many years, the standard tool for propensity score matching in Stata has been the psmatch2 command, written by Edwin Leuven and Barbara Sianesi. However, Stata 13 introduced a new teffects command for estimating treatments effects in a variety of ways, including propensity score matching. The teffects psmatch command has one very important advantage over psmatch2: it takes into account the fact that propensity scores are estimated rather than known when calculating standard errors. This often turns out to make a significant difference, and sometimes in surprising ways. We thus strongly recommend switching from psmatch2 to teffects psmatch, and this article will help you make the transition.

An Example of Propensity Score Matching

Run the following command in Stata to load an example data set:

  1. use http://ssc.wisc.edu/sscc/pubs/files/psm
复制代码

It consists of four variables: a treatment indicator t, covariates x1 and x2, and an outcome y. This is constructed data, and the effect of the treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1 and x2, and both x1 and x2 are positively correlated with y. Thus simply comparing the mean value of y for the treated and untreated groups badly overestimates the effect of treatment:

  1. ttest y, by(t)
复制代码


(Regressing y on t, x1, and x2 will give you a pretty good picture of the situation.)

The psmatch2 command will give you a much better estimate of the treatment effect:

  1. psmatch2 t x1 x2, out(y)
复制代码


----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
               y  Unmatched |  1.8910736  -.423243358   2.31431696   .109094342    21.21
                        ATT |  1.8910736   .871388246   1.01968536   .173034999     5.89
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
The teffects Command

You can carry out the same estimation with teffects. The basic syntax of the teffects command when used for propensity score matching is:

teffects psmatch (outcome) (treatment covariates)

In this case the basic command would be:

  1. teffects psmatch (y) (t x1 x2)
复制代码


However, the default behavior of teffects is not the same as psmatch2 so we'll need to use some options to get the same results. First, psmatch2 by default reports the average treatment effect on the treated (which it refers to as ATT). The teffects command by default reports the average treatment effect (ATE) but will calculate the average treatment effect on the treated (which it refers to as ATET) if given the atet option. Second, psmatch2 by default uses a probit model for the probability of treatment. The teffects command uses a logit model by default, but will use probit if the probit option is applied to the treatment equation. So to run the same model using teffects type:

  1. teffects psmatch (y) (t x1 x2, probit), atet
复制代码


Treatment-effects estimation                    Number of obs      =      1000
Estimator      : propensity-score matching      Matches: requested =         1
Outcome model  : matching                                      min =         1
Treatment model: probit                                        max =         1
------------------------------------------------------------------------------
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET         |
           t |
   (1 vs 0)  |   1.019685   .1227801     8.30   0.000     .7790407     1.26033
------------------------------------------------------------------------------
The average treatment effect on the treated is identical, other than being rounded at a different place. But note that teffects reports a very different standard error (we'll discuss why that is shortly), plus a Z-statistic, p-value, and 95% confidence interval rather than just a T-statistic.

Running teffects with the default options gives the following:

  1. teffects psmatch (y) (t x1 x2)
复制代码


Treatment-effects estimation                    Number of obs      =      1000
Estimator      : propensity-score matching      Matches: requested =         1
Outcome model  : matching                                      min =         1
Treatment model: logit                                         max =         1
------------------------------------------------------------------------------
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           t |
   (1 vs 0)  |   1.019367   .1164694     8.75   0.000     .7910912    1.247643
------------------------------------------------------------------------------
This is equivalent to:

  1. psmatch2 t x1 x2, out(y) logit ate
复制代码


----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
               y  Unmatched |  1.8910736  -.423243358   2.31431696   .109094342    21.21
                        ATT |  1.8910736   .930722886   .960350715   .168252917     5.71
                        ATU |-.423243358   .625587554   1.04883091            .        .
                        ATE |                           1.01936701            .        .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that psmatch2 is reporting a somewhat different ATT in this model. The teffects command reports the same ATET if asked:

  1. teffects psmatch (y) (t x1 x2), atet
复制代码


Treatment-effects estimation                    Number of obs      =      1000
Estimator      : propensity-score matching      Matches: requested =         1
Outcome model  : matching                                      min =         1
Treatment model: logit                                         max =         1
------------------------------------------------------------------------------
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET         |
           t |
   (1 vs 0)  |   .9603507   .1204748     7.97   0.000     .7242245    1.196477
------------------------------------------------------------------------------
Standard Errors

The output of psmatch2 includes the following caveat:

Note: S.E. does not take into account that the propensity score is estimated.

A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatch relies on their work. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on estimated propensity scores turns out to be more efficient than matching based on true propensity scores. However, for ATET the adjustment can be positive or negative, so the standard errors reported by psmatch2 may be too large or to small.

Handling Ties

Thus far we've used psmatch2 and teffects psmatch to do simple nearest-neighbor matching with one neighbor (and no caliper). However, this raises the question of what to do when two observations have the same propensity score and are thus tied for "nearest neighbor." Ties are common if the covariates in the treatment model are categorical or even integers.

The psmatch2 command by default matches with one of the tied observations, but with the ties option it matches with all tied observations. The teffects psmatch command always matches with all ties. If your data set has multiple observations with the same propensity score, you won't get exactly the same results from teffects psmatch as you were getting from psmatch2 unless you go back and add the ties option to your psmatch2 commands. (At this time we are not aware of any clear guidance as to whether it is better to match with ties or not.)

Matching With Multiple Neighbors

By default teffects psmatch matches each observation with one other observation. You can change this with the nneighbor() (or just nn()) option. For example, you could match each observation with its three nearest neighbors with:

  1. teffects psmatch (y) (t x1 x2), nn(3)
复制代码


Postestimation

By default teffects psmatch does not add any new variables to the data set. However, there are a variety of useful variables that can be created with options and post-estimation predict commands. The following table lists the 1st and 467th observations of the example data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.


      +-------------------------------------------------------------------------------------------------------+
      |        x1          x2   t          y   match1        ps0        ps1          y0         y1         te |
      |-------------------------------------------------------------------------------------------------------|
   1. |  .0152526   -1.793022   0   -1.79457      467   .9081651   .0918349    -1.79457   2.231719   4.026289 |
467. | -2.057838    .5360286   1   2.231719      781    .907606    .092394   -.6012772   2.231719   2.832996 |
      +-------------------------------------------------------------------------------------------------------+
Start with a clean slate by typing:

  1. use http://ssc.wisc.edu/sscc/pubs/files/psm, replace
复制代码

The gen() option tells teffects psmatch to create a new variable (or variables). For each observation, this new variable will contain the number of the observation that observation was matched with. If there are ties or you told teffects psmatch to use multiple neighbors, then gen() will need to create multiple variables. Thus you supply the stem of the variable name, and teffects psmatch will add suffixes as needed.

  1. teffects psmatch (y) (t x1 x2), gen(match)
复制代码


In this case each observation is only matched with one other, so gen(match) only creates match1. Referring to the example output, the match of observation 1 is observation 467 (which is why those two are listed).

Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If necessary, run:

  1. gen ob=_n
复制代码


and then:

  1. sort ob
复制代码


to restore the current sort order.

The predict command with the ps option creates two variables containing the propensity scores, or that observation's predicted probability of being in either the control group or the treated group:

  1. predict ps0 ps1, ps
复制代码


Here ps0 is the predicted probability of being in the control group (t=0) and ps1 is the predicted probability of being in the treated group (t=1). Observations 1 and 467 were matched because their propensity scores are very similar.

The po option creates variables containing the potential outcomes for each observation:

  1. predict y0 y1, po
复制代码


Because observation 1 is in the control group, y0 contains its observed value of y. y1 is the observed value of y for observation 1's match, observation 467. The propensity score matching estimator assumes that if observation 1 had been in the treated group its value of y would have been that of the observation in the treated group most similar to it (where "similarity" is measured by the difference in their propensity scores).

Observation 467 is in the treated group, so its value for y1 is its observed value of y while its value for y0 is the observed value of y for its match, observation 781.

Running the predict command with no options gives the treatment effect itself:

  1. predict te
复制代码


The treatment effect is simply the difference between y1 and y0. You could calculate the ATE yourself (but emphatically not its standard error) with:

  1. sum te
复制代码


and the ATET with:

  1. sum te if t
复制代码




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:DID PSM observations observation Probability

已有 2 人评分经验 学术水平 热心指数 收起 理由
wongjeryu + 1 + 1 精彩帖子
np84 + 100 精彩帖子

总评分: 经验 + 100  学术水平 + 1  热心指数 + 1   查看全部评分

本帖被以下文库推荐

沙发
deem 学生认证  发表于 2017-8-31 15:36:34 |只看作者 |坛友微信交流群
Regression on the "Matched Sample"

Another way to conceptualize propensity score matching is to think of it as choosing a sample from the control group that "matches" the treatment group. Any differences between the treatment and matched control groups are then assumed to be a result of the treatment. Note that this gives the average treatment effect on the treated—to calculate the ATE you'd create a sample of the treated group that matches the controls. Mathematically this is all equivalent to using matching to estimate what an observation's outcome would have been if it had been in the other group, as described above.

Sometimes researchers then want to run regressions on the "matched sample," defined as the observations in the treated group plus the observations in the control group which were matched to them. The problem with this approach is that the matched sample is based on propensity scores which are estimated, not known. Thus the matching scheme is an estimate as well. Running regressions after matching is essentially a two stage regression model, and the standard errors from the second stage must take the first stage into account, something standard regression commands do not do. This is an area of ongoing research.

We will discuss how to run regressions on a matched sample because it remains a popular technique, but we cannot recommend it.

psmatch2 makes it easy by creating a _weight variable automatically. For observations in the treated group, _weight is 1. For observations in the control group it is the number of observations from the treated group for which the observation is a match. If the observation is not a match, _weight is missing. _weight thus acts as a frequency weight (fweight) and can be used with Stata's standard weighting syntax. For example (starting with a clean slate again):

  1. use http://ssc.wisc.edu/sscc/pubs/files/psm, replacepsmatch2 t x1 x2, out(y) logitreg y x1 x2 t [fweight=_weight]
复制代码

Observations with a missing value for _weight are omitted from the regression, so it is automatically limited to the matched sample. Again, keep in mind that the standard errors given by the reg command are incorrect because they do not take into account the matching stage.

teffects psmatch does not create a _weight variable, but it is possible to create one based on the match1 variable. Here is example code, with comments:

  1. gen ob=_n //store the observation numbers for future usesave fulldata,replace // save the complete data set
  2. keep if t // keep just the treated groupkeep match1 // keep just the match1 variable (the observation numbers of their matches)bysort match1: gen weight=_N // count how many times each control observation is a matchby match1: keep if _n==1 // keep just one row per control observationren match1 ob //rename for merging purposes
复制代码

  1. merge 1:m ob using fulldata // merge back into the full datareplace weight=1 if t // set weight to 1 for treated observations
复制代码


The resulting weight variable will be identical to the _weight variable created by psmatch2, as can be verified with:

  1. assert weight==_weight
复制代码

It is used in the same way and will give exactly the same results:

reg y x1 x2 t [fweight=weight]

Obviously this is a good bit more work than using psmatch2. If your propensity score matching model can be done using both teffects psmatch and psmatch2, you may want to run teffects psmatch to get the correct standard error and then psmatch2 if you need a _weight variable.

This regression has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different observations from the control group. About 1/3 of them are the matches for more than one observation from the treated group and are thus duplicated in the regression (run tab weight if !t for details). Researchers sometimes use the norepl (no replacement) option in psmatch2 to ensure each observation is used just once, even though this generally makes the matching worse. To the best of our knowledge there is no equivalent with teffects psmatch.

The results of this regression leave somewhat to be desired:

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |    1.11891   .0440323    25.41   0.000      1.03245    1.205369
          x2 |    1.05594   .0417253    25.31   0.000       .97401     1.13787
           t |   .9563751   .0802273    11.92   0.000     .7988445    1.113906
       _cons |   .0180986   .0632538     0.29   0.775    -.1061036    .1423008
------------------------------------------------------------------------------
By construction all the coefficients should be 1. Regression using all the observations (reg y x1 x2 t rather than reg y x1 x2 t [fweight=weight]) does better in this case:

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.031167   .0346941    29.72   0.000     .9630853    1.099249
          x2 |   .9927759   .0333297    29.79   0.000     .9273715     1.05818
           t |   .9791484   .0769067    12.73   0.000     .8282306    1.130066
       _cons |   .0591595   .0416008     1.42   0.155    -.0224758    .1407948
------------------------------------------------------------------------------
Other Methods of Estimating Treatment Effects

While propensity score matching is the most common method of estimating treatment effects at the SSCC, teffects also implements Regression Adjustment (teffects ra), Inverse Probability Weighting (teffects ipw), Augmented Inverse Probability Weighting (teffects aipw), Inverse Probability Weighted Regression Adjustment (teffects ipwra), and Nearest Neighbor Matching (teffects nnmatch). The syntax is similar, though it varies whether you need to specify variables for the outcome model, the treatment model, or both:

teffects ra (y x1 x2) (t)
teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)

Complete Example Code

The following is the complete code for the examples in this article.

  1. clear all
  2. use http://www.ssc.wisc.edu/sscc/pubs/files/psm

  3. ttest y, by(t)
  4. reg y x1 x2 t

  5. psmatch2 t x1 x2, out(y)
  6. teffects psmatch (y) (t x1 x2, probit), atet

  7. teffects psmatch (y) (t x1 x2)
  8. psmatch2 t x1 x2, out(y) logit ate
  9. teffects psmatch (y) (t x1 x2), atet

  10. use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace

  11. teffects psmatch (y) (t x1 x2), gen(match)

  12. predict ps0 ps1, ps
  13. predict y0 y1, po
  14. predict te
  15. l if _n==1 | _n==467

  16. use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace

  17. psmatch2 t x1 x2, out(y) logit
  18. reg y x1 x2 t [fweight=_weight]

  19. gen ob=_n
  20. save fulldata,replace

  21. teffects psmatch (y) (t x1 x2), gen(match)
  22. keep if t
  23. keep match1
  24. bysort match1: gen weight=_N
  25. by match1: keep if _n==1
  26. ren match1 ob

  27. merge 1:m ob using fulldata
  28. replace weight=1 if t

  29. assert weight==_weight

  30. reg y x1 x2 t [fweight=weight]
  31. reg y x1 x2 t

  32. teffects ra (y x1 x2) (t)
  33. teffects ipw (y) (t x1 x2)
  34. teffects aipw (y x1 x2) (t x1 x2)
  35. teffects ipwra (y x1 x2) (t x1 x2)
  36. teffects nnmatch (y x1 x2) (t)
复制代码


使用道具

藤椅
auirzxp 学生认证  发表于 2017-8-31 16:58:37 |只看作者 |坛友微信交流群
deem 发表于 2017-8-31 15:36
Regression on the "Matched Sample"

Another way to conceptualize propensity score matching is to t ...

使用道具

板凳
rosebaby6688 在职认证  发表于 2017-9-20 15:33:44 |只看作者 |坛友微信交流群

使用道具

报纸
梦离潇湘 发表于 2017-10-5 17:02:46 |只看作者 |坛友微信交流群
真是赞~~~

使用道具

地板
caichaoying211 发表于 2017-10-6 15:10:49 |只看作者 |坛友微信交流群
真是棒棒哒

使用道具

7
养养眼1998 发表于 2019-3-14 21:25:58 |只看作者 |坛友微信交流群
楼主好人感谢

使用道具

8
和如歌也 发表于 2019-3-20 19:16:42 |只看作者 |坛友微信交流群
随机分组是什么意思呢?

使用道具

9
芝华塔内欧 发表于 2019-3-21 09:33:43 |只看作者 |坛友微信交流群
还是没太看懂。。。。这些命令哪些体现出did了呢

使用道具

10
guyiyin2003 发表于 2019-3-25 21:26:01 |只看作者 |坛友微信交流群
谢谢楼主的分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-28 17:33