楼主: 阿袋
5321 4

[程序分享] stata最新命定---RD [推广有奖]

贵宾

院士

16%

还不是VIP/贵宾

-

TA的文库  其他...

各科好书新书

投资人生

论文写作投稿实战

威望
0
论坛币
568506 个
通用积分
150.2418
学术水平
304 点
热心指数
347 点
信用等级
246 点
经验
88776 点
帖子
1683
精华
5
在线时间
2895 小时
注册时间
2007-6-10
最后登录
2024-3-24

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Regression discontinuity (RD) estimator

Syntax

      rd [varlist] [if] [in] [weight] [, options]

      where varlist has the form outcomevar [treatmentvar] assignmentvar

        +---------+
    ----+ Weights +------------------------------------------------------------

    aweights, fweights, and pweights are allowed; see help weights.  Under
    Stata versions 9.2 or before (using locpoly to construct local regression
    estimates) aweights and pweights will be converted to fweights
    automatically and the data expanded. If this would exceed system memory
    limits, error r(901) will be issued; in this case, the user is advised to
    round weights.  In any case, the validity of bootstrapped standard errors
    will depend on the expanded data correctly representing sampling
    variability, which may require rounding or replacing weight variables.
    Under Stata versions 10 or later (using lpoly to construct local regression
    estimates), all weights will be treated as aweights.

      bs [, options]: rd varlist [if] [in] [weight] [, options]

        +----------------------------+
    ----+  Table of Further Contents +-----------------------------------------

  General description of estimator
  Examples
  Detailed syntax
  Description of options
  Remarks and saved results
  References
  Acknowledgements
  Citation of rd
  Author information

        +-------------+
    ----+ Description +--------------------------------------------------------

rd implements a set of regression-discontinuity estimation methods that are
thought to have very good internal validity, for estimating the causal effect of
some explanatory variable (called the treatment variable) for a particular
subpopulation, under some often plausible assumptions.  In this sense, it is much
like an experimental design, except that levels of the treatment variable are not
assigned randomly by the researcher.  Instead, there is a jump in the conditional
mean of the treatment variable at a known cutoff in another variable, called the
assignment variable, which is perfectly observed, and this allows us to estimate
the effect of treatment as if it were randomly assigned in the neighborhood of
the known cutoff.

rd is an alternative to various regression techniques that purport to allow
causal inference (e.g. panel methods such as xtreg), instrumental variables (IV)
and other IV-type methods (see the ivreg2 help file and references therein), and
matching estimators (see the psmatch2 and nnmatch help files and references
therein).  The rd approach is in fact an IV model with one exogenous variable
excluded from the regression (excluded instrument), an indicator for the
assignment variable above the cutoff, and one endogenous regressor (the treatment
variable).

rd estimates local linear or kernel regression models on both sides of the
cutoff, using a triangle kernel.  Estimates are sensitive to the choice of
bandwidth, so by default several estimates are constructed using different
bandwidths. In practice, rd uses kernel-weighted suest (or ivreg if suest fails)
to estimate the local linear regressions and reports analytic SE based on the
regressions.

Further discussion of rd appears in Nichols (2007).

        +----------+
    ----+ Examples +-----------------------------------------------------------

In the simplest case, assignment to treatment depends on a variable Z being above
a cutoff Z0.  Frequently, Z is defined so that Z0=0. In this case, treatment is 1
for Z>=0 and 0 for Z<0, and we estimate local linear regressions on both sides of
the cutoff to obtain estimates of the outcome at Z=0.  The difference between the
two estimates (for the samples where Z>=0 and where Z<0) is the estimated effect
of treatment.

For example, having a Democratic representative in the US Congress may be
considered a treatment applied to a Congressional district, and the assignment
variable Z is the vote share garnered by the Democratic candidate.  At Z=50%, the
probability of treatment=1 jumps from zero to one. Suppose we are interested in
the effect a Democratic representative has on the federal spending within a
Congressional district.  rd estimates local linear regressions on both sides of
the cutoff like so:

         ssc inst rd, replace
         net get rd
         use votex
         rd lne d, gr mbw(100)
         rd lne d, gr mbw(100) line(`"xla(-.2 "Repub" 0 .3 "Democ", noticks)"')
         rd lne d, gr ddens
         rd lne d, mbw(25(25)300) bdep ox
         rd lne d, x(pop-vet)
         rd lne d, mbw(100) bin binvar(bins) scopt(mcol(black))

In a fuzzy RD design, the conditional mean of treatment jumps at the cutoff, and
that jump forms the denominator of a Local Wald Estimator. The numerator is the
jump in the outcome, and both are reported along with their ratio. The sharp RD
design is a special case of the fuzzy RD design, since the denominator in the
sharp case is just one.

         g byte ranwin=cond(uniform()<.1,1-win,win)
         rd lne ranwin d, mbw(100)

The default bandwidth from Imbens and Kalyanaraman (2009) is designed to minimize
MSE, or squared bias plus variance, in a sharp RD design. Note that a smaller
bandwidth tends to produce lower bias and higher variance. The optimal bandwidth
will tend to be larger for a fuzzy design due to the additional variance arising
from the estimation of the jump in the conditional mean of treatment.
Unfortunately, a larger bandwidth also leads to additional bias, which will be
greater if the curvature of the response function is greater (meaning that a
linear regression over a larger range is a poorer approximation).  The increase
in squared bias due to dividing by the estimated jump in the conditional mean of
treatment (using observations away from the discontinuity) can easily dominate
the increase in variance and lead to the optimal bandwidth in a fuzzy design to
be smaller than in the sharp design.  No clear guidance is offered; conducting
simulations using plausible generating functions for your specific application
are highly recommended.  The rd option bdep facilitates visualizing the
dependence of the estimate on bandwidth.

There are also a varitey of alternative implementations on
{browse:https://sites.google.com/a/umich.edu/cattaneo/software}{the website of
Matias Cattaneo}.

         rd lne ranwin d, mbw(25(25)300) bdep ox

        +-----------------------------+
    ----+ Detailed Syntax and Options +----------------------------------------

There should be two or three variables specified after the rd command; if two are
specified, a sharp RD design is assumed, where the treatment variable jumps from
zero to one at the cutoff.  If no variables are specified after the rd command,
the estimates table is displayed.

      rd outcomevar [treatmentvar] assignmentvar [if] [in] [weight] [, options]


        +-----------------+
    ----+ Options summary +----------------------------------------------------

mbw(numlist) specifies a list of multiples for bandwidths, in percentage terms.
    The default is "100 50 200" (i.e. half and twice the requested bandwidth) and
    100 is always included in the list, regardless of whether it is specified.

z0(real) specifies the cutoff Z0 in assignmentvar Z.

strineq specifies that mean treatment differs at Z0 from all Z>Z0 (e.g. treatment
    is 1 for Z>0 and 0 for Z<=0); the default assumption is that mean treatment
    differs at Z0 from all Z<Z0 (e.g. treatment is 1 for Z>=0 and 0 for Z<0).

x(varlist) requests estimates of jumps in control variables varlist.

ddens requests a computation of a discontinuity in the density of Z.  This is
    computed in a relatively ad hoc way, and should be redone using McCrary's
    test described at http://www.econ.berkeley.edu/~jmccrary/DCdensity/.

s(stubname) requests that estimates be saved as new variables beginning with
    stubname.

graph requests that local linear regression graphs for each bandwidth be
    produced.

noscatter suppresses the scatterplot on those graphs.

cluster(varlist) requests standard errors robust to clustering on distinct
    combinations of varlist (e.g. stratum psu).

scopt(string) supplies an option list to the scatter plot.

lineopt(string) supplies an option list to the overlaid line plots.

n(real) specifies the number of points at which to calculate local linear
    regressions.  The default is to calculate the regressions at 50 points above
    the cutoff, with equal steps in the grid, and to use equal steps below the
    cutoff, with the number of points determined by the step size.

bwidth(real) allows specification of a bandwidth for local linear regressions.
    The default is to use the estimated optimal bandwidth for a "sharp" design as
    given by Imbens and Kalyanaraman (2009).  The optimal bandwidth minimizes
    MSE, or squared bias plus variance, where a smaller bandwidth tends to
    produce lower bias and higher variance. Note that the optimal bandwidth will
    often tend to be larger for a fuzzy design, due to the additional variance
    that arises from the estimation of the jump in the conditional mean of
    treatment.

bdep requests a graph of estimates versus bendwidths.

bingraph requests a graph of binned means instead of a scatterplot, in bins
    defined by binvar.

binvar(varname) specifies the variable across which binned means should be
    calculated.

oxline adds a vertical line at the default bandwidth.

kernel(rectangle) requests the use of a rectangle (uniform) kernel. The default
    is a triangle (edge) kernel.

covar(varlist) adds covariates to Local Wald Estimation, which is generally a
    Very Bad Idea.  It is possible that covariates could reduce residual variance
    and improve efficiency, but estimation error in their coefficients could also
    reduce efficiency, and any violations of the assumptions that such covariates
    are exogenous and have a linear impact on mean treatment and outcomes could
    greatly increase bias.


        +---------------------------+
    ----+ Remarks and saved results +------------------------------------------


To facilitate bootstrapping, rd saves the following results in e():

Scalars
   e(N)          Number of observations used in estimation
   e(w)          Bandwidth in base model; other bandwidths are reported in e.g.
                  e(w50) for the 50% multiple.

Macros
   e(cmd)        rd
   e(rdversion)  Version number of rd
   e(depvar)     Name of dependent variable

Matrices
   e(b)          Coefficient vector of estimated jumps in variables at
                  different percentage bandwidth multiples

Functions
   e(sample)     Marks estimation sample


References

Many references appear in

    Nichols, Austin. 2007.  Causal Inference with Observational Data. Stata
        Journal 7(4): 507-541.

but the interested reader is directed also to

    Imbens, Guido and Thomas Lemieux. 2007. "Regression Discontinuity Designs:
        A Guide to Practice." NBER Working Paper 13039.

    McCrary, Justin. 2007. "Manipulation of the Running Variable in the
        Regression Discontinuity Design:  A Density Test." NBER Technical
        Working Paper 334.

    Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2002.
        Experimental and Quasi-Experimental Designs for Generalized Causal
        Inference.  Boston: Houghton Mifflin.

    Fuji, Daisuke, Guido Imbens, and Karthik Kalyanaraman. 2009. "Notes for
        Matlab and Stata Regression Discontinuity Software."
        http://www.economics.harvard.edu/faculty/imbens/software_imbens
        {phang}Imbens, Guido, and Karthik Kalyanaraman. 2009. "Optimal
        Bandwidth Choice for the Regression Discontinuity Estimator." NBER WP
        14726.  Acknowledgements {p}I would like to thank Justin McCrary for
        helpful discussions.  Any errors are my own.{p_end} {p}The optimal
        bandwidth calculations are from Fuji, Imbens, and Kalyanaraman (2009),
        available at
        http://www.economics.harvard.edu/faculty/imbens/software_imbens.{p_end}
        Citation of rd {p}rd is not an official Stata command. It is a free
        contribution to the research community, like a paper. Please cite it as
        such: {p_end} {phang}Nichols, Austin. 2011.  rd 2.0: Revised Stata
        module for regression discontinuity estimation.
        http://ideas.repec.org/c/boc/bocode/s456888.html{p_end} Author Austin
        Nichols Principal Scientist, Abt Associates, Bethesda MD
        austinnichols@gmail.com Also see {p 1 14}Manual:  [U] 23  Estimation
        and  post-estimation commands{p_end} {p 10 14}[R] bootstrap{p_end} {p
        10 14}[R] lpoly in Stata 10, else locpoly (findit locpoly to
        install){p_end} {p 10 14}[R] ivregress in Stata 10, else [R]
        ivreg{p_end} {p 10 14}[R] regress{p_end} {p 10 14}[XT] xtreg{p_end} {p
        1 10}On-line: help for (if installed) rd_obs (prior version of rd),
        ivreg2, overid, ivendog, ivhettest, ivreset, xtivreg2, xtoverid,
        ranktest, condivreg; psmatch2, nnmatch.  {p_end}

(remote file ends)
---------------------------------------------------------------------------------
(click here to return to the previous screen)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Stata tata Experimental instrumental bootstrapped before local where

沙发
newfei188 发表于 2016-10-15 09:13:53 |只看作者 |坛友微信交流群
you are so great.

使用道具

藤椅
天道纵韬 学生认证  发表于 2016-10-22 10:19:07 |只看作者 |坛友微信交流群
thank u

使用道具

板凳
723364867 发表于 2016-10-30 18:20:53 |只看作者 |坛友微信交流群
xiexie

使用道具

报纸
jpingl1273 发表于 2017-3-24 22:31:12 |只看作者 |坛友微信交流群
贴出来这个rd help有什么用?

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-25 22:53