楼主: 921125
23350 11

[回归分析求助] 断点回归方法 [推广有奖]

  • 9关注
  • 0粉丝

已卖:1份资源

博士生

5%

还不是VIP/贵宾

-

威望
0
论坛币
1113 个
通用积分
0.0087
学术水平
1 点
热心指数
2 点
信用等级
1 点
经验
8704 点
帖子
165
精华
0
在线时间
160 小时
注册时间
2014-10-29
最后登录
2017-8-3

楼主
921125 发表于 2015-10-7 10:39:52 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Stata可以做断点回归吗?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:断点回归 回归方法 Stata tata 断点

沙发
蓝色 发表于 2015-10-7 11:57:22
https://ideas.repec.org/c/boc/bocode/s456888.html


-----------------------------------------------------------------------------------------------------------------
help for rd
-----------------------------------------------------------------------------------------------------------------

Regression discontinuity (RD) estimator

Syntax

      rd [varlist] [if] [in] [weight] [, options]

      where varlist has the form outcomevar [treatmentvar] assignmentvar

        +---------+
    ----+ Weights +--------------------------------------------------------------------------------------------

    aweights, fweights, and pweights are allowed; see help weights.  Under Stata versions 9.2 or before (using
    locpoly to construct local regression estimates) aweights and pweights will be converted to fweights
    automatically and the data expanded. If this would exceed system memory limits, error r(901) will be
    issued; in this case, the user is advised to round weights.  In any case, the validity of bootstrapped
    standard errors will depend on the expanded data correctly representing sampling variability, which may
    require rounding or replacing weight variables.  Under Stata versions 10 or later (using lpoly to construct
    local regression estimates), all weights will be treated as aweights.

      bs [, options]: rd varlist [if] [in] [weight] [, options]

        +----------------------------+
    ----+  Table of Further Contents +-------------------------------------------------------------------------

  General description of estimator
  Examples
  Detailed syntax
  Description of options
  Remarks and saved results
  References
  Acknowledgements
  Citation of rd
  Author information

        +-------------+
    ----+ Description +----------------------------------------------------------------------------------------

rd implements a set of regression-discontinuity estimation methods that are thought to have very good internal
validity, for estimating the causal effect of some explanatory variable (called the treatment variable) for a
particular subpopulation, under some often plausible assumptions.  In this sense, it is much like an experimental
design, except that levels of the treatment variable are not assigned randomly by the researcher.  Instead, there
is a jump in the conditional mean of the treatment variable at a known cutoff in another variable, called the
assignment variable, which is perfectly observed, and this allows us to estimate the effect of treatment as if it
were randomly assigned in the neighborhood of the known cutoff.

rd is an alternative to various regression techniques that purport to allow causal inference (e.g. panel methods
such as xtreg), instrumental variables (IV) and other IV-type methods (see the ivreg2 help file and references
therein), and matching estimators (see the psmatch2 and nnmatch help files and references therein).  The rd
approach is in fact an IV model with one exogenous variable excluded from the regression (excluded instrument),
an indicator for the assignment variable above the cutoff, and one endogenous regressor (the treatment variable).

rd estimates local linear or kernel regression models on both sides of the cutoff, using a triangle kernel.
Estimates are sensitive to the choice of bandwidth, so by default several estimates are constructed using
different bandwidths. In practice, rd uses kernel-weighted suest (or ivreg if suest fails) to estimate the local
linear regressions and reports analytic SE based on the regressions.

Further discussion of rd appears in Nichols (2007).

        +----------+
    ----+ Examples +-------------------------------------------------------------------------------------------

In the simplest case, assignment to treatment depends on a variable Z being above a cutoff Z0.  Frequently, Z is
defined so that Z0=0. In this case, treatment is 1 for Z>=0 and 0 for Z<0, and we estimate local linear
regressions on both sides of the cutoff to obtain estimates of the outcome at Z=0.  The difference between the
two estimates (for the samples where Z>=0 and where Z<0) is the estimated effect of treatment.

For example, having a Democratic representative in the US Congress may be considered a treatment applied to a
Congressional district, and the assignment variable Z is the vote share garnered by the Democratic candidate.  At
Z=50%, the probability of treatment=1 jumps from zero to one. Suppose we are interested in the effect a
Democratic representative has on the federal spending within a Congressional district.  rd estimates local linear
regressions on both sides of the cutoff like so:

         ssc inst rd, replace
         net get rd
         use votex
         rd lne d, gr mbw(100)
         rd lne d, gr mbw(100) line(`"xla(-.2 "Repub" 0 .3 "Democ", noticks)"')
         rd lne d, gr ddens
         rd lne d, mbw(25(25)300) bdep ox
         rd lne d, x(pop-vet)

In a fuzzy RD design, the conditional mean of treatment jumps at the cutoff, and that jump forms the denominator
of a Local Wald Estimator. The numerator is the jump in the outcome, and both are reported along with their
ratio. The sharp RD design is a special case of the fuzzy RD design, since the denominator in the sharp case is
just one.

         g byte ranwin=cond(uniform()<.1,1-win,win)
         rd lne ranwin d, mbw(100)

The default bandwidth from Imbens and Kalyanaraman (2009) is designed to minimize MSE, or squared bias plus
variance, in a sharp RD design. Note that a smaller bandwidth tends to produce lower bias and higher variance.
The optimal bandwidth will tend to be larger for a fuzzy design due to the additional variance arising from the
estimation of the jump in the conditional mean of treatment.  Unfortunately, a larger bandwidth also leads to
additional bias, which will be greater if the curvature of the response function is greater (meaning that a
linear regression over a larger range is a poorer approximation).  The increase in squared bias due to dividing
by the estimated jump in the conditional mean of treatment (using observations away from the discontinuity) can
easily dominate the increase in variance and lead to the optimal bandwidth in a fuzzy design to be smaller than
in the sharp design.  No clear guidance is offered; conducting simulations using plausible generating functions
for your specific application are highly recommended.  The rd option bdep facilitates visualizing the dependence
of the estimate on bandwidth.

There are also a varitey of alternative implementations on
{browse:https://sites.google.com/a/umich.edu/cattaneo/software}{the website of Matias Cattaneo}.

         rd lne ranwin d, mbw(25(25)300) bdep ox

        +-----------------------------+
    ----+ Detailed Syntax and Options +------------------------------------------------------------------------

There should be two or three variables specified after the rd command; if two are specified, a sharp RD design is
assumed, where the treatment variable jumps from zero to one at the cutoff.  If no variables are specified after
the rd command, the estimates table is displayed.

      rd outcomevar [treatmentvar] assignmentvar [if] [in] [weight] [, options]


        +-----------------+
    ----+ Options summary +------------------------------------------------------------------------------------

mbw(numlist) specifies a list of multiples for bandwidths, in percentage terms.  The default is "100 50 200"
    (i.e. half and twice the requested bandwidth) and 100 is always included in the list, regardless of whether
    it is specified.

z0(real) specifies the cutoff Z0 in assignmentvar Z.

strineq specifies that mean treatment differs at Z0 from all Z>Z0 (e.g. treatment is 1 for Z>0 and 0 for Z<=0);
    the default assumption is that mean treatment differs at Z0 from all Z<Z0 (e.g. treatment is 1 for Z>=0 and 0
    for Z<0).

x(varlist) requests estimates of jumps in control variables varlist.

ddens requests a computation of a discontinuity in the density of Z.  This is computed in a relatively ad hoc
    way, and should be redone using McCrary's test described at
    http://www.econ.berkeley.edu/~jmccrary/DCdensity/.

s(stubname) requests that estimates be saved as new variables beginning with stubname.

graph requests that local linear regression graphs for each bandwidth be produced.

noscatter suppresses the scatterplot on those graphs.

cluster(varlist) requests standard errors robust to clustering on distinct combinations of varlist (e.g. stratum
    psu).

scopt(string) supplies an option list to the scatter plot.

lineopt(string) supplies an option list to the overlaid line plots.

n(real) specifies the number of points at which to calculate local linear regressions.  The default is to
    calculate the regressions at 50 points above the cutoff, with equal steps in the grid, and to use equal steps
    below the cutoff, with the number of points determined by the step size.

bwidth(real) allows specification of a bandwidth for local linear regressions.  The default is to use the
    estimated optimal bandwidth for a "sharp" design as given by Imbens and Kalyanaraman (2009).  The optimal
    bandwidth minimizes MSE, or squared bias plus variance, where a smaller bandwidth tends to produce lower bias
    and higher variance. Note that the optimal bandwidth will often tend to be larger for a fuzzy design, due to
    the additional variance that arises from the estimation of the jump in the conditional mean of treatment.

bdep requests a graph of estimates versus bendwidths.

oxline adds a vertical line at the default bandwidth.

kernel(rectangle) requests the use of a rectangle (uniform) kernel. The default is a triangle (edge) kernel.

covar(varlist) adds covariates to Local Wald Estimation, which is generally a Very Bad Idea.  It is possible that
    covariates could reduce residual variance and improve efficiency, but estimation error in their coefficients
    could also reduce efficiency, and any violations of the assumptions that such covariates are exogenous and
    have a linear impact on mean treatment and outcomes could greatly increase bias.


        +---------------------------+
    ----+ Remarks and saved results +--------------------------------------------------------------------------


To facilitate bootstrapping, rd saves the following results in e():

Scalars
   e(N)          Number of observations used in estimation
   e(w)          Bandwidth in base model; other bandwidths are reported in e.g. e(w50) for the 50% multiple.

Macros
   e(cmd)        rd
   e(rdversion)  Version number of rd
   e(depvar)     Name of dependent variable

Matrices
   e(b)          Coefficient vector of estimated jumps in variables at different percentage bandwidth multiples

Functions
   e(sample)     Marks estimation sample


References

Many references appear in

    Nichols, Austin. 2007.  Causal Inference with Observational Data. Stata Journal 7(4): 507-541.

but the interested reader is directed also to

    Imbens, Guido and Thomas Lemieux. 2007. "Regression Discontinuity Designs: A Guide to Practice." NBER
        Working Paper 13039.

    McCrary, Justin. 2007. "Manipulation of the Running Variable in the Regression Discontinuity Design:  A
        Density Test." NBER Technical Working Paper 334.

    Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2002.  Experimental and Quasi-Experimental
        Designs for Generalized Causal Inference.  Boston: Houghton Mifflin.

    Fuji, Daisuke, Guido Imbens, and Karthik Kalyanaraman. 2009. "Notes for Matlab and Stata Regression
        Discontinuity Software." http://www.economics.harvard.edu/faculty/imbens/software_imbens

    Imbens, Guido, and Karthik Kalyanaraman. 2009. "Optimal Bandwidth Choice for the Regression Discontinuity
        Estimator." NBER WP 14726.

Acknowledgements

I would like to thank Justin McCrary for helpful discussions.  Any errors are my own.

The optimal bandwidth calculations are from Fuji, Imbens, and Kalyanaraman (2009), available at
http://www.economics.harvard.edu/faculty/imbens/software_imbens.

Citation of rd

rd is not an official Stata command. It is a free contribution to the research community, like a paper. Please
cite it as such:

    Nichols, Austin. 2011.  rd 2.0: Revised Stata module for regression discontinuity estimation.
        http://ideas.repec.org/c/boc/bocode/s456888.html

Author

    Austin Nichols
    Urban Institute
    Washington, DC, USA
    austinnichols@gmail.com

Also see

Manual:  [U] 23  Estimation and  post-estimation commands
          [R] bootstrap
          [R] lpoly in Stata 10, else locpoly (findit locpoly to install)
          [R] ivregress in Stata 10, else [R] ivreg
          [R] regress
          [XT] xtreg

On-line: help for (if installed) rd_obs (prior version of rd), ivreg2, overid, ivendog, ivhettest, ivreset,
          xtivreg2, xtoverid, ranktest, condivreg; psmatch2, nnmatch.


藤椅
921125 发表于 2015-10-7 12:01:40
蓝色 发表于 2015-10-7 11:57
多谢解答!不过看着好复杂的样子,估计得花时间消化一下。。谢谢~~

板凳
dj1993jd 学生认证  发表于 2016-6-5 21:28:38
921125 发表于 2015-10-7 12:01
多谢解答!不过看着好复杂的样子,估计得花时间消化一下。。谢谢~~
你好  想请教你如何做断点回归    我现在迷迷糊糊的

报纸
胖仔1 学生认证  发表于 2016-8-8 21:52:08
请问Eviews能做么

地板
黃河泉 在职认证  发表于 2016-8-12 16:12:00
其实一般的 RDD 估计,只需 OLS (for sharp RDD)/IV (for fuzzy RDD) 就可! More information can be found: https://bbs.pinggu.org/thread-4705247-1-1.html
已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
icyjunjin + 5 + 5 + 5 + 5 精彩帖子

总评分: 论坛币 + 5  学术水平 + 5  热心指数 + 5  信用等级 + 5   查看全部评分

7
碧草已满地3 发表于 2017-4-17 15:01:06
黃河泉 发表于 2016-8-12 16:12
其实一般的 RDD 估计,只需 OLS (for sharp RDD)/IV (for fuzzy RDD) 就可! More information can be foun ...
老师,是不是只需要两组数据:被解释变量y和处理变量x,再用命令rd y x,gr mbw(100),就应该能出来精确断点回归的结果?为什么我的是no observation。

8
黃河泉 在职认证  发表于 2017-4-17 15:11:34
碧草已满地3 发表于 2017-4-17 15:01
老师,是不是只需要两组数据:被解释变量y和处理变量x,再用命令rd y x,gr mbw(100),就应该能出来精确断 ...
把基本统计量与指令 show 出来一下!

9
dxk614 发表于 2017-8-15 20:43:55
dj1993jd 发表于 2016-6-5 21:28
你好  想请教你如何做断点回归    我现在迷迷糊糊的
请问,你学会断点回归了么?有什么教材资料么?

10
金平糖 发表于 2017-12-5 17:27:34
非常感谢,顶一下

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-4 05:10