请选择 进入手机版 | 继续访问电脑版
楼主: chwcy
13679 8

请教:stata中的Heckman命令如何执行? [推广有奖]

  • 0关注
  • 1粉丝

博士生

75%

还不是VIP/贵宾

-

威望
0
论坛币
1938 个
通用积分
7.1382
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
4946 点
帖子
149
精华
0
在线时间
486 小时
注册时间
2005-5-9
最后登录
2023-5-8

chwcy 发表于 2005-9-5 09:46:00 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

例如:我想分析农户是否发生土地租佃,看作两个决策:第一个是是否租佃,是否租入用rentin表示,第二个是租佃规模,租入规模用landin表示,解释变量为age edu land等等,那么在Stata中用Heckman命令如何执行?我在stata8.0中打开后,从statistics中找到seleciton models,然后选择Heckman seleciton model(two step), 我不太明白selection DV是什么意思,必须在前面的方框里勾上,Selection independent variables中输入的变量与前面的Independent有什么区别?是否完全一样?Dependent variable是否输入landin?还有,Heckman seleciton model(ML)与Heckman seleciton model(two step)有何区别?刚学STATA,敬请高手指正,谢谢了。

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:heckman Stata tata HEC Man Stata 命令 heckman

蓝色 发表于 2005-9-5 12:57:00 |显示全部楼层 |坛友微信交流群
看heckman的帮助啊。讲的很详细,还有例子。




-------------------------------------------------------------------------------------------
help forheckman                              manual:  [R]heckman                       
                                             dialogs:  heckman ml  heckman 2-step  predict
-------------------------------------------------------------------------------------------

Heckman selection model

    Basic syntax:

        heckman depvar [varlist], select(varlist_s) [twostep]

      or

        heckman depvar [varlist], select(depvar_s = varlist_s) [twostep]


    Full syntax for maximum-likelihood estimates only:

        heckman depvar [varlist] [weight] [if exp] [in range], select([depvar_s =]
               varlist_s [, offset(varname) noconstant]) [ robust cluster(varname)
               score(newvarlist|stub*) nshazard(newvarname) mills(newvarname)
               offset(varname) noconstant constraints(numlist) first noskip level(#)
                iterate(0) nolog maximize_options ]


    Full syntax for Heckman's two-step consistent estimates only:

        heckman depvar [varlist] [if exp] [in range], twostep select([depvar_s =]
               varlist_s [, noconstant]) [ nshazard(newvarname) mills(newvarname)
               noconstant first level(#) [ rhosigma | rhotrunc | rholimited | rhoforce]
                ]


    by ... : may be used with heckman; see help by.

    pweights, aweights, fweights, and iweights are allowed; see help weights.  No weights
    are allowed if twostep is specified.

    heckman shares the features of all estimation commands; see help estcom.


    The syntax of predict following heckman is

        predict [type] newvarname [if exp] [in range] [, statistic nooffset]

    where statistic is

       xb             fitted values for regression equation; the default
        ycond           E(y | y observed)
        yexpected       E(y*), y taken to be 0 where unobserved
       nshazard        nonselection hazardor inverse Mills' ratio
       mills          nonselection hazard or inverse Mills' ratio
        psel            P(y observed)
       xbsel          linear prediction for selection equation
       stdpsel         standard errorof selection linear pred.
        pr(a,b)         Pr(y | a<y<b)
        e(a,b)          E(y | a<y<b)
        ystar(a,b)      E(y*), y* = max(a,min(y,b))
       stdp           standard error of the prediction
       stdf           standard error of the forecast

    where a and b may be numbers or variables; a missing (a > .) means -infinity; and b
    missing (b > .) means infinity.

    These statistics are available both in and out of sample; type "predict ... if
    esample(), ..." if wanted only for the estimation sample.


Description

    heckman fits regression models with selection using either Heckman's two-step
    consistent estimator or full maximum-likelihood.

    Heckman estimates all of the parameters in the model:

        (regression equation: y is depvar, x is varlist)
        y = xb + u_1

        (selection equation: Z is varlist_s)
        y observed if Zg + u_2 > 0

        where:
                u_1 ~ N(0, sigma)
                u_2 ~ N(0, 1)
                corr(u_1, u_2) = rho

    In the syntax for heckman, depvar and varlist are the dependent variable and
    regressors for the underlying regression model (y = xb), and varlist_s are the
    variables (Z) thought to determine whether depvar is selected/observed or unobserved.
    By default, heckman will assume that missing values (see help missing) of depvar
    imply that the dependent variable is unobserved (not selected).  With some datasets
    it is more convenient to specify a binary variable (depvar_s) that identifies the
    observations for which the dependent is observed/selected (depvar_s!=0) or not
    observed (depvar_s==0); heckman will accommodate either type of data.

    See help svyheckman for a survey version of heckman.


Options

    select(...) specifies the variables and options for the selection equation.  It is an
        integral part of specifying a Heckman model and is not optional.

    twostep specifies that Heckman's (1979) two-step efficient estimates of the
        parameters and covariance matrix (standard errors) of the model are to be
        produced.

    robust specifies that the Huber/White/sandwich estimator of variance is to be used in
        place of the traditional calculation.  robust combined with cluster() further
        allows observations which are not independent within cluster (although they must
        be independent between clusters).  See [U] 23.14 Obtaining robust variance
        estimates.

    cluster(varname) specifies that the observations are independent across groups
        (clusters) but not necessarily independent within groups.  varname specifies to
        which group each observation belongs; e.g., cluster(personid) in data with
        repeated observations on individuals.  cluster() can be used with pweights to
        produce estimates for unstratified cluster-sampled data.  Specifying cluster()
        implies robust.

    score(newvarlist|stub*) creates new variables containing the contributions to the
        scores for each equation and ancillary parameter in the model; see [U] 23.15
        Obtaining scores.

        If score(newvarlist) is specified, four new variables must be provided.  If
        score(stub*) is specified, then variables stub1, stub2, stub3, and stub4 will be
        created.

        The first new variable will contain d(ln L_j)/d(x_j beta)
        The second,   d(ln L_j)/d(z_j gamma)
        The third,    d(ln L_j)/d(atanh(rho))
        The fourth,   d(ln L_j)/d(ln(sigma))

    nshazard(varname) and mills(varname) are synonyms, and either creates a new variable
        containing the nonselection hazard (what is often referred to as the inverse of
        the Mills' ratio) from the selection equation.  With the options twostep or
        iterate(0), the nonselection hazard is derived from a probit regression of
        whether the dependent variable is selected/observed.  Under full
        maximum-likelihood, the nonselection hazard is derived from the parameter
        estimates of the selection equation.

    offset(varname) is a rarely used option that specifies a variable to be added
        directly to xb.  This option may be specified on either the regression or
        selection equation.

    noconstant omits the constant term from the equations.  This option may be specified
        on either the regression equation or the selection equation.

    constraints(numlist) specifies the linear constraints to be applied during
        estimation.  Constraints are defined using the constraint command and are
        numbered; see help constraint.  The default is to perform unconstrained
        estimation.  constraints() may not be specified with twostep.

    first specifies that the first-step probit estimates of the selection equation be
        displayed prior to estimation.

    noskip specifies that a full maximum-likelihood model with only a constant for the
        regression equation be fitted.  This model is not displayed but is used as the
        base model to compute a likelihood-ratio test for the model test statistic
        displayed in the estimationheader.  By default, the overall model test statistic
        is an asymptotically equivalent Wald test of all the parameters in the regression
        equation being zero (except the constant).  For many models, this option can
        significantly increase estimation time.

    level(#) specifies the confidence level in percent for confidence intervals of the
        coefficients; see help level.

    iterate(0) produces Heckman's (1979) two-step parameter estimates with standard
        errors computed from the inverse Hessian of the full information matrix at the
        two-step solution for the parameters.  As an alternative, the twostep option
        computes Heckman's two-step consistent estimates of the standard errors.
        iterate(#) can also be used to restrict the maximum number of iterations during
        optimization; see help maximize.

    rhosigma, rhotrunc, rholimited, and rhoforce are rarely used options to specify how
        the two-step estimator, option twostep, handles unusual cases where the two-step
        estimate of rho is outside the admissible range for a correlation, [-1,1].  When
        rho is outside this range it is possible for the two-step estimate of the
        coefficient variance-covariance matrix to not be positive definite and thus
        unusable for testing.  The default is rhosigma.

        rhotrunc specifies that rho be truncated to lie in the range [-1,1].  If the
        two-step estimate is below -1, rho is set to -1; if the two-step estimate is
        above 1, rho is set to 1.  This truncated value of rho is used in all
        computations to estimate the two-step covariance matrix.

        rhosigma specifies that rho be truncated, as with option rhotrunc, and that the
        estimate of sigma be made consistent with rho_hat, the truncated estimate of rho.
        So, sigma_hat = B_m * rho_hat; see the Methods and Formulas section of [R]
        heckman for the definition of B_m.  Both the truncated rho and the new estimate
        of sigma_hat are used in all computations to estimate the two-step covariance
        matrix.

        rholimited specifies that rho be truncated only in the computation of the
        diagonal matrix D as it enters V_twostep and Q; see [R] heckman Methods and
        Formulas.  In all other computations, the untruncated estimate of rho is used.

        rhoforce specifies that the two-step estimate of rho be retained even if it is
        outside the admissible rangefor a correlation.  This may, in rare cases, lead to
        a nonpositive-definite covariance matrix.

        These options have no effect when estimation is by maximum likelihood, the
        default.  They also have no effect when the two-step estimate of rho is in the
        range [-1,1].

    nolog suppresses the iteration log.

    maximize_options control the maximization process; see help maximize.  You will
        likely never need to specify any of the maximize options except for iterate(0)
        and possibly difficult.  If the iteration log shows many "not concave" messages
        and it is taking many iterations to converge, you may want to try using the
        difficult option and see if that helps it to converge in fewer steps.


Options for predict

    xb, the default, calculates the linear predictions from the underlying regression
        equation.

    ycond calculates the expected value of the dependent variable conditional on the
        dependent variable being observed/selected; E(y | y observed).

    yexpected calculates the expected value of the dependent variable (y*), where that
        value is taken to be 0 when it is expected to be unobserved; y* = P(y observed) *
        E(y | y observed).

        The assumption of 0 is valid for many cases where nonselection implies
        non-participation (e.g., unobserved wage levels, insurance claims from those who
        are uninsured, etc.) but may be inappropriate for some problems (e.g., unobserved
        disease incidence).

    nshazard and mills are synonyms, either calculates the nonselection hazard -- what is
        often referred to as the inverse of the Mills' ratio.

    psel calculates the probability of selection (or being observed):  P(y observed) =
        Pr(z_j*g + u_2j > 0).

    xbsel calculates the linear prediction for the selection equation.

    stdpsel calculates the standard error of the linear prediction for the selection
        equation.

    pr(a,b) calculates the Pr(a < x*b+u_1 < b), the probability that y|x would be
        observed in the interval (a,b).

        a and b may be specified as numbers or variable names;
        pr(20,30) calculates Pr(20 < x*b+u_1 < 30);
        pr(lb,ub) calculates Pr(lb < x*b+u_1 < ub); and
        pr(20,ub) calculates Pr(20 < x*b+u_1 < ub).

        a missing (a > .) meansminus infinity; pr(.,30) calculates Pr(x*b+u_1 < 30) and
        pr(lb,30) calculates Pr(x*b+u_1 < 30) in observations for which lb > . (and
        calculates Pr(lb < x*b+u_1 < 30) elsewhere).

        b missing (b > .) means plus infinity; pr(20,.) calculates Pr(x*b+u_1 > 20) and
        pr(20,ub) calculates Pr(x*b+u_1 > 20) in observations for which ub > . (and
        calculates Pr(20 < x*b+u_1 < ub) elsewhere).

    e(a,b) calculates E(x*b+u_1 | a < x*b+u_1 < b), the expected value of y|x conditional
        on y|x being in the interval (a,b), which is to say, y|x is censored.  a and b
        are specified as they are for pr().

    ystar(a,b) calculates E(y*), where y* = a if x*b+u_1 < a, y* = b if xb+u > b, and
        y* = xb+u otherwise, which is to say, y* is truncated.  a and b are specified as
        they are for pr().

    stdp calculates the standard error of the prediction from the underlying regression
        equation.

    stdf calculates the standard error of the forecast of the underlying regression
        equation.  This is often informally referred to as the standard error of the
        prediction.  By construction, the standard errors produced by stdf are always
        larger than those by stdp; see [R] regress.

    nooffset is relevant only if you specified offset() for heckman.  It modifies the
        calculations made by predict so that they ignore the offset variable; the linear
        prediction is treated as xb rather than xb + offset.


Examples

    To obtain full ML estimates:

        . heckman wage educ age, select(married children educ age)

    To obtain Heckman's two-step consistent estimates:

        . heckman wage educ age, select(married children educ age) twostep

    To define and use each equation separately:

        . global wage_eqn wage educ age
        . global seleqn married children age
        . heckman $wage_eqn, select($seleqn)

    To use a variable to identify selection:

        . heckman wage educ age, select(wageseen = married children educ age)

    To use options:

        . heckman wage educ age, select(married children educ age), [pw=wgt]
        . heckman wage educ age, select(married children educ age) robust
        . heckman $wage_eqn, select($seleqn) cluster(county)
        . heckman $wage_eqn, select($seleqn) score(scr1 scr2 scr3 scr4)

        . heckman wage educ age, select(married children educ age) first

        . heckman $wage_eqn, select($seleqn) mills(mymills)

        . heckman wage educ age, noconstant select(married children educ age)
        . heckman wage educ age, select(married children educ age, noconstant)

    Prediction:

        . heckman wage educ age, select(married children educ age)

        . predict yhat
        . predict yhat, xb

        . predict mystdp, stdp
        . predict mystdf, stdf

        . predict ycond, ycond
        . predict ystar, yexpected
        . predict probseen, psel
        . predict selindex, xbsel

        . predict mymill, mills

        . predict p0to20, pr(0,20)
        . predict less15, pr(.,15)
        . predict ey0to20, e(0,20)
        . predict ys0to20, ystar(0,20)


Also see

    Manual:  [U] 23 Estimation and post-estimation commands,
             [U] 29 Overview of Stata estimation commands,
             [R] heckman

    Online:  help for constraint, estcom, postest, svyheckman; heckprob, regress,
             svyheckprob, tobit, treatreg

使用道具

hanszhu 发表于 2005-9-6 08:23:00 |显示全部楼层 |坛友微信交流群

Hello,

I am trying to use Heckman model. I found that many people mentioned that the Proc Qlim can do it in SAS 9.1. But I checked the documents for Proc Qlim and counld not find any detailed information about Heckman model analysis. Anyone can give me more detailed information about the Proc Qlim for Heckman?

Thank you all for your help!

使用道具

hanszhu 发表于 2005-9-6 08:24:00 |显示全部楼层 |坛友微信交流群

I don't know about PROC QLIM, but a colleague has some IML code that does it. I think she found it on the web; if no one helps with QLIM, and you'd like the IML code, let me know and I will try to find it. Tangentially,The Heckman model has gotten very mixed reviews....economists seem to like it, others seem less impressed.

Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)

使用道具

hanszhu 发表于 2005-9-6 08:25:00 |显示全部楼层 |坛友微信交流群

If you go t

http://ftp.sas.com/techsup/download/stat/

You'll find a 'heckman' program. But it's not PROC QLIM. It uses Greene's correction to Heckman to get the adjusted standard errors. It uses PROC PROBIT, PROC REG, and *also* IML.

Peter mentioned that non-econ people tend not to be as thrilled with the Heckman model. So let me add another point. Heckman was *wrong*. He claimed that the OLS estimates would be smaller than the real standard errors. Greene (1981) showed that the OLS estimates could be larger or smaller or even the same size as the true errors.

What I find worrisome is the whole concept that you can really correct for an unknowable bias when using non-randomly selected samples. I simply don't believe this. You can make (possibly unwarranted) assumptions to model the bias, as Heckman did. But you can't evalaute those assumptions.

I also found this URL: http://www.stat.purdue.edu/~ywang/Introduction%20to%20Heckman%20Model.ppt In this is code which only uses PROC LOGISTIC, a data step, and PROC REG.In the PROC QLIM examples, there is an entry for Sample Selection Models. As you may guess by reading my whiny diatribe above, this shows how to fit a Heckman-like model using PROC QLIM. If you look unde the "Details" section, the "Selection Models" part shows the *exact* model to use to get the classical Heckman model.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

使用道具

hanszhu 发表于 2005-9-6 08:27:00 |显示全部楼层 |坛友微信交流群

Hi,

You've already gotten some thought-provoking feedback about the Heckman two-step (and if I ever go to a barn dance, I'm going to shout out a request to dance the Heckman Two-Step) from two of the gurus. I have my own complaints about it (it seems that whenever the Mills Ratio variable is significant, it has a positive coefficient, although sometimes the higher propensity subjects have a smaller effect size).

If you do want to continue in your quest to dance Heckman with SAS, you have at least two alternatives at your disposal:

1) David Jaeger's macro (http://support.sas.com/ctx/samples/index.jsp? sid=476&re="s/y/PROCS/reg/reg/s/y/PROCS/reg/reg") This requires SAS/STATand SAS/IML (and Base SAS, natch) to run the whole thing, but both the modeling steps are done with STAT PROCs (PROBIT and REG), with a data step in between to extract the Mills Ratios; the IML portion at the end is just to compute corrected standard errors.

2) PROC QLIM in SAS 9 (http://support.sas.com/onlinedoc/913/docMainpage.jsp). They consider the Heckman models part of the general class of selection models, which is true. So the details section of the PROC QLIM documentation on Selection Models has a bit of the Heckman theory, and there is an example of a Heckman-like model, "Example 22.4: Sample Selection Model" that you can look at. The code is pretty simple:

proc qlim data=mroz; model inlf = nwifeinc educ exper expersq age kidslt6 kidsge6 /discrete; model lwage = educ exper expersq / select(inlf=1);

run;

It's very straightforward to see what it's doing; the first model step is the binary discrete part, the second models the effect when the first dependent variable is 1. So give it a spin (and report back to us if you like, oh intrepid pioneer)!

使用道具

hanszhu 发表于 2006-4-13 11:58:00 |显示全部楼层 |坛友微信交流群
1

使用道具

wavon 发表于 2007-6-21 14:20:00 |显示全部楼层 |坛友微信交流群
我也问发问的作者一样,虽然help里面有说明Heckman two stage的用法,但是还是看不是很懂,我跟发问者遇到问题一样,不知道有没有人可以回答的详细一点,请用stata的方式回答,不要用sas的方式回答....感激不尽.....

使用道具

qiangli 发表于 2007-6-21 14:24:00 |显示全部楼层 |坛友微信交流群
论坛上有stata的muanal,里面讲的详细,下载可以去看看

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-17 07:37