首先,第一步,要知道这个命令要用到cmp,我们help cmp来看一下介绍
关于版本要求,最好从Stata 13起步
Versions 8.0.0 and 8.2.0 of cmp, released in mid-2017 and early 2018,
include changes that can somewhat affect results in hierarchical models.
An older version, 7.1.0, is available as a Github archive, and can be
directly installed, in Stata 13 or later, via "net from
https://raw.github.com/droodman/cmp/v7.1.0".
Versions 8.6.2, released in June 2021, requires Stata 13 or later. The
previous version works in Stata 11 and 12 too.
第二步,安装命令:ssc install ghk2,replace
第三步,大致知道cmp命令语句等式的意思
To inform cmp about the natures of the dependent variables and about which
equations apply to which observations, the user must include the
indicators() option after the comma in the cmp command line. This must
contain one expression for each equation. The expression can be a
constant, a variable name, or a formula. Formulas that contain spaces or
parentheses should be enclosed in quotes. For each observation, each
expression must evaluate to one of the following codes, with the meanings
shown:
0 = observation is not in this equation's sample
. = observation is in this equation's sample but dependent variable
unobserved for this observation
1 = equation is "continuous" for this observation, i.e., is linear
with Gaussian error or is an uncensored observation in a tobit
equation
2 = observation is left-censored for this (tobit) equation at the
value stored in the dependent variable
3 = observation is right-censored at the value stored in the dependent
variable
4 = equation is probit for this observation
5 = equation is ordered probit for this observation
6 = equation is multinomial probit for this observation
7 = equation is interval-censored for this observation
8 = equation is truncated on the left and/or right (obsolete because
truncation is now a general modeling feature)
9 = equation is rank-ordered probit for this observation
10 = equation is frational probit for this observation
For clarity, users can execute the cmp setup subcommand, which defines
global macros that can then be used in cmp command lines:
$cmp_out = 0
$cmp_missing = .
$cmp_cont = 1
$cmp_left = 2
$cmp_right = 3
$cmp_probit = 4
$cmp_oprobit = 5
$cmp_mprobit = 6
$cmp_int = 7
$cmp_trunc = 8 (deprecated)
$cmp_roprobit = 9
$cmp_frac = 10
第四步,举例说明
. cmp setup
. webuse laborsup
先来一个正常的oprobit,mprobit
. oprobit kids fem_inc male_educ
. margins, dydx(*) predict(outcome(#2))
. cmp (kids = fem_inc male_educ), ind($cmp_oprobit) qui
. margins, dydx(*) predict(eq(#1) outcome(#2) pr)
. webuse sysdsn3
. mprobit insure age male nonwhite site2 site3
. margins, dydx(nonwhite) predict(outcome(2))
. cmp (insure = age male nonwhite site2 site3, iia), nolr ind($cmp_mprobit) qui
. margins, dydx(nonwhite) predict(eq(#2) pr)
由于在help中没有直接的iv-oprobit,先借鉴下ivprobit
. ivprobit fem_work fem_educ kids (other_inc = male_educ), first
. version 13: margins, predict(pr) dydx(*)
. cmp (fem_work = other_inc fem_educ kids) (other_inc = fem_educ kids male_educ), ind($ cmp_probit $ cmp_cont)
. margins, predict(pr) dydx(*) force
由此,我们大致可以推测出语法结构:
以y1为例,问卷中是一个五分类的变量,分别赋值1-5,现在转换成二分类的y2,赋值1和0,有变量x1,x2,x3,x4,其中,x1是解释变量,他的工具变量是mv,其它的x2,x3,x4是控制变量
那么,ivprobit模型为ivprobit y2 x2,x3,x4(x1=mv)
如果我们要做iv-oprobit,则变成:
cmp(y1=x1 x2 x3 x4)(x1=x2 x3 x4 mv),ind($ cmp_oprobit $ cmp_cont)technique(dfp) nolrtest
可以看到cmp后边跟了两个等式,第一个等式,y1=x1 x2 x3 x4就是正常的oprobit回归中所有变量,第二个等式x1=x2 x3 x4 mv,就是将主要解释变量x1让它等于工具变量 加上x1以外的所有变量,
ind($cmp_oprobit $cmp_cont)这个语法是固定的,要联系到第三步中这两个的意思,建议最好在help中复制这个命令,我自己做的时候,就老出错,不知道是不是括号的问题,后来改了才能进行,比如:
The indicators() option must contain 2 variables, one for each equation. Did you forget to type cmp setup?
technique(dfp) nolrtest是help文件中没怎么出现的,但我在别的提问中看到这个,目前还不知道意思,暂且加上
然后就是等结果,结果分为两部分
第一部分
Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
For exact fits of each equation alone, run cmp separately on each.
Iteration 0: log likelihood = -14235.01
Iteration 1: log likelihood = -13216.218
Iteration 2: log likelihood = -13210.845
Iteration 3: log likelihood = -13210.842
Iteration 4: log likelihood = -13210.842
Ordered probit regression 一个模型
然后会出现
Warning: regressor matrix for _cmp_y1 equation appears ill-conditioned. (Condition number = 4140.8843.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.
又是一个模型
Warning: regressor matrix for erzi equation appears ill-conditioned. (Condition number = 5012.1327.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.
Fitting full model.
Iteration 0: log likelihood = -18709.494
接下来就是漫长的运行,一直到260多次,终于结束
Iteration 261: log likelihood = -18698.597
Iteration 262: log likelihood = -18698.593
Iteration 263: log likelihood = -18698.593
Iteration 264: log likelihood = -18698.593
Mixed-process regression
终于出现了模型的结果,最后求一下边际效应
margins, dydx(*)
由于帖子被自动排版,出了很多问题,比如符号 $ 的 问题,导致$ cmp_oprobit $cmp_cont 出现问题,我传一个word供大家观看。
以上就是我自己的操作经验,其实我自己也有点懵逼,不知道对错,欢迎大家指正、批评。
IV-Oprobit.doc
(48.5 KB)


雷达卡






京公网安备 11010802022788号







