synth_runner安装包及学习资料分享交流 - Stata专版

26关注
28粉丝

已卖：1253份资源

教授

47%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 1167 个
通用积分: 1196.5565
学术水平: 36 点
热心指数: 43 点
信用等级: 35 点
经验: 14504 点
帖子: 893
精华: 0
在线时间: 1518 小时
注册时间: 2019-3-5
最后登录: 2026-2-14

楼主

Lee_iris

发表于 2020-4-29 19:16:57 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

最近开始学习synth_runner，用于不同地区分阶段实行政策的效应分析，但是学习资料很少，只找到推送两篇：
https://mp.weixin.qq.com/s/DsO2Zkm_cggaE_s40Qpbvg
https://mp.weixin.qq.com/s/F_z6J2XGasSTW02L-2GyxA
这两篇很雷同，内容基本来自附件的英文论文。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏14 回帖

关键词：synth_runner 合成控制法 scm synth 学习资料

The Synth_Runner Package Utilities to Automate Synthetic ControlEstimation Usin.pdf
下载链接: https://bbs.pinggu.org/a-3145337.html

336.88 KB

synth_runner介绍

相关帖子

沙发

Lee_iris

发表于 2020-4-29 20:03:18

关于软件安装：
search synth_runner以后点击链接
st0500 from http://www.stata-journal.com/software/sj17-4
再点击click here to install 就行了
安装的文件包括
INSTALLATION FILES                               (click here to install)
   st0500/calc_RMSPE.ado
   st0500/effect_graphs.ado
   st0500/pval_graphs.ado
   st0500/single_treatment_graphs.ado
   st0500/_sr_add_keepfile_to_agg.ado
   st0500/_sr_do_work_do.ado
   st0500/_sr_do_work_tr.ado
   st0500/_sr_gen_time_locals.ado
   st0500/_sr_get_returns.ado
   st0500/_sr_print_dots.ado
   st0500/synth_runner.ado
   st0500/synth_wrapper.ado
   st0500/effect_graphs.sthlp
   st0500/pval_graphs.sthlp
   st0500/single_treatment_graphs.sthlp
   st0500/synth_runner.sthlp
要分别放在ado文件夹下面的plus文件夹里的c,e,p,a,_,子文件夹下面

可参考回帖：
关于合成控制法的安慰剂检验问题
https://bbs.pinggu.org/forum.php?mod=viewthread&tid=7866750&from^^uid=11374248

synthrunner.zip

28.65 KB

本附件包括：

_sr_add_keepfile_to_agg.ado
_sr_do_work_do.ado
_sr_do_work_tr.ado
_sr_gen_time_locals.ado
_sr_get_returns.ado
_sr_print_dots.ado
calc_rmspe.ado
effect_graphs.ado
effect_graphs.sthlp
pval_graphs.ado
pval_graphs.sthlp
single_treatment_graphs.ado
single_treatment_graphs.sthlp
synth_runner.ado
synth_runner.sthlp
synth_wrapper.ado

藤椅

Lee_iris

发表于 2020-4-30 12:42:46

一、synth的p值

synth_runner可以计算合成控制的p值，附件的文章中提到计算思路和公式（下图是synth的p值，适用于单期处理和单个处理单元）：

公式中的J是untreated units 未进行处理的个体个数，α是处理效应，即实际值与合成值的差值，α一帽右上标PL是安慰剂 placebos 的缩写，指的是处理单元（记为1）在t期的合成组的差值，是每个J的安慰剂处理效应，个人理解，这里p值的计算是将控制组中安慰剂处理效应大于处理组处理效应的个数，除以控制组的个体数。

求和符号右侧的1（）可能是逻辑判断的符号，括号里为正确的则记为1然后加总，尽管没有找到依据，但是暂时没有想到其他合理的解释，没有参考依据，仅为猜测。同时这里的分母为J，而不是现在国内论文中常用的J+1。这一点在注释中有提到过，几位相关作者都用J，他的解释是“With multiple treatments, there would be several approaches to adding the eﬀects on the treated to the comparison distribution, so they are not dealt with here.”。在案例中，作者提到，如果是真正随机的treatment，真实的p值应该在原来的基础上分子分母分别加1。

二、synth_runner： multiple events的p值

除了计算synth中的p值，synth_runner还可以计算multiple events，设定中允许冲击发生在不同主体g∈{1,...,G}以及不同时间上，类似did，设置一个d，表示某个体在某期有没有受到冲击。同样可以计算p值：

多期p值.JPG

上图的α一把是所有处理效应的均值（见下图公式），分子应该是所有安慰剂效应的均值比处理组处理效应大的个数，分母是所有可能的均值。
分母是一个累乘。乘的是每一个treated unit g所匹配到的untreated units 的数量，这个数量与匹配的质量有关，个人理解，不一定正确，累乘的就是合成的控制组中有权重的个体数量。而之所以要累乘，是从每一个g的Jg（合成控制组中有权重的个体集）中选一个，则共有累乘起来这么多种可能。为什么要从中选一个是因为分子做的是平均，是因为这个平均相当于就是在这个Jg中选了其中的一个作为代表，看安慰剂的平均大于处理组平均效应这个事件发生的次数在所有的可能组合中的比例，这样就得到了多个事件synth的p值。

累乘.JPG

三、synth_runner的检验

关于检验，看起来和synth没有什么不同，原文引用了13年Cavallo et al. 的文献提到的两种方法：
Cavallo et al. (2013) perform two basic checks to see whether the synthetic control serves as a valid counterfactual.
The ﬁrst is to check whether a weighted average of donors can approximate the treated unit in the pretreatment. This should be satisﬁed if the treated unit lies within the convex hull of the control units. One can visually compare the diﬀerence in pretreatment outcomes between a unit and its synthetic control. Additionally, one could look at the distribution of pretreatment RMSPEs and see what proportion of control units have values at least as high as that of the treated unit. Cavallo et al. (2013) discard several events from their study because they cannot be matched appropriately.
一是看控制组的加权平均能不能拟合处理组。如果处理组在控制组的凸集里，这个应该能满足。这一点可以（从图上）直观地看出来。另外是可以看有权重的控制组个体在实验期前的rmspe是不是够小，至少要比处理组的小。
这里提到，Cavallo等人因为一些研究不能很好地匹配而放弃了一些研究的事件。所以我们在研究中,如果是从事件入手的话，通过数据尝试实在不能拟合，也不要勉强，不如直接换个方式。当然，由此引出的scm的可靠性和适用性，就是另一个故事了。
Secondly, one can exclude some pretreatment outcomes from the list of predictors and see whether the synthetic control matches well the treated unit in these periods. Because this is still pretreatment, the synthetic control should match well. The initial section of the pretreatment period is often designated the “training” period, with the latter part being the “validation” period. Cavallo et al. (2013) set aside the ﬁrst half of the pretreatment period as the training period.

还有一种检验方法，个人没太理解。可能是在说通过调整预测变量来使得scm拟合得更好。在处理期前，拟合应该是比较好才行。处理期前刚开始的部分被称为 “训练期”，（处理期前的）后半部分被称为 “检验期”。Cavallo et al. (2013) 是从处理期前一半的位置区分的。

（最后的图是从前面替换下来的，因为删不掉，所以忽略就好。）
p值计算.JPG

板凳

Lee_iris

发表于 2020-4-30 15:15:28

四、The synth runner package

（一）Syntax

synth_runner depvar predictorvars, {trunit(#) trperiod(#)|d(varname)} [trends pre_limit_mult(real) training_propr(real) gen_vars noenforce_const_pre_length ci max_lead(#) n_pl_avgs(string) pred_prog(string) deterministicoutput parallel pvals1s drop_units_prog(string) xperiod_prog(string) mspeperiod_prog(string) synthsettings]
Required settings:
• depvar speciﬁes the outcome variable.
• predictorvars speciﬁes the list of predictor variables. See help synth help for more details.

需要的变量包括：depvar（结果变量）以及predictorvars（一系列预测变量）

（二）Options

1.trunit(#) and trperiod(#) or d(varname):
There are two methods for specifying the unit and time period of treatment: either trunit() and trperiod() or d(). Exactly one of these is required.
trunit(#) and trperiod(#), used bysynth, can be used when there is a single unit entering treatment. Because synthetic control methods split time into pretreatment and treated periods, trperiod() is the ﬁrst of the treated periods and, slightly confusingly, also called posttreatment.
d(varname) speciﬁes a binary variable, which is 1 for treated units in treated periods and 0 everywhere else. This allows for multiple units to undergo treatment, possibly at diﬀerent times.

关于明确处理期以及处理组个体，有两种设置方式：一是和synth一样的trunit(#) and trperiod(#)，如果只有一个处理个体时可以用这种方式。另一种类似DID，设置d（varname）指定一个二进制变量，对于处理期的处理组个体，变量为1，其他为0，这样的话就可以适用于不同个体在不同期受到冲击的合成。

2.trends :
trends will force synth to match on the trends in the outcome variable. It does this by scaling each unit’s outcome variable so that it is 1 in the last pretreatment period.

谷歌翻译：trends将迫使合成器匹配结果变量中的趋势。它通过缩放每个单元的结果变量来做到这一点，以使其在最后的预处理期间为1。这个真的是每一个字都看得懂但是不知道在说什么。

3.pre_limit_mult(real):
pre_limit_mult(real) will not include placebo eﬀects in the pool for inference if the match quality of that control, namely, the pretreatment RMSPE, is greater than pre limit mult() times the match quality of the treated unit. real must be greater than or equal to 1.

pre_limit_mult(real) 其实是synth里的稳健性检验，排除控制组中处理期前RMSPE大于处理组的多少倍的个体。中文论文中一般用10,5,2，有的也用1.5倍，这里可以填大于或等于1的实数。这个option还挺不错的，在synth里面设置这个还是略显麻烦。

4.training_propr(real) :
training_propr(real) instructs synth_runner to automatically generate the outcome predictors. The default is training_propr(0), which is to not generate any (the user then includes the desired ones in predictorvars). If the value is set to a number greater than 0, then that initial proportion of the pretreatment period is used as a training period, with the rest being the validation period. Outcome predictors for every time in the training period will be added to the synth commands. Diagnostics of the ﬁt for the validation period will be outputted. If the value is between 0 and 1, there will be at least one training period and at least one validation period. If it is set to 1, then all the pretreatment period outcome variables will be used as predictors. This will make other covariate predictors redundant. real must be greater than or equal to 0 and less than or equal to 1.

前面说了“训练期”和“验证期”的比例问题，是将处理期前的时间分成两部分，training_propr(real)这个option就是用来确定比例的，把括号里填的数字作为训练期的比例。实数必须大于或等于0且小于或等于1，默认值为零。设置所谓的训练期不会生成新的变量，它的意义在于将训练期间每次的结果预测变量将添加到synth命令中，然后输出验证期间的拟合诊断（可能指的是p值）。有一点要注意，这里提到，如果将其设置为1，则所有处理期前的结果变量都将用作预测变量。这将使其他协变量预测变量变得多余。其实在实际实证中，设置为1可能会拟合更好，因为它给和处理组个体被预测变量值最接近的控制组个体赋予了更大的权重，但这也有可能“使其他协变量预测变量变得多余”，这样做算不算错会不会出问题，值得探讨。

5.gen_vars
gen_vars generates variables in the dataset from estimation. This is allowed only if there is a single period in which units enter treatment. These variables are required for the following: single treatment graphs and effect graphs. Ifgen vars is speciﬁed, it will generate the following variables:
lead contains the respective time period relative to treatment. lead = 1 speciﬁes the ﬁrst period of treatment. This is to match Cavallo et al. (2013) and is eﬀectively the oﬀset from T0.
depvar_synth contains the unit’s synthetic control outcome for that time period.
effect contains the diﬀerence between the unit’s outcome and its synthetic control for that time period.
pre_rmspe contains the pretreatment match quality in terms of RMSPE. It is constant for a unit.
post_rmspe contains a measure of the posttreatment eﬀect (jointly over all posttreatment time periods) in terms of RMSPE. It is constant for a unit.
depvar_scaled (if the match was done on trends) is the unit’s outcome variable normalized so that its last pretreatment period outcome is 1.
depvar_scaled_synth (if the match was done on trends) is the unit’s synthetic control (scaled) outcome variable.
effect_scaled (if the match was done on trends) is the diﬀerence between the unit’s scaled outcome and its synthetic control’s (scaled) outcome for that time period.

当只有一个处理期时（或许意味着处理组中可以不只有一个个体），可以加入gen_vars 这个option，然后会生成以下几个变量：
lead 对应冲击的相应时间段，从T0开始，如果lead = 1 ，说明这是第一段的冲击。这是为了与Cavallo等人（2013年）匹配。没看过Cavallo这篇文章，所以有些似懂非懂，可能是表示受到冲击的第几期。
depvar_synth包含该时间段内合成控制的结果。
effect 包含该时间段内个体被预测变量的结果与其合成控制组之间的差异。
pre_rmspe表示处理期前匹配的质量，即RMSPE。对于一个单位，它是常数。
post_rmspe是对处理期后冲击的处理效果的度量。对于一个单位，它是常数。
（如果匹配是根据trends完成的）depvar_scaled是经过标准化处理后个体的结果变量，其最后的预处理期结果为1。
（如果匹配是根据trends完成的）depvar_scaled_synth是个体经过缩放的的合成控制组结果变量。
（如果匹配是根据trends完成的）effect_scaled是该个体在该时间段内的缩放结果与其经过缩放的合成控制组结果之间的差异。

6.noenforce_const_pre_length
noenforce_const_pre_length speciﬁes that maximal histories are desired at each estimation stage. When there are multiple periods, estimations at later treatment dates will have more pretreatment history available. By default, these histories are trimmed on the early side so that all estimations have the same amount of history.

noenforce_const_pre_lengtho选项要求在每个估计阶段计算时都需要最大的历史记录。如果有多个时期，那么冲击靠后的估计数据将比较多。如果不加这一个选项，默认情况下，较早的时候历史记录会进行剔除，以便所有估计值都具有相同的历史记录量。

7.ci
ci outputs conﬁdence intervals from randomization inference for raw eﬀect estimates. These should be used only if the treatment is randomly assigned (conditional on covariates and interactive ﬁxed eﬀects). If treatment is not randomly assigned, then these conﬁdence intervals do not have the standard interpretation (see above).

ci 生成的是通过随机推断得到的原始效应估计的置信区间。只有当treatment 是随机选择的时候才能使用（以协变量和交互式固定效应为条件）。如果treatment 不是随机分配的，那么这些置信区间就没有标准的解释（文中有提过）。

8.max_lead(int)
max_lead(int) will limit the number of posttreatment periods analyzed. The default is the maximum number of leads that is available for all treatment periods.

max_lead(int) 限制处理期后分析的时期数量，默认值为所有处理期内可用的最大数量。

9.n_pl_avgs(string)
n_pl_avgs(string) controls the number of placebo averages to compute for inference. The possible total grows exponentially with the number of treated events. The default behavior is to cap the number of averages computed at 1,000,000 and, if the total is more than that, to sample (with replacement) the full distribution. The option n pl avgs(all) can be used to override this behavior and compute all the possible averages. The option n_pl_avgs(#) can be used to specify a speciﬁc number less than the total number of averages possible.

翻译：n_pl_avgs(string) 控制要进行推理的安慰剂平均值的数量。可能的总数随已treated events的数量呈指数增长。默认值是将计算的平均值的上限限制为1,000,000，如果总数大于该上限，则对整个分布进行抽样（替换）。选项n_pl_avgs(all)可用于覆盖此行为并计算所有可能的平均值。选项n_pl_avgs（＃）可用于指定小于可能的平均值总数的特定数字。
这里应该是说调整计算p-value的公式，给分母指定一个少于或等于平均值总数的数字。

10.pred_prog(string)
pred_prog(string) allows for time-contingent predictor sets. The user writes a program that takes as input a time period and outputs via r(predictors) a synth-style predictor string. If one is not using training_propr(), then pred_prog() could be used to dynamically include outcome predictors. See example 3 for usage details.

pred_prog(string) 支持随时间变化的预测变量集。用户编写一个程序，该程序将一个时间段作为输入，并通过r（predictors）输出合成风格的预测变量字符串。如果不使用training_propr(),则可以使用 pred_prog() 动态地包含结果预测变量。

11.deterministicoutput
deterministicoutput, when used with parallel, will eliminate displayed output that would vary depending on the machine (for example, timers and number of parallel clusters) so that log ﬁles can be easily compared across runs.

deterministicoutput，和parallel一起使用时，将消除已显示的输出，该输出会根据机器的不同而有所不同（例如，计时器和并行集群的数量），因此可以轻松比较运行之间的日志文件。

报纸

Lee_iris

发表于 2020-4-30 15:17:39

12.parallel

parallelwill enableparallel processing if the parallel command is installed and conﬁgured. Version 1.18.2 is needed at a minimum.44. At thetime of writing, Statistical Software Components does not contain a new enough version.Newer versions are available via the development website https://github.com/gvegayon/parallel/.

parallel启用并行处理

13.pvals1s

pvals1soutputsone-sided p-values in addition to the two-sided p-values.

除了两侧的p值之外，pvals1s还输出一侧的p值。

14.drop_units_prog(string)

drop_units_prog(string) speciﬁes the name of a program that, when passed the unit tobe considered treated, will drop other units that should not be considered whenforming the synthetic control. This is usually because they are neighboring or interferingunits. See example 3 for usage details.

drop_units_prog(string) 指定了一个程序的名称，当该程序通过要被视为已处理的单元时，它将删除组成合成控制时不应考虑的其他单元。这通常是因为它们是相邻或干扰单元。这个或许可以用来解决溢出效应的问题，通过设置来删除空间邻近的地区。

15.xperiod_prog(string)

xperiod_prog(string)allowsfor setting synth’s xperiod() option, which varies with the treatment period. Theuser-written program is passed the treatment period and should return, via r(xperiod),anumlist suitable for synth’s xperiod() (the period over which generic predictorvariables are averaged). See synth for more details on the xperiod() option. Seeexample 3 for usage details.

xperiod_prog(string)允许设置synth的xperiod（）选项，该选项随treatment时间而异。用户编写的程序已超过治疗期，并应通过r（xperiod）返回一个适合synth的xperiod（）的数值列表（对通用预测变量进行平均的时间段）。

16.mspeperiod_prog(string)

mspeperiod_prog(string)allowsfor setting synth’s mspeperiod() option, which varies with the treatment period.The user-written program is passed the treatment period and should return, via r(mspeperiod),a numlist suitable for synth’s mspeperiod() (the period over which the predictionoutcome is evaluated). See synth for more details on the mspeperiod() option. Seeexample 3 for usage details.

mspeperiod_prog(string)允许设置synth的mspeperiod（）选项，该选项随treatment时间的不同而不同。用户编写的程序已过治疗期，并应通过r（mspeperiod）返回一个适合于synth的mspeperiod（）的数字列表（评估预测结果的时间段）。

17.synthsettings

synthsettingsspeciﬁes pass-through options sent to synth. Seehelp synth formore information. The following are disallowed: counit(), figure, resultsperiod().

synthsettings指定发送到synth的传递选项

以上主要通过谷歌翻译并结合个人理解，如有错误欢迎指正。