楼主: 今年,夏末
9851 10

[学习分享] SAS proc psmatch 平衡治疗组和对照组两组人数差别比较大时的应用-Propensity score [推广有奖]

  • 0关注
  • 1粉丝

已卖:118份资源

高中生

55%

还不是VIP/贵宾

-

威望
0
论坛币
283 个
通用积分
9.6800
学术水平
1 点
热心指数
1 点
信用等级
1 点
经验
384 点
帖子
20
精华
0
在线时间
41 小时
注册时间
2014-10-16
最后登录
2022-11-10

楼主
今年,夏末 学生认证  发表于 2019-8-26 09:55:38 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
SAS学习之—— proc PSMATCH

我们在做研究的时候可能会遇到这种情况:实验组50人 对照组500,甚至两组的人数差别更大。

(lz现在做的一个临床研究就是实验组只有<200人,整个研究纳入>4500人。不要问lz为什么会出现这么奇葩的研究设计。主要研究方向的main paper老板早就已经发在了NEJM、Lancet上面了。我只是拿这个研究的数据做其中一个方向分析。)

所以,在分析数据的时候,需要平衡两个组的人数。


2006年美国流行病学杂志 Am J Epidemiol 总结了真实世界研究控制混杂常用的五种方法[1],包括:

1. 多元回归模型调整混杂

2. 倾向性评分匹配(PSM)后构建回归模型

3. 回归模型调整倾向性评分(PS)

4. 回归模型+加权(IPTW)处理

5. 回归模型+加权(SMR)处理

WX20190826-115659.png


倾向性评分(Propensity score ,PS)在控制混杂方面有独特的优势,尤其是当治疗组和对照组两组的人数差异很大时倾向性评分(Propensity score ,PS)有多种方法,包括倾向性评分匹配(propensity score matching,PSM)、倾向性评分加权(propensity score weighting,PSW)、倾向性评分分层(propensity score stratification,PSS)、调整倾向性评分(covariate adjustment using propensity score,CAPS)。SAS中应用PS的过程是——proc PSMATCH。
PSMATCH Procedure

[size=14.079999923706055px]This example illustrates the use of the PSMATCH procedure to match observations for individuals in a treatment group with observations for individuals in a control group that have similar propensity scores. The matched observations are saved in an output data set that, with the addition of the outcome variable, can be used to provide an unbiased estimate of the treatment effect.

[size=14.079999923706055px]A pharmaceutical company is conducting a nonrandomized clinical trial to demonstrate the efficacy of a new treatment (Drug_X) by comparing it to an existing treatment (Drug_A). Patients in the trial can choose the treatment that they prefer; otherwise, physicians assign each patient to a treatment. The possibility of treatment selection bias is a concern because it can lead to systematic differences in the distributions of the baseline variables in the two groups, resulting in a biased estimate of treatment effect.

[size=14.079999923706055px]The data set Drugs contains baseline variable measurements for individuals from both treated and control groups. PatientID is the patient identification number, Drug is the treatment group indicator, Gender provides the gender, Age provides the age, and BMI provides the body mass index (a measure of body fat based on height and weight). Typically, more variables are used in a propensity score analysis, but for simplicity only a few variables are used in this example.

[size=14.079999923706055px]Figure 98.2 lists the first 10 observations.

WX20190826-115750.png

[size=14.079999923706055px]Note that the Drugs data set does not contain a response variable, because the response variable is not used in a propensity score analysis. Instead, the response variable is added to the output data set that contains the matched observations, and the combined data set is then used for outcome analysis.

[size=14.079999923706055px]The following statements invoke the PSMATCH procedure and request optimal matching to match observations for patients in the treatment group with observations for patients in the control group:

WX20190826-115951.png

[size=14.079999923706055px]The CLASS statement specifies the classification variables. The PSMODEL statement specifies the logistic regression model that creates the propensity score for each observation, which is the probability that the patient receives Drug_X. The Drug variable is the binary treatment indicator variable and TREATED='Drug_X' identifies Drug_X as the treated group. The Gender, Age, and BMI variables are included in the model because they are believed to be related to the assignment.

[size=14.079999923706055px]The REGION= option specifies which observations are used in stratification and matching. In this example, matching is requested by the MATCH statement, and the REGION=CS option requests that only those observations whose propensity scores (or equivalently, logits of propensity scores) lie in the common support region be used for matching. The common support region is defined as the largest interval that contains propensity scores for subjects in both groups. By default, the region is extended by 0.25 times a pooled estimate of the common standard deviation of the logit of the propensity score. For more information, see the description of the EXTEND= option.

[size=14.079999923706055px]The MATCH statement specifies the criteria for matching. The DISTANCE=LPS option (which is the default) requests that the logit of the propensity score be used to compute differences between pairs of observations. The METHOD=OPTIMAL(K=1) option (which is the default) requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference, The EXACT=GENDER option forces the treated unit and its matched control unit to have the same value of the Gender variable.

[size=14.079999923706055px]The CALIPER=0.25 option specifies the caliper requirement for matching. This means that for a match to be made, the difference in the logits of the propensity scores for pairs of individuals from the two groups must be less than or equal to 0.25 times the pooled estimate of the common standard deviation of the logits of the propensity scores.

[size=14.079999923706055px]The "Data Information" table in Figure 98.3 displays information about the input and output data sets, the numbers of observations in the treated and control groups, the lower and upper limits for the propensity score support region, and the numbers of observations in the treated and control groups that fall within the support region. Of the 373 observations in the control group, 351 fall within the support region.

WX20190826-120000.png

[size=14.079999923706055px]The "Propensity Score Information" table in Figure 98.4 displays summary statistics for propensity scores by treatment group on the basis of all observations, support region observations, and matched observations. When you specify the METHOD=OPTIMAL(K=1) option, all matched observations have the same weight—that is, each matched unit has a weight of 1. Therefore, all the propensity score summary statistics would remain unchanged if you applied these weights to the matched observations. In the example, the WEIGHT=NONE option suppresses the display of summary statistics for weighted matched observations.

WX20190826-120013.png

未完待续


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:SAS psmatch 控制混杂因素 Propensity score

基于相对危险度:患病率比的模型及参数估计方法研究进展.pdf
下载链接: https://bbs.pinggu.org/a-2905325.html

407.7 KB

需要: 1 个论坛币  [购买]

基于相对危险度/患病率比的模型及参数估计方法研究进展

倾向评分法在SAS软件中的实现.pdf

859.38 KB

需要: 1 个论坛币  [购买]

倾向评分法在SAS软件中的实现

本帖被以下文库推荐

  • · MyLib|主题: 414, 订阅: 41

沙发
今年,夏末(未真实交易用户) 学生认证  发表于 2019-8-26 10:11:51

[size=14.079999923706055px]The "Matching Information" table in Figure 98.5 displays the matching criteria, the number of matched sets, the numbers of matched observations in the treated and control groups, and the total absolute difference in the logit of the propensity score for all matches.

[size=14.079999923706055px] WX20190826-120450.png

[size=14.079999923706055px]The ASSESS statement produces a table and plots that summarize differences in specified variables between treated and control groups. As specified by the LPS and ALLCOV options, these variables are the logit of the propensity score (LPS) and all the covariates in the PSMODEL statement: Gender, Age, and BMI. For a binary classification variable (Gender), the difference is in the proportion of the first ordered level (Female).

[size=14.079999923706055px]The "Standardized Mean Differences" table, shown in Figure 98.6, displays standardized mean differences for all observations, observations in the support region, and matched observations. The WEIGHT=NONE option suppresses the display of differences for weighted matched observations. Note that when one control unit is matched to each treated unit, the weights are all 1 for matched treated and control units and the results are identical for weighted matched observations and matched observations.

[size=14.079999923706055px] WX20190826-120502.png

[size=14.079999923706055px]By default, the standard deviations of the variables, pooled across the treated and control groups, are computed based on all observations. The pooled standard deviations are then used to compute standardized mean differences based on all observations, observations in the support region, and matched observations. You can request a different standard deviation with the STDDEV= option. In Figure 98.6 the standardized mean differences are significantly reduced in the matched observations. The largest of these differences in absolute value is 0.0646, which is less than the upper limit of 0.25 recommended by Rubin (Figure 98.6. All differences for the matched observations are within the recommended limits of –0.25 and 0.25, which are indicated by the shaded area. Again, note that many authors use limits of –0.10 and 0.10. (Normand et al. 2001; Mamdani et al. 2005; Austin 2009). You can use the PLOTS=STDDIFFPLOT(REF=) option to specify the limits for the shaded area.

[size=14.079999923706055px]The PLOTS=BARCHART option requests bar charts that compare the treated and control group distributions of binary classification variables that are specified in the ASSESS statement. The bar chart that is created for Gender is shown in Figure 98.8. The chart displays proportions by default, and it provides comparisons based on all observations, observations in the support region, and matched observations. The distributions of Gender are identical for matched observations because EXACT=GENDER is specified in the MATCH statement.

[size=14.079999923706055px] WX20190826-120525.png

[size=14.079999923706055px]The PLOTS=BOXPLOT option requests box plots for the logit of the propensity score (LPS) and for the continuous variables that are specified in the ASSESS statement, as shown in Figure 98.9, Figure 98.10, and Figure 98.11. The box plots show good variable balance for the matched observations.

[size=14.079999923706055px] WX20190826-120546.png

WX20190826-120538.png

[size=14.079999923706055px] WX20190826-120607.png

[size=14.079999923706055px]Because the matched observations in this example exhibit good balance, you can output them for subsequent outcome analysis. In situations where you are not satisfied with the balance, you can do one or more of the following to improve the balance: you can select another set of variables for the propensity score model, you can modify the specification of the propensity score model (for example, by introducing nonlinear terms for the continuous variables or by adding interactions), you can modify the matching criteria, or you can choose another matching method.

[size=14.079999923706055px]The OUT(OBS=MATCH)= option in the OUTPUT statement creates an output data set named Outgs that contains the matched observations. By default, this data set includes the variable _PS_ (which provides the propensity score) and the variable _MATCHWGT_ (which provides matched observation weights). The weight for each treated unit is 1. The weight for each matched control unit is also 1 because one control unit is matched to each treated unit. The LPS=_LPS option adds a variable named _LPS that provides the logit of the propensity score, and the MATCHID=_MatchID option adds a variable named _MatchID that identifies the matched sets of observations.



[size=14.079999923706055px]The following statements list the observations in the first five matched sets, as shown in Figure 98.12.

[size=14.079999923706055px] WX20190826-120616.png

[size=14.079999923706055px] WX20190826-120623.png

After the responses for the trial are observed and added to the matched data set Outgs, you can estimate the treatment effect by carrying out the same type of outcome analysis on Outgs that you would have used with the original data set Drugs (augmented with responses) as if it were a randomized trial (Ho et al. 2007, p. 223). This assumes that no other confounding variables are associated with both the response variable and the treatment group indicator Drug.


SAS中除了用以上proc PSMATCH实现以外,还有用其他方式实现的,这里就不详细介绍了,把查到资料贴在下面,大家一起学习一下。(附件1楼已经上传)

WX20190826-120013.png (41.18 KB)

WX20190826-120013.png

WX20190826-120515.png (57.35 KB)

WX20190826-120515.png

WX20190826-115951.png (38.55 KB)

WX20190826-115951.png

WX20190826-120000.png (42.76 KB)

WX20190826-120000.png

已有 1 人评分学术水平 热心指数 信用等级 收起 理由
Cecilia_Xi + 1 + 1 + 1 精彩帖子

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

藤椅
stata18(真实交易用户) 发表于 2019-9-1 10:55:04
多谢分享

板凳
sj_seu(未真实交易用户) 发表于 2020-1-8 18:07:22
多谢分享,正好在在做一个RWE study,要用psm做矫正

报纸
Cecilia_Xi(未真实交易用户) 在职认证  发表于 2020-3-12 16:17:43
多谢楼主分享! 请问lz用的哪个版本呢?我的SAS 9.4提示 proc psmatch not found  

地板
Cecilia_Xi(未真实交易用户) 在职认证  发表于 2020-3-12 16:20:17
Cecilia_Xi 发表于 2020-3-12 16:17
多谢楼主分享! 请问lz用的哪个版本呢?我的SAS 9.4提示 proc psmatch not found
SAS/STAT 14.2 (November 2016) runs on SAS 9.4M4 and later releases

7
今年,夏末(未真实交易用户) 学生认证  发表于 2020-3-29 16:35:32
Cecilia_Xi 发表于 2020-3-12 16:17
多谢楼主分享! 请问lz用的哪个版本呢?我的SAS 9.4提示 proc psmatch not found
我的是SAS企业版7.100.5.6214 最新的版本

8
yuyan1(未真实交易用户) 学生认证  发表于 2020-4-19 22:43:48
Cecilia_Xi 发表于 2020-3-12 16:20
SAS/STAT 14.2 (November 2016) runs on SAS 9.4M4 and later releases
请问您的问题解决了吗?我的SAS也是在这一步报错

9
yuyan1(未真实交易用户) 学生认证  发表于 2020-4-19 22:44:28
Cecilia_Xi 发表于 2020-3-12 16:17
多谢楼主分享! 请问lz用的哪个版本呢?我的SAS 9.4提示 proc psmatch not found
请问您的问题解决了吗?我的SAS也是在这一步报错

10
Cecilia_Xi(未真实交易用户) 在职认证  发表于 2020-4-19 22:47:25
yuyan1 发表于 2020-4-19 22:44
请问您的问题解决了吗?我的SAS也是在这一步报错
要SAS9.4 M4以上才支持。如果不是这个版本,需要用到相对比较复杂的宏,或者使用其它相对方便的软件如stata

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-2-11 14:24