我们在做研究的时候可能会遇到这种情况:实验组50人 对照组500,甚至两组的人数差别更大。
(lz现在做的一个临床研究就是实验组只有<200人,整个研究纳入>4500人。不要问lz为什么会出现这么奇葩的研究设计。主要研究方向的main paper老板早就已经发在了NEJM、Lancet上面了。我只是拿这个研究的数据做其中一个方向分析。)
所以,在分析数据的时候,需要平衡两个组的人数。
2006年美国流行病学杂志 Am J Epidemiol 总结了真实世界研究控制混杂常用的五种方法[1],包括:
1. 多元回归模型调整混杂
2. 倾向性评分匹配(PSM)后构建回归模型
3. 回归模型调整倾向性评分(PS)
4. 回归模型+加权(IPTW)处理
5. 回归模型+加权(SMR)处理
倾向性评分(Propensity score ,PS)在控制混杂方面有独特的优势,尤其是当治疗组和对照组两组的人数差异很大时。倾向性评分(Propensity score ,PS)有多种方法,包括倾向性评分匹配(propensity score matching,PSM)、倾向性评分加权(propensity score weighting,PSW)、倾向性评分分层(propensity score stratification,PSS)、调整倾向性评分(covariate adjustment using propensity score,CAPS)。SAS中应用PS的过程是——proc PSMATCH。
PSMATCH Procedure
[size=14.079999923706055px]This example illustrates the use of the PSMATCH procedure to match observations for individuals in a treatment group with observations for individuals in a control group that have similar propensity scores. The matched observations are saved in an output data set that, with the addition of the outcome variable, can be used to provide an unbiased estimate of the treatment effect.
[size=14.079999923706055px]A pharmaceutical company is conducting a nonrandomized clinical trial to demonstrate the efficacy of a new treatment (Drug_X) by comparing it to an existing treatment (Drug_A). Patients in the trial can choose the treatment that they prefer; otherwise, physicians assign each patient to a treatment. The possibility of treatment selection bias is a concern because it can lead to systematic differences in the distributions of the baseline variables in the two groups, resulting in a biased estimate of treatment effect.
[size=14.079999923706055px]The data set Drugs contains baseline variable measurements for individuals from both treated and control groups. PatientID is the patient identification number, Drug is the treatment group indicator, Gender provides the gender, Age provides the age, and BMI provides the body mass index (a measure of body fat based on height and weight). Typically, more variables are used in a propensity score analysis, but for simplicity only a few variables are used in this example.
[size=14.079999923706055px]Figure 98.2 lists the first 10 observations.
[size=14.079999923706055px]Note that the Drugs data set does not contain a response variable, because the response variable is not used in a propensity score analysis. Instead, the response variable is added to the output data set that contains the matched observations, and the combined data set is then used for outcome analysis.
[size=14.079999923706055px]The following statements invoke the PSMATCH procedure and request optimal matching to match observations for patients in the treatment group with observations for patients in the control group:
[size=14.079999923706055px]The CLASS statement specifies the classification variables. The PSMODEL statement specifies the logistic regression model that creates the propensity score for each observation, which is the probability that the patient receives Drug_X. The Drug variable is the binary treatment indicator variable and TREATED='Drug_X' identifies Drug_X as the treated group. The Gender, Age, and BMI variables are included in the model because they are believed to be related to the assignment.
[size=14.079999923706055px]The REGION= option specifies which observations are used in stratification and matching. In this example, matching is requested by the MATCH statement, and the REGION=CS option requests that only those observations whose propensity scores (or equivalently, logits of propensity scores) lie in the common support region be used for matching. The common support region is defined as the largest interval that contains propensity scores for subjects in both groups. By default, the region is extended by 0.25 times a pooled estimate of the common standard deviation of the logit of the propensity score. For more information, see the description of the EXTEND= option.
[size=14.079999923706055px]The MATCH statement specifies the criteria for matching. The DISTANCE=LPS option (which is the default) requests that the logit of the propensity score be used to compute differences between pairs of observations. The METHOD=OPTIMAL(K=1) option (which is the default) requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference, The EXACT=GENDER option forces the treated unit and its matched control unit to have the same value of the Gender variable.
[size=14.079999923706055px]The CALIPER=0.25 option specifies the caliper requirement for matching. This means that for a match to be made, the difference in the logits of the propensity scores for pairs of individuals from the two groups must be less than or equal to 0.25 times the pooled estimate of the common standard deviation of the logits of the propensity scores.
[size=14.079999923706055px]The "Data Information" table in Figure 98.3 displays information about the input and output data sets, the numbers of observations in the treated and control groups, the lower and upper limits for the propensity score support region, and the numbers of observations in the treated and control groups that fall within the support region. Of the 373 observations in the control group, 351 fall within the support region.
[size=14.079999923706055px]The "Propensity Score Information" table in Figure 98.4 displays summary statistics for propensity scores by treatment group on the basis of all observations, support region observations, and matched observations. When you specify the METHOD=OPTIMAL(K=1) option, all matched observations have the same weight—that is, each matched unit has a weight of 1. Therefore, all the propensity score summary statistics would remain unchanged if you applied these weights to the matched observations. In the example, the WEIGHT=NONE option suppresses the display of summary statistics for weighted matched observations.



雷达卡







京公网安备 11010802022788号







