最近在尝试用 stata做病例和对照的匹配,匹配指标 var1 var2 var3 var4;在小样本中,用下面这个方法(https://www.statalist.org/forums/forum/general-stata-discussion/general/1009802-1-n-matching-on-age-and-gender)可以很快实现1:4匹配;
但是当case较少,control较多时,比如有1万个case,300万个control,这时候要给每个case匹配4个对照,且对照不重复使用,发现系统运行速度非常缓慢,一跑就死机,完全看不到结果。
不知各位有什么好的解决方法吗?
*============MAKE FAKE DATAclearset seed 1971set obs 20250gen id=_ngen age = int(uniform()*75)gen gender=round(uniform())gen case=1 if _n<251*=======STASH AWAY CASES, THEN GET CONTROLS*=======NEED TO RENAME SUBSTANTIVE VARIABLES AS WELLpreservekeep if case==1rename id case_idsave temp_cases, replacerestorekeep if case!=1rename id control_iddrop case*===sort by random variable in case there were ordering effectsgen trash=uniform()sort trashdrop trash*=====NOW MERGE THEMjoinby age gender using temp_cases*======GETTING RID OF DUPLICATE MATCHESbysort control_id: keep if _n==1*===========KEEPING ONLY FIRST FOURbysort case_id: keep if _n<5


雷达卡





京公网安备 11010802022788号







