生成各年年内匹配完毕的mi.dta数据文件后,就可以进行相邻两年间的样本匹配了。 这么做的目的是,我们需要的是一个拥有企业识别码和年份的面板数据,但中国工业企业数据库中,法人代码和工商注册号均有重复的数值,不能作为唯一的企业识别码。另外,企业改制,改名和在不同地区之间迁徙也导致这一问题非常复杂,因此在没有只能模糊识别技术前提下,BRANDT采用的是贯序识别匹配法,做法原理是:现根据相同的法人代码识别同一家企业,然后根据企业名称再进行识别,最后根据法人、行政区码和乡镇等信息进行识别配。其进行相邻两年间的识别匹配程序为:
forval i =1998/2007{
use m`i'.dta,clear
*将ID中的字母都变成大写的:
replace id`i' = strupper(id`i')
compress
saveold m`i'.10.dta,replace
}
forval i =1998/2007{
use m`i'.10.dta,clear
des,short
}
forval i = 1998/2006{
*设i为当年,j为下一年:
local j = `i'+1
**Step 10 首先根据法人代码(firm_id/id)进行匹配,分离出id重复的样本:
disp "Step 10 "
use m`i'.10.dta,clear
*保留ID重复的样本:
bysort id`i': keep if _N>1
compress
*将重复样本保存为duplicates_ID`i'.dta:
saveold duplicates_ID`i'.dta,replace
use m`i'.10.dta,clear
bysort id`i': drop if _N>1
rename id`i' id
sort id
keep *`i' id
compress
*将匹配成功的保存为match`i'.1.dta:
saveold match`i'.1.dta,replace
*处理下一年的数据,方法跟上面一样:
use m`j'.10.dta,clear
bysort id`j': keep if _N>1
compress
*保存重复ID的样本:
saveold duplicates_ID`j'.dta,replace
use m`j'.10.dta,clear
bysort id`j': drop if _N>1
rename id`j' id
keep *`j' id
sort id
compress
*保存匹配成功的样本:
saveold match`j'.1.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==3
gen id`i' = id
rename id id`j'
drop _merge
*为了方便后面的识别,需要将匹配成功的样本生成匹配方法和匹配结果两个变量(1为i年未匹配成功;2为j年未匹配成功;3为匹配成功):
gen match_method_`i'_`j'="ID"
gen match_status_`i'_`j'="3"
compress
*相邻两年以ID匹配成功的样本保存为matched_by_ID`i'_`j'.dta:
saveold matched_by_ID`i'_`j'.dta,replace
**Step 20 将未能用ID匹配成功的样本以企业名称(firm_name/name)进行匹配:
disp "Step 20 "
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
*保留i年未匹配成功样本:
keep if _m==1
rename id id`i'
*合并ID重复的样本:
append using duplicates_ID`i'.dta
bysort name`i': keep if _N>1
keep *`i'
compress
*保存为duplicates_name`i'.dta:
saveold duplicates_name`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==1
rename id id`i'
append using duplicates_ID`i'.dta
bysort name`i': drop if _N>1
rename name`i' name
sort name
keep *`i' name
compress
saveold unmatched_by_ID`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': keep if _N>1
keep *`j'
compress
saveold duplicates_name`j'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': drop if _N>1
rename name`j' name
sort name
keep *`j' name
compress
saveold unmatched_by_ID`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m==3
gen name`i' = name
rename name name`j'
drop _m
gen match_method_`i'_`j'="firm name"
gen match_status_`i'_`j'="3"
compress
saveold matched_by_name`i'_`j'.dta,replace
**Step 30 未能以企业名称匹配成功的,再以法人(legal_person)+地区码(region_code/dq)进行匹配,当然大家还可以修改为其他匹配变量,例如邮编、传真等:
disp "Step 30 "
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: keep if _N>1
keep *`i'
compress
saveold duplicates_code1_`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`i'
compress
saveold unmatched_by_ID_and_name`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: keep if _N>1
keep *`j'
compress
saveold duplicates_code1_`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`j'
compress
saveold unmatched_by_ID_and_name`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
disp _N
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==3
drop _m code1
gen match_method_`i'_`j' = "legal_person"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_legalperson`i'_`j'.dta,replace
**Step 40 上两步未匹配成功的再以电话(phone)+地区码(dq)+行业代码(cic)进行匹配:
disp "Step 40 "
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : keep if _N>1
keep *`i'
compress
saveold duplicates_code2_`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : drop if _N>1
keep code2 *`i'
sort code2
compress
saveold unmatched_by_ID_and_name_and_legalperson`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : keep if _N>1
keep *`j'
compress
saveold duplicates_code2_`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : drop if _N>1
sort code2
keep code2 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==3
drop _m code2
gen match_method_`i'_`j' = "phone number"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_phone`i'_`j'.dta,replace
**Step 50 以上没有匹配成功的再以开业年(bdat)+地区代码(dq)+行业代码(cic)+乡镇(town)+产品1(product1)进行匹配:
disp "Step 50 "
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: keep if _N>1
keep *`i'
compress
saveold duplicates_code3_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: drop if _N>1
sort code3
keep code3 *`i'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: keep if _N>1
keep *`j'
compress
saveold duplicates_code3_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: drop if _N>1
sort code3
keep code3 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
disp _N
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m==3
drop _m code3
gen match_method_`i'_`j' = "code 3"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_code3_`i'_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 1
drop _m code3
append using duplicates_code3_`i'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "1"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 2
drop _m code3
append using duplicates_code3_`j'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "2"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta,replace
**Step 60 将匹配成功的和未最终匹配成功的样本重新合并成一个m`i'-m`j'.dta文件用于下一步的匹配:
disp "Step 60 "
use matched_by_ID`i'_`j'.dta,clear
append using matched_by_name`i'_`j'.dta
append using matched_by_legalperson`i'_`j'.dta
append using matched_by_phone`i'_`j'.dta
append using matched_by_code3_`i'_`j'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta
compress
saveold m`i'-m`j'.dta,replace
}
forval i = 1998/2006{
local j = `i'+1
use m`i'-m`j'.dta,clear
*相邻两年的样本匹配最重要的结果是生产匹配方法(match_method_`i'_`j')和匹配结果(match_status_`i'_`j')这两个变量:
tab match_method_`i'_`j'
tab match_status_`i'_`j'
}
匹配方法和匹配结果这两个变量是进行下一步——三个年份间匹配的关键。