小白VS中国工业企业数据库（4）：相邻两年间的匹配-经管之家官网！

经济学管理学金融学统计学

您当前的位置> 会计>>

小白VS中国工业企业数据库（4）：相邻两年间的匹配

人大经济论坛-经管之家：分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易，真正实现买卖双方的共赢。【请点击这里访问】

TOP热门关键词

专题页面精选

生成各年年内匹配完毕的mi.dta数据文件后，就可以进行相邻两年间的样本匹配了。这么做的目的是，我们需要的是一个拥有企业识别码和年份的面板数据，但中国工业企业数据库中，法人代码和工商注册号均有重复的数值，不 ...

坛友互助群

扫码加入各岗位、行业、专业交流群

生成各年年内匹配完毕的mi.dta数据文件后，就可以进行相邻两年间的样本匹配了。这么做的目的是，我们需要的是一个拥有企业识别码和年份的面板数据，但中国工业企业数据库中，法人代码和工商注册号均有重复的数值，不能作为唯一的企业识别码。另外，企业改制，改名和在不同地区之间迁徙也导致这一问题非常复杂，因此在没有只能模糊识别技术前提下，BRANDT采用的是贯序识别匹配法，做法原理是：现根据相同的法人代码识别同一家企业，然后根据企业名称再进行识别，最后根据法人、行政区码和乡镇等信息进行识别配。其进行相邻两年间的识别匹配程序为：
forval i =1998/2007{
use m`i'.dta,clear
*将ID中的字母都变成大写的：
replace id`i' = strupper(id`i')
compress
saveold m`i'.10.dta,replace
}
forval i =1998/2007{
use m`i'.10.dta,clear
des,short
}
forval i = 1998/2006{
*设i为当年，j为下一年：
local j = `i'+1
**Step 10首先根据法人代码（firm_id/id）进行匹配，分离出id重复的样本：
disp "Step 10 "
use m`i'.10.dta,clear
*保留ID重复的样本：
bysort id`i': keep if _N>1
compress
*将重复样本保存为duplicates_ID`i'.dta：
saveold duplicates_ID`i'.dta,replace
use m`i'.10.dta,clear
bysort id`i': drop if _N>1
rename id`i' id
sort id
keep *`i' id
compress
*将匹配成功的保存为match`i'.1.dta：
saveold match`i'.1.dta,replace
*处理下一年的数据，方法跟上面一样：
use m`j'.10.dta,clear
bysort id`j': keep if _N>1
compress
*保存重复ID的样本：
saveold duplicates_ID`j'.dta,replace
use m`j'.10.dta,clear
bysort id`j': drop if _N>1
rename id`j' id
keep *`j' id
sort id
compress
*保存匹配成功的样本：
saveold match`j'.1.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==3
gen id`i' = id
rename id id`j'
drop _merge
*为了方便后面的识别，需要将匹配成功的样本生成匹配方法和匹配结果两个变量（1为i年未匹配成功；2为j年未匹配成功；3为匹配成功）：
gen match_method_`i'_`j'="ID"
gen match_status_`i'_`j'="3"
compress
*相邻两年以ID匹配成功的样本保存为matched_by_ID`i'_`j'.dta：
saveold matched_by_ID`i'_`j'.dta,replace
**Step 20 将未能用ID匹配成功的样本以企业名称（firm_name/name）进行匹配：
disp "Step 20 "
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
*保留i年未匹配成功样本：
keep if _m==1
rename id id`i'
*合并ID重复的样本：
append using duplicates_ID`i'.dta
bysort name`i': keep if _N>1
keep *`i'
compress
*保存为duplicates_name`i'.dta：
saveold duplicates_name`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==1
rename id id`i'
append using duplicates_ID`i'.dta
bysort name`i': drop if _N>1
rename name`i' name
sort name
keep *`i' name
compress
saveold unmatched_by_ID`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': keep if _N>1
keep *`j'
compress
saveold duplicates_name`j'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': drop if _N>1
rename name`j' name
sort name
keep *`j' name
compress
saveold unmatched_by_ID`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m==3
gen name`i' = name
rename name name`j'
drop _m
gen match_method_`i'_`j'="firm name"
gen match_status_`i'_`j'="3"
compress
saveold matched_by_name`i'_`j'.dta,replace
**Step 30未能以企业名称匹配成功的，再以法人（legal_person）+地区码（region_code/dq）进行匹配，当然大家还可以修改为其他匹配变量，例如邮编、传真等：
disp "Step 30 "
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: keep if _N>1
keep *`i'
compress
saveold duplicates_code1_`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`i'
compress
saveold unmatched_by_ID_and_name`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: keep if _N>1
keep *`j'
compress
saveold duplicates_code1_`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`j'
compress
saveold unmatched_by_ID_and_name`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
disp _N
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==3
drop _m code1
gen match_method_`i'_`j' = "legal_person"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_legalperson`i'_`j'.dta,replace
**Step 40上两步未匹配成功的再以电话（phone）+地区码（dq）+行业代码(cic)进行匹配：
disp "Step 40 "
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : keep if _N>1
keep *`i'
compress
saveold duplicates_code2_`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : drop if _N>1
keep code2 *`i'
sort code2
compress
saveold unmatched_by_ID_and_name_and_legalperson`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : keep if _N>1
keep *`j'
compress
saveold duplicates_code2_`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : drop if _N>1
sort code2
keep code2 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==3
drop _m code2
gen match_method_`i'_`j' = "phone number"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_phone`i'_`j'.dta,replace
**Step 50以上没有匹配成功的再以开业年(bdat)+地区代码(dq)+行业代码(cic)+乡镇(town)+产品1(product1)进行匹配：
disp "Step 50 "
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: keep if _N>1
keep *`i'
compress
saveold duplicates_code3_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: drop if _N>1
sort code3
keep code3 *`i'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: keep if _N>1
keep *`j'
compress
saveold duplicates_code3_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: drop if _N>1
sort code3
keep code3 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
disp _N
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m==3
drop _m code3
gen match_method_`i'_`j' = "code 3"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_code3_`i'_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 1
drop _m code3
append using duplicates_code3_`i'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "1"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 2
drop _m code3
append using duplicates_code3_`j'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "2"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta,replace
**Step 60将匹配成功的和未最终匹配成功的样本重新合并成一个m`i'-m`j'.dta文件用于下一步的匹配：
disp "Step 60 "
use matched_by_ID`i'_`j'.dta,clear
append using matched_by_name`i'_`j'.dta
append using matched_by_legalperson`i'_`j'.dta
append using matched_by_phone`i'_`j'.dta
append using matched_by_code3_`i'_`j'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta
compress
saveold m`i'-m`j'.dta,replace
}
forval i = 1998/2006{
local j = `i'+1
use m`i'-m`j'.dta,clear
*相邻两年的样本匹配最重要的结果是生产匹配方法（match_method_`i'_`j'）和匹配结果（match_status_`i'_`j'）这两个变量：
tab match_method_`i'_`j'
tab match_status_`i'_`j'
}
匹配方法和匹配结果这两个变量是进行下一步——三个年份间匹配的关键。

扫码或添加微信号：坛友素质互助

「经管之家」APP：经管人学习、答疑、交友，就上经管之家！
免流量费下载资料----在经管之家app可以下载论坛上的所有资源，并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库，各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里，你可以找到任何学科方向、有共同话题的朋友。
经管之家（原人大经济论坛），跨越高校的围墙，带你走进经管知识的新世界。
扫描下方二维码下载并注册APP

本文关键词：

本文论坛网址：https://bbs.pinggu.org/thread-5913071-1-1.html

上一篇 | Morgan Stanley 长篇：Why Are the Large ...

下一篇 | 麦道夫如何诈骗了包括汇丰银行在内的数十 ...

会计库精彩帖子推荐更多

您可能感兴趣的文章

本站推荐的文章

人气文章

本文标题：小白VS中国工业企业数据库（4）：相邻两年间的匹配

本文链接网址：https://bbs.pinggu.org/jg/huiji_huijiku_5913071_1.html

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。

小白VS中国工业企业数据库（4）：相邻两年间的匹配-经管之家官网！

会计库