小白VS中国工业企业数据库(4):相邻两年间的匹配-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 会计>>

会计库

>>

小白VS中国工业企业数据库(4):相邻两年间的匹配

小白VS中国工业企业数据库(4):相邻两年间的匹配

发布:liuyangclick | 分类:会计库

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

生成各年年内匹配完毕的mi.dta数据文件后,就可以进行相邻两年间的样本匹配了。这么做的目的是,我们需要的是一个拥有企业识别码和年份的面板数据,但中国工业企业数据库中,法人代码和工商注册号均有重复的数值,不 ...
坛友互助群


扫码加入各岗位、行业、专业交流群


生成各年年内匹配完毕的mi.dta数据文件后,就可以进行相邻两年间的样本匹配了。 这么做的目的是,我们需要的是一个拥有企业识别码和年份的面板数据,但中国工业企业数据库中,法人代码和工商注册号均有重复的数值,不能作为唯一的企业识别码。另外,企业改制,改名和在不同地区之间迁徙也导致这一问题非常复杂,因此在没有只能模糊识别技术前提下,BRANDT采用的是贯序识别匹配法,做法原理是:现根据相同的法人代码识别同一家企业,然后根据企业名称再进行识别,最后根据法人、行政区码和乡镇等信息进行识别配。其进行相邻两年间的识别匹配程序为:
forval i =1998/2007{
use m`i'.dta,clear
*将ID中的字母都变成大写的:
replace id`i' = strupper(id`i')
compress
saveold m`i'.10.dta,replace
}
forval i =1998/2007{
use m`i'.10.dta,clear
des,short
}
forval i = 1998/2006{
*设i为当年,j为下一年:
local j = `i'+1
**Step 10首先根据法人代码(firm_id/id)进行匹配,分离出id重复的样本:
disp "Step 10 "
use m`i'.10.dta,clear
*保留ID重复的样本:
bysort id`i': keep if _N>1
compress
*将重复样本保存为duplicates_ID`i'.dta:
saveold duplicates_ID`i'.dta,replace
use m`i'.10.dta,clear
bysort id`i': drop if _N>1
rename id`i' id
sort id
keep *`i' id
compress
*将匹配成功的保存为match`i'.1.dta:
saveold match`i'.1.dta,replace
*处理下一年的数据,方法跟上面一样:
use m`j'.10.dta,clear
bysort id`j': keep if _N>1
compress
*保存重复ID的样本:
saveold duplicates_ID`j'.dta,replace
use m`j'.10.dta,clear
bysort id`j': drop if _N>1
rename id`j' id
keep *`j' id
sort id
compress
*保存匹配成功的样本:
saveold match`j'.1.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==3
gen id`i' = id
rename id id`j'
drop _merge
*为了方便后面的识别,需要将匹配成功的样本生成匹配方法和匹配结果两个变量(1为i年未匹配成功;2为j年未匹配成功;3为匹配成功):
gen match_method_`i'_`j'="ID"
gen match_status_`i'_`j'="3"
compress
*相邻两年以ID匹配成功的样本保存为matched_by_ID`i'_`j'.dta:
saveold matched_by_ID`i'_`j'.dta,replace
**Step 20 将未能用ID匹配成功的样本以企业名称(firm_name/name)进行匹配:
disp "Step 20 "
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
*保留i年未匹配成功样本:
keep if _m==1
rename id id`i'
*合并ID重复的样本:
append using duplicates_ID`i'.dta
bysort name`i': keep if _N>1
keep *`i'
compress
*保存为duplicates_name`i'.dta:
saveold duplicates_name`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==1
rename id id`i'
append using duplicates_ID`i'.dta
bysort name`i': drop if _N>1
rename name`i' name
sort name
keep *`i' name
compress
saveold unmatched_by_ID`i'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': keep if _N>1
keep *`j'
compress
saveold duplicates_name`j'.dta,replace
use match`i'.1.dta,clear
merge 1:1 id using match`j'.1.dta
keep if _m==2
rename id id`j'
append using duplicates_ID`j'.dta
bysort name`j': drop if _N>1
rename name`j' name
sort name
keep *`j' name
compress
saveold unmatched_by_ID`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m==3
gen name`i' = name
rename name name`j'
drop _m
gen match_method_`i'_`j'="firm name"
gen match_status_`i'_`j'="3"
compress
saveold matched_by_name`i'_`j'.dta,replace
**Step 30未能以企业名称匹配成功的,再以法人(legal_person)+地区码(region_code/dq)进行匹配,当然大家还可以修改为其他匹配变量,例如邮编、传真等:
disp "Step 30 "
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: keep if _N>1
keep *`i'
compress
saveold duplicates_code1_`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 1
rename name name`i'
append using duplicates_name`i'.dta
replace legal_person`i' = "." if legal_person`i' == ""
gen code1 = legal_person`i' + substr(dq`i',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`i'
compress
saveold unmatched_by_ID_and_name`i'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: keep if _N>1
keep *`j'
compress
saveold duplicates_code1_`j'.dta,replace
use unmatched_by_ID`i'.dta,clear
merge 1:1 name using unmatched_by_ID`j'.dta
keep if _m == 2
rename name name`j'
append using duplicates_name`j'.dta
* replace legal_person`j' = "." if legal_person`j' == ""
gen code1 = legal_person`j' + substr(dq`j',1,4)
bysort code1: drop if _N>1
sort code1
keep code1 *`j'
compress
saveold unmatched_by_ID_and_name`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
disp _N
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==3
drop _m code1
gen match_method_`i'_`j' = "legal_person"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_legalperson`i'_`j'.dta,replace
**Step 40上两步未匹配成功的再以电话(phone)+地区码(dq)+行业代码(cic)进行匹配:
disp "Step 40 "
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : keep if _N>1
keep *`i'
compress
saveold duplicates_code2_`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==1
drop code1
append using duplicates_code1_`i'.dta
replace phone`i' = "." if phone`i' == ""
gen code2 = substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i'
bysort code2 : drop if _N>1
keep code2 *`i'
sort code2
compress
saveold unmatched_by_ID_and_name_and_legalperson`i'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : keep if _N>1
keep *`j'
compress
saveold duplicates_code2_`j'.dta,replace
use unmatched_by_ID_and_name`i'.dta,clear
merge 1:1 code1 using unmatched_by_ID_and_name`j'.dta
keep if _m==2
drop code1
append using duplicates_code1_`j'.dta
* replace phone`j' = "." if phone`j' == ""
gen code2 = substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j'
bysort code2 : drop if _N>1
sort code2
keep code2 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==3
drop _m code2
gen match_method_`i'_`j' = "phone number"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_phone`i'_`j'.dta,replace
**Step 50以上没有匹配成功的再以开业年(bdat)+地区代码(dq)+行业代码(cic)+乡镇(town)+产品1(product1)进行匹配:
disp "Step 50 "
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: keep if _N>1
keep *`i'
compress
saveold duplicates_code3_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==1
drop code2
append using duplicates_code2_`i'.dta
replace town`i' = "." if town`i' == ""
replace product1_`i' = "." if product1_`i' == ""
gen code3 = string(bdat`i')+substr(dq`i',1,4)+substr(cic`i',1,3)+town`i'+product1_`i'
bysort code3: drop if _N>1
sort code3
keep code3 *`i'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: keep if _N>1
keep *`j'
compress
saveold duplicates_code3_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson`i'.dta,clear
merge 1:1 code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
keep if _m==2
drop code2
append using duplicates_code2_`j'.dta
* replace town`j' = "." if town`j' == ""
replace product1_`j' = "." if product1_`j' == ""
gen code3 = string(bdat`j')+substr(dq`j',1,4)+substr(cic`j',1,3)+town`j'+product1_`j'
bysort code3: drop if _N>1
sort code3
keep code3 *`j'
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
disp _N
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m==3
drop _m code3
gen match_method_`i'_`j' = "code 3"
gen match_status_`i'_`j' = "3"
compress
saveold matched_by_code3_`i'_`j'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 1
drop _m code3
append using duplicates_code3_`i'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "1"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta,replace
use unmatched_by_ID_and_name_and_legalperson_and_phone_`i'.dta,clear
merge 1:1 code3 using unmatched_by_ID_and_name_and_legalperson_and_phone_`j'.dta
keep if _m == 2
drop _m code3
append using duplicates_code3_`j'.dta
gen match_method_`i'_`j' = ""
gen match_status_`i'_`j' = "2"
compress
saveold unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta,replace
**Step 60将匹配成功的和未最终匹配成功的样本重新合并成一个m`i'-m`j'.dta文件用于下一步的匹配:
disp "Step 60 "
use matched_by_ID`i'_`j'.dta,clear
append using matched_by_name`i'_`j'.dta
append using matched_by_legalperson`i'_`j'.dta
append using matched_by_phone`i'_`j'.dta
append using matched_by_code3_`i'_`j'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`i'.dta
append using unmatched_by_ID_and_name_and_legalperson_and_phone_and_code2`j'.dta
compress
saveold m`i'-m`j'.dta,replace
}
forval i = 1998/2006{
local j = `i'+1
use m`i'-m`j'.dta,clear
*相邻两年的样本匹配最重要的结果是生产匹配方法(match_method_`i'_`j')和匹配结果(match_status_`i'_`j')这两个变量:
tab match_method_`i'_`j'
tab match_status_`i'_`j'
}
匹配方法和匹配结果这两个变量是进行下一步——三个年份间匹配的关键。
扫码或添加微信号:坛友素质互助


「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-5913071-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
经管之家 人大经济论坛 大学 专业 手机版