生成平衡面板数据的代码如下,该文档被命名为balance match.do
匹配完成后可以得到连续两年的平衡面板(1998-1999,1999-2000,...,2005-2006),连续三年的平衡面板(1998-2000,1999-2001,...,2004-2006),...,连续九年的平衡面板(1998-2006)。
*生成连续2年,3年,...,9年的平衡面板
cls
clear
set more off
cd "/Users/youwang/Desktop/FIRM"
qui do "my preprocess"
**------------------------------------------------------------------------------
* PART1:根据法人代码匹配 ID
**------------------------------------------------------------------------------
*以法人代码为匹配变量对数据集进行匹配,本部分匹配和其他各部分的匹配独立
forval i = 1998/2006{
use "m`i'.dta",clear
gen match_id=id`i'
gen match_name=name`i'
gen match_phone=substr(dq`i',1,4)+substr(phone`i',-7,7)
gen match_rep=substr(dq`i',1,4)+substr(nic`i',1,3)+corp_representive`i'
save "m`i'.ID.dta",replace //保存用于匹配的样本
}
forval i = 1998/2006{
use "m`i'.ID.dta",clear
des,short //简单描述即将用于匹配的数据集
}
*连续两年匹配
forval i = 1998/2005{
local j=`i'+1
use "m`i'.ID.dta",clear
merge 1:1 match_id using "m`j'.ID.dta"
keep if _merge==3
gen status_ID`i'_`j' = _merge
drop _merge
save "matched2year.`i'-`j'.ID.dta",replace //保存连续两年匹配成功的样本
}
*连续j年匹配,j=3,4,...,9
forval j = 3/9{
local y=2007-`j'
forval i = 1998/`y'{
local j0=`j'-1
local k=`i'+`j0'-1
local t=`k'+1
use "matched`j0'year.`i'-`k'.ID.dta",clear
merge 1:1 match_id using "m`t'.ID.dta"
keep if _merge==3 //保存连续j年匹配成功的样本(j=3,4,...,9)
gen status_ID`i'_`t' = _merge
drop _merge
save "matched`j'year.`i'-`t'.ID.dta",replace
}
}
**------------------------------------------------------------------------------
* PART2:根据法人名称匹配 NAME
**------------------------------------------------------------------------------
*以法人名称为匹配变量对数据集进行匹配,本部分匹配和其他各部分的匹配独立
forval i = 1998/2006{
use "m`i'.dta",clear
gen match_id=id`i'
gen match_name=name`i'
gen match_phone=substr(dq`i',1,4)+substr(phone`i',-7,7)
gen match_rep=substr(dq`i',1,4)+substr(nic`i',1,3)+corp_representive`i'
bysort match_name : drop if _N>1 //剔除match_name重复的样本
save "m`i'.NAME.dta",replace //保存用于匹配的样本
}
forval i = 1998/2006{
use "m`i'.NAME.dta",clear
des,short //简单描述下即将用于匹配的数据集
}
*连续两年匹配
forval i = 1998/2005{
local j=`i'+1
use "m`i'.NAME.dta",clear
merge 1:1 match_name using "m`j'.NAME.dta"
keep if _merge==3
gen status_NAME`i'_`j' = _merge
drop _merge
save "matched2year.`i'-`j'.NAME.dta",replace //保存连续两年匹配成功的样本
}
*连续j年匹配,j=3,4,...,9
forval j = 3/9{
local y=2007-`j'
forval i = 1998/`y'{
local j0=`j'-1
local k=`i'+`j0'-1
local t=`k'+1
use "matched`j0'year.`i'-`k'.NAME.dta",clear
merge 1:1 match_name using "m`t'.NAME.dta"
keep if _merge==3 //保存连续j年匹配成功的样本(j=3,4,...,9)
gen status_NAME`i'_`t' = _merge
drop _merge
save "matched`j'year.`i'-`t'.NAME.dta",replace
}
}
**------------------------------------------------------------------------------
* PART3:根据电话号码匹配 PHONE
**------------------------------------------------------------------------------
*以电话号码为匹配变量对数据集进行匹配,本部分匹配和其他各部分的匹配独立
*说明:各年统计的电话号码格式不尽相同,有的年份将电话号码和长途区号一并统计,有
*的年份将电话号码和长途区号分开统计。电话号码的位数也不完全相同,有的企业用手机
*号代替电话号码,有的企业电话号码(不含长途区号)只有7位,而有的企业的电话号码
*(不含长途区号)却有8位。为了便于匹配且保证电话号码与企业一一对应,我们用“电话
*号码后七位+地区代码前六位+行业代码”构建新的匹配代码。2000-2003年,地区代码为省
*地县码,只有六位数;2004-20012年,地区代码为行政区代码,有十二位数。行政区代码=
*省地县码(六位数)+乡村码(六位数)。
forval i = 1998/2006{
use "m`i'.dta",clear
gen match_id=id`i'
gen match_name=name`i'
gen match_phone=substr(dq`i',1,4)+substr(phone`i',-7,7)
gen match_rep=substr(dq`i',1,4)+substr(nic`i',1,3)+corp_representive`i'
bysort match_phone : drop if _N>1 //剔除match_phone重复的样本
save "m`i'.PHONE.dta",replace //保存用于匹配的样本
}
forval i = 1998/2006{
use "m`i'.PHONE.dta",clear
des,short //简单描述下即将用于匹配的数据集
}
*连续两年匹配
forval i = 1998/2005{
local j=`i'+1
use "m`i'.PHONE.dta",clear
merge 1:1 match_phone using "m`j'.PHONE.dta"
keep if _merge==3
gen status_PHONE`i'_`j' = _merge
drop _merge
save "matched2year.`i'-`j'.PHONE.dta",replace //保存连续两年匹配成功的样本
}
*连续j年匹配,j=3,4,...,13
forval j = 3/13{
local y=2007-`j'
forval i = 1998/`y'{
local j0=`j'-1
local k=`i'+`j0'-1
local t=`k'+1
use "matched`j0'year.`i'-`k'.PHONE.dta",clear
merge 1:1 match_phone using "m`t'.PHONE.dta"
keep if _merge==3
gen status_PHONE`i'_`t' = _merge
drop _merge
save "matched`j'year.`i'-`t'.PHONE.dta",replace
}
}
**------------------------------------------------------------------------------
* PART4:根据法人代表进行匹配 REP
**------------------------------------------------------------------------------
*以法人代表为匹配变量对数据集进行匹配,本部分匹配和其他各部分的匹配独立
*说明:不同企业存在同名法人代表的情形可能存在,为了解决这个问题,我们在法人代表的前面加上地区代码
*的前四位(代表企业所处的地区)和行业分类码的前三位(代表企业所处的中类)和企业的主要产品,生成新
*法人代表。
forval i = 1998/2006{
use "m`i'.dta",clear
gen match_id=id`i'
gen match_name=name`i'
gen match_phone=substr(dq`i',1,4)+substr(phone`i',-7,7)
gen match_rep=substr(dq`i',1,4)+substr(nic`i',1,3)+corp_representive`i'
bysort match_rep : drop if _N>1 //剔除match_phone重复的样本
save "m`i'.REP.dta",replace //保存用于匹配的样本
}
forval i = 1998/2006{
use "m`i'.REP.dta",clear
des,short //简单描述下即将用于匹配的数据集
}
*连续两年匹配
forval i = 1998/2005{
local j=`i'+1
use "m`i'.REP.dta",clear
merge 1:1 match_rep using "m`j'.REP.dta"
keep if _merge==3
gen status_REP`i'_`j' = _merge
drop _merge
save "matched2year.`i'-`j'.REP.dta",replace //保存连续两年匹配成功的样本
}
*连续j年匹配,j=3,4,...,13
forval j = 3/13{
local y=2007-`j'
forval i = 1998/`y'{
local j0=`j'-1
local k=`i'+`j0'-1
local t=`k'+1
use "matched`j0'year.`i'-`k'.REP.dta",clear
merge 1:1 match_rep using "m`t'.REP.dta"
keep if _merge==3
gen status_REP`i'_`t' = _merge
drop _merge
save "matched`j'year.`i'-`t'.REP.dta",replace
}
}
**------------------------------------------------------------------------------
* PART5:生成平衡面板(连续2年,3年,...,9年)
**------------------------------------------------------------------------------
forval j = 2/9{
local y=2007-`j'
forval i = 1998/`y'{
local k=`i'+`j'-1
use "matched`j'year.`i'-`k'.ID.dta"
append using "matched`j'year.`i'-`k'.NAME.dta"
append using "matched`j'year.`i'-`k'.PHONE.dta"
append using "matched`j'year.`i'-`k'.REP.dta"
bysort match_id : drop if _n>1
bysort match_name : drop if _n>1
bysort match_phone : drop if _n>1
bysort match_rep : drop if _n>1
save "balanced`j'year.`i'-`k'.ALL.dta",replace
}
}