RegisterAddress是杂乱无章的地址,var1是所有地级市的模糊名称,var2是所有地级市的标准名称。
我想的办法是:通过var1用indexnot函数来从RegisterAddress提取有用的信息,最后把地址替换为地级市的标准名称var2。
如题,我想对var1和var2里的每个数据都执行如下三个命令:
gen 判断变量=indexnot("var1",RegisterAddress)//通过var1用indexnot函数来从RegisterAddress提取有用的信息
replace RegisterAddress="var2" if 判断变量==0//将能够提取出有用信息的观测值改为对应的标准地级市名称
drop 判断变量//删除用于判断是否提取出有用信息的变量
我的部分数据如下:
- * Example generated by -dataex-. To install: ssc install dataex
- clear
- input str6 Symbol str4 year str532 RegisterAddress str33(var1 var2)
- "000089" "2010" "深圳市" "黑河" "黑河市"
- "000089" "2010" "深圳市" "绥化" "绥化市"
- "000089" "2010" "深圳市" "大兴安岭" "大兴安岭地区"
- "000089" "2010" "深圳市" "上海" "上海市"
- "000089" "2010" "深圳市" "南京" "南京市"
- "000089" "2010" "深圳市" "无锡" "无锡市"
- "000089" "2010" "深圳市" "徐州" "徐州市"
- "000089" "2010" "深圳市" "常州" "常州市"
- "000089" "2010" "深圳市" "苏州" "苏州市"
- "000089" "2010" "深圳市" "南通" "南通市"
- "000089" "2010" "深圳市" "连云港" "连云港市"
- "000099" "2010" "北京" "淮安" "淮安市"
- "000099" "2010" "深圳" "盐城" "盐城市"
- "000155" "2010" "四川省成都市青白江区" "扬州" "扬州市"
- "000155" "2010" "成都市" "镇江" "镇江市"
- "000155" "2010" "四川省泸州市合江县榕山镇" "泰州" "泰州市"
- "000155" "2010" "四川省成都市青白江区" "宿迁" "宿迁市"
- "000155" "2010" "四川省成都市青白江区" "杭州" "杭州市"
- "000155" "2010" "成都市" "宁波" "宁波市"
- "000159" "2010" "乌鲁木齐市团结路45号" "温州" "温州市"
- "000159" "2010" "乌鲁木齐市北京南路22号龙岭大厦412号" "嘉兴" "嘉兴市"
- "000159" "2010" "博尔塔拉蒙古自治州" "湖州" "湖州市"
- "000159" "2010" "奎屯市" "绍兴" "绍兴市"
- "000159" "2010" "拜城县红旗路16号" "金华" "金华市"
- "000159" "2010" "哈密市爱国北路21号" "衢州" "衢州市"
- "000159" "2010" "乌鲁木齐市四十户路189号" "舟山" "舟山市"
- "000159" "2010" "北京" "台州" "台州市"
- "000159" "2010" "香港Rooms 407-10,4th Floor,Tower Two,Lippo Centre,89 Queensway,Hong Kong." "丽水" "丽水市"
- "000159" "2010" "乌鲁木齐市头屯河区王家沟工业园区" "合肥" "合肥市"
- "000401" "2010" "河北省三河市" "芜湖" "芜湖市"
- "000401" "2010" "河北省唐山市古冶区" "蚌埠" "蚌埠市"
- end
比如:
var1:北京 天津 石家庄
var2:北京市 天津市 石家庄市
我想要
gen 判断变量=indexnot("北京",RegisterAddress)
replace RegisterAddress="北京市" if 北京==0
drop 判断变量
gen 判断变量=indexnot("天津",RegisterAddress)
replace RegisterAddress="天津市" if 天津==0
drop 判断变量
gen 判断变量=indexnot("石家庄",RegisterAddress)
replace RegisterAddress="石家庄市" if 石家庄==0
drop 判断变量