- * Import raw example data
- cls
- clear
- input strL(file)
- "CA294156;CA297738B;CA294156;UB101920C"
- "CA298876;CA297738B;CA294156;UB451920C;UB451920C"
- "CA2865372C;BP20189789;CA2865372C;BP20189789"
- "TY345728N;TY345728N"
- end
- list
- *- Split long strings into columns
- split file, parse(";") gen(unit)
-
- *- Reshape to long data structure
- gen i = _n
- reshape long unit, i(i) j(j)
- *- Drop repeated character units
- drop if missing(unit)
- drop j
- duplicates drop
-
- *- Reshape to wide data structure
- bysort i: gen j = _n
- reshape wide unit, i(i) j(j)
- *- Add semicolon to each character units
- foreach v of varlist unit* {
- replace `v' = `v' + ";" if !missing(`v')
- }
-
- *- Put all character units together by row and remove the last semicolon
- egen result = concat(unit*)
- replace result = substr(result, 1, length(result)-1)
-
- keep file result // Get what you want



雷达卡



京公网安备 11010802022788号







