楼主: 姜先生
30505 5

[数据管理求助] stata数据合并时关键变量数据类型不匹配的问题 [推广有奖]

  • 0关注
  • 0粉丝

高中生

0%

还不是VIP/贵宾

-

威望
0
论坛币
786 个
通用积分
0
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
114 点
帖子
11
精华
0
在线时间
25 小时
注册时间
2014-4-6
最后登录
2019-8-10

楼主
姜先生 发表于 2019-3-31 13:04:17 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
各位大神,本人在使用merge横向合并两个数据表时,出现如下错误“key variable Code is long in master but str19 in using data
Each key variable -- the variables on which observations are matched -- must be of the same generic type in the master and using datasets.  Same generic type means both numeric or both string.”使用的代码如下:
  1. merge 1:1 Code year using "C:\Users\jzz\OneDrive\小论文\小论文\Dt1.dta",update
复制代码
主要系target数据表中关键变量Code(股票代码)在进行数据导入时变成numeric类型,Code每个样本前的0被自动删除,比如000001直接变成了1,而使用数据表Dt1中的Code是文本类型。这样合并的时候就出现了上述问题。请问如何将target数据表中的变量Code样本前补充0,变成6位的股票代码,变成文本类型,这个如何处理?或者有其他更好的方式?谢谢!
target数据
  1. * Example generated by -dataex-. To install: ssc install dataex
  2. clear
  3. input long Code int year byte Character
  4.      1 2012 0
  5.      1 2013 0
  6.      1 2014 0
  7.      1 2015 0
  8.      1 2016 0
  9.      2 2012 0
  10.      2 2013 0
  11.      2 2014 0
  12.      2 2015 0
  13.      2 2016 0
  14.      6 2012 1
  15.      6 2013 1
  16.      6 2014 1
  17.      6 2015 1
  18.      6 2016 1
  19.      9 2012 0
  20.      9 2013 0
  21.      9 2014 0
  22.      9 2015 0
  23.      9 2016 0
  24.     21 2012 1
  25.     21 2013 1
  26.     21 2014 1
  27.     21 2015 1
  28.     21 2016 1
  29.     26 2012 1
  30.     26 2013 1
  31.     26 2014 1
  32.     26 2015 1
  33.     26 2016 1
  34.     27 2012 1
  35.     27 2013 1
  36.     27 2014 1
  37.     27 2015 1
  38.     27 2016 1
  39.     28 2012 1
  40.     28 2013 1
  41.     28 2014 1
  42.     28 2015 1
  43.     28 2016 1
  44.     31 2012 1
  45.     31 2013 1
  46.     31 2014 1
  47.     31 2015 1
  48.     31 2016 1
  49.     39 2012 1
  50.     39 2013 1
  51.     39 2014 1
  52.     39 2015 1
  53.     39 2016 1
  54.     60 2012 1
  55.     60 2013 1
  56.     60 2014 1
  57.     60 2015 1
  58.     60 2016 1
  59.     61 2012 1
  60.     61 2013 1
  61.     61 2014 1
  62.     61 2015 1
  63.     61 2016 1
  64.     63 2012 0
  65.     63 2013 0
  66.     63 2014 0
  67.     63 2015 0
  68.     63 2016 0
  69.     66 2012 1
  70.     66 2013 1
  71.     66 2014 1
  72.     66 2015 1
  73.     66 2016 1
  74.     69 2012 1
  75.     69 2013 1
  76.     69 2014 1
  77.     69 2015 1
  78.     69 2016 1
  79.     88 2012 1
  80.     88 2013 1
  81.     88 2014 1
  82.     88 2015 1
  83.     88 2016 1
  84.    100 2012 0
  85.    100 2013 0
  86.    100 2014 0
  87.    100 2015 0
  88.    100 2016 0
  89.    333 2012 0
  90.    333 2013 0
  91.    333 2014 0
  92.    333 2015 0
  93.    333 2016 0
  94.    539 2012 1
  95.    539 2013 1
  96.    539 2014 1
  97.    539 2015 1
  98.    539 2016 1
  99.    651 2012 1
  100.    651 2013 1
  101.    651 2014 1
  102.    651 2015 1
  103.    651 2016 1
  104.    776 2012 0
  105.    776 2013 0
  106.    776 2014 0
  107.    776 2015 0
  108.    776 2016 0
  109.    861 2012 0
  110.    861 2013 0
  111.    861 2014 0
  112.    861 2015 0
  113.    861 2016 0
  114.    999 2012 1
  115.    999 2013 1
  116.    999 2014 1
  117.    999 2015 1
  118.    999 2016 1
  119.   2008 2012 0
  120.   2008 2013 0
  121.   2008 2014 0
  122.   2008 2015 0
  123.   2008 2016 0
  124.   2054 2012 0
  125.   2054 2013 0
  126.   2054 2014 0
  127.   2054 2015 0
  128.   2054 2016 0
  129.   2063 2012 0
  130.   2063 2013 0
  131.   2063 2014 0
  132.   2063 2015 0
  133.   2063 2016 0
  134.   2084 2012 0
  135.   2084 2013 0
  136.   2084 2014 0
  137.   2084 2015 0
  138.   2084 2016 0
  139.   2106 2012 1
  140.   2106 2013 1
  141.   2106 2014 1
  142.   2106 2015 1
  143.   2106 2016 1
  144.   2121 2012 0
  145.   2121 2013 0
  146.   2121 2014 0
  147.   2121 2015 0
  148.   2121 2016 0
  149.   2161 2012 0
  150.   2161 2013 0
  151.   2161 2014 0
  152.   2161 2015 0
  153.   2161 2016 0
  154.   2249 2012 0
  155.   2249 2013 0
  156.   2249 2014 0
  157.   2249 2015 0
  158.   2249 2016 0
  159.   2340 2012 0
  160.   2340 2013 0
  161.   2340 2014 0
  162.   2340 2015 0
  163.   2340 2016 0
  164.   2419 2012 1
  165.   2419 2013 1
  166.   2419 2014 1
  167.   2419 2015 1
  168.   2419 2016 1
  169.   2431 2012 0
  170.   2431 2013 0
  171.   2431 2014 0
  172.   2431 2015 0
  173.   2431 2016 0
  174. 300004 2012 0
  175. 300004 2013 0
  176. 300004 2014 0
  177. 300004 2015 0
  178. 300004 2016 0
  179. 300047 2012 0
  180. 300047 2013 0
  181. 300047 2014 0
  182. 300047 2015 0
  183. 300047 2016 0
  184. 300077 2012 0
  185. 300077 2013 0
  186. 300077 2014 0
  187. 300077 2015 0
  188. 300077 2016 0
  189. 300124 2012 0
  190. 300124 2013 0
  191. 300124 2014 0
  192. 300124 2015 0
  193. 300124 2016 0
  194. 600004 2012 1
  195. 600004 2013 1
  196. 600004 2014 1
  197. 600004 2015 1
  198. 600004 2016 1
  199. 600029 2012 1
  200. 600029 2013 1
  201. 600029 2014 1
  202. 600029 2015 1
  203. 600029 2016 1
  204. 600030 2012 1
  205. 600030 2013 1
  206. 600030 2014 1
  207. 600030 2015 1
  208. 600030 2016 1
  209. 600036 2012 1
  210. 600036 2013 1
  211. 600036 2014 1
  212. 600036 2015 1
  213. 600036 2016 1
  214. 600048 2012 1
  215. 600048 2013 1
  216. 600048 2014 1
  217. 600048 2015 1
  218. 600048 2016 1
  219. 600098 2012 1
  220. 600098 2013 1
  221. 600098 2014 1
  222. 600098 2015 1
  223. 600098 2016 1
  224. 600183 2012 0
  225. 600183 2013 0
  226. 600183 2014 0
  227. 600183 2015 0
  228. 600183 2016 0
  229. 600323 2012 1
  230. 600323 2013 1
  231. 600323 2014 1
  232. 600323 2015 1
  233. 600323 2016 1
  234. 600325 2012 1
  235. 600325 2013 1
  236. 600325 2014 1
  237. 600325 2015 1
  238. 600325 2016 1
  239. 600383 2012 0
  240. 600383 2013 0
  241. 600383 2014 0
  242. 600383 2015 0
  243. 600383 2016 0
  244. end
  245. label var Code "股票代码"
  246. label var year "年份"
  247. label var Character "企业性质"
复制代码
dt1数据
  1. * Example generated by -dataex-. To install: ssc install dataex
  2. clear
  3. input str19 Code int year str20 Stock
  4. "000002" 2012 "万科A"     
  5. "000002" 2013 "万科A"     
  6. "000002" 2014 "万科A"     
  7. "000002" 2015 "万科A"     
  8. "000002" 2016 "万科A"     
  9. "000002" 2017 "万科A"     
  10. "000006" 2012 "深振业A"  
  11. "000006" 2013 "深振业A"  
  12. "000006" 2014 "深振业A"  
  13. "000006" 2015 "深振业A"  
  14. "000006" 2016 "深振业A"  
  15. "000006" 2017 "深振业A"  
  16. "000009" 2012 "中国宝安"
  17. "000009" 2013 "中国宝安"
  18. "000009" 2014 "中国宝安"
  19. "000009" 2015 "中国宝安"
  20. "000009" 2016 "中国宝安"
  21. "000009" 2017 "中国宝安"
  22. "000012" 2012 "南玻A"     
  23. "000012" 2013 "南玻A"     
  24. "000012" 2014 "南玻A"     
  25. "000012" 2015 "南玻A"     
  26. "000012" 2016 "南玻A"     
  27. "000012" 2017 "南玻A"     
  28. "000021" 2012 "深科技"   
  29. "000021" 2013 "深科技"   
  30. "000021" 2014 "深科技"   
  31. "000021" 2015 "深科技"   
  32. "000021" 2016 "深科技"   
  33. "000021" 2017 "深科技"   
  34. "000026" 2012 "飞亚达A"  
  35. "000026" 2013 "飞亚达A"  
  36. "000026" 2014 "飞亚达A"  
  37. "000026" 2015 "飞亚达A"  
  38. "000026" 2016 "飞亚达A"  
  39. "000026" 2017 "飞亚达A"  
  40. "000027" 2012 "深圳能源"
  41. "000027" 2013 "深圳能源"
  42. "000027" 2014 "深圳能源"
  43. "000027" 2015 "深圳能源"
  44. "000027" 2016 "深圳能源"
  45. "000027" 2017 "深圳能源"
  46. "000028" 2012 "国药一致"
  47. "000028" 2013 "国药一致"
  48. "000028" 2014 "国药一致"
  49. "000028" 2015 "国药一致"
  50. "000028" 2016 "国药一致"
  51. "000028" 2017 "国药一致"
  52. "000031" 2012 "中粮地产"
  53. "000031" 2013 "中粮地产"
  54. "000031" 2014 "中粮地产"
  55. "000031" 2015 "中粮地产"
  56. "000031" 2016 "中粮地产"
  57. "000031" 2017 "中粮地产"
  58. "000039" 2012 "中集集团"
  59. "000039" 2013 "中集集团"
  60. "000039" 2014 "中集集团"
  61. "000039" 2015 "中集集团"
  62. "000039" 2016 "中集集团"
  63. "000039" 2017 "中集集团"
  64. "000046" 2012 "泛海控股"
  65. "000046" 2013 "泛海控股"
  66. "000046" 2014 "泛海控股"
  67. "000046" 2015 "泛海控股"
  68. "000046" 2016 "泛海控股"
  69. "000046" 2017 "泛海控股"
  70. "000050" 2012 "深天马A"  
  71. "000050" 2013 "深天马A"  
  72. "000050" 2014 "深天马A"  
  73. "000050" 2015 "深天马A"  
  74. "000050" 2016 "深天马A"  
  75. "000050" 2017 "深天马A"  
  76. "000059" 2012 "华锦股份"
  77. "000059" 2013 "华锦股份"
  78. "000059" 2014 "华锦股份"
  79. "000059" 2015 "华锦股份"
  80. "000059" 2016 "华锦股份"
  81. "000059" 2017 "华锦股份"
  82. "000060" 2012 "中金岭南"
  83. "000060" 2013 "中金岭南"
  84. "000060" 2014 "中金岭南"
  85. "000060" 2015 "中金岭南"
  86. "000060" 2016 "中金岭南"
  87. "000060" 2017 "中金岭南"
  88. "000061" 2012 "农产品"   
  89. "000061" 2013 "农产品"   
  90. "000061" 2014 "农产品"   
  91. "000061" 2015 "农产品"   
  92. "000061" 2016 "农产品"   
  93. "000061" 2017 "农产品"   
  94. "000063" 2012 "中兴通讯"
  95. "000063" 2013 "中兴通讯"
  96. "000063" 2014 "中兴通讯"
  97. "000063" 2015 "中兴通讯"
  98. "000063" 2016 "中兴通讯"
  99. "000063" 2017 "中兴通讯"
  100. "000065" 2012 "北方国际"
  101. "000065" 2013 "北方国际"
  102. "000065" 2014 "北方国际"
  103. "000065" 2015 "北方国际"
  104. "000065" 2016 "北方国际"
  105. "000065" 2017 "北方国际"
  106. "000066" 2012 "中国长城"
  107. "000066" 2013 "中国长城"
  108. "000066" 2014 "中国长城"
  109. "000066" 2015 "中国长城"
  110. "000066" 2016 "中国长城"
  111. "000066" 2017 "中国长城"
  112. "000069" 2012 "华侨城A"  
  113. "000069" 2013 "华侨城A"  
  114. "000069" 2014 "华侨城A"  
  115. "000069" 2015 "华侨城A"  
  116. "000069" 2016 "华侨城A"  
  117. "000069" 2017 "华侨城A"  
  118. "000088" 2012 "盐田港"   
  119. "000088" 2013 "盐田港"   
  120. "000088" 2014 "盐田港"   
  121. "000088" 2015 "盐田港"   
  122. "000088" 2016 "盐田港"   
  123. "000088" 2017 "盐田港"   
  124. "000089" 2012 "深圳机场"
  125. "000089" 2013 "深圳机场"
  126. "000089" 2014 "深圳机场"
  127. "000089" 2015 "深圳机场"
  128. "000089" 2016 "深圳机场"
  129. "000089" 2017 "深圳机场"
  130. "000100" 2012 "TCL集团"   
  131. "000100" 2013 "TCL集团"   
  132. "000100" 2014 "TCL集团"   
  133. "000100" 2015 "TCL集团"   
  134. "000100" 2016 "TCL集团"   
  135. "000100" 2017 "TCL集团"   
  136. "000156" 2012 "华数传媒"
  137. "000156" 2013 "华数传媒"
  138. "000156" 2014 "华数传媒"
  139. "000156" 2015 "华数传媒"
  140. "000156" 2016 "华数传媒"
  141. "000156" 2017 "华数传媒"
  142. "000157" 2012 "中联重科"
  143. "000157" 2013 "中联重科"
  144. "000157" 2014 "中联重科"
  145. "000157" 2015 "中联重科"
  146. "000157" 2016 "中联重科"
  147. "000157" 2017 "中联重科"
  148. "000301" 2012 "东方市场"
  149. "000301" 2013 "东方市场"
  150. "000301" 2014 "东方市场"
  151. "000301" 2015 "东方市场"
  152. "000301" 2016 "东方市场"
  153. "000301" 2017 "东方市场"
  154. "000333" 2012 "美的集团"
  155. "000333" 2013 "美的集团"
  156. "000333" 2014 "美的集团"
  157. "000333" 2015 "美的集团"
  158. "000333" 2016 "美的集团"
  159. "000333" 2017 "美的集团"
  160. "000338" 2012 "潍柴动力"
  161. "000338" 2013 "潍柴动力"
  162. "000338" 2014 "潍柴动力"
  163. "000338" 2015 "潍柴动力"
  164. "000338" 2016 "潍柴动力"
  165. "000338" 2017 "潍柴动力"
  166. "000400" 2012 "许继电气"
  167. "000400" 2013 "许继电气"
  168. "000400" 2014 "许继电气"
  169. "000400" 2015 "许继电气"
  170. "000400" 2016 "许继电气"
  171. "000400" 2017 "许继电气"
  172. "000401" 2012 "冀东水泥"
  173. "000401" 2013 "冀东水泥"
  174. "000401" 2014 "冀东水泥"
  175. "000401" 2015 "冀东水泥"
  176. "000401" 2016 "冀东水泥"
  177. "000401" 2017 "冀东水泥"
  178. "000402" 2012 "金融街"   
  179. "000402" 2013 "金融街"   
  180. "000402" 2014 "金融街"   
  181. "000402" 2015 "金融街"   
  182. "000402" 2016 "金融街"   
  183. "000402" 2017 "金融街"   
  184. "000407" 2012 "胜利股份"
  185. "000407" 2013 "胜利股份"
  186. "000407" 2014 "胜利股份"
  187. "000407" 2015 "胜利股份"
  188. "000407" 2016 "胜利股份"
  189. "000407" 2017 "胜利股份"
  190. "000422" 2012 "*ST宜化"   
  191. "000422" 2013 "*ST宜化"   
  192. "000422" 2014 "*ST宜化"   
  193. "000422" 2015 "*ST宜化"   
  194. "000422" 2016 "*ST宜化"   
  195. "000422" 2017 "*ST宜化"   
  196. "000423" 2012 "东阿阿胶"
  197. "000423" 2013 "东阿阿胶"
  198. "000423" 2014 "东阿阿胶"
  199. "000423" 2015 "东阿阿胶"
  200. "000423" 2016 "东阿阿胶"
  201. "000423" 2017 "东阿阿胶"
  202. "000425" 2012 "徐工机械"
  203. "000425" 2013 "徐工机械"
  204. "000425" 2014 "徐工机械"
  205. "000425" 2015 "徐工机械"
  206. "000425" 2016 "徐工机械"
  207. "000425" 2017 "徐工机械"
  208. "000488" 2012 "晨鸣纸业"
  209. "000488" 2013 "晨鸣纸业"
  210. "000488" 2014 "晨鸣纸业"
  211. "000488" 2015 "晨鸣纸业"
  212. "000488" 2016 "晨鸣纸业"
  213. "000488" 2017 "晨鸣纸业"
  214. "000498" 2012 "山东路桥"
  215. "000498" 2013 "山东路桥"
  216. "000498" 2014 "山东路桥"
  217. "000498" 2015 "山东路桥"
  218. "000498" 2016 "山东路桥"
  219. "000498" 2017 "山东路桥"
  220. "000503" 2012 "国新健康"
  221. "000503" 2013 "国新健康"
  222. "000503" 2014 "国新健康"
  223. "000503" 2015 "国新健康"
  224. "000503" 2016 "国新健康"
  225. "000503" 2017 "国新健康"
  226. "000511" 2012 "烯碳退"   
  227. "000511" 2013 "烯碳退"   
  228. "000511" 2014 "烯碳退"   
  229. "000511" 2015 "烯碳退"   
  230. "000511" 2016 "烯碳退"   
  231. "000511" 2017 "烯碳退"   
  232. "000516" 2012 "国际医学"
  233. "000516" 2013 "国际医学"
  234. "000516" 2014 "国际医学"
  235. "000516" 2015 "国际医学"
  236. "000516" 2016 "国际医学"
  237. "000516" 2017 "国际医学"
  238. "000517" 2012 "荣安地产"
  239. "000517" 2013 "荣安地产"
  240. "000517" 2014 "荣安地产"
  241. "000517" 2015 "荣安地产"
  242. "000517" 2016 "荣安地产"
  243. "000517" 2017 "荣安地产"
  244. end
  245. label var Code "股票代码"
  246. label var year "年份"
  247. label var Stock "股票名称"
复制代码






二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:000516 000301 000517 000333 000100 Stata 数据合并 merge 数据类型

回帖推荐

黃河泉 发表于2楼  查看完整内容

你的 dt1 资料要先

沙发
黃河泉 在职认证  发表于 2019-3-31 15:58:06
你的 dt1 资料要先
  1. destring Code, replace
复制代码
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
ritaing + 5 + 5 + 5 精彩帖子

总评分: 学术水平 + 5  热心指数 + 5  信用等级 + 5   查看全部评分

藤椅
姜先生 发表于 2019-4-7 16:45:55
黃河泉 发表于 2019-3-31 15:58
你的 dt1 资料要先
醍醐灌顶,谢谢黄老师

板凳
黃河泉 在职认证  发表于 2019-4-7 17:08:22
姜先生 发表于 2019-4-7 16:45
醍醐灌顶,谢谢黄老师
没那么严重吧!呵呵!

报纸
jimy1 发表于 2023-6-21 11:37:01
https://www.statalist.org/forums ... ble-types-different
The contents of a variable can be stored as either a numeric variable, one where Stata knows to interpret that content as a number, or as simply a "string" of alpabetic or numeric characters, without paying attention to whether it is numeric or not. This is true even for strings of numerals, such as "19432165." In a CSV file, if that string of numbers is enclosed in quotes, Stata will read and interpret it as simply a string of characters *even if those characters are numerals,* and store it as a string variable and not as a numeric variable. The way in which a variable is stored and interpreted is known in computer programming as its "data type." Variables with different data types cannot be matched or compared unless they are made to be of the same type. This issue is not unique to Stata, but in fact is present in all programs, but possibly hidden from the user.

Stata is telling you that in one file, Stata has stored the key variable as a string, and in the other as a numeric variable of type long. (long is a numeric type variable that holds only integers, but which can hold very large "long" ones. See -help datatype-.) It might be that before now, the variable rifisid had always been put into the CSV file as a string, but someone now prepared that file with rifisid as a numeric variable. That sort of mistake is pretty common in the data processing world.
Stata provides functions to create a numeric variable from a string variable containing numerals. See -help destring-. You can take your "using" file, and convert the string version of rifisid to a long variable, save the file, and then do the merge.
Code:
use "Tidy675_2021.dta"
destring rifisid, gen(temp)
rename rifisid rifisid_as_string // keep copy just in case
rename temp rifisid
recast long rifisid
save "Tidy675_2021.dta"
clear
use "YourMasterFile.dta"
merge m:1 rifisid ....
Dates are a specific and complicated case here, where there are many different ways to use a string to represent a date, and many different ways to store the numeric information of a date. That's a difficult topic, best left aside for the moment.
Last edited by Mike Lacy; 27 May 2021, 15:20. Reason: Forgot to illustrate "clear"

地板
赵安豆 发表于 2024-9-11 10:38:14
在 Stata 中处理这种数据类型不匹配的问题时,你可能需要先转换你的 `Code` 变量类型以确保它与你想要合并的数据集中的变量类型一致。

如果你的目标是让 `Code` 成为字符串类型(因为股票代码通常包含前导零),你可以使用以下命令将数值类型的 `Code` 转换为字符串:

```stata
encode Code, gen(strCode)
destring strCode, replace
```

但是,更简单的方法是直接转换数字型的 `Code` 为字符串,并确保保留前导零。这可以通过 `format` 或者 `discrete` 格式来实现。假设你的股票代码最大长度可能为6位(例如000001),你可以这样操作:

```stata
gen strCode = sprintf("%06d", Code)
```

或者使用更简单的转换方法,直接将数字型的 `Code` 转换为字符串类型:

```stata
destring Code, replace
```

然后你可能需要重新设置格式以确保前导零被保留。如果 `Code` 最大长度是6位,可以这样做:

```stata
format strCode %06.0g
```

但是,更好的做法是在读取数据时就直接将其设定为字符串类型。在导入数据文件(如CSV或Excel)时使用正确的命令选项将变量指定为字符串类型。

例如,如果你正在从 CSV 文件中导入数据:

```stata
import delimited "path_to_file.csv", varnames(1) clear strL(Code)
```

这样可以避免后续的数据转换步骤。之后你就可以进行合并操作了,确保两个数据集中的 `Code` 和其他关键变量类型一致:

```stata
merge 1:1 Code year using "C:\Users\jzz\OneDrive\小论文\小论文\Dt1.dta", update
```

以上方法应该可以帮助解决你的问题。

此文本由CAIE学术大模型生成,添加下方二维码,优先体验功能试用



您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-21 13:16