| 所在主题: | |
| 文件名: crosssection.rar | |
| 资料下载链接地址: https://bbs.pinggu.org/a-1670531.html | |
本附件包括:
|
|
| 附件大小: | |
|
这段时间整理数据遇到一个对我来说比较困惑也比较棘手的问题,我想用append命令把几年的截面数据合并成一个面板,但是合并下来显示有缺漏值,而事实上这些所谓的缺漏值在原数据里面都能找到,原因是id name的值在有些年份对不上(我已经用rencode命令把id转成了数字文字对应表),我不知道为什么产生这个原因,因为个年id 名都是一样的。我猜想会不会是使用中文作为id名称的关系,可是在大部分情况下还是能够合并成功的。下面是我在statalist上提的问题(比较长就不翻译了,大家都能看的懂),stata专版上高手大大也很多,希望不吝赐教,小弟已经在这个问题上被卡了一个周末了
![]() --------------------------------------- Hi, I have several cross-section data which id(here named region) and year variable, I want to append them togather to form a panel. When trying to achieve this goal with --append-- command, I encouter series problem that some observations were lost for certain year(s)t, but they are not really missing observation as I can see them in the raw data. So I check the cross-section data sets seperately and find in the these certain year the value of int type variable region is different from other years (I use --recode-- code command to convert "region" from string to int) .For example,the value of region name "哈尔滨"in 5 of 6 data sets are 65 but in year2012 it's 63 so there is a inconsistency. I don't know what cause the inconsistent, the variable type is uniform for all region names and I didn't give them any (value or variable) label before appending. Using the -xtdes- command, I found about 80% of my observations are balanced with the missing of id names concentrated at two years,2012and2006,see below Freq.Percent Cum. Pattern ---------------------------+---------- 262 80.62 80.62 11111111 19 5.85 86.46 1111111. 10 3.08 89.54 1.111111 6 1.85 91.38 .1111111 5 1.54 92.92 ..111111 3 0.92 93.85 11111.11 2 0.62 94.46 ..1.111. 2 0.62 95.08 11.1...1 1 0.31 95.38 ......1. 15 4.62100.00(other patterns) ---------------------------+---------- 325 100.00 XXXXXXXX I highly suspect that this problem is due to some kinds of differences of id names (region in this case) between different cross-section data sets. In principle, the --encode-- command attach unique code to a id name, regardless its relative position. But even I tried to copy theid names of master datasets to substitute the corresponding ones in certain years, the problem remained.That really confuse me. Is that possible that my id names are in chinese characters (b/c they are chinese data) cause this problems ? I don't know. , but it worked just fine in the most cases and I have too many of them so give id a english names is extremely cubersome I have stucked by this problem for 2 days and still have no hope, I feel it may due to some stupid reasons that I cannot find. The worst thing is that I have no one to seek for help nearby. So specialists on statalist,please check it and give me some clue, Thank you in advance. Here I pose the raw data, cross-section data and my appended panel data |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明