楼主: novice07
34634 24

[学习心得] 【转】“repeated time values within panel”问题的解决 [推广有奖]

  • 2关注
  • 1粉丝

不懂

已卖:232份资源

副教授

9%

还不是VIP/贵宾

-

威望
0
论坛币
73 个
通用积分
105.7369
学术水平
21 点
热心指数
35 点
信用等级
20 点
经验
7115 点
帖子
393
精华
0
在线时间
950 小时
注册时间
2007-11-18
最后登录
2021-3-8

楼主
novice07 发表于 2010-4-19 11:28:22 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Title
Dealing with reports of repeated time values within panel
Author Nicholas J. Cox, Durham University, UK
Michael Mulcahy, University of Connecticut
Date December 2005


QuestionI have panel data. I want to exploit the power of tsset (see [TS] tsset), but when I type
            . tsset id timeI get a report of
            repeated time values within panel            r(451);

What should I do next?
AnswerPanel data are defined by an identifier variable and a time variable. Each combination of identifier and time should occur, at most, once. That is, any such combination might appear either once or not at all, as gaps are allowed in panel data. The report of "repeated time values within panel" is thus serious, as Stata is unable to proceed with any commands that depend upon your data being accepted as panel data.
Two common reactions to this report are to suppose that it cannot be true, as you know you have panel data, or that there must be a bug or at least a misunderstanding here. In our experience, the misunderstanding will, on closer inspection, be found embedded in the dataset. Here we discuss various methods for approaching the problem. The underlying idea is that knowing several ways of going further is much better than knowing none. All the methods discussed are also applicable to other problems.

1. Do identifier and time uniquely identify the data?Observations in panel data are uniquely identified by the combination of identifier and year. Thus isid may be used to check for this, for example,
     . isid id timeWith isid,

no news is good news. However, if the variables specified do not jointly identify the data, an error message will appear.
The logic of isid may be implemented in other ways. At its heart is an operation
     . bysort id time: assert _N == 1
asserting that each combination of identifier and time is unique. Again, with assert no news is good news. If the statement asserted is not true everywhere that it is tested, an error message will ensue.

2. Check for duplicatesIf you have received confirmation of a problem, the next step is to track it down. With a very small dataset, a list or edit of the data may be sufficient, but even then, a more systematic approach is preferable. Here is what we did in a specific example using the duplicates command, which is a small bundle of tools for investigating possible problems arising from duplicated observations.
The dataset consists of several variables for various cities and years, with identifier id and time variable year. The number of values is 7,813, large enough for a visual scan of the data to be a poor solution. The subcommand duplicates report quantifies the extent of the problem, 26 pairs of values of id and year. The subcommand duplicates list finds that they involve id 467. The subcommand duplicates tag is used to tag the observations to examine more closely. An edit then gives all the details.

. duplicates report id year

Duplicates in terms of id year

--------------------------------------
    copies | observations       surplus
----------+---------------------------
         1 |         7787             0
         2 |           26            13
--------------------------------------

. duplicates list id year

Duplicates in terms of id year

   +----------------------------+
   | group:   obs:    id   year |
   |----------------------------|
   |      1   6059   467   1990 |
   |      1   6060   467   1990 |
   |      2   6061   467   1991 |
   |      2   6062   467   1991 |
   |      3   6063   467   1992 |
   |----------------------------|
   |      3   6064   467   1992 |
   |      4   6065   467   1993 |
   |      4   6066   467   1993 |
   |      5   6067   467   1994 |
   |      5   6068   467   1994 |
   |----------------------------|
   |      6   6069   467   1995 |
   |      6   6070   467   1995 |
   |      7   6071   467   1996 |
   |      7   6072   467   1996 |
   |      8   6073   467   1997 |
   |----------------------------|
   |      8   6074   467   1997 |
   |      9   6075   467   1998 |
   |      9   6076   467   1998 |
   |     10   6077   467   1999 |
   |     10   6078   467   1999 |
   |----------------------------|
   |     11   6079   467   2000 |
   |     11   6080   467   2000 |
   |     12   6081   467   2001 |
   |     12   6082   467   2001 |
   |     13   6083   467   2002 |
   |----------------------------|
   |     13   6084   467   2002 |
   +----------------------------+

. duplicates tag id year, gen(isdup)

Duplicates in terms of id year

. edit if isdup

. drop isdup


The final edit command reveals the precise problem: two cities, Royal Oak, MI, and Bristol, CT, have been assigned the same identifier. We need to fix that by changing the identifier of one city to something else.
Not all these steps are essential. Some users omit the report. On the other hand, in a large dataset, the list could be lengthy. Either way, duplicates offers various handles for the problem.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Repeated within values repeat Panel University 内蒙古 repeated reports values

已有 5 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
xddlovejiao1314 + 10 + 1 + 1 + 1 精彩帖子
scau + 5 + 2 + 2 + 2 精彩帖子
crystal8832 + 20 + 1 + 1 + 1 精彩帖子
Sunknownay + 100 + 10 + 1 + 1 + 1 鼓励积极发帖讨论
蓝色 + 100 + 1 精彩帖子

总评分: 经验 + 110  论坛币 + 135  学术水平 + 5  热心指数 + 6  信用等级 + 5   查看全部评分

沙发
novice07 发表于 2010-5-16 15:19:56
居然无人问津,唉。。。
平常心、淡定

藤椅
蓝色 发表于 2010-5-16 15:54:09
呵呵
或许人家都已经解决了

板凳
lahraf 发表于 2010-5-18 18:42:16
Many thanks for information share

报纸
武陵溪yn 发表于 2010-12-9 12:43:41
非常感激呀!我就是在非常紧急的情况下靠您解决了问题~~

地板
cooking0830 发表于 2010-12-14 17:42:04
谢谢楼主,现在这个正是我碰到的问题!

7
tianxiaoxiao1 发表于 2010-12-15 00:31:15
可是,识别出之后是要重新命名成新的标识吗?对于需要这个重复的id的情况下,怎样和其他变量连接?
比如说怎样设置新变量表示动态变量?

8
rucchenqiong 发表于 2010-12-18 21:20:17
顶贴!我也遇到这种情况,解决了。原因是id缺失。定义时间序列时,id不可以缺失。

9
wdlhong888 发表于 2011-1-8 16:28:25
谢谢  对我这个菜鸟正好用上
虚怀若谷,孜孜以求,以学为乐!

10
carawang 发表于 2011-5-20 14:50:13
非常感谢~~~~~~~~~~~~~~~

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-14 14:28