楼主: 欣冉
6177 5

[数据管理求助] [求助]关于stata中的数据处理 [推广有奖]

  • 0关注
  • 0粉丝

学前班

70%

还不是VIP/贵宾

-

威望
0
论坛币
34 个
通用积分
0
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
73 点
帖子
4
精华
0
在线时间
0 小时
注册时间
2008-5-21
最后登录
2008-7-7

楼主
欣冉 发表于 2008-6-17 00:35:00 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
我的数据有三个维度,分别是产业、年份、地区,请问各位,这样的数据在stata中可以以面板数据处理么?不能的话该怎么处理呢?做论文中,着急。。。等待高手指点。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Stata 数据处理 tata 面板数据处理 怎么处理 论文 数据分析专题 数据处理 数据分析软件 数据分析报告 面板数据分析 excel数据分析 数据分析方法 项目数据分析

沙发
laserwto 发表于 2008-6-17 01:04:00
可以,用XTSET定义个体和时间变量,用XTREG回归就可以了

藤椅
欣冉 发表于 2008-6-17 01:31:00
以下是引用laserwto在2008-6-17 1:04:00的发言:
可以,用XTSET定义个体和时间变量,用XTREG回归就可以了

能说的再具体点么,我现在的困难是不知道具体怎么处理。

我的数据分五个产业,三十个地区和8个年份,我现在的做法是把产业和时间分别作为面板的两个维度(想把这两者分别确定为截面变量和时间变量),然后按产业和时间序号把各地区的观测值全拷进去,于是对应某一个产业,每一年各有30个观测值,但当我用tsset确定截面和时间变量的时候,stata返回结果提示repeated time values within panel,我该怎么处理??

板凳
whgyu 发表于 2008-6-17 08:07:00
你应该用-xtmixed-,-xtreg-只能处理一维的

报纸
欣冉 发表于 2008-6-18 12:52:00

问题还是没解决...

地板
pfp7748 发表于 2008-7-25 13:42:00

How do I deal with a report of repeated time values within panel?

Title  Dealing with reports of repeated time values within panel
Author Nicholas J. Cox, Durham University, UK
Michael Mulcahy, University of Connecticut
Date December 2005

Question

I have panel data. I want to exploit the power of tsset (see [TS] tsset), but when I type

 . tsset id time 

I get a report of

 repeated time values within panel r(451); 

What should I do next?

Answer

Panel data are defined by an identifier variable and a time variable. Each combination of identifier and time should occur, at most, once. That is, any such combination might appear either once or not at all, as gaps are allowed in panel data. The report of "repeated time values within panel" is thus serious, as Stata is unable to proceed with any commands that depend upon your data being accepted as panel data.

Two common reactions to this report are to suppose that it cannot be true, as you know you have panel data, or that there must be a bug or at least a misunderstanding here. In our experience, the misunderstanding will, on closer inspection, be found embedded in the dataset. Here we discuss various methods for approaching the problem. The underlying idea is that knowing several ways of going further is much better than knowing none. All the methods discussed are also applicable to other problems.

1. Do identifier and time uniquely identify the data?

Observations in panel data are uniquely identified by the combination of identifier and year. Thus isid may be used to check for this, for example,

 . isid id time 

With isid, no news is good news. However, if the variables specified do not jointly identify the data, an error message will appear.

The logic of isid may be implemented in other ways. At its heart is an operation

 . bysort id time: assert _N == 1 

asserting that each combination of identifier and time is unique. Again, with assert no news is good news. If the statement asserted is not true everywhere that it is tested, an error message will ensue.

2. Check for duplicates

If you have received confirmation of a problem, the next step is to track it down. With a very small dataset, a list or edit of the data may be sufficient, but even then, a more systematic approach is preferable. Here is what we did in a specific example using the duplicates command, which is a small bundle of tools for investigating possible problems arising from duplicated observations.

The dataset consists of several variables for various cities and years, with identifier id and time variable year. The number of values is 7,813, large enough for a visual scan of the data to be a poor solution. The subcommand duplicates report quantifies the extent of the problem, 26 pairs of values of id and year. The subcommand duplicates list finds that they involve id 467. The subcommand duplicates tag is used to tag the observations to examine more closely. An edit then gives all the details.

 . duplicates report id year Duplicates in terms of id year -------------------------------------- copies | observations surplus ----------+--------------------------- 1 | 7787 0 2 | 26 13 -------------------------------------- . duplicates list id year Duplicates in terms of id year +----------------------------+ | group: obs: id year | |----------------------------| | 1 6059 467 1990 | | 1 6060 467 1990 | | 2 6061 467 1991 | | 2 6062 467 1991 | | 3 6063 467 1992 | |----------------------------| | 3 6064 467 1992 | | 4 6065 467 1993 | | 4 6066 467 1993 | | 5 6067 467 1994 | | 5 6068 467 1994 | |----------------------------| | 6 6069 467 1995 | | 6 6070 467 1995 | | 7 6071 467 1996 | | 7 6072 467 1996 | | 8 6073 467 1997 | |----------------------------| | 8 6074 467 1997 | | 9 6075 467 1998 | | 9 6076 467 1998 | | 10 6077 467 1999 | | 10 6078 467 1999 | |----------------------------| | 11 6079 467 2000 | | 11 6080 467 2000 | | 12 6081 467 2001 | | 12 6082 467 2001 | | 13 6083 467 2002 | |----------------------------| | 13 6084 467 2002 | +----------------------------+ . duplicates tag id year, gen(isdup)  Duplicates in terms of id year . edit if isdup . drop isdup  

The final edit command reveals the precise problem: two cities, Royal Oak, MI, and Bristol, CT, have been assigned the same identifier. We need to fix that by changing the identifier of one city to something else.

Not all these steps are essential. Some users omit the report. On the other hand, in a large dataset, the list could be lengthy. Either way, duplicates offers various handles for the problem.

已有 1 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
Sunknownay + 100 + 10 + 1 + 1 + 1 热心帮助其他会员

总评分: 经验 + 100  论坛币 + 10  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-24 22:17