看了几个关于面板数据操作的帖子,受益匪浅。
不过,似乎很多人没有区分pool数据和panel数据(当然,面板数据本身也是pool数据的一种)。事实上,在eviews中,panel数据和pool数据从工作文件建立、数据录入,方程设定到结果呈现都是有所不同的。
抽时间整理了一下,发出来,跟大家探讨,向大家学习。
主要根据伍德里奇《计量经济学导论》、古扎拉蒂《计量经济学基础》、Eviews 7 guide、张晓峒和高铁梅老师Eviews指导书整理。
内容主要分四部分:(1)pool和panel定义;(2)平衡面板数据的定义;(3)几本书对于固定效应和随机效应的理论讲解;(4)我对eviews做的pool和panel操作演练。
由于费了很大精力整理,收取5个论坛币,如果有同仁觉得买了后不值,请在本贴高级回复中加入随意附件,并售5个论坛币,我把论坛币还你。不过,最近很忙,可能回购不及时。
如果您觉得还有可借鉴之处,也请尊重知识产权(惭愧),不要在我这儿下载后去别的地方卖。
最后,说明一点,我使用eviews也不久,有些东西也还没弄懂,文中错误在所难免,仅仅是做一点总结,供大家相互探讨,在探讨中向大家学习,以求一起不断进步。所以,如果中间有错误的地方,请不吝指出,感激不尽。
下面是第一部分(图形略):
(一)概念
1.古扎拉蒂认为面板数据就是横截面和时间序列的混合。但面板数据观测对象是既定的,《计量经济学基础》第四版,英文版第17章,636页。
In Chapter 1 we discussed briefly the types of data that are generally availablefor empirical analysis, namely, time series, cross section, and panel.
In time series data we observe the values of one or more variables over a period of time (e.g., GDP for several quarters or years). In cross-section data,values of one or more variables are collected for several sample units, or entities,at the same point in time (e.g., crime rates for 50 states in the UnitedStates for a given year).
In panel data the same cross-sectional unit (say afamily or a firm or a state) is surveyed over time. In short, panel data have space as well as time dimensions.
There are other names for panel data, such as pooled data (pooling of time series and cross-sectional observations), combination of time series and cross-section data, micropanel data, longitudinal data (a study overtime of a variable or group of subjects), event history analysis (e.g., studyingthe movement over time of subjects through successive states or conditions),cohort analysis (e.g., following the career path of 1965 graduates of a businessschool). Although there are subtle variations, all these names essentially connote movement over time of cross-sectional units.We will therefore use theterm panel data in a generic sense to include one or more of these terms. And we will call regression models based on such data panel data regression models.
2.伍德里奇认为面板数据有别于(独立)横截面和时间序列的混合。伍德里奇也认为,面板数据中不同时期的观测个体不变。而(独立)横截面和时间序列的混合中,观测个体是随机的,由此引出的方法是可以令截距或斜率变动,来进行独立横截面和时间序列的分析。《计量经济学导论》第2版,英文版第13章,408页。
We will analyze two kinds of data sets in this chapter. An independently pooled cross section is obtained by sampling randomly from a large population at different points in time (usually, but not necessarily, different years). For instance, in each year, we can draw a random sample on hourly wages, education, experience, and so on, from the population of working people in the United States. Or, in every other year, we draw a random sample on the selling price, square footage, number of bathrooms, and so on, of houses sold in a particular metropolitan area. From a statistical standpoint, these datasets have an important feature: they consist of independently sampled observations. This was also a key aspect in our analysis of cross-sectional data: among other things, it rules out correlation in the error terms for different observations.
An independently pooled cross section differs from a single random sample in that sampling from the population at different points in time likely leads to observations that are not identically distributed. For example, distributions of wages and education have changed over time in most countries. As we will see, this is easy to deal with in practice by allowing the intercept in a multiple regression model, and in some cases the slopes, to change over time. We cover such models in Section 13.1. In Section 13.2, we discuss how pooling cross sections over time can be used to evaluate policy changes.
A panel data set, while having both a cross-sectional and a time series dimension, differs in some important respects from an independently pooled cross section. To collect panel data—sometimes called longitudinal data—we follow (or attempt to follow) the same individuals, families, firms, cities, states, or whatever, across time. For example, a panel data set on individual wages, hours, education, and other factors is collected by randomly selecting people from a population at a given point in time. Then, these same people are reinterviewed at several subsequent points in time. This gives us data on wages, hours, education, and so on, for the same group of people in different years.
3.Eviews (Eviews7 guide 564页)
认为面板数据(panel data)有别于混合时间序列横截面数据(pooled time-series, cross-section data)。长时间,短截面叫混合时间序列截面数据;宽截面,短时间叫做面板数据。
Generally speaking, we distinguish between the two by noting that pooled time-series, cross-section data refer to data with relatively few cross-sections, where variables are held in cross-section specific individual series, while panel data correspond to data with large numbers of cross-sections, with variables held in single series in stacked form.