Programming Advice for stata-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 软件培训>>

Stata软件培训

>>

Programming Advice for stata

Programming Advice for stata

发布:galt | 分类:Stata软件培训

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

[Point=2][/Point]ProgrammingAdviceMitchellA.PetersenMypurposeinwritingthispaperwastomakesureresearchers(myselfincluded)understoodwhateachofthemethodsforestimatingstandarderrorswasactuallydoing.Thesepa ...
免费学术公开课,扫码加入


[Point=2]
[/Point]

Programming Advice

Mitchell A. Petersen

My purpose in writing this paper was to make sure researchers (myself included) understood what each of the methods for estimating standard errors was actually doing. These pages are meant to help researchers use the correct techniques. Code which is easily available is more likely to be used. Since I program in Stata, most of the instructions below are for Stata. I have also included SAS code (contributed by Tanguy Brachet). If you know how to do this in other languages, please let me know. I am happy to post links to the instructions. With all of the instructions, the programming instructions are in bold. The variable names which the user must specify are in italics. I have also included a sample of the Stata program which I used to run the simulations (i.e. simulated the data sets and then estimated the coefficients and standard errors).

Stata Programming Instructions

The standard command for running a regression in Stata is:

regress dependent_variable independent­_variables, options

Rogers or Clustered Standard Error

To obtain Rogers/Clustered standard errors (and OLS coefficients), use the command:

regress dependent_variable independent_variables, robust cluster(cluster_variable)

This produces White standard errors which are robust to within cluster correlation (Rogers or clustered standard errors). If you wanted to cluster by year, then the cluster variable would be the year variable. If you wanted to cluster by industry and year, you would need to create a variable which had a unique value for each industry-year pair. For most estimation commands such as logits and probits, the previous form of the command will also work. For example, to run a logit with Rogers/Clustered standard errors you would use the command:

logit dependent_variable independent_variables, robust cluster(cluster_variable)

Fama-MacBeth Standard Errors

Stata does not contain a routine for estimating the coefficients and standard errors by Fama-MacBeth (that I know of), but I have written two routines (ado files) which you can download. The ado file fm.ado runs a cross-sectional regression for each year in the data set.

The form of the command is:

fm dependent_variable independent_variables

Prior to running the fm program, you need to use the tsset command. This tells Stata the name of the firm identifier and the time variable. The form of these command is:

tsset firm_identifier time_identifier

The program will accept the Stata in and if commands, if you want to do the regression for only certain observations. The file fmw.ado does the weighted average version of Fama-MacBeth, where each year’s coefficient is weighted by the number of observations in that year. Justin Caskey, who showed me how to use the tsset command in the FM program, has also modified the program. His version reports the number of positive or negative coefficients and the number which are significant (and positive or negative).

Newey West for Panel Data Sets

The Stata command newey will estimate the coefficients of a regression using OLS and generate Newey-West standard errors. If you want to use this in a panel data set (so that only observations within a cluster may be correlated), you need to use the tsset command.

newey dependent_variable independent_variables, lag(lag_length) force

Where firm_identifier is the variable which denotes each firm (e.g. cusip, permn, or gvkey) and time_identifier is the variable that identifies the time dimension, such as year. This specification will allow for observations on the same firm in different years to be correlated (i.e. a firm effect). If you want to allow for observations on different firms but in the same year to be correlated you need to reverse the firm and time identifiers. If you are clustering on some other dimension besides firm (e.g. industry or country), you would use that variable instead. You can specify any lag length up to T-1, where T is the number of years per firm.

Fixed Effects

Stata can automatically include a set of dummy variable for each value of one specified variable.

The form of the command is:

areg dependent_variable independent_variables, absorb(identifier_variable)

Where identifier_variable is a firm identifier (e.g. cusip, permn, or gvkey) if you want firm dummies or a time identifier (e.g. year) if you want year dummies. If you want to include both firm and time dummies, only one set can be included with the absorb option. The other must be included manually (e.g. by manually including a full set of time dummies among the independent variables, and then using the absorb option for the firm dummies).

To create a full set of dummy variables from an indexed variable such as year you can use the following command:

tabulate index_variable, gen(dummy_variable)

This will create a set of dummy variables (e.g. dummy_variable1, dummy_variable2, etc), which are equal to one if the index_variable takes on its first value and zero otherwise (in the case of dummy_variable1).

A more elegant way to do this is to use the xi command (as recommended by Prof Nandy). This allows you to include a set of dummy variables for any categorical variable (e.g. year or firm), including multiple categorical values. To include both year and firm dummies, the command is:

xi: areg dependent_variable independent_variables i.year absorb(firm_identifier)

where year is the categorical variable for year and firm_identifier is the categorical variable for firm. The coefficients on T-1 of the year variables will be reported, the coefficients on the firm dummy variables will not. To see the coefficients on both sets of dummy variables you would use the command:

xi: reg dependent_variable independent_variables i.year i.firm_identifier

Bootstrapped Standard Errors

The Stata command bootstrap will allow you to estimate the standard errors using the bootstrap method. This will run the regression multiple times and use the variability in the slope coefficients as an estimate of their standard deviation (intuitively like I did with my simulations.

The form of these command is:

bootstrap “regress dependent_variable independent_variables” _b, reps(number_of_repetitions)

Where number_of_repetitions samples will be drawn with replacement from the original sample. Each time the regression will be run and the slope coefficients will be saved, since _b is specified. Both the average slope and its standard deviation will be reported. As specified, the bootstrapped samples will be drawn a single observation at a time. If the observations within a cluster (year or firm) are correlated, then these bootstrapped standard errors will be biased. To account for the correlation within cluster it is necessary to draw clusters with replacement oppose observations with replacement. To do this in Stata, you need to add the cluster option. In this case, the command is:

bootstrap “regress dependent_variable independent_variables” _b, reps(number_of_repetitions) cluster(cluster_variable)


SAS Programming Instructions

Although I did not do the empirical work in SAS, Tanguy Brachet was kind enough to explain how to do some of the estimation in SAS. I am responsible for errors. A brief description follows.

Rogers or Clustered Standard Errors

The standard command for running an OLS regression in SAS and getting the Clustered/Rogers standard errors is:

proc surveyreg data=mydata;
cluster cluster_variable;
model dependent variable = independent variables;

This produces White standard errors which are robust to within cluster correlation (Rogers or clustered standard errors), when cluster_variable is the variable by which you want to cluster. If you clustered by firm it could be cusip or gvkey. If you clustered by time it could be year.

Fixed Effects

If you want to include dummy variables for one dimension (time) and cluster by another dimension, you need to create the dummy variables. A simple way (there are more elegant ways) is as follows.

data new;
set old;
year1 = (year=1991);
year2 = (year=1992);
year3 = (year=1993);
year4 = (year=1994);
year5 = (year=1995);

As SAS is not my traditional language, this code is provided just as information. I have used both the SAS and Stata code to verify that the results produced by both sets of instructions (SAS and Stata) are the same based on a test data set.


Simulated Data Sets

Many of the results in the paper are based on simulating data sets with a specified dependence (firm and/or time effect). For those who are interested in seeing how this was done or trying different data structures, I have posted a stripped down version of the simulation program. This program simulates a data set with a firm effect and then estimates the coefficients using OLS and Fama-MacBeth. The standard errors are estimated by OLS, Rogers/Clustering, and Fama-MacBeth. The results are saved for each iteration, and the means and standard deviations are calculated and displaced. To run the program, simulation.do, you need to type

do simulation firm_effect_x firm_effect_r number_of years

where firm_effect_x is the percent of the independent variable’s variance which is due to the firm effect [i.e. rho(x)], firm_effect_r is the percent of the residual’s variance which is due to the firm effect [i.e. rho(r)], and number_of_years is the number of time periods per firm in the data set. The data set will have 5,000 observations (although this can be changed), so the number of firms is 5,000/number_of_years. Other parameters can be changed by editing the program. This example is just meant to provide intuition of how I did the simulations. If you have questions about this page, you are welcome to e-mail me. I can not promise an immediate response, but I will try to get back to you. I unfortunately, can’t help you debug your stata (or non-stata) programs. However, by posting these instructions I hope to make it easier to use the methods discussed in my paper.


「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-261067-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
经管之家 人大经济论坛 大学 专业 手机版