1) There will be omitted variable bias in the OLS estimator unless the included regressor,
X, is uncorrelated with the omitted variable or the omitted variable is not a determinant of
the dependent variable, Y.
(a) In the case of omitted variable bias, write down a “true” regression equation and an
“estimated” regression equation, where the latter might suffer from omitted variable bias.
What are the population covariance conditions that must hold so that there is in fact an
omitted variable bias problem?
(b) Give a common example in applied econometrics (i.e., an application) of two regression
equations (true and estimated) where omitted variable bias does indeed hold. Write down
the two equations and clearly indicate all of the variables in your example.
(c) For the example in (b), indicate the expected direction of the bias. Discuss and justify
your expectation by referring to the standard formula for omitted variable bias. Discuss
implications were one to base individual or public policy decisions on the biased coefficient.
2) You want to find the determinants of suicide rates in the United States. To investigate
the issue, you collect state level data for ten years. Your first idea is that the annual amount
of sunshine must be important. Stacking the data and using no fixed effects, you find no
significant relationship between suicide rates and this variable. However, sorting the suicide
rate data from highest to lowest, you notice that those states with the lowest population
density are dominating in the highest suicide rate category. You run another regression,
without fixed effects, and find a highly significant relationship between suicide rates and
population density. Adding some economic variables, such as state per capita income or
the state unemployment rate, does not lower the t-statistic for the population density by
much. However, adding fixed entity and time effects results in an insignificant coefficient for
population density.
(a) What do you think is the cause for this change in significance? That is, which fixed
effect (state or time) do you think is primarily responsible? Does this result imply that
population density is not related to suicide rates? To answer correctly, you should think
about (and discuss) the set of omitted variables the fixed effects are capturing.
(b) Speculate on what might happen to the coefficients of the economic variables when
entity and time fixed effects are included in the regression. Which economic variables are
likely to remain significant? Why? What might happen to the standard errors?
3) Earnings functions, whereby the log of earnings is regressed on years of education,
years of on-the-job training, and individual characteristics, have been studied for a variety
of reasons. Some studies have focused on the returns to education, others on
discrimination, union and non-union differentials, etc. For all these studies, a major concern
has been the fact that ability should enter as a determinant of earnings, but that it is close
to impossible to measure. Assume that the coefficient on years of education is the parameter
of interest. To overcome omitted variable bias, various authors have used
instrumental variables estimation techniques. For each of the instruments potential
instruments listed below, discuss instrument validity. That is, why (or why not) might the
proposed instruments below be relevant and/or exogenous?
(a) The individual’s post code.
(b) The individual’s IQ or test-score on a work-related exam.
(c) Years of education for the individual’s mother or father.
(d) Number of siblings the individual has.
4) In order to overcome omitted variable biases, one often sees that instrumental variable
techniques and panel data fixed effects regressions are attempted. But there are also
additional alternatives in the econometrician’s toolkit that might be fruitfully implemented.
For example, one might be able to perform a randomized controlled experiment, or use a
differences-in-differences approach.
a) Discuss the logic behind a randomized controlled experiment for solving the omitted
variable bias problem. If there were no problems with the experimental design and
implementation, then would OLS yield unbiased and consistent coefficients? What
are some of the problems in randomized controlled experiments that might arise?
Discuss.
b) Write down the panel-data regression representation of the differences-indifferences
approach, with dummies for the treatment group, the time period before
and after program intervention, and the interaction between these two latter dummy
variables. Prove that the coefficient on the interaction measures differences-indifferences.