I'm trying to figure out how to set-up a multilevel model to predict monthly health outcomes with a time-varying predictor of interest using Stata?
The model is specified as follows:
xtmixed nopregrate i.naics1 i.naics1#c.rollingunemp i.disabyear i.disabmonth || naics1: || unique: ,res(ar 1,t(yearmonth2))
Variables and data structure
- nopregrate is a monthly disability outcome
- yearmonth2 is a time variable from 1 to 60 represent each month over 5 years
- i.disabyear and i.disabmonth represent months years 1 to 5 and months 1 to 12, respectively.
- (i prefix indicates a dummy variable in STATA)
- Unique is the employer id.
- i.naics1 is a series of dummy variables for 12 industry sectors
- yearmonth2 and i.naics together as associated with a unique monthly unemployment rate (60 timepoints by 12 naics codes).
- i.naics1#c.rollingunemp interacts the sector dummies with their corresponding continuous measure of rolling unemployment rate for each of the 60 time periods.
What's tripping me up is the time-varying aspect of the unemployment rate, and time-varying rates are nested by industry. NAICS is the industry id code. It seems there should be a time-varying slope specified across the industry and time levels for the effect of industry sector on the relationship between the monthly unemployment rate and the monthly disability outcome (over the 60 time points). The levels in the model are 12 industry sectors, 10520 employers (unique is employer id), and the observational level of 225468 observations. There is an unbalanced structure because of missing values over time else we might expect 631,200 observations (60 time points X 10520 = 631,200). I'm having trouble following the STATA syntax and have a sense that the time variation is not being handled correctly in the model, but I don't have any alternative STATA syntax to offer.