Hi, I am doing an analysis of academic outcomes for children in private and state schools, and am trying to use the propensity score approach.
I want to match on the exact propensity score, dropping unmatched cases from the sample. I did try the binning approach, but since my dataset is large (more than 10,000 cases), it was impossible to balance the bins.
I have calculated the propensity score using 'save predicted values -probabilities' in binary logistic regression, with the 'treatment' (state/private school) as the dependent variable, and a set of predictors (social class, etc), as follows:
LOGISTIC REGRESSION private
/METHOD = ENTER region3s faclas7m educatio famtrad kidno mobooks moint Zabilit11 teacha_1 teachmiss abilmiss
/CONTRAST (region3s)=Indicator /CONTRAST (faclas7m)=Indicator /CONTRAST (educatio)=Indicator /CONTRAST
(famtrad)=Indicator /CONTRAST (kidno)=Indicator /CONTRAST (mobooks)=Indicator /CONTRAST (moint)=Indicator /CONTRAST
(abilmiss)=Indicator /CONTRAST (teachmiss)=Indicator
/SAVE = PRED
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
My problem is that the number of values I get from this is huge - it exceeds 1000, so I can't even run a crosstabs. I can run a table of frequencies, but it's too huge to print out. My questions are:
- Am I doing something wrong?
- Is it acceptable to group the propensity scores together - e.g. into percentiles or deciles, before dropping unmatched cases, or would this defeat the object?
- Has anyone written syntax to identify/drop unmatched cases? (Doing it by hand is a daunting task with so many values!).
Many Thanks,
Alice
Dr. Alice Sullivan,
Centre for Longitudinal Studies, Institute of Education 20 Bedford Way, LONDON WC1H OAL