Using Gee Method to analyze whether Church Life
Affects Life Expectation
Kaiqing Fan
Introduction
The Iowa 65+ Rural Health Study (Cornoni-Huntley et al., 1986) followed a cohort of elderly males and females over a six-year period (at years 0, 3, and 6). One question asked at each survey was “Do you regularly (at least once a month) attend religious meetings or services?” At each of three surveys, the response to one of the variables of interest, church attendance, was classified as “yes” if the subject attends church regularly, or “no” if the subject does not attend church regularly. Table 7.6 displays the data from the 1311 females and 662 males who responded to all three surveys. Interest focuses on determining whether church attendance rates are changing over time, whether the attendance rates differ between females and males, and whether the observed patterns of change over time are the same for females and males.
Because of the binary nature of the response variable, this project used the GEE approach and the independence working correlation model to fit a logistic model for predicting church attendance as a function of sex or survey year or the interaction. Unstructured, exchangeable and independence working correlation models were used to fit the data. Also to compare this logistic model with a reduced model based on the tests of nonlinearity and parallelism, and a marginal logit model based on Weighted Least Squares to choose more appropriate model for the data.
Statistical Analysis
1) Used GEE approach to fit a logistic model
Initial analysis
Sex=0 , for female; 1, for male;
For the year effect we suppose:
year0=1 if year=0, 0 if year≠0;
year3=1 if year=3, 0 if year≠3;
year6=1 if year=6, 0 if year≠6.
To analyze the data, firstly, interaction between sex and survey year should be considered. And secondly, a working correlation matrix should be specified for GEE approach. For purposes of comparison, the independent, exchangeable and unstructured working correlation structures will be considered.
Fit a logistic model
For model 1, that is, full model, covariates sex , year and their interaction are included. SAS output gives us the same Wald statistics Chi-Square test result for independent, unstructured and exchangeable working correlation structure. Three of the Wald statistics for type 3 GEE analysis give us the same result to test the main effects of sex and year and their interaction.
Wald Statistics For Type 3 GEE Analysis
Chi-
Source DF Square Pr > ChiSq
sex 1 32.42 <.0001
year 2 38.75 <.0001
sex*year 2 2.47 0.2909
The p-value of the interaction is 0.2909, greater than 0.05, there is no evidence that sex*year has effect on dependent variable yij. i.e. there are no year*sex effect. So we do not need to keep the interaction in our model. This is also called test of parallelism.
For model 2, that is, the reduced model, unstructured working correlation structure gives the Wald Chi-Square test for the main effects of sex and year.
Wald Statistics For Type 3 GEE Analysis
Chi-
Source DF Square Pr > ChiSq
sex 1 31.55 <.0001
year 2 41.82 <.0001
Both the p-values of sex and year are significant, there are significant evidence that covariates sex and year have effects on dependent variable yij.
In the independent and exchangeable working correlation structure, SAS output gives us very similar results, showed as following:
For independent structure,
Wald Statistics For Type 3 GEE Analysis
Chi-
Source DF Square Pr > ChiSq
sex 1 32.17 <.0001
year 2 41.78 <.0001
For exchangeable structure,
Wald Statistics For Type 3 GEE Analysis
Chi-
Source DF Square Pr > ChiSq
sex 1 31.39 <.0001
year 2 41.84 <.0001
All of the p-values of sex and year in these three of them are strongly significant. All of the Wald Chi-square tests obtained from independent, exchangeable and unstructured working correlation structures give us the same result.
Parameter estimated
Actually, the three working correlation structures give us very similar parameter estimates for model 2. Also the standard error, confidence limits and Z-values are very similar. I’d like to choose the parameter estimates of the unstructured structure to do my further analysis.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 0.6217 0.0780 0.4688 0.7746 7.97 <.0001
sex 0 0.5338 0.0950 0.3475 0.7200 5.62 <.0001
sex 1 0.0000 0.0000 0.0000 0.0000 . .
year 0 0.2886 0.0470 0.1964 0.3808 6.14 <.0001
year 3 0.2216 0.0427 0.1379 0.3053 5.19 <.0001
year 6 0.0000 0.0000 0.0000 0.0000 . .
For the exchangeable structure:
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 0.6221 0.0781 0.4691 0.7751 7.97 <.0001
sex 0 0.5328 0.0951 0.3464 0.7192 5.60 <.0001
sex 1 0.0000 0.0000 0.0000 0.0000 . .
year 0 0.2886 0.0470 0.1965 0.3807 6.14 <.0001
year 3 0.2217 0.0427 0.1380 0.3054 5.19 <.0001
year 6 0.0000 0.0000 0.0000 0.0000 . .
For the independent structure:
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z| odds ratios
Intercept 0.6178 0.0782 0.4645 0.7712 7.90 <.0001
sex 0 0.5412 0.0954 0.3542 0.7282 5.67 <.0001 1.72
sex 1 0.0000 0.0000 0.0000 0.0000 . .
year 0 0.2887 0.0471 0.1965 0.3810 6.13 <.0001 1.33
year 3 0.2218 0.0427 0.1381 0.3056 5.19 <.0001 1.25
year 6 0.0000 0.0000 0.0000 0.0000 . .
From the independent structure table above, the odds ratios of attending church regularly versus not attending are greater for females than for males, and decreases as survey year increases. The odds ratios of attending churched regularly are estimated to be 1.72 times as high for females as for males. The odds ratios of attending church regularly are 1.33 times as high for year0 as for year3, 1.25 times as high for year3 as for year6.
2, weighted least square approach to fit a marginal logit model
For the notation of the linear model using WLS approach, let y=(y1,y2,….,yn)ˈ be a nx1 vector of observations. Suppose that y~Nn(Xβ,V), then the linear model is:
Y=Xβ+ε , where x is an nxp design model matrix with p=<n, β is a px1 vector of parameters,
ε~Nn(On,V).
Our data belong to repeated measures, 2 response levels and 2 populations.
For model 1, it includes sex (i.e. male, female), trial (i.e. year) and sex*trial, we use proc catmod procedure, get the following output:
Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 6154.13 <.0001
sex 1 30.04 <.0001
Trial 2 34.83 <.0001
sex*Trial 2 0.87 0.6476
Residual 0 . .
The ANOVA Chi-square test gives all of their p-values, only the p-value of sex*trial is non-significant, so we should drop sex*trial in our model.
The parameter estimates and Chi-Square tests based on WLS is shown as following:
Saturated Model
The CATMOD Procedure
Analysis of Weighted Least Squares Estimates
Standard Chi-
Effect Parameter Estimate Error Square Pr > ChiSq
Intercept 1 0.7384 0.00941 6154.13 <.0001
sex 2 0.0516 0.00941 30.04 <.0001
Trial 3 0.0201 0.00488 17.04 <.0001
4 0.0106 0.00454 5.47 0.0194
sex*Trial 5 0.00453 0.00488 0.86 0.3534
6 -0.00197 0.00454 0.19 0.6641
It is very clear that the sex*trial parameter are replaced by trial(sex=1), trial(sex=2), both of them are non-significant, we do not need to retain them in the final model.
For final model,
Response Functions and Design Matrix
Function Response Design Matrix
Sample Number Function 1 2 3 4
1 1 0.81465 1 1 0 0
2 0.79863 1 1 0 0
3 0.75667 1 1 0 0
2 1 0.70242 1 -1 1 0
2 0.69940 1 -1 0 1
3 0.65861 1 -1 -1 -1
This output displays the design matrix resulting from the nested effect.
Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 6207.26 <.0001
sex 1 32.71 <.0001
Trial(sex=2) 2 11.00 0.0041
Residual 2 30.46 <.0001
The ANOVA table in the above shows that the final model does not fit well, that there is a significant sex effect, and that there is a significant trial(sex=2) effect, since the goodness-of-fit statistic (the residual Chi-square) is significant.
Conclusion
1, The reduced logistic model obtained from GEE approach for the our dataset is
Log(μij/(1- μij))=0.6217+0.5338*sex+0.2886*year0+0.2216*year3.
In this model, the odds ratios of attending church regularly versus not attending are higher for females than for males, and goes down as survey year goes up.
2, Compared the reduced logistic model obtained from GEE approach with the marginal logit model obtained from WLS approach, it is clearly that we should choose the model produced by GEE, drop the model chosen by WLS since the goodness-of-fit test of ANOVA from WLS already gave us the result: the marginal logit model does not fit the data well.
Reference
1, Code of GEE:
data fan;
input sex year0 year3 year6 count;
datalines;
0 1 1 1 158
0 1 1 0 30
0 1 0 1 22
.....
1 0 1 1 26
1 0 1 0 12
1 0 0 1 36
1 0 0 0 391
;
run;
data fan1;
set fan;
retain sex year0 year3 year6;
if _N_=1 then
do i=1 to 158;output;end;
else if _N_=2 then
do i=159 to 188;output;end;
else if _N_=3 then
do i=189 to 210;output;end;
else if _N_=4 then
do i=211 to 243;output;end;
else if _N_=5 then
do i=244 to 294;output;end;
else if _N_=6 then
do i=295 to 319;output;end;
else if _N_=7 then
do i=320 to 407;output;end;
else if _N_=8 then
do i=408 to 1311;output;end;
else if _N_=9 then
do i=1312 to 1454;output;end;
else if _N_=10 then
do i=1455 to 1472;output;end;
else if _N_=11 then
do i=1473 to 1493;output;end;
else if _N_=12 then
do i=1494 to 1508;output;end;
else if _N_=13 then
do i=1509 to 1534;output;end;
else if _N_=14 then
do i=1535 to 1546;output;end;
else if _N_=15 then
do i=1547 to 1582;output;end;
else if _N_=16 then
do i=1583 to 1973;output;end;
run;
data fan2(drop=count year0 year3 year6);
set fan1(rename=(i=id));
year=0; outcome=year0; output;
year=3; outcome=year3; output;
year=6; outcome=year6; output;
run;
proc genmod data=fan2;
class id sex year;
title 'model1 for problem9.6(indepence structure)';
model outcome=sex year sex*year/ d=bin type3 wald;
repeated subject=id / type=ind;
run;
proc genmod data=fan2;
class id sex year;
title 'model1 for problem9.6(unstructured)';
model outcome=sex year sex*year/ d=bin type3 wald;
repeated subject=id / type=unstr;
run;
proc genmod data=fan2;
class id sex year;
title 'model1 for problem9.6(exchangeable structure)';
model outcome=sex year sex*year/ d=bin type3 wald;
repeated subject=id / type=exch;
run;
/*
proc genmod data=fan2;
class id sex year;
title 'model1 for problem9.6(exchangeable structure)';
model outcome=sex year sex*year/ d=bin type3 wald;
repeated subject=id / type=toep;
run; */
proc genmod data=fan2;
class id sex year;
title 'model2 for problem9.6(independence structure)';
model outcome=sex year/ d=bin type3 wald;
repeated subject=id / type=ind;
run;
proc genmod data=fan2;
class id sex year;
title 'model2 for problem9.6(unstructure)';
model outcome=sex year/ d=bin type3 wald;
repeated subject=id / type=unstr;
run;
proc genmod data=fan2;
class id sex year;
title 'model2 for problem9.6(exchangeable structure)';
model outcome=sex year/ d=bin type3 wald;
repeated subject=id / type=exch;
run;
2, CODE of WLS
data fan;
input sex year0 year3 year6 count;
datalines;
1 1 1 1 158
1 1 1 0 30
1 1 0 1 22
......
2 0 1 1 26
2 0 1 0 12
2 0 0 1 36
2 0 0 0 391
;
run;
proc catmod data=fan;
weight count;
response marginals;
model year0*year3*year6=sex _response_ sex*_response_
/ freq;
repeated Trial 3;
title2 'Saturated Model';
run;
model year0*year3*year6=sex _response_(sex=2)
/ noprofile noparm design;
title2 'Trial Nested within Group 3';
quit;
3, code of WLS from Dr Li
options ls=85 ps=65 nodate;
data dat;
input sex $ year0 $ year3 $ year6 $ count;
datalines;
F N N N 158
F N N Y 30
F N Y N 22
......
M Y N N 26
M Y N Y 12
M Y Y N 36
M Y Y Y 391
;
proc catmod data=dat order=data;
weight count;
population sex;
response marginals;
model year0*year3*year6=sex/averaged;
repeated year 3;
title'No year effect';
run;
proc catmod data=dat order=data;
weight count;
population sex;
response marginals;
model year0*year3*year6=(1 -1 1 0 0 0,
1 0 -2 0 0 0,
1 1 1 0 0 0,
0 0 0 1 -1 1,
0 0 0 1 0 -2,
0 0 0 1 1 1);
title "fan";
contrast 'no year effect' all_parms 0 1 0 0 0 0,
all_parms 0 0 1 0 0 0,
all_parms 0 0 0 0 1 0,
all_parms 0 0 0 0 0 1;
contrast 'No year effect in females' all_parms 0 1 0 0 0 0,
all_parms 0 0 1 0 0 0;
contrast 'No year effect in males' all_parms 0 0 0 0 1 0,
all_parms 0 0 0 0 0 1;
contrast 'No linear year effect' all_parms 0 1 0 0 0 0,
all_parms 0 0 0 0 1 0;
contrast 'No linear year effect in females' all_parms 0 1 0 0 0 0;
contrast 'No linear year effect in males' all_parms 0 0 0 0 1 0;
contrast 'No quadratic year effect' all_parms 0 0 1 0 0 0,
all_parms 0 0 0 0 0 1;
contrast 'No quadratic year effect in females' all_parms 0 0 1 0 0 0;
contrast 'No quadratic year effect in males' all_parms 0 0 0 0 0 1;
contrast 'Equality of intercepts' all_parms 1 0 0 -1 0 0;
contrast 'Equality of linear year effect' all_parms 0 1 0 0 -1 0;
contrast 'Equality of quadratic year effect' all_parms 0 0 1 0 0 -1;
run;
proc catmod data=dat;
weight count;
response marginals;
model year0*year3*year6=sex;
repeated year 3;
run;
quit;
proc catmod data=dat;
weight count;
response marginals;
model year0*year3*year6=_response_;
repeated year 3;
run;
quit;