求大神帮忙,十万火急。若能看懂请联系aaronluo921@hotmail.com或手机:15692402707.
第一题:In 2004, the state of North Carolina released a large data set containing information on
births recorded in this state. A random sample of observations from this data set is available
at . We have observations on 13 dierent variables, some categorical and some numerical. The
meaning of each variable is as follows:
fage: fathers age in years.
mage: mothers age in years.
mature: maturity status of mother.
weeks: length of pregnancy in weeks.
premie: whether the birth was classied as premature (premie) or full-term.
visits: number of hospital visits during pregnancy.
marital: whether mother is married or not married at birth.
gained: weight gained by mother during pregnancy in pounds.
weight: weight of the baby at birth in pounds.
lowbirthweight: whether baby was classied as low birthweight (low) or not (not low).
gender: gender of the baby, female or male.
habit: status of the mother as a nonsmoker or a smoker.
whitemom: whether mom is white or not white.
Pick a pair of numerical and categorical variables and come up with a research question
evaluating the relationship between these variables. Formulate the question in a way that it
can be answered using a hypothesis test and/or a condence interval.
第二题:The dataset contains the results of a study on gambling amongst
teenagers in the UK.
(a) Estimate a linear regression model with gambled amount (gamble) as the response, and
socioeconomic status, income (given in pounds per week), sex (with 0 being males) and
verbal score as explanatory variables. Present the output from the estimation.
(b) Which variables are statistically signicant at the 0:05 signicance level?
(c) Using the same signicance level, test the hypothesis that the eect of income is equal to
3.
(d) Construct a 90% condence interval for the verbal score variable. What can you conclude
about the relationship between the verbal score and gambling?
(e) Holding all other predictors constant, what would be the dierence in predicted expenditure
on gambling for a male compared to a female?
(f) Using the predict() function, predict the amount that a male with average status, income
and verbal score would gamble.
(g) What percentage of variation in the response is explained by the given predictors?
(h) Which observation has the largest positive residual?
(i) Plot a density of the residuals; describe the distribution in terms of skew.
(j) Compute the correlation of the residuals and the income variable. What did you expect to
nd, and on which regression assumption did you base this expectation?
还有两题!