Education 161 Winter 2000
Assignment 3 Due Feb 22, 2000
Note data files are available in one of two locations:
path: /usr/class/ed161/[data file]
or using web-services at URL
http://www.stanford.edu/class/ed161/hw/[data file]
1. For the NELS data (see file description in Course Examples Index)
obtain the correlation between 10th grade science achievement scores
and 8th grade science scores. Does this correlation change
when it is computed for males and females separately?
[note you will likely want to use the minitab
copy command along with the "use" subcommand]
--------------------------------------------------------------------
2. In the file 'hw3p2.dat' are X (C1) and Y (C2).
Estimate E(Y|X) and give
interval estimates for the intercept and slope parameters.
Examine the effects of anomolous and/or influential
observations on the fits and the parameter estimates.
-----------------------------------------------------------
3. The file 'prognosis.dat'
contains data on days hospitalized (X in C1) and
a prognosis index (Y in C2) for 15 severely injured patients.
A hospital administrator wants to develop a prediction equation
for the long term prognosis using the length of the hospital stay.
(a) Develop a prediction equation by straightening the
scatterplot and using a straight-line fit.
Give the fit and an interval estimate for
a patient hospitalized 10 days.
Repeat for 60 days hospitalization.
(b) For this same problem develop a prediction equation for the
long term prognosis by fitting a polynomial.
Compare the fits and a interval estimate for expected prognosis
for a patient hospitalized 10 days from the
two approaches-- polynomial fit vs straightening the scatterplot
and using a straight-line fit in part a. Repeat the comparison for
60 days hospitalization.
--------------------------------------------
4. Bodyfat data revisited
By referring to Course Example file bodyfat.out
or by redoing the analyses, use this
example to once-again illustrate the vagaries of multiple
regression coefficients (and improper attempts to interpret
them).
Which of the three predictors--triceps X1, thigh X2 or
midarm X3-- is the best single predictor of bodyfat?
What is the regression coefficient for that predictor in
a single predictor eqaution? What is the corresponding
t-statistic for that coefficient?
Now consider the regression using both triceps and thigh as
predictors. Compare the coefficients (and their t-statistics)
from this multiple regression with the corresponding single
predictor equations.
Now consider the multiple regression using all three predictors.
For triceps and thigh, compare the coefficients (and their
t-statistics) from this multiple regression with the results from
the previous regression equations. To decrease bodyfat does one puff
up one's thighs?
------------------------------------------------------------
5. Patient Satisfaction Data. The data reside in file patient.dat
A hospital adminstrator wished to study the relation
between patient satisfaction Y (in C1) and X1 patients age (in C2),
X2 an index of severity of illness (in C3), and X3
anxiety level (in c4) where larger values of Y X2 X3 indicate
more satisfaction, more severe illness and more anxiety.
a. Prepare a stem-and-leaf plot for each of the predictor
variables. Are any noteworthy features revealed by these plots?
b. Fit multiple regression model (flat plane) for three predictor
variables to the data and state the estimated regression function. How
is the coefficient for X2 interpreted here?
c. Obtain the residuals and prepare a box plot of the residuals. Do
there appear to be any outliers?
d. Using the regression model in part b using three predictor
variables , Test whether there is a regression relation; use Type I
error rate = .10. State the alternatives, decision rule, and
conclusion.
e. for the fit in part b verify that the regression coefficients can
be obtained from straight line fits to the corresponding partial
regression plots. Use the coefficient for X2 as your example.
---------------------------------------------------
6. Consider a one-way classification with four levels (I = 4).
We are given the population cell means (mu(1) through mu(4))
as: 7, 9, 6, 15.
Consider the general linear model setup (with 3 group membership
indicators)
E(Y|G1,G2,G3) = beta0 + beta1*G1 + beta2*G2 + beta3*G3
where
G1 = 1 if treatment 2 G1 = 0 otherwise
G2 = 1 if treatment 3 G2 = 0 otherwise
G3 = 1 if treatment 4 G3 = 0 otherwise
a. Determine the values for the 4 betas in the regression model
b. Express mu(3) - mu(2) in terms of the betas. Check by numerical
substitution.
---------------------------------------------------------------
7. File salary.dat contains data from a salary survey discussed
in lecture: C1 is experience,
c2 is education level (1 for HS, 2 for BS, 3 for advanced degree),
c3 indicate management position (=1) or not,
and c4 is the outcome measure salary.
First, code the 3 levels of education using 2 group membership
indicators (so that education is not used as an interval scale).
In the solutions we use HS as the base --0 0 code.
What is the single best predictor of salary?
Predict salary using experience, education, and management.
Add to the model two management-education interaction terms. Do
these terms add significantly to the prediction?
Give an interval estimate of the value of an additional year of
experience.
Repeat for an advanced degree in addition to the BS--
(i.e comparison asked for here is
the comparison between advanced and H.S, *not* to indicate I want
a differential between advanced deg and B.S. That's a harder thing
to do in this coding although it can be done)
--------------------------------------------------------------------
8. (former quiz question)
A study of several hundred professors' salaries in a large
American university in 1969 (AER, 1973, p.469) yielded the following
prediction equation: S = 1900 + 230*B + 18*A + 100*E + 490*D + 190*Y
+ 50*T - 2400*X where S is annual salary, B is number of books
written, A number of ordinary articles, E number of excellent
articles, D number of Ph.D.'s supervised, Y years experience, T = 1
if student evaluations above median, 0 otherwise, X = 1 if female, 0
otherwise.
For a prof with B=A=E=D=X=1 and Y=5, what's the
expected change in salary if she goes from very good to poor student
evaluations?
Mean salaries were $16,100 for males and $11,200 for females.
What is the value of the slope from a simple S on X regression?
-------------------------------------------------------------------
9. A researcher is studying the effect of an incentive on the
retention of subject matter and is also interested in the role of
time devoted to study. Subjects are randomly assigned to two groups,
one receiving (C3 = 1) and the other not receiving (C3 = 0) an
incentive. Within these groups, subjects are randomly assigned to 5,
10, 15, or 20 minutes of study (C2) of a passage specifically
prepared for the experiment. At the end of the study period, a test
of retention (C1) is administered. We treat the study time as a
covariate for investigating the differential effects of the
incentive.
Part I: ANCOVA
Use the Minitab output below to answer the following questions.
(This is a quiz question from prior year)
(for reference raw data are in file retention.dat)
What is the slope of the C1 on C2
regression line for the 12 subjects in the incentive group?
What is the correlation between C1 and C2 for the incentive group?
Construct a 99% confidence interval for the analysis of covariance treatment
effect.
MTB > ancova c1 = c3;
SUBC> covariates c2;
SUBC> means c3.
Analysis of Covariance for C1
Source DF ADJ SS MS
Covariates 1 42.008 42.008
C3 1 100.042 100.042
Error 21 30.575 1.456
Total 23 172.625
Covariate Coeff Stdev t-value
C2 0.2367 0.0441 5.371
ADJUSTED MEANS
C3 N C1
0 12 5.8333
1 12 9.9167
MTB > describe c1-c2;
SUBC> by c3.
C3 N MEAN MEDIAN STDEV
C1 0 12 5.833 5.500 1.850
1 12 9.917 10.000 1.782
C2 0 12 12.50 12.50 5.84
1 12 12.50 12.50 5.84
MTB > let c4 = c2*c3
MTB > regress c1 3 c3 c2 c4
The regression equation is
C1 = 2.50 + 4.83 C3 + 0.267 C2
- 0.0600 C4
Predictor Coef Stdev
Constant 2.5000 0.8646
C3 4.833 1.223
C2 0.26667 0.06314
C4 -0.06000 0.08929
MTB > regress c1 2 c3 c2
The regression equation is
C1 = 2.87 + ???? C3 + ????? C2
Predictor Coef Stdev
Constant 2.8750 0.6517
C3 ?????? 0.4926
C2 ??????? 0.04406
----------------------------------------
END HW3