Education 161 Winter 2000
Assignment 1 Solutions Jan 18,2000
1. MTB > read 'cartoon.dat' c1c9
179 ROWS READ
ROW C1 C2 C3 C4 C5 C6 C7 C8 C9
1 1 0 0 1 107 4 4 * *
2 2 0 0 2 106 9 9 6 5
3 3 0 0 2 94 4 2 3 0
4 4 0 0 2 121 8 8 6 8
. . .
Create a difference variable in the direction of realcartoon
MTB > let c10 = c7  c6
MTB > describe c10
N MEAN MEDIAN TRMEAN STDEV SEMEAN
C10 179 0.821 1.000 0.789 1.415 0.106
MIN MAX Q1 Q3
C10 5.000 3.000 2.000 0.000
MTB > ttest 0 c10
TEST OF MU = 0.000 VS MU N.E. 0.000
N MEAN STDEV SE MEAN T P VALUE
C10 179 0.821 1.415 0.106 7.77 0.0000
clearly the test statistic for zero difference is large enough to reject the
null hypothesis of zero change at any reasonable Type I error rate. For .01
level the critical value is about 2.62
MTB > tinterval 90 c10
N MEAN STDEV SE MEAN 90.0 PERCENT C.I.
C10 179 0.821 1.415 0.106 ( 0.996, 0.646)
MTB > stop
===========================================================================
2.
Use Welch procedure
MTB > twosamplet 95 'low dose' 'control';
SUBC> .
TWOSAMPLE T FOR low dose VS control
N MEAN STDEV SE MEAN
low dose 6 22.0 50.0 20
control 6 31.1 33.2 14
95 PCT CI FOR MU low dose  MU control: (66, 47)
TTEST MU low dose = MU control (VS NE): T= 0.37 P=0.72 DF= 8
MTB > stop
(you may want to try to recreate the interval by hand, taking
Minitab's caculation of nu = 8).
=======================================================================
3.
The ANOVA table is as follows, with calculations below.
SOURCE SS df MS
Between 80 4 20
Within 400 40 10
Total 480 44
SSW = SST  SSB = 48080= 400
df(within) = total n  (# of groups) = (44 + 1)  (4 + 1) = 40
or df(total)  df(between)
MS = SS/df so that MSB = 80/4 = 20 and MSW = 400/40 = 10.
The omnibus null hypothesis is Ho: mu(1) = mu(2) = ... = mu(5)
i.e that all 5 population means are equal, versus an alternative hypothesis
that not all are equal. The test statistic is the ratio of the mean squares
= 20/10 = 2.
The critical value for Type I error rate .10 is F(0.90, 4,40) = 2.09,
(rough interpolation)
Or use Minitab:
MTB >invcdf .90;
SUBC>f 4 40.
0.9000 2.0909
Since 2<2.09, we do not reject the null hypothesis.
NOTE: since subscripts cannot be displayed in this text mode we will usually
employ parens to indicate subscripts etc  e.g. mu(1).

4.
a) The model for this problem is as follows:
Y(ij) = mu + alpha(i) + epsilon(ij)
where
i = 1,2,3 (3 groups)
j = 1,2, ... n(i) where n(1)=12, n(2)=14, n(3)=11
Y(ij) = jth employee's response in the ith group
mu = overall mean
alpha(i) = effect of the ith group
epsilon(ij) = random error (individual differences) associated with
the jth employee in the ith group
(An alternative model in terms of the cell means rather
than main effects could be written:
Y(ij) = mu(i) + epsilon(ij)
where
i = 1,2,3
j = 1,2,...,n(i)
Y(ij) = jth employee's response in the ith group
mu(i) = mean of the ith group
epsilon(ij) = random error associated with the jth employee in the
ith group
)
b) We are given
G(1) G(2) G(3)
n(i) 12 14 11
y(i)bar 25.2 32.6 28.1 (sample means)
s(i)^2 3.6 4.8 5.3 (sample variances)
n = 12+14+11= 37
Textbook guide to calcs below: GH sections 15.5, 15.6
First calculate the grand mean: ybar = 28.862
To calculate grand mean, weight each group mean by its sample size,
add, and divide by total n:
ybar = [25.2(12) + 32.6(14) + 28.1(11)]/(12+14+11) = 28.86
Degrees of freedom between is 2, and within is 34.
Form SSB by deviating
the group means from the grand mean (28.86), squaring the deviations,
multiplying by the group size, and summing over the three groups
(SSB=362.98). MSB is 181.47 (362.93/2).
Now, SSW = (n(1)1)s(1)^2 + (n(2)1)s(2)^2 + (n(3)1)s(3)^2
=11(3.6) + 13(4.8) + 10(5.3)
= 155
MSW is the weighted average (by sample size) of the
withingroup variances = 4.559 which is found by
divide SSW by dfw: 155/34 = 4.559.
Hence the ANOVA table is
SOURCE SS df MS
Between 362.98 2 181.49
Within 155 34 4.558
Total 517.98 36
Test statistic = MSB/MSW = 39.81
The 99th percentile point of F(2,34) is approx. 5.30:
by simple interpolation since F(0.99,2,30)=5.39 and F(0.99,2,40)=5.18
Or better yet use Minitab to get critical value:
MTB > invcdf .99;
SUBC> f 2 34.
Inverse Cumulative Distribution Function
F distribution with 2 DF in numerator and 34 DF in denominator
P( X <= x) x
0.9900 5.2893
Since 39.81 > 5.29 we reject the null hypothesis of equal means in all
groups.

5.
a) MTB > read 'knee.dat' c1 c2
24 ROWS READ
ROW C1 C2
1 29 1
2 42 1
3 38 1
4 40 1
. . .
MTB > describe c1;
SUBC> by c2.
C2 N MEAN MEDIAN TRMEAN STDEV SEMEAN
C1 1 8 38.00 40.00 38.00 5.48 1.94
2 10 32.00 31.00 31.62 3.46 1.10
3 6 24.00 22.50 24.00 4.43 1.81
C2 MIN MAX Q1 Q3
C1 1 29.00 43.00 32.00 42.00
2 28.00 39.00 29.00 35.00
3 20.00 32.00 20.75 27.50
The group means are 38, 32, and 24 for the below average, average, and
above average groups, respectively. Variances are 30.03, 11.97, and 19.62.
(Note: The group means and SDs are also displayed under the ANOVA table)
b)
MTB > dotplot c1;
SUBC> by c2.
C2
1 (below average)
. . . : : .
++++++C1
C2
2 (average)
. : . : . : .
++++++C1
C2
3 (above average)
. . . . . .
++++++C1
20.0 25.0 30.0 35.0 40.0 45.0
These plots illustrate the clustering of the observations in each
group about the group means. The small sample sizes make it difficult
to detect outliers or heteroskedasticity (unequal group variances),
although the observations in the below average group appear to be
somewhat more spread out than are those in the other groups.
c)
MTB > oneway c1 c2 resids in c3 fits in c4;
SUBC> tukey.
(Note: the above command tells Minitab to store the residuals in C3
and the fitted values (which are just the group means) in C4. The
words "resids in" and "fits in" are unnecessary; could just write
MTB >oneway c1 c2 c3 c4)
ANALYSIS OF VARIANCE ON C1
SOURCE DF SS MS F p
C2 2 672.0 336.0 16.96 0.000
ERROR 21 416.0 19.8
TOTAL 23 1088.0
INDIVIDUAL 95 PCT CI'S FOR MEAN
BASED ON POOLED STDEV
LEVEL N MEAN STDEV +++
1 8 38.000 5.477 (*)
2 10 32.000 3.464 (*)
3 6 24.000 4.427 (*)
+++
POOLED STDEV = 4.451 24.0 30.0 36.0
The omnibus null hypothesis is
Ho: mu(1)=mu(2)=mu(3)
We test this against the alternative
Ha: not all mu(i) are equal
Test statistic is MSB/MSW = 336/19.8 = 16.96.
Find critical value F(.95,2,21):
MTB > invcdf .95;
SUBC> f 2 21.
0.9500 3.4668
Since 16.96 > 3.4668, we reject the omnibus null hypothesis and
conclude that there are differences among the three groups.
d) Resids are stored in C3 & fits in C4, from oneway command above.
MTB > plot c3 c4
 *
 *
6.0+
 *
C3  2
 * 2 2
 *
0.0+ *
 * 2
 * *
 2 3

6.0+

 *
 *

++++++C4
25.0 27.5 30.0 32.5 35.0 37.5
We could also plot C3 against C2, or produce aligned dotplots of the
residuals for each group.
Here's how to obtain residuals the long way (remember residuals are
just the differences between each observation and the group mean):
MTB > unstack c1 c3c5;
SUBC> subscripts c2.
MTB > let c6=c3mean(c3)
MTB > let c7=c4mean(c4)
MTB > let c8=c5mean(c5)
MTB > stack c6c8 c9
MTB > plot c9 c2.
The plots suggest that the variability of the observations in the
below average group is greater than that for the other groups (the
dotplots and a quick look at the descriptive statistics support this).
Since the sample sizes are a bit unequal, if one wanted to be very careful,
the best analysis here would be to use something like BMDP7D
which we illustrated with the IBS data to use a oneway anova method
that did not require the equal variance assumption.
e.
from tukey subcommand in the main analysis
MTB > oneway c1 c2 resids in c3 fits in c4;
SUBC> tukey.

TUKEY'S multiple comparison procedure
Nominal level = 0.0500
Family error rate = 0.0500
Individual error rate = 0.0199
Critical value = 3.56
Intervals for (mean of column group)  (mean of row group)
1 2
2 0.680
11.320
3 7.943 2.208
20.057 13.792
Just presenting the output above, is not really
a complete answer. Much better to saythe confidence interval
for mu(1)  mu(2) has endpoints (.68, 11.32) etc etc
None of these intervals includes zero, so we conclude that each group
means differ from one another.
================================================================
6. SMSG :
after you read in the data....
Here's the results of the parametric ANOVA:
MTB > oneway c2 c1
ANALYSIS OF VARIANCE ON C2
SOURCE DF SS MS F
C1 1 367.2 367.2 11.98
ERROR 41 1256.7 30.7
TOTAL 42 1623.8
LEVEL N MEAN STDEV
1 21 11.947 6.035
2 22 17.793 5.016
POOLED STDEV = 5.536
MTB > twot c2 c1;
SUBC> pooled.
TWOSAMPLE T FOR Y
C1 N MEAN STDEV SE MEAN
1 21 11.95 6.03 1.3
2 22 17.79 5.02 1.1
95 PCT CI FOR MU 1  MU 2: (9.3, 2.4)
TTEST MU 1 = MU 2 (VS NE): T=3.46 P=0.0013 DF=41.0
you will find 3.46^2 = 11.98 (to the accuracy provided)
fact (tvariate with m df)^2 distributed as F(1,m)

7.
Group A B C D E
Sample size is 10 for all groups
first thing is to take a look at some descriptives:
MTB > describe c1c5
N MEAN MEDIAN TRMEAN STDEV SEMEAN
C1 10 12.050 12.350 12.087 0.829 0.262
C2 10 11.020 11.200 10.988 1.121 0.355
C3 10 10.270 10.350 10.325 1.026 0.325
C4 10 9.270 9.100 9.037 1.159 0.366
C5 10 12.170 11.850 12.100 0.792 0.250
MIN MAX Q1 Q3
C1 10.700 13.100 11.150 12.625
C2 9.100 13.200 10.375 11.550
C3 8.500 11.600 9.450 11.225
C4 8.200 12.200 8.450 9.525
C5 11.200 13.700 11.650 12.825
MTB > aovoneway c1c5
ANALYSIS OF VARIANCE
SOURCE DF SS MS F p
FACTOR 4 59.879 14.970 15.07 0.000
ERROR 45 44.704 0.993
TOTAL 49 104.583
LEVEL N MEAN STDEV
C1 10 12.050 0.829
C2 10 11.020 1.121
C3 10 10.270 1.026
C4 10 9.270 1.159
C5 10 12.170 0.792
POOLED STDEV = 0.997
Test statistic = 14.9698/0.9932= 15.072
Compare with F(0.95,4,45) = (approx) 2.6
Since 15.072 > 2.6 we reject the null hypothesis of no difference between
the group means.
Pairwise comparisons of population means using Tukey's W at familywise
error rate =0.05
We have 5 groups, 45 degrees of freedom within, n=10 (number
of observations in each group)
Thus (from Tables) q(0.95, 5, 45) = (approx) 4.025
(By interpolation,
q(0.95, 5, 40) = 4.04 and q(0.95, 5, 60) = 3.98)
and we construct interval estimates from the sample mean difference to
and W (which stays the same for all comparisons since the sample sizes
are equal)
W = q(0.95, 5, 45)*Sqrt(MSW / n)
= 4.025 * Sqrt(0.9932/10)
= 1.2685 (the honest significant difference)
Interval estimates of any two
population means are Xbar(i.)  Xbar(i'.) +/ W.
Minitab will give these to us using the tukey subcommand (or do 'em by hand)
The first step is to read and then stack the data.....
MTB > read '[data from file]' c1c5
10 ROWS READ
ROW C1 C2 C3 C4 C5
1 12.4 9.1 8.5 8.7 12.7
2 10.7 11.5 11.6 9.3 13.2
3 11.9 11.3 10.2 8.2 11.8
4 11.0 9.7 10.9 8.3 11.9
. . .
MTB > stack (c1) (c2) (c3) (c4) (c5) (c6);
SUBC>subscripts c7.
MTB >
MTB > describe c6;
SUBC>by c7.
C7 N MEAN MEDIAN TRMEAN STDEV SEMEAN
C6 1 10 12.050 12.350 12.087 0.829 0.262
2 10 11.020 11.200 10.988 1.121 0.355
3 10 10.270 10.350 10.325 1.026 0.325
4 10 9.270 9.100 9.037 1.159 0.366
5 10 12.170 11.850 12.100 0.792 0.250
C7 MIN MAX Q1 Q3
C6 1 10.700 13.100 11.150 12.625
2 9.100 13.200 10.375 11.550
3 8.500 11.600 9.450 11.225
4 8.200 12.200 8.450 9.525
5 11.200 13.700 11.650 12.825
Here we can do a oneway and get the Tukey intervals
MTB > oneway c6 c7;
SUBC>tukey.
ANALYSIS OF VARIANCE ON C6
SOURCE DF SS MS F p
C7 4 59.879 14.970 15.07 0.000
ERROR 45 44.704 0.993
TOTAL 49 104.583
TUKEY'S multiple comparison procedure
Nominal level = 0.0500
Family error rate = 0.0500
Individual error rate = 0.00672
Critical value = 4.02
Intervals for (mean of column group)  (mean of row group)
1 2 3 4
2 0.2359
2.2959
3 0.5141 0.5159
3.0459 2.0159
4 1.5141 0.4841 0.2659
4.0459 3.0159 2.2659
5 1.3859 2.4159 3.1659 4.1659
1.1459 0.1159 0.6341 1.6341
if instead you want a 90% familywise confidence you say
MTB > oneway c6 c7;
SUBC>tukey 10.
TUKEY'S multiple comparison procedure
Nominal level = 0.100
Family error rate = 0.100
Individual error rate = 0.0147
Critical value = 3.59
Intervals for (mean of column group)  (mean of row group)
1 2 3 4
2 0.1022
2.1622
3 0.6478 0.3822
2.9122 1.8822
4 1.6478 0.6178 0.1322
3.9122 2.8822 2.1322
5 1.2522 2.2822 3.0322 4.0322
1.0122 0.0178 0.7678 1.7678
****************************************************
If instead the problem had asked for tests of the pairwise differences; i.e.
means and for groups i and j significantly differ if
y(i)bar  y(j)bar > = W
(Please note that the symbols ... are used to indicate "absolute value"
W = 1.2685
We conclude that two population means are significantly different if
y(i)bar  y(j)bar >= W
For this problem, the differences are
y(1)bar  y(2)bar = 1.03 < W ===> not significantly different
y(1)bar  y(3)bar = 1.78 > W ===> significantly different population means
y(1)bar  y(4)bar = 2.78 > W ===> significantly different
y(1)bar  y(5)bar = 0.12 < W ===> not significantly different
y(2)bar  y(3)bar = 0.75 < W ===> not significantly different
y(2)bar  y(4)bar = 1.75 > W ===> significantly different
y(2)bar  y(5)bar = 1.15 < W ===> not significantly different
y(3)bar  y(4)bar = 1.00 < W ===> not significantly different
y(3)bar  y(5)bar = 1.90 > W ===> significantly different
y(4)bar  y(5)bar = 2.90 > W ===> significantly different
Note that the pairs of means above that are significantly different
according to these tests are the same pairs whose intervals using
the same .05 error rate do not contain zero.

8.
Don't Sweat
a. Since there are three groups, dfbetween = 31 = 2
Subtract SS to get SSBetween:
SSB=SSTSSE = 4390 3813 = 577
Divide to get MSbetween:
MSB=SSB/dfbetween = 577/2 = 288.5
b. Ho: mu(1)=mu(2)=mu(3)
Ha: not! (or more formally not all mu(i) equal)
calculate test statistic:
MSB/MSE = 288.5/141= 2.05
look up critical F value (the 90th percentile
puts area .10 in the tail)
F(2,24; .90) = 2.54
F(2,30; .90) = 2.49
so the 2,27 critical value is somewhere in between...
by linear interpolation (or just estimating)
F(2,27; .90) = 2.515
compare the test stat with the F critical value:
2.05 < 2.515
Do not reject H0!
**Note as this was originally a quiz question the solutions were
properly in terms of what you get from the book tables
As a check, let's use the computer as follows:
MTB> invcdf .90;
SUBC> f 2 27.
which gives F(2,27; .90)=2.5107.
actually rough interpolation is pretty decent.*******
c. We're looking for the CI for mu(2)mu(3)...
point estimate=X(2)barX(3)bar= 48.00  57.50 = 9.5
For Tukey procedure where all groups have equal n's, the interval
widths are all the same...so use one of the provided CIs to find the
answers...
width=11.79 (10.99) = 2.29  (20.49) = 22.78
point estimate + half the width gives the answer...
9.5 + (22.78/2) > 20.89, 1.89
dddddd= 20.89
eeeeee= 1.89
The above is the minimum arithmetic (thinking) way. You could also do it
from scratch.

For reference here is the unedited output (you could reproduce
it from the data)
ANALYSIS OF VARIANCE ON C10
SOURCE DF SS MS F p
C11 2 577 289 2.04 0.149
ERROR 27 3813 141
TOTAL 29 4390
LEVEL N MEAN STDEV
1 10 48.40 13.33
2 10 48.00 12.26
3 10 57.50 9.79
POOLED STDEV = 11.88
Tukey's pairwise comparisons
Family error rate = 0.100
Individual error rate = 0.0413
Critical value = 3.03
Intervals for (column level mean)  (row level mean)
1 2
2 10.99
11.79
3 20.49 20.89
2.29 1.89

END