ED161 Winer 2000
Startup Problem Solutions

note: Alex drafted these solutions and his method of obtaining
the data for these two schools is elegant. More basic method
is accessing the data via browser or from commandline leland display
and then you can cutandpaste the data sets to your desktop.
some key things to note:
1. the use of the "read" command which you will
also see in the various course examples. If you
had only cutandpasted the data for the two schools
you would be reading in much smaller data sets.
2. The use of the Manip...Subset Worksheet menu item
to select out these schools is quite elegant (and
something I hadn't thought of).
drr

These problems were done on Minitab 12.1. After enabling the command
language From the 'Editor' menu, the
code and output presented below was either typed directly or created
through using the dialouge boxes. The
commands are included so you can reproduce the output and learn the
simplicity of the command lagauge.
After FTPing the data files to my computer, I used the File...Other
Files...Special text menu options to import the data
into Minitab. This is translated by the program into the command
language below. Some may find it easier
just to type the commands at the prompt.
MTB > Read "C:\My Documents\ED161\Hsb1.dat" c1c5;
SUBC> Decimal ".".
Entering data from file: C:\My Documents\ED161\Hsb1.dat
7185 rows read.
MTB > Read "C:\My Documents\ED161\Hsb2.dat" c1c6;
SUBC> Decimal ".".
Entering data from file: C:\My Documents\ED161\Hsb2.dat
160 rows read.
Problem 1)
First, look at HSB2.dat to determine which school is the first public
school = 0, and which is the first Catholic school = 1.
These turn out to be schools with ID# 1224 and 1308 respectively.
There are many ways to just get the appropriate subsets of data to work with.
I used the handy Manip...Subset Worksheet menu item that produced
separate datasets for each of the schools. For example,
this is the code to get the public school 1224 worksheet:
Current worksheet: hsbstudent.MTW
MTB > Subset;
SUBC> Where "ID=1224";
SUBC> Name "Subset of hsbstudent.MTW";
SUBC> NoMatrices;
SUBC> NoConstants;
SUBC> Include.
Subset worksheet 'Subset of hsbstudent.MTW' created.
Now we are ready to do the problems. With the public school subset the
active worksheet simply
type the follwing commands to get a stem and leaf of math achievement
MTB > StemandLeaf 'mathach'.
Character StemandLeaf Display
Stemandleaf of mathach N = 47
Leaf Unit = 1.0
1 0 2
2 0 1
7 0 00001
13 0 222233
18 0 44455
23 0 66666
(5) 0 88999
19 1 01
17 1 33
15 1 4
14 1 6667
10 1 99
8 2 0000111
1 2 3
You could also do a boxplot with this command:
MTB > GStd.
* NOTE * Character graphs are obsolete.
MTB > BoxPlot 'mathach'.
Boxplot

I + I

+++++mathach
0.0 5.0 10.0 15.0 20.0
Now we do the same for the Catholic School.
Current worksheet: Subset of hsbstudent.MTW[W3]
MTB > StemandLeaf 'mathach'.
Character StemandLeaf Display
Stemandleaf of mathach N = 20
Leaf Unit = 1.0
1 0 2
1 0
2 0 6
3 0 9
4 1 0
8 1 3333
10 1 55
10 1 667
7 1
7 2 11
5 2 2233
1 2 4
MTB > GStd.
MTB > BoxPlot 'mathach'.
Boxplot

I + I

++++++mathach
0.0 5.0 10.0 15.0 20.0 25.0
What can we say about these plots? First, school 1224 is slightly
positively skewed, with a median about 9.
School 1308 has a median about 16.
School 1308 appears to be
and less variable than school 1224. Both are somewhat bimodal.
Problem 2)
To create a numerical discriptive summar of our two subsets, the
'describe' command is handy.
For school 1224, we get this:
MTB > Describe 'mathach'.
Descriptive Statistics
Variable N Mean Median TrMean StDev SE
Mean
mathach 47 9.72 8.30 9.67 7.59
1.11
Variable Minimum Maximum Q1 Q3
mathach 2.83 23.58 3.15 16.41
And for school 1308, we get this:
MTB > Describe 'mathach'.
Descriptive Statistics
Variable N Mean Median TrMean StDev SE
Mean
mathach 20 16.26 16.02 16.53 6.11
1.37
Variable Minimum Maximum Q1 Q3
mathach 2.51 24.99 13.36 22.17
You can also get this with the Stat...Basic Stats....Display discriptive
Statistics menu option.What do these
summaries tell us? It looks like school 1224 has lower measures of
central tendancy, and slightly more variablity
than school 1308. It is interesting to note that some of the students in
1224 had negative math achievement scores
(possible typos [or odd coding]??). Because there were over twice as many observations
in school 1224 as in 1308, the se of the mean
is smaller in school 1224.
Problem 3)
For this problem, we want the data for the 2 schools to be in the same
data set. This can be done by cutting
and pasting, or by creating another subset from the big dataset.

note: you can use the "stack" command (or from menu) to
create 1 column containing mathach and another the school indicator)
MTB > Stack C5 C15 c25;
SUBC> Subscripts c26.

We want
to do a 2 sample ttest and get a
.95CI for the difference in sample means. This can be done with the
following commands:
MTB > TwoT 95.0 'mathach' 'ID';
SUBC> Alternative 0.
Two Sample TTest and Confidence Interval
Two sample T for mathach
ID N Mean StDev SE Mean
1224 47 9.72 7.59 1.1
1308 20 16.26 6.11 1.4
95% CI for mu (1224)  mu (1308): ( 10.1, 3.0)
TTest mu (1224) = mu (1308) (vs not =): T = 3.72 P = 0.0006 DF = 44
Problem 4)
Again, separate data sets are useful for this problem. For each school,
we want a simple plot of mathach
against SES, a sample correlation coefficient, and value for beta, the
regression coefficient.
For school 1224:
MTB > GStd.
MTB > Plot 'mathach' 'ses';
SUBC> Symbol 'x'.
Plot
mathach 
 x
 x 2
20+ x x xx x x
 x
 x x x
 x xx
 x
10+ x x x x
 x x
 x x x x x x x
 x x xx x
 x x x x x
0+ x x2
 x x


++++++ses
1.50 1.00 0.50 0.00 0.50 1.00
MTB > Correlation 'ses' 'mathach';
NoPValues.
Correlations (Pearson)
Correlation of ses and mathach = 0.207
Stop.
Worksheet size: 100000 cells
Retrieving project from file: C:\My
Documents\ED161\ed160introexcercise.MPJ
MTB > Regress 'mathach' 1 'ses';
SUBC> Constant;
SUBC> Brief 2.
Regression Analysis
The regression equation is
mathach = 10.8 + 2.51 ses
Predictor Coef StDev T P
Constant 10.805 1.337 8.08 0.000
ses 2.509 1.765 1.42 0.162
S = 7.510 RSq = 4.3% RSq(adj) = 2.2%
Analysis of Variance
Source DF SS MS F P
Regression 1 113.90 113.90 2.02 0.162
Residual Error 45 2538.01 56.40
Total 46 2651.92
Unusual Observations
Obs ses mathach Fit StDev Fit Residual St
Resid
24 0.97 2.06 13.24 2.71 11.18
1.60 X
X denotes an observation whose X value gives it large influence.
For school 1308, we get the following:
Current worksheet: Subset of hsbstudent.MTW[W3]
MTB > GStd.
* NOTE * Character graphs are obsolete.
MTB > Plot 'mathach' 'ses';
SUBC> Symbol 'x'.
Plot
 x
24.0+ x x
 x x
mathach  x x

 x
16.0+ x xx x
 x x x
 x
 x
 x
8.0+
 x

 x

0.0+
+++++ses
0.40 0.00 0.40 0.80 1.20
MTB > GPro.
MTB > Correlation 'ses' 'mathach';
SUBC> NoPValues.
Correlations (Pearson)
Correlation of ses and mathach = 0.010
MTB > Regress 'mathach' 1 'ses';
SUBC> Constant;
SUBC> Brief 2.
Regression Analysis
The regression equation is
mathach = 16.2 + 0.13 ses
Predictor Coef StDev T P
Constant 16.189 2.118 7.64 0.000
ses 0.126 3.003 0.04 0.967
S = 6.281 RSq = 0.0% RSq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 0.07 0.07 0.00 0.967
Residual Error 18 710.23 39.46
Total 19 710.30
Unusual Observations
Obs ses mathach Fit StDev Fit Residual St
Resid
13 0.10 2.51 16.20 1.90 13.69
2.29R
20 0.57 21.12 16.12 3.58 5.00
0.97 X
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
School 1224 (public) has a greater slope value (2.51 vs.13)
for the regression of math on SES than school 1308 (catholic).

Do catholic schools do a better job of recucing class inequality?
drr

&