--------------------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fal

> l_2010_s381_logs\class5.log

log type:  text

opened on:   5 Oct 2010, 14:05:17

. table sex if age>24 & age<35, contents (freq mean yrsed sd yrsed p25 yrsed p75 yrsed)

---------------------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   p25(yrsed)   p75(yrsed)

----------+----------------------------------------------------------------

Male |       9,027     13.31212     2.967666           12           17

Female |       9,511     13.55657     2.854472           12           17

---------------------------------------------------------------------------

* This is the summary data of education by gender which we have seen before, see also my Excel file.

. gen random=runiform()

*generate a uniform random variable, which I called random.

. summarize random

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

random |    133710    .5006203    .2884588   .0000219   .9999971

*Max of 1, min of zero, average of 0.5

. histogram random

(bin=51, start=.00002188, width=.01960736)

*histogram shows that our new variable is nice and flat, but not perfectly so.

. table sex if age>24 & age<35 & random<.25, contents (freq mean yrsed sd yrsed p25 yrsed p75 yrsed)

---------------------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   p25(yrsed)   p75(yrsed)

----------+----------------------------------------------------------------

Male |       2,249     13.36261     2.907726           12           17

Female |       2,366     13.61855     2.829585           12           17

---------------------------------------------------------------------------

* The random sub-sample of one fourth of our data has similar mean and sd, but not exactly the same. The 25th and 75th percentiles are exactly the same… Sample size is roughly one fourth of what it is above.

. ttest yrsed if age>24 & age<35, by(sex)  unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest yrsed if age>24 & age<35 & random<.25, by(sex)  unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    2249    13.36261    .0613139    2.907726    13.24237    13.48284

Female |    2366    13.61855    .0581722    2.829585    13.50448    13.73263

---------+--------------------------------------------------------------------

combined |    4615    13.49382     .042254    2.870473    13.41099    13.57666

---------+--------------------------------------------------------------------

diff |           -.2559489    .0845186               -.4216461   -.0902518

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -3.0283

Ho: diff = 0                     Satterthwaite's degrees of freedom =  4585.15

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0012         Pr(|T| > |t|) = 0.0025          Pr(T > t) = 0.9988

* Given that t-statistic is proportional to square root of N, we would expect the second t-statistic to be half as large as the full one; it is in the neighborhood of half as large (not exactly half as large because the random sub-samples introduce random variation..)

. ttest yrsed if age>24 & age<35 & random>=.25 & random<.5, by(sex)  unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    2312    13.20631    .0637414    3.064897    13.08132    13.33131

Female |    2366    13.49239    .0590895    2.874203    13.37652    13.60826

---------+--------------------------------------------------------------------

combined |    4678      13.351     .043469    2.973105    13.26578    13.43622

---------+--------------------------------------------------------------------

diff |           -.2860773    .0869168               -.4564757    -.115679

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -3.2914

Ho: diff = 0                     Satterthwaite's degrees of freedom =  4640.72

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0005         Pr(|T| > |t|) = 0.0010          Pr(T > t) = 0.9995

. graph hbox yrsed if age>24 & age<35, over(sex)

.

. graph hbox yrsed if age>24 & age<35 & random<.25, over(sex)

* These two box plots were identical, and the men and women's boxes were identical. Which either means that box plot is not a good way of comparing categorical variables with few categories, or else the difference between men's and women's educational attainment is not large enough to matter…

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

. log close

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fal

> l_2010_s381_logs\class5.log

log type:  text

closed on:   5 Oct 2010, 16:01:36

--------------------------------------------------------------------------------------------------------