-----------------------------------------------------------------------------------------------

name:  <unnamed>

> g

log type:  text

opened on:   5 Oct 2016, 10:03:59

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

* How to make a box plot: See below commands and look up the Stata documentation if necessary.

. graph box age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

. graph hbox age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

* Several ways to get percentiles of distributions:

. summarize age if occ1990==178, detail

Age

-------------------------------------------------------------

Percentiles      Smallest

1%           24             24

5%           27             24

10%           29             24       Obs                 441

25%           35             24       Sum of Wgt.         441

50%           43                      Mean           44.38549

Largest       Std. Dev.      12.48585

75%           52             84

90%           61             86       Variance       155.8965

95%           66             87       Skewness       .7190904

99%           83             90       Kurtosis       3.549932

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, contents(freq p25 age p50 age p75 age)

----------------------------------------------------------------------

Occupation, 1990      |

basis                 |      Freq.    p25(age)    med(age)    p75(age)

----------------------+-----------------------------------------------

Registered nurses |        966          36          43          51

Sociology instructors |          6          50          53          54

Lawyers |        441          35          43          52

----------------------------------------------------------------------

*Among several ways to remind your self which occupational code is which:

. tabulate occ1990 if occ1990==178| occ1990==95 | occ1990==125

Occupation, 1990 basis |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

Registered nurses |        966       68.37       68.37

Sociology instructors |          6        0.42       68.79

Lawyers |        441       31.21      100.00

----------------------------------------+-----------------------------------

Total |      1,413      100.00

. tabulate occ1990 if occ1990==178| occ1990==95 | occ1990==125, nolab

Occupation, |

1990 basis |      Freq.     Percent        Cum.

------------+-----------------------------------

95 |        966       68.37       68.37

125 |          6        0.42       68.79

178 |        441       31.21      100.00

------------+-----------------------------------

Total |      1,413      100.00

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* What if we rescale years of education to months of education? Would the t-test be the same? It should, because t-statistic is unit free. Notice what changes and what doesn’t change:

. gen months_ed=yrsed*12

(30484 missing values generated)

. ttest months_ed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. exit, clear