-----------------------------------------------------------------------------------------------------

name:  <unnamed>

log type:  text

opened on:   8 Oct 2018, 09:45:16

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta", clear

. *class starts here

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

·        Note the subtle difference in std error of the difference between the equal variance and unequal variance t-test. The difference is subtle because the variance men’s education and the variance of women’s education is almost the same, so it hardly matters if we assume equal variance or not. The difference between the means is of course identical in both cases.

. display -.2444469/.0428057

-5.7106156

. display ttail(18536,-5.7164)

.99999999

. display (1-ttail(18356,-5.7164))

5.525e-09

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

. codebook sex

-----------------------------------------------------------------------------------------------------

sex                                                                                               Sex

-----------------------------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

. gen byte female=0

. replace female=1 if sex==2

. tabulate sex female

|        female

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |    64,791          0 |    64,791

Female |         0     68,919 |    68,919

-----------+----------------------+----------

Total |    64,791     68,919 |   133,710

·        In order to use sex in a regression, we need to construct a 0-1 dummy variable for it, to take the place of the 1-2 values in the variable.

. regress yrsed female if age>=25 & age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

·        Note that the constant term has a t-test associated with it, but not a very interesting one. The null hypothesis for the constant term is that men’s average educational attainment is zero, which cannot be true.

·        If we want the SD, SE, t-stat or other things to more precision, Stata stores them and we can call them up.

. matrix var_covar_regress=e(V)

. matrix list var_covar_regress

symmetric var_covar_regress[2,2]

female       _cons

female   .00182861

_cons  -.00093818   .00093818

. display var_covar_regress[1,1]^0.5

.04276226

. display 0.2444469/0.04276226

5.7164168

·        Fun with boxplot! Look up the definitions of box plot in the stata documentation, to see what the parts of the boxes mean. You can ignore the outliers.

. graph box age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

. graph hbox age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

·        In order to get the percentiles of each distribution, you can use summarize, detail or table.

. summarize age if occ1990==178, detail

Age

-------------------------------------------------------------

Percentiles      Smallest

1%           24             24

5%           27             24

10%           29             24       Obs                 441

25%           35             24       Sum of Wgt.         441

50%           43                      Mean           44.38549

Largest       Std. Dev.      12.48585

75%           52             84

90%           61             86       Variance       155.8965

95%           66             87       Skewness       .7190904

99%           83             90       Kurtosis       3.549932

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, contents(freq p25 age p50 age p75 age)

----------------------------------------------------------------------

Occupation, 1990      |

basis                 |      Freq.    p25(age)    med(age)    p75(age)

----------------------+-----------------------------------------------

Registered nurses |        966          36          43          51

Sociology instructors |          6          50          53          54

Lawyers |        441          35          43          52

----------------------------------------------------------------------

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. gen months_ed=yrsed*12

(30484 missing values generated)

·        Rescaling the variable changes the means, SDs and SEs, but not the t-stat.

. ttest months_ed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. log close

name:  <unnamed>