------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class4

> .log

  log type:  text

 opened on:  29 Sep 2021, 10:10:42

 

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"

 

 

. *class starts here

 

. drop female nurses lawyers

*dropping some variables I created before class so that I can recreate them during class.

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]

---------+--------------------------------------------------------------------

    Male |   9,027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |   9,511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

Combined |  18,538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

H0: diff = 0                                     Degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]

---------+--------------------------------------------------------------------

    Male |   9,027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |   9,511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

Combined |  18,538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

H0: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* we have an unequal and an equal variance t-test. Which one is going to be exactly the same as the regression results?

 

 

. codebook sex

 

-----------------------------------------------------------------------------------

sex                                                                             Sex

-----------------------------------------------------------------------------------

 

                  Type: Numeric (byte)

                 Label: sexlbl

 

                 Range: [1,2]                         Units: 1

         Unique values: 2                         Missing .: 0/133,710

 

            Tabulation: Freq.   Numeric  Label

                       64,791         1  Male

                       68,919         2  Female

 

* Sex is arbitrarily coded, we need to make a dummy variable for gender rather than using the 1-2 coded existing variable which, if we mistakenly told Stata to take the numbers as they are, would indicate that women have twice as much of whatever is being coded as men do. In order to deal with categorical variables like sex we either need to make dummy variables or Stata needs to make them for us (more on that later!)

 

. gen byte female=0

 

. replace female=1 if sex==2

(68,919 real changes made)

 

. tabulate sex female, miss

 

           |        female

       Sex |         0          1 |     Total

-----------+----------------------+----------

      Male |    64,791          0 |    64,791

    Female |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

. regress yrsed female if age>=25 & age<=34

 

      Source |       SS           df       MS      Number of obs   =    18,538

-------------+----------------------------------   F(1, 18536)     =     32.68

       Model |  276.742433         1  276.742433   Prob > F        =    0.0000

    Residual |  156979.922    18,536  8.46892111   R-squared       =    0.0018

-------------+----------------------------------   Adj R-squared   =    0.0017

       Total |  157256.664    18,537  8.48339343   Root MSE        =    2.9101

 

------------------------------------------------------------------------------

       yrsed | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

-------------+----------------------------------------------------------------

      female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

       _cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

 

. display 0.2444469/0.04276226

5.7164168

* Regression gives results that are exactly the same as the equal variance t-test (compare it from above)

 

 

. graph box age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

 

. graph hbox age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

 

* Box plot gives us a box with the 25th, 50th, and 75th percentiles. You can ignore the whiskers and dots that represent outliers. If you want to know exactly what the 25th, 50th, and 75th percentiles equal, the table command can tell you.

 

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, stat(freq)  stat(p25 age) stat(p50 age) stat(p75 age)

 

------------------------------------------------------------------------------------------

                        |  Frequency   25th percentile   50th percentile   75th percentile

------------------------+-----------------------------------------------------------------

Occupation, 1990 basis  |                                                                

  Registered nurses     |        966                36                43                51

  Sociology instructors |          6                50                53                54

  Lawyers               |        441                35                43                52

  Total                 |      1,413                35                43                51

------------------------------------------------------------------------------------------

 

. graph box inctot if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

 

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, stat(freq)  stat(p25 inctot) stat(p50 inctot) stat(p75 inctot)

 

------------------------------------------------------------------------------------------

                        |  Frequency   25th percentile   50th percentile   75th percentile

------------------------+-----------------------------------------------------------------

Occupation, 1990 basis  |                                                                 

  Registered nurses     |        966             27194             39144             50100

  Sociology instructors |          6             39360             45326             49162

  Lawyers               |        441             47133             82515            125725

  Total                 |      1,413             30081             44840             67639

------------------------------------------------------------------------------------------

 

* There is a lot less overlap in income than in age across these 3 occupations.

 

. gen byte nurses=0

 

. replace nurses=1 if occ1990==95

(966 real changes made)

 

. gen byte lawyers=0

 

. replace lawyers=1 if occ1990==178

(441 real changes made)

 

. regress inctot nurses lawyers

 

      Source |       SS           df       MS      Number of obs   =   103,226

-------------+----------------------------------   F(2, 103223)    =   1294.98

       Model |  2.5972e+12         2  1.2986e+12   Prob > F        =    0.0000

    Residual |  1.0351e+14   103,223  1.0028e+09   R-squared       =    0.0245

-------------+----------------------------------   Adj R-squared   =    0.0245

       Total |  1.0611e+14   103,225  1.0279e+09   Root MSE        =     31667

 

------------------------------------------------------------------------------

      inctot | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

-------------+----------------------------------------------------------------

      nurses |   15233.13    1023.69    14.88   0.000     13226.71    17239.55

     lawyers |   73688.55   1511.213    48.76   0.000     70726.59    76650.51

       _cons |   25554.04   99.24125   257.49   0.000     25359.52    25748.55

------------------------------------------------------------------------------

* In this regression above we are comparing nurses and lawyers each to all other persons in the CPS with nonmissing total income for 1999.

 

 

. regress inctot nurses lawyers if occ1990==178| occ1990==95| occ1990==125

 

      Source |       SS           df       MS      Number of obs   =     1,413

-------------+----------------------------------   F(2, 1410)      =    262.68

       Model |  1.0359e+12         2  5.1795e+11   Prob > F        =    0.0000

    Residual |  2.7802e+12     1,410  1.9718e+09   R-squared       =    0.2715

-------------+----------------------------------   Adj R-squared   =    0.2704

       Total |  3.8161e+12     1,412  2.7026e+09   Root MSE        =     44405

 

------------------------------------------------------------------------------

      inctot | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

-------------+----------------------------------------------------------------

      nurses |  -3576.166   18184.46    -0.20   0.844    -39247.68    32095.35

     lawyers |   54879.25   18251.16     3.01   0.003     19076.91    90681.59

       _cons |   44363.33   18128.25     2.45   0.015     8802.086    79924.58

------------------------------------------------------------------------------

 

* Here above we are comparing nurses and lawyers to sociologists (all others excluded by the “if”).

 

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, stat(freq)  stat(p25 inctot) stat(p50 inctot) stat(p75 inctot) stat(mean inctot)

 

-----------------------------------------------------------------------------------------------------

                        |  Frequency   25th percentile   50th percentile   75th percentile       Mean

------------------------+----------------------------------------------------------------------------

Occupation, 1990 basis  |                                                                           

  Registered nurses     |        966             27194             39144             50100   40787.17

  Sociology instructors |          6             39360             45326             49162   44363.33

  Lawyers               |        441             47133             82515            125725   99242.58

  Total                 |      1,413             30081             44840             67639    59046.4

-----------------------------------------------------------------------------------------------------

 

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, stat(mean inctot) stat(freq)  stat(p25 inctot) stat(p50 inctot) stat(p75 inctot)

 

-----------------------------------------------------------------------------------------------------

                        |      Mean   Frequency   25th percentile   50th percentile   75th percentile

------------------------+----------------------------------------------------------------------------

Occupation, 1990 basis  |                                                                           

  Registered nurses     |  40787.17         966             27194             39144             50100

  Sociology instructors |  44363.33           6             39360             45326             49162

  Lawyers               |  99242.58         441             47133             82515            125725

  Total                 |   59046.4       1,413             30081             44840             67639

-----------------------------------------------------------------------------------------------------

 

* For reference, what is the inctot average for the 3 occupations (in the homework you will be using incwage, which is a little different).

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class4.l

> og

  log type:  text

 closed on:   4 Oct 2021, 14:05:13

--------------------------------------------------------------------------------------------