* The first thing you need to do, always always always, is start a log at the beginning of every Stata session. I generally always save the log as a .log rather than as a .smcl file, because the .log file is plain text and can be read by any other program (in this case MS Word). Use the menu commands under File to start a log and to open the dataset.

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1.log

  log type:  text

 closed on:  20 Sep 2021, 11:36:14

------------------------------------------------------------------------------

------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021

> _logs\class1.log

  log type:  text

 opened on:  20 Sep 2021, 11:38:07

 

. *class actually starts here.

 

. clear all

 

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"

 

. describe

 

Contains data from C:\Users\mexmi\Desktop\cps_mar_2000_new.dta

  obs:       133,710                         

 vars:            55                          1 Feb 2009 13:36

------------------------------------------------------------------------------------------

              storage   display    value

variable name   type    format     label      variable label

------------------------------------------------------------------------------------------

year            int     %8.0g      yearlbl    Survey year

serial          long    %12.0g     seriallbl

                                              Household serial number

hhwt            float   %9.0g      hhwtlbl    Household weight

region          byte    %27.0g     regionlbl

                                              Region and division

statefip        byte    %57.0g     statefiplbl

                                              State (FIPS code)

metro           byte    %27.0g     metrolbl   Metropolitan central city status

metarea         int     %50.0g     metarealbl

                                              Metropolitan area

ownershp        byte    %21.0g     ownershplbl

                                              Ownership of dwelling

hhincome        long    %12.0g     hhincomelbl

                                              Total household income

pubhous         byte    %8.0g      pubhouslbl

                                              Living in public housing

foodstmp        byte    %8.0g      foodstmplbl

                                              Food stamp recipiency

pernum          byte    %8.0g      pernumlbl

                                              Person number in sample unit

perwt           float   %9.0g      perwtlbl   Person weight

momloc          byte    %8.0g      momloclbl

                                              Mother's location in the household

poploc          byte    %8.0g      poploclbl

                                              Father's location in the household

sploc           byte    %8.0g      sploclbl   Spouse's location in household

famsize         byte    %25.0g     famsizelbl

                                              Number of own family members in hh

nchild          byte    %18.0g     nchildlbl

                                              Number of own children in household

nchlt5          byte    %23.0g     nchlt5lbl

                                              Number of own children under age 5 in hh

nsibs           byte    %18.0g     nsibslbl   Number of own siblings in household

relate          int     %34.0g     relatelbl

                                              Relationship to household head

age             byte    %19.0g     agelbl     Age

sex             byte    %8.0g      sexlbl     Sex

race            int     %37.0g     racelbl    Race

marst           byte    %23.0g     marstlbl   Marital status

popstat         byte    %14.0g     popstatlbl

                                              Adult civilian, armed forces, or child

bpl             long    %27.0g     bpllbl     Birthplace

yrimmig         int     %11.0g     yrimmiglbl

                                              Year of immigration

citizen         byte    %31.0g     citizenlbl

                                              Citizenship status

mbpl            long    %27.0g     mbpllbl    Mother's birthplace

fbpl            long    %27.0g     fbpllbl    Father's birthplace

hispan          int     %29.0g     hispanlbl

                                              Hispanic origin

educ99          byte    %38.0g     educ99lbl

                                              Educational attainment, 1990

educrec         byte    %23.0g     educreclbl

                                              Educational attainment recode

schlcoll        byte    %45.0g     schlcolllbl

                                              School or college attendance

empstat         byte    %30.0g     empstatlbl

                                              Employment status

occ1990         int     %78.0g     occ1990lbl

                                              Occupation, 1990 basis

wkswork1        byte    %8.0g      wkswork1lbl

                                              Weeks worked last year

hrswork         byte    %8.0g      hrsworklbl

                                              Hours worked last week

uhrswork        byte    %13.0g     uhrsworklbl

                                              Usual hours worked per week (last yr)

hourwage        int     %8.0g      hourwagelbl

                                              Hourly wage

union           byte    %33.0g     unionlbl   Union membership

inctot          long    %12.0g                Total personal income

incwage         long    %12.0g                Wage and salary income

incss           long    %12.0g                Social Security income

incwelfr        long    %12.0g                Welfare (public assistance) income

vetstat         byte    %10.0g     vetstatlbl

                                              Veteran status

vetlast         byte    %26.0g     vetlastlbl

                                              Veteran's most recent period of service

disabwrk        byte    %34.0g     disabwrklbl

                                              Work disability

health          byte    %9.0g      healthlbl

                                              Health status

inclugh         byte    %8.0g      inclughlbl

                                              Included in employer group health plan last

                                                year

himcaid         byte    %8.0g      himcaidlbl

                                              Covered by Medicaid last year

ftotval         double  %10.0g     ftotvallbl

                                              Total family income

perwt_rounded   float   %9.0g                 integer perwt, negative values recoded to 0

yrsed           float   %9.0g                 based on educrec

------------------------------------------------------------------------------------------

Sorted by: race

 

. tabulate race

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |    113,475       84.87       84.87

                          Black/Negro |     13,626       10.19       95.06

         American Indian/Aleut/Eskimo |      1,894        1.42       96.47

            Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

                                Total |    133,710      100.00

 

. tabulate race, miss

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |    113,475       84.87       84.87

                          Black/Negro |     13,626       10.19       95.06

         American Indian/Aleut/Eskimo |      1,894        1.42       96.47

            Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

                                Total |    133,710      100.00

 

* There are no missing values for the race variable because missing values are imputed in the CPS. This is true for lots of other variables like age and sex as well.

 

. tabulate race [fweight=perwt_rounded]

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |224,806,952       82.02       82.02

                          Black/Negro | 35,508,668       12.96       94.98

         American Indian/Aleut/Eskimo |  2,847,473        1.04       96.01

            Asian or Pacific Islander | 10,924,728        3.99      100.00

--------------------------------------+-----------------------------------

                                Total |274,087,821      100.00

 

*In the above command we use the variable perwt_rounded as a frequency weight, or in Stata language an fweight. Note the square brackets around the weight command.

 

 

. summarize perwt_rounded

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

perwt_roun~d |    133,710    2049.868    1083.244         93      14281

 

* Weights average about 2000. The CPS is a 1-in-2000 survey of the US non-institutional population.

 

. tabulate race, nolab

 

       Race |      Freq.     Percent        Cum.

------------+-----------------------------------

        100 |    113,475       84.87       84.87

        200 |     13,626       10.19       95.06

        300 |      1,894        1.42       96.47

        650 |      4,715        3.53      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate gender if race==100

variable gender not found

r(111);

 

* I make my share of syntax errors also! In this case I asked to tabulate the variable gender but there is no such variable.

 

. tabulate sex if race==100

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     55,457       48.87       48.87

     Female |     58,018       51.13      100.00

------------+-----------------------------------

      Total |    113,475      100.00

 

* Note that even though race is a categorical and non numeric variable, Stata stores it as a number where White=100. And if you want to refer to the White people, the syntax is “if race==100” and note the double equal sign after the ‘if.’

 

*

 

. codebook race

 

------------------------------------------------------------------------------------------

race                                                                                  Race

------------------------------------------------------------------------------------------

 

                  type:  numeric (int)

                 label:  racelbl

 

                 range:  [100,650]                    units:  10

         unique values:  4                        missing .:  0/133,710

 

            tabulation:  Freq.   Numeric  Label

                       113,475       100  White

                        13,626       200  Black/Negro

                         1,894       300  American Indian/Aleut/Eskimo

                         4,715       650  Asian or Pacific Islander

 

. summarize age

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

         age |    133,710    35.17964    22.21722          0         90

 

Age is topcoded at 90, which you can see also in the ipums documentation at https://cps.ipums.org/cps-action/variables/AGE#codes_section

 

 

. summarize incwage

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

     incwage |    103,226    19462.59    28843.38          0     364302

 

* Wage income is topcoded and not all persons are in the universe to be asked about income- children are excluded (note the number of observations is less than 133K)

 

. summarize yrsed

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

       yrsed |    103,226    12.77328    3.156011          0         17

 

. tabulate age if yrsed==.

 

                Age |      Freq.     Percent        Cum.

--------------------+-----------------------------------

       Under 1 year |      1,713        5.62        5.62

                  1 |      1,932        6.34       11.96

                  2 |      1,950        6.40       18.35

                  3 |      1,939        6.36       24.71

                  4 |      1,965        6.45       31.16

                  5 |      1,998        6.55       37.71

                  6 |      2,059        6.75       44.47

                  7 |      2,176        7.14       51.61

                  8 |      2,163        7.10       58.70

                  9 |      2,243        7.36       66.06

                 10 |      2,202        7.22       73.28

                 11 |      2,083        6.83       80.12

                 12 |      2,035        6.68       86.79

                 13 |      2,047        6.71       93.51

                 14 |      1,979        6.49      100.00

--------------------+-----------------------------------

              Total |     30,484      100.00

 

* The period, or “.” Is stata code for missing value in a numeric variable.

 

. sort sex

 

. by sex: summarize yrsed if age>=25 & age<=34

 

------------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

       yrsed |      9,027    13.31212    2.967666          0         17

 

------------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------

       yrsed |      9,511    13.55657    2.854472          0         17

 

* In the CPS, women have slightly more educational attainment than men. Is this difference significant or could it be due to chance? In other words, if we went back in time (an expensive proposition, admittedly) and re-ran the CPS a hundred different times with different samples of 133K people, would we get a male advantage just as often? Could the difference we see here be due to chance?

 

 

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |   9,027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |   9,511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |  18,538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

* The t-test answers this question very strongly in the negative: middle of the 3 probabilities, 0.0000, represents the chance that such a difference, -.244 would be observed in a sample this size by chance if the actual average educational attainment of men and women were equal. It turns out the sample size of the CPS is large enough to allow for strong conclusions about even small differences. Sample size is power! I will explain why…. Also note: the t-distribution probability associated with a statistic of -5.71 and 18K degrees of freedom is a tiny number but it is not zero. We will put an exact value on this soon.

 

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1

> .log

  log type:  text

 closed on:  20 Sep 2021, 13:14:38

------------------------------------------------------------------------------------------