* The first thing you need to do, always always always, is start a log at the beginning of every Stata session. I generally always save the log as a .log rather than as a .smcl file, because the .log file is plain text and can be read by any other program (in this case MS Word). Use the menu commands under File to start a log and to open the dataset.

. log close

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1.log

log type: text

closed on: 20 Sep 2021, 11:36:14

------------------------------------------------------------------------------

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021

> _logs\class1.log

log type: text

opened on: 20 Sep 2021, 11:38:07

. *class actually starts here.

. clear all

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"

. describe

Contains data from C:\Users\mexmi\Desktop\cps_mar_2000_new.dta

obs: 133,710

vars: 55 1 Feb 2009 13:36

------------------------------------------------------------------------------------------

storage display value

variable name type format label variable label

------------------------------------------------------------------------------------------

year int %8.0g yearlbl Survey year

serial long %12.0g seriallbl

Household serial number

hhwt float %9.0g hhwtlbl Household weight

region byte %27.0g regionlbl

Region and division

statefip byte %57.0g statefiplbl

State (FIPS code)

metro byte %27.0g metrolbl Metropolitan central city status

metarea int %50.0g metarealbl

Metropolitan area

ownershp byte %21.0g ownershplbl

Ownership of dwelling

hhincome long %12.0g hhincomelbl

Total household income

pubhous byte %8.0g pubhouslbl

Living in public housing

foodstmp byte %8.0g foodstmplbl

Food stamp recipiency

pernum byte %8.0g pernumlbl

Person number in sample unit

perwt float %9.0g perwtlbl Person weight

momloc byte %8.0g momloclbl

Mother's location in the household

poploc byte %8.0g poploclbl

Father's location in the household

sploc byte %8.0g sploclbl Spouse's location in household

famsize byte %25.0g famsizelbl

Number of own family members in hh

nchild byte %18.0g nchildlbl

Number of own children in household

nchlt5 byte %23.0g nchlt5lbl

Number of own children under age 5 in hh

nsibs byte %18.0g nsibslbl Number of own siblings in household

relate int %34.0g relatelbl

Relationship to household head

age byte %19.0g agelbl Age

sex byte %8.0g sexlbl Sex

race int %37.0g racelbl Race

marst byte %23.0g marstlbl Marital status

popstat byte %14.0g popstatlbl

Adult civilian, armed forces, or child

bpl long %27.0g bpllbl Birthplace

yrimmig int %11.0g yrimmiglbl

Year of immigration

citizen byte %31.0g citizenlbl

Citizenship status

mbpl long %27.0g mbpllbl Mother's birthplace

fbpl long %27.0g fbpllbl Father's birthplace

hispan int %29.0g hispanlbl

Hispanic origin

educ99 byte %38.0g educ99lbl

Educational attainment, 1990

educrec byte %23.0g educreclbl

Educational attainment recode

schlcoll byte %45.0g schlcolllbl

School or college attendance

empstat byte %30.0g empstatlbl

Employment status

occ1990 int %78.0g occ1990lbl

Occupation, 1990 basis

wkswork1 byte %8.0g wkswork1lbl

Weeks worked last year

hrswork byte %8.0g hrsworklbl

Hours worked last week

uhrswork byte %13.0g uhrsworklbl

Usual hours worked per week (last yr)

hourwage int %8.0g hourwagelbl

Hourly wage

union byte %33.0g unionlbl Union membership

inctot long %12.0g Total personal income

incwage long %12.0g Wage and salary income

incss long %12.0g Social Security income

incwelfr long %12.0g Welfare (public assistance) income

vetstat byte %10.0g vetstatlbl

Veteran status

vetlast byte %26.0g vetlastlbl

Veteran's most recent period of service

disabwrk byte %34.0g disabwrklbl

Work disability

health byte %9.0g healthlbl

Health status

inclugh byte %8.0g inclughlbl

Included in employer group health plan last

year

himcaid byte %8.0g himcaidlbl

Covered by Medicaid last year

ftotval double %10.0g ftotvallbl

Total family income

perwt_rounded float %9.0g integer perwt, negative values recoded to 0

yrsed float %9.0g based on educrec

------------------------------------------------------------------------------------------

Sorted by: race

. tabulate race

Race | Freq. Percent Cum.

--------------------------------------+-----------------------------------

White | 113,475 84.87 84.87

Black/Negro | 13,626 10.19 95.06

American Indian/Aleut/Eskimo | 1,894 1.42 96.47

Asian or Pacific Islander | 4,715 3.53 100.00

--------------------------------------+-----------------------------------

Total | 133,710 100.00

. tabulate race, miss

Race | Freq. Percent Cum.

--------------------------------------+-----------------------------------

White | 113,475 84.87 84.87

Black/Negro | 13,626 10.19 95.06

American Indian/Aleut/Eskimo | 1,894 1.42 96.47

Asian or Pacific Islander | 4,715 3.53 100.00

--------------------------------------+-----------------------------------

Total | 133,710 100.00

* There are no missing values for the race variable because missing values are imputed in the CPS. This is true for lots of other variables like age and sex as well.

. tabulate race [fweight=perwt_rounded]

Race | Freq. Percent Cum.

--------------------------------------+-----------------------------------

White |224,806,952 82.02 82.02

Black/Negro | 35,508,668 12.96 94.98

American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01

Asian or Pacific Islander | 10,924,728 3.99 100.00

--------------------------------------+-----------------------------------

Total |274,087,821 100.00

*In the above command we use the variable perwt_rounded as a frequency weight, or in Stata language an fweight. Note the square brackets around the weight command.

. summarize perwt_rounded

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

perwt_roun~d | 133,710 2049.868 1083.244 93 14281

* Weights average about 2000. The CPS is a 1-in-2000 survey of the US non-institutional population.

. tabulate race, nolab

Race | Freq. Percent Cum.

------------+-----------------------------------

100 | 113,475 84.87 84.87

200 | 13,626 10.19 95.06

300 | 1,894 1.42 96.47

650 | 4,715 3.53 100.00

------------+-----------------------------------

Total | 133,710 100.00

. tabulate gender if race==100

variable gender not found

r(111);

* I make my share of syntax errors also! In this case I asked to tabulate the variable gender but there is no such variable.

. tabulate sex if race==100

Sex | Freq. Percent Cum.

------------+-----------------------------------

Male | 55,457 48.87 48.87

Female | 58,018 51.13 100.00

------------+-----------------------------------

Total | 113,475 100.00

* Note that even though race is a categorical and non numeric variable, Stata stores it as a number where White=100. And if you want to refer to the White people, the syntax is “if race==100” and note the double equal sign after the ‘if.’

. codebook race

------------------------------------------------------------------------------------------

race Race

------------------------------------------------------------------------------------------

type: numeric (int)

label: racelbl

range: [100,650] units: 10

unique values: 4 missing .: 0/133,710

tabulation: Freq. Numeric Label

113,475 100 White

13,626 200 Black/Negro

1,894 300 American Indian/Aleut/Eskimo

4,715 650 Asian or Pacific Islander

. summarize age

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

age | 133,710 35.17964 22.21722 0 90

Age is topcoded at 90, which you can see also in the ipums documentation at https://cps.ipums.org/cps-action/variables/AGE#codes_section

. summarize incwage

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

incwage | 103,226 19462.59 28843.38 0 364302

* Wage income is topcoded and not all persons are in the universe to be asked about income- children are excluded (note the number of observations is less than 133K)

. summarize yrsed

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

yrsed | 103,226 12.77328 3.156011 0 17

. tabulate age if yrsed==.

Age | Freq. Percent Cum.

--------------------+-----------------------------------

Under 1 year | 1,713 5.62 5.62

1 | 1,932 6.34 11.96

2 | 1,950 6.40 18.35

3 | 1,939 6.36 24.71

4 | 1,965 6.45 31.16

5 | 1,998 6.55 37.71

6 | 2,059 6.75 44.47

7 | 2,176 7.14 51.61

8 | 2,163 7.10 58.70

9 | 2,243 7.36 66.06

10 | 2,202 7.22 73.28

11 | 2,083 6.83 80.12

12 | 2,035 6.68 86.79

13 | 2,047 6.71 93.51

14 | 1,979 6.49 100.00

--------------------+-----------------------------------

Total | 30,484 100.00

* The period, or “.” Is stata code for missing value in a numeric variable.

. sort sex

. by sex: summarize yrsed if age>=25 & age<=34

------------------------------------------------------------------------------------------

-> sex = Male

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

yrsed | 9,027 13.31212 2.967666 0 17

------------------------------------------------------------------------------------------

-> sex = Female

Variable | Obs Mean Std. Dev. Min Max

-------------+---------------------------------------------------------

yrsed | 9,511 13.55657 2.854472 0 17

* In the CPS, women have slightly more educational attainment than men. Is this difference significant or could it be due to chance? In other words, if we went back in time (an expensive proposition, admittedly) and re-ran the CPS a hundred different times with different samples of 133K people, would we get a male advantage just as often? Could the difference we see here be due to chance?

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9,027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9,511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18,538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* The t-test answers this question very strongly in the negative: middle of the 3 probabilities, 0.0000, represents the chance that such a difference, -.244 would be observed in a sample this size by chance if the actual average educational attainment of men and women were equal. It turns out the sample size of the CPS is large enough to allow for strong conclusions about even small differences. Sample size is power! I will explain why…. Also note: the t-distribution probability associated with a statistic of -5.71 and 18K degrees of freedom is a tiny number but it is not zero. We will put an exact value on this soon.

. log close

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1

> .log

log type: text

closed on: 20 Sep 2021, 13:14:38

------------------------------------------------------------------------------------------