* The first thing you need to do, always always always, is start a log at the beginning of every Stata session. I generally always save the log as a .log rather than as a .smcl file, because the .log file is plain text and can be read by any other program (in this case MS Word). Use the menu commands under File to start a log and to open the dataset.
. log close
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1.log
log type: text
closed on: 20 Sep 2021, 11:36:14
------------------------------------------------------------------------------
------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021
> _logs\class1.log
log type: text
opened on: 20 Sep 2021, 11:38:07
. *class actually starts here.
. clear all
. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"
. describe
Contains data from C:\Users\mexmi\Desktop\cps_mar_2000_new.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household income
pubhous byte %8.0g pubhouslbl
Living in public housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in the household
poploc byte %8.0g poploclbl
Father's location in the household
sploc byte %8.0g sploclbl Spouse's location in household
famsize byte %25.0g famsizelbl
Number of own family members in hh
nchild byte %18.0g nchildlbl
Number of own children in household
nchlt5 byte %23.0g nchlt5lbl
Number of own children under age 5 in hh
nsibs byte %18.0g nsibslbl Number of own siblings in household
relate int %34.0g relatelbl
Relationship to household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed forces, or child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational attainment, 1990
educrec byte %23.0g educreclbl
Educational attainment recode
schlcoll byte %45.0g schlcolllbl
School or college attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990 basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last year
hrswork byte %8.0g hrsworklbl
Hours worked last week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked per week (last yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary income
incss long %12.0g Social Security income
incwelfr long %12.0g Welfare (public assistance) income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent period of service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer group health plan last
year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt, negative values recoded to 0
yrsed float %9.0g based on educrec
------------------------------------------------------------------------------------------
Sorted by: race
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race, miss
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
* There are no missing values for the race variable because missing values are imputed in the CPS. This is true for lots of other variables like age and sex as well.
. tabulate race [fweight=perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
*In the above command we use the variable perwt_rounded as a frequency weight, or in Stata language an fweight. Note the square brackets around the weight command.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
perwt_roun~d | 133,710 2049.868 1083.244 93 14281
* Weights average about 2000. The CPS is a 1-in-2000 survey of the US non-institutional population.
. tabulate race, nolab
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate gender if race==100
variable gender not found
r(111);
* I make my share of syntax errors also! In this case I asked to tabulate the variable gender but there is no such variable.
. tabulate sex if race==100
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 55,457 48.87 48.87
Female | 58,018 51.13 100.00
------------+-----------------------------------
Total | 113,475 100.00
* Note that even though race is a categorical and non numeric variable, Stata stores it as a number where White=100. And if you want to refer to the White people, the syntax is “if race==100” and note the double equal sign after the ‘if.’
*
. codebook race
------------------------------------------------------------------------------------------
race Race
------------------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133,710
tabulation: Freq. Numeric Label
113,475 100 White
13,626 200 Black/Negro
1,894 300 American Indian/Aleut/Eskimo
4,715 650 Asian or Pacific Islander
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
age | 133,710 35.17964 22.21722 0 90
Age is topcoded at 90, which you can see also in the ipums documentation at https://cps.ipums.org/cps-action/variables/AGE#codes_section
. summarize incwage
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
incwage | 103,226 19462.59 28843.38 0 364302
* Wage income is topcoded and not all persons are in the universe to be asked about income- children are excluded (note the number of observations is less than 133K)
. summarize yrsed
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
yrsed | 103,226 12.77328 3.156011 0 17
. tabulate age if yrsed==.
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 5.62 5.62
1 | 1,932 6.34 11.96
2 | 1,950 6.40 18.35
3 | 1,939 6.36 24.71
4 | 1,965 6.45 31.16
5 | 1,998 6.55 37.71
6 | 2,059 6.75 44.47
7 | 2,176 7.14 51.61
8 | 2,163 7.10 58.70
9 | 2,243 7.36 66.06
10 | 2,202 7.22 73.28
11 | 2,083 6.83 80.12
12 | 2,035 6.68 86.79
13 | 2,047 6.71 93.51
14 | 1,979 6.49 100.00
--------------------+-----------------------------------
Total | 30,484 100.00
* The period, or “.” Is stata code for missing value in a numeric variable.
. sort sex
. by sex: summarize yrsed if age>=25 & age<=34
------------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
yrsed | 9,027 13.31212 2.967666 0 17
------------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
yrsed | 9,511 13.55657 2.854472 0 17
* In the CPS, women have slightly more educational attainment than men. Is this difference significant or could it be due to chance? In other words, if we went back in time (an expensive proposition, admittedly) and re-ran the CPS a hundred different times with different samples of 133K people, would we get a male advantage just as often? Could the difference we see here be due to chance?
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9,027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9,511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18,538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* The t-test answers this question very strongly in the negative: middle of the 3 probabilities, 0.0000, represents the chance that such a difference, -.244 would be observed in a sample this size by chance if the actual average educational attainment of men and women were equal. It turns out the sample size of the CPS is large enough to allow for strong conclusions about even small differences. Sample size is power! I will explain why…. Also note: the t-distribution probability associated with a statistic of -5.71 and 18K degrees of freedom is a tiny number but it is not zero. We will put an exact value on this soon.
. log close
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class1
> .log
log type: text
closed on: 20 Sep 2021, 13:14:38
------------------------------------------------------------------------------------------