*My comments will appear after asterisks
* The first thing you Always want to do is open a log.
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\class1.log
log type: text
opened on: 22 Sep 2014, 11:25:24
. describe
Contains data from C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
size: 14,574,390
---------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household income
pubhous byte %8.0g pubhouslbl
Living in public housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in the household
poploc byte %8.0g poploclbl
Father's location in the household
sploc byte %8.0g sploclbl Spouse's location in household
famsize byte %25.0g famsizelbl
Number of own family members in hh
nchild byte %18.0g nchildlbl
Number of own children in household
nchlt5 byte %23.0g nchlt5lbl
Number of own children under age 5
in hh
nsibs byte %18.0g nsibslbl Number of own siblings in household
relate int %34.0g relatelbl
Relationship to household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed forces, or
child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational attainment, 1990
educrec byte %23.0g educreclbl
Educational attainment recode
schlcoll byte %45.0g schlcolllbl
School or college attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990 basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last year
hrswork byte %8.0g hrsworklbl
Hours worked last week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked per week (last
yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary income
incss long %12.0g Social Security income
incwelfr long %12.0g Welfare (public assistance) income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent period of
service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer group health
plan last year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt, negative values
recoded to 0
yrsed float %9.0g based on educrec
---------------------------------------------------------------------------------
Sorted by: sex
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
*We noticed that there appear to me no missing values of race. This is because the missing values have been imputed.
. tabulate race, missing
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race [fweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
* Note that there are 133K individuals in the CPS, but there are 274 Million individuals in the sample universe, which was the non-institutional population of the US in March, 2000. It is vital for you to keep in mind the difference between the sample population and the entire US population.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
perwt_roun~d | 133710 2049.868 1083.244 93 14281
* The CPS is a roughly 1-in-2000 sample of the US.
. tabulate race, nolab
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
* race is actually coded numerically, and the numbers have labels attached to them for convenience and legibility. Keep in mind that if you want to tell Stata to look specifically at, let’s say the white population, you would asking for this numerically, i.e. if race==100 (and notice the double equal signs after the “if”).
. codebook race
---------------------------------------------------------------------------------
race Race
---------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 100 White
13626 200 Black/Negro
1894 300 American Indian/Aleut/Eskimo
4715 650 Asian or Pacific Islander
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 133710 35.17964 22.21722 0 90
. tabulate age
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 1.28 1.28
1 | 1,932 1.44 2.73
2 | 1,950 1.46 4.18
3 | 1,939 1.45 5.63
4 | 1,965 1.47 7.10
5 | 1,998 1.49 8.60
6 | 2,059 1.54 10.14
7 | 2,176 1.63 11.77
8 | 2,163 1.62 13.38
9 | 2,243 1.68 15.06
10 | 2,202 1.65 16.71
11 | 2,083 1.56 18.27
12 | 2,035 1.52 19.79
13 | 2,047 1.53 21.32
14 | 1,979 1.48 22.80
15 | 2,046 1.53 24.33
16 | 1,965 1.47 25.80
17 | 1,998 1.49 27.29
18 | 1,847 1.38 28.67
19 | 1,826 1.37 30.04
20 | 1,722 1.29 31.33
21 | 1,687 1.26 32.59
22 | 1,638 1.23 33.81
23 | 1,622 1.21 35.03
24 | 1,662 1.24 36.27
25 | 1,666 1.25 37.52
26 | 1,640 1.23 38.74
27 | 1,726 1.29 40.03
28 | 1,801 1.35 41.38
29 | 1,995 1.49 42.87
30 | 1,907 1.43 44.30
31 | 1,991 1.49 45.79
32 | 1,890 1.41 47.20
33 | 1,898 1.42 48.62
34 | 2,024 1.51 50.13
35 | 2,134 1.60 51.73
36 | 2,123 1.59 53.32
37 | 2,099 1.57 54.89
38 | 2,064 1.54 56.43
39 | 2,228 1.67 58.10
40 | 2,190 1.64 59.74
41 | 2,115 1.58 61.32
42 | 2,137 1.60 62.92
43 | 2,091 1.56 64.48
44 | 2,114 1.58 66.06
45 | 2,118 1.58 67.64
46 | 1,939 1.45 69.10
47 | 1,957 1.46 70.56
48 | 1,827 1.37 71.93
49 | 1,767 1.32 73.25
50 | 1,865 1.39 74.64
51 | 1,802 1.35 75.99
52 | 1,825 1.36 77.35
53 | 1,695 1.27 78.62
54 | 1,301 0.97 79.59
55 | 1,323 0.99 80.58
56 | 1,324 0.99 81.57
57 | 1,304 0.98 82.55
58 | 1,128 0.84 83.39
59 | 1,129 0.84 84.24
60 | 1,154 0.86 85.10
61 | 1,051 0.79 85.89
62 | 1,073 0.80 86.69
63 | 938 0.70 87.39
64 | 952 0.71 88.10
65 | 1,014 0.76 88.86
66 | 869 0.65 89.51
67 | 926 0.69 90.20
68 | 908 0.68 90.88
69 | 904 0.68 91.56
70 | 913 0.68 92.24
71 | 885 0.66 92.90
72 | 770 0.58 93.48
73 | 797 0.60 94.08
74 | 814 0.61 94.68
75 | 796 0.60 95.28
76 | 704 0.53 95.81
77 | 646 0.48 96.29
78 | 687 0.51 96.80
79 | 602 0.45 97.25
80 | 514 0.38 97.64
81 | 476 0.36 97.99
82 | 425 0.32 98.31
83 | 427 0.32 98.63
84 | 325 0.24 98.87
85 | 306 0.23 99.10
86 | 248 0.19 99.29
87 | 209 0.16 99.44
88 | 172 0.13 99.57
89 | 155 0.12 99.69
90 (90+, 1988-2002) | 416 0.31 100.00
--------------------+-----------------------------------
Total | 133,710 100.00
* Age is topcoded to 90. You need to get used to looking up variables in the IPUMS CPS documentation to verify topcodes, sample universes and the like.
. summarize yrsed
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 103226 12.77328 3.156011 0 17
* yrsed is a variable I created, from one of the educational categorical variables, educrec. Note that there are missing values of yrsed.
. tabulate age if yrsed==.
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 5.62 5.62
1 | 1,932 6.34 11.96
2 | 1,950 6.40 18.35
3 | 1,939 6.36 24.71
4 | 1,965 6.45 31.16
5 | 1,998 6.55 37.71
6 | 2,059 6.75 44.47
7 | 2,176 7.14 51.61
8 | 2,163 7.10 58.70
9 | 2,243 7.36 66.06
10 | 2,202 7.22 73.28
11 | 2,083 6.83 80.12
12 | 2,035 6.68 86.79
13 | 2,047 6.71 93.51
14 | 1,979 6.49 100.00
--------------------+-----------------------------------
Total | 30,484 100.00
* If you looked up the educational variable educrec in ipums, it should say that the sample universe is persons 15 years old and older. Everyone younger than 15 has missing values.
. sort sex
. by sex: summarize yrsed
---------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 49353 12.79632 3.217925 0 17
---------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 53873 12.75218 3.098084 0 17
* If we want to compare educational attainment, we probably ought to narrow the age range down.
. by sex: summarize yrsed if age>=25 & age<=34
---------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
---------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
* So, are young women in the CPS more educated than young men? This first question is not exactly the kind of statistical question we usually want to answer, because the educational attainments of all subjects in the CPS are known (leaving aside measurement error and imputation). The real question is whether the observed difference in the CPS is consistent with equal educational attainment between young men and young women in the whole US, and the answer to this question is *No*, for reasons we will be exploring more as the class goes forward.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_
> logs\class1.log
log type: text
closed on: 22 Sep 2014, 12:42:45
---------------------------------------------------------------------------------