-------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class1.log
log type: text
opened on: 10 Jan 2013, 14:20:48
* The first thing you should always do when starting a Stata session is open and save a log, in .log (simple text) format. The log will continually save your typed commands and Stata output, so once you have opene the log you don’t need to worry about saving it. The dataset is a different file that would need to be saved if you make changes, such as adding variables.
* describe tells you what is in your dataset.
. describe
Contains data from C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
size: 15,109,230 (71.2% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household income
pubhous byte %8.0g pubhouslbl
Living in public housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in the
household
poploc byte %8.0g poploclbl
Father's location in the
household
sploc byte %8.0g sploclbl Spouse's location in household
famsize byte %25.0g famsizelbl
Number of own family members in
hh
nchild byte %18.0g nchildlbl
Number of own children in
household
nchlt5 byte %23.0g nchlt5lbl
Number of own children under age
5 in hh
nsibs byte %18.0g nsibslbl Number of own siblings in
household
relate int %34.0g relatelbl
Relationship to household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed forces, or
child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational attainment, 1990
educrec byte %23.0g educreclbl
Educational attainment recode
schlcoll byte %45.0g schlcolllbl
School or college attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990 basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last year
hrswork byte %8.0g hrsworklbl
Hours worked last week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked per week (last
yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary income
incss long %12.0g Social Security income
incwelfr long %12.0g Welfare (public assistance)
income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent period of
service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer group health
plan last year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt, negative values
recoded to 0
yrsed float %9.0g based on educrec
-------------------------------------------------------------------------------
Sorted by: sex
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate sex [fweight= perwt_rounded]
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male |133,932,994 48.86 48.86
Female |140,154,827 51.14 100.00
------------+-----------------------------------
Total |274,087,821 100.00
* One key to keep in mind is that there are 133,710 subjects in our CPS dataset, but if we apply the weights we can see that the sample frame (the noninstitutional population of the US) had 274 million people.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
perwt_roun~d | 133710 2049.868 1083.244 93 14281
* The average weight in the dataset is 2050, meaning each person had about a 1-in-2050 chance of being sampled in the CPS.
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
* Also note that there are no missing values for race. We know this because there are exactly 133,710 subjects in the dataset, and every one of them has a race. How can this be? The answer is that the Census Bureau imputes values for some variables when respondents leave those variables blank. It is possible to figure out whose values have been imputed by looking at the data allocation flags, but we won’t be worrying about those in this class.
. tabulate race [aweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 109,669 82.02 82.02
Black/Negro | 17,322.419 12.96 94.98
American Indian/Aleut/Eskimo | 1,389.1008 1.04 96.01
Asian or Pacific Islander | 5,329.4793 3.99 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
* In the weighted data, the black percentage of the population is higher, because blacks are more urban and have lower response rates to the CPS, so blacks have higher average weight. There were 35.5 million blacks in the (noninstitutional) US in March, 2000.
. tabulate race [fweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
. tabulate race, nolabel
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
* Another thing to keep in mind is that all the variables, including categorical variables like race, are stored as numbers. The “white” and “Black/Negro” labels are just added on to the dataset after the fact. The fact that race is stored as a number means you COULD summarize race but you SHOULD NOT because the numbers don’t mean anything. Please be careful to distinguish between variables that have units which you can take the average of (like years of education, or dollars of income) versus variables where the numbers are just placeholders for categories.
* Summarize is for continuous variables like income, not categorical variables like race. So please do NOT do this:
. summarize race
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
race | 133710 132.4183 105.8387 100 650
* For income, however, summarize makes sense.
. summarize incwelfr
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 103226 40.62242 478.8231 0 25000
* Here we are averaging the 1999 welfare income over all persons in the dataset, and we get an answer of $40.62. Why is it so low?
* Well, if we look at the detail, we see that more than 95% percent of all people report zero welfare income for 1999.
. summarize incwelfr, detail
Welfare (public assistance) income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 103226
25% 0 0 Sum of Wgt. 103226
50% 0 Mean 40.62242
Largest Std. Dev. 478.8231
75% 0 15600
90% 0 19999 Variance 229271.5
95% 0 23292 Skewness 16.98146
99% 804 25000 Kurtosis 403.6187
* The average welfare income for people who received welfare was $3253, which sounds more reasonable.
. summarize incwelfr if incwelfr>0
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1289 3253.134 2813.505 1 25000
. tabulate age, nolab
Age | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,713 1.28 1.28
1 | 1,932 1.44 2.73
2 | 1,950 1.46 4.18
3 | 1,939 1.45 5.63
4 | 1,965 1.47 7.10
5 | 1,998 1.49 8.60
6 | 2,059 1.54 10.14
7 | 2,176 1.63 11.77
8 | 2,163 1.62 13.38
9 | 2,243 1.68 15.06
10 | 2,202 1.65 16.71
11 | 2,083 1.56 18.27
12 | 2,035 1.52 19.79
13 | 2,047 1.53 21.32
14 | 1,979 1.48 22.80
15 | 2,046 1.53 24.33
16 | 1,965 1.47 25.80
17 | 1,998 1.49 27.29
18 | 1,847 1.38 28.67
19 | 1,826 1.37 30.04
20 | 1,722 1.29 31.33
21 | 1,687 1.26 32.59
22 | 1,638 1.23 33.81
23 | 1,622 1.21 35.03
24 | 1,662 1.24 36.27
25 | 1,666 1.25 37.52
26 | 1,640 1.23 38.74
27 | 1,726 1.29 40.03
28 | 1,801 1.35 41.38
29 | 1,995 1.49 42.87
30 | 1,907 1.43 44.30
31 | 1,991 1.49 45.79
32 | 1,890 1.41 47.20
33 | 1,898 1.42 48.62
34 | 2,024 1.51 50.13
35 | 2,134 1.60 51.73
36 | 2,123 1.59 53.32
37 | 2,099 1.57 54.89
38 | 2,064 1.54 56.43
39 | 2,228 1.67 58.10
40 | 2,190 1.64 59.74
41 | 2,115 1.58 61.32
42 | 2,137 1.60 62.92
43 | 2,091 1.56 64.48
44 | 2,114 1.58 66.06
45 | 2,118 1.58 67.64
46 | 1,939 1.45 69.10
47 | 1,957 1.46 70.56
48 | 1,827 1.37 71.93
49 | 1,767 1.32 73.25
50 | 1,865 1.39 74.64
51 | 1,802 1.35 75.99
52 | 1,825 1.36 77.35
53 | 1,695 1.27 78.62
54 | 1,301 0.97 79.59
55 | 1,323 0.99 80.58
56 | 1,324 0.99 81.57
57 | 1,304 0.98 82.55
58 | 1,128 0.84 83.39
59 | 1,129 0.84 84.24
60 | 1,154 0.86 85.10
61 | 1,051 0.79 85.89
62 | 1,073 0.80 86.69
63 | 938 0.70 87.39
64 | 952 0.71 88.10
65 | 1,014 0.76 88.86
66 | 869 0.65 89.51
67 | 926 0.69 90.20
68 | 908 0.68 90.88
69 | 904 0.68 91.56
70 | 913 0.68 92.24
71 | 885 0.66 92.90
72 | 770 0.58 93.48
73 | 797 0.60 94.08
74 | 814 0.61 94.68
75 | 796 0.60 95.28
76 | 704 0.53 95.81
77 | 646 0.48 96.29
78 | 687 0.51 96.80
79 | 602 0.45 97.25
80 | 514 0.38 97.64
81 | 476 0.36 97.99
82 | 425 0.32 98.31
83 | 427 0.32 98.63
84 | 325 0.24 98.87
85 | 306 0.23 99.10
86 | 248 0.19 99.29
87 | 209 0.16 99.44
88 | 172 0.13 99.57
89 | 155 0.12 99.69
90 | 416 0.31 100.00
------------+-----------------------------------
Total | 133,710 100.00
* When we tabulate age, we find that the highest value is 90. Where did the really old people go? The answer, if you look at the ipums documentation for variable age, is that 90 is the top code. Everyone who was older than 90 got recoded to 90, to help preserve respondent confidentiality.
. tabulate age
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 1.28 1.28
1 | 1,932 1.44 2.73
2 | 1,950 1.46 4.18
3 | 1,939 1.45 5.63
4 | 1,965 1.47 7.10
5 | 1,998 1.49 8.60
6 | 2,059 1.54 10.14
7 | 2,176 1.63 11.77
8 | 2,163 1.62 13.38
9 | 2,243 1.68 15.06
10 | 2,202 1.65 16.71
11 | 2,083 1.56 18.27
12 | 2,035 1.52 19.79
13 | 2,047 1.53 21.32
14 | 1,979 1.48 22.80
15 | 2,046 1.53 24.33
16 | 1,965 1.47 25.80
17 | 1,998 1.49 27.29
18 | 1,847 1.38 28.67
19 | 1,826 1.37 30.04
20 | 1,722 1.29 31.33
21 | 1,687 1.26 32.59
22 | 1,638 1.23 33.81
23 | 1,622 1.21 35.03
24 | 1,662 1.24 36.27
25 | 1,666 1.25 37.52
26 | 1,640 1.23 38.74
27 | 1,726 1.29 40.03
28 | 1,801 1.35 41.38
29 | 1,995 1.49 42.87
30 | 1,907 1.43 44.30
31 | 1,991 1.49 45.79
32 | 1,890 1.41 47.20
33 | 1,898 1.42 48.62
34 | 2,024 1.51 50.13
35 | 2,134 1.60 51.73
36 | 2,123 1.59 53.32
37 | 2,099 1.57 54.89
38 | 2,064 1.54 56.43
39 | 2,228 1.67 58.10
40 | 2,190 1.64 59.74
41 | 2,115 1.58 61.32
42 | 2,137 1.60 62.92
43 | 2,091 1.56 64.48
44 | 2,114 1.58 66.06
45 | 2,118 1.58 67.64
46 | 1,939 1.45 69.10
47 | 1,957 1.46 70.56
48 | 1,827 1.37 71.93
49 | 1,767 1.32 73.25
50 | 1,865 1.39 74.64
51 | 1,802 1.35 75.99
52 | 1,825 1.36 77.35
53 | 1,695 1.27 78.62
54 | 1,301 0.97 79.59
55 | 1,323 0.99 80.58
56 | 1,324 0.99 81.57
57 | 1,304 0.98 82.55
58 | 1,128 0.84 83.39
59 | 1,129 0.84 84.24
60 | 1,154 0.86 85.10
61 | 1,051 0.79 85.89
62 | 1,073 0.80 86.69
63 | 938 0.70 87.39
64 | 952 0.71 88.10
65 | 1,014 0.76 88.86
66 | 869 0.65 89.51
67 | 926 0.69 90.20
68 | 908 0.68 90.88
69 | 904 0.68 91.56
70 | 913 0.68 92.24
71 | 885 0.66 92.90
72 | 770 0.58 93.48
73 | 797 0.60 94.08
74 | 814 0.61 94.68
75 | 796 0.60 95.28
76 | 704 0.53 95.81
77 | 646 0.48 96.29
78 | 687 0.51 96.80
79 | 602 0.45 97.25
80 | 514 0.38 97.64
81 | 476 0.36 97.99
82 | 425 0.32 98.31
83 | 427 0.32 98.63
84 | 325 0.24 98.87
85 | 306 0.23 99.10
86 | 248 0.19 99.29
87 | 209 0.16 99.44
88 | 172 0.13 99.57
89 | 155 0.12 99.69
90 (90+, 1988-2002) | 416 0.31 100.00
--------------------+-----------------------------------
Total | 133,710 100.00
* Just as summarize is for continuous variables (variables it makes sense to take the average of), tabulate is for categorical variables, that is variables with just a few categories. If you tabulate a continuous variable, you will get one row for every value, and that is almost certainly not what you want, so hit the little red stop sign at the top of your Stata window, and interrupt the process.
. tabulate incwelfr
Welfare |
(public |
assistance) |
income | Freq. Percent Cum.
------------+-----------------------------------
0 | 101,937 98.75 98.75
1 | 4 0.00 98.76
4 | 1 0.00 98.76
12 | 5 0.00 98.76
26 | 5 0.00 98.77
30 | 1 0.00 98.77
36 | 1 0.00 98.77
40 | 1 0.00 98.77
45 | 1 0.00 98.77
48 | 1 0.00 98.77
52 | 1 0.00 98.77
53 | 1 0.00 98.77
71 | 2 0.00 98.77
75 | 2 0.00 98.78
88 | 1 0.00 98.78
98 | 1 0.00 98.78
100 | 7 0.01 98.79
105 | 1 0.00 98.79
106 | 1 0.00 98.79
113 | 1 0.00 98.79
117 | 1 0.00 98.79
120 | 3 0.00 98.79
129 | 1 0.00 98.79
--Break--
r(1);
* How does welfare look by race? First sort race, and then summarize by race.
. sort race
. by race: summarize incwelfr
-------------------------------------------------------------------------------
-> race = White
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 88334 31.34195 425.1302 0 25000
-------------------------------------------------------------------------------
-> race = Black/Negro
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 9916 108.1088 754.8848 0 23292
-------------------------------------------------------------------------------
-> race = American Indian/Aleut/Eskimo
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1320 120.6235 748.9787 0 12816
-------------------------------------------------------------------------------
-> race = Asian or Pacific Islander
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 3656 52.9267 584.7742 0 13200
* The above shows welfare income by race, including the zeros. Below excludes the zeros.
. by race: summarize incwelfr if incwelfr>0
-------------------------------------------------------------------------------
-> race = White
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 839 3299.833 2872.782 1 25000
-------------------------------------------------------------------------------
-> race = Black/Negro
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 348 3080.48 2664.567 26 23292
-------------------------------------------------------------------------------
-> race = American Indian/Aleut/Eskimo
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 57 2793.386 2369.272 1 12816
-------------------------------------------------------------------------------
-> race = Asian or Pacific Islander
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 45 4300 3119.071 1 13200
. by race: summarize incwelfr if incwelfr>0 [fweight= perwt_rounded]
-------------------------------------------------------------------------------
-> race = White
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1488839 3110.566 2899.053 1 25000
-------------------------------------------------------------------------------
-> race = Black/Negro
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 901251 2896.298 2592.647 26 23292
-------------------------------------------------------------------------------
-> race = American Indian/Aleut/Eskimo
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 74860 3202.088 3061.035 1 12816
-------------------------------------------------------------------------------
-> race = Asian or Pacific Islander
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 86296 4131.57 2745.261 1 13200
* What the above, using the fweights, tells us is that there were 1.49 million white people and 901 thousand black people with welfare income in the US in 1999.
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B
> _win2013\class1.log
log type: text
closed on: 10 Jan 2013, 15:32:49
-------------------------------------------------------------------------------