-------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_381_logs\class1.log
log type: text
opened on: 24 Sep 2013, 14:42:49
* Start every session with a log!
. *add comment with asterisk
. describe
Contains data from C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
size: 14,574,390
-------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial
number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central
city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household
income
pubhous byte %8.0g pubhouslbl
Living in public
housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in
sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in
the household
poploc byte %8.0g poploclbl
Father's location in
the household
sploc byte %8.0g sploclbl Spouse's location in
household
famsize byte %25.0g famsizelbl
Number of own family
members in hh
nchild byte %18.0g nchildlbl
Number of own
children in
household
nchlt5 byte %23.0g nchlt5lbl
Number of own
children under age 5
in hh
nsibs byte %18.0g nsibslbl Number of own
siblings in
household
relate int %34.0g relatelbl
Relationship to
household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed
forces, or child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational
attainment, 1990
educrec byte %23.0g educreclbl
Educational
attainment recode
schlcoll byte %45.0g schlcolllbl
School or college
attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990
basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last
year
hrswork byte %8.0g hrsworklbl
Hours worked last
week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked
per week (last yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary
income
incss long %12.0g Social Security
income
incwelfr long %12.0g Welfare (public
assistance) income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent
period of service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer
group health plan
last year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid
last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt,
negative values
recoded to 0
yrsed float %9.0g based on educrec
-------------------------------------------------------------------
Sorted by: sex
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race, missing
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
* Notice there are no missing values for race. How can this be? Answer: the Census Bureau imputes the missing values.
. tabulate race [fweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
* There were 274 million Americans in the non-institutional population in March, 2000. Note that applying the weights yields a slightly different percentage breakdown than the unweighted data. Some individuals are under-represented among respondents, so they carry higher weights to compensate.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
perwt_roun~d | 133710 2049.868 1083.244 93 14281
* The average weight is about 2050; the CPS was about a 1-in-2000 sample.
. tabulate race, nolab
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
* Race, like all variables in the CPS is actually stored as a number, the “white” and “Black/Negro” text are labels that are attached to the numeric values.
. codebook race
-------------------------------------------------------------------------------
race Race
-------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 100 White
13626 200 Black/Negro
1894 300 American Indian/Aleut/Eskimo
4715 650 Asian or Pacific Islander
* If you want to know which labels correspond to which numeric values, you can try the codebook command, or tabulate and then tabulate, nolab.
. summarize yrsed if race==100
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 88334 12.81067 3.154691 0 17
*yrsed has units, years of education, so you can summarize it, and the mean makes sense.
*Don’t ever do this:
. summarize race
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
race | 133710 132.4183 105.8387 100 650
* Race is a categorical variable, whose average makes no sense.
. sort sex
. by sex: summarize yrsed
-------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 49353 12.79632 3.217925 0 17
-------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 53873 12.75218 3.098084 0 17
*The best way to find out what the missing values are is to look the variable up in the ipums/cps documentation.
. tabulate age if yrsed==.
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 5.62 5.62
1 | 1,932 6.34 11.96
2 | 1,950 6.40 18.35
3 | 1,939 6.36 24.71
4 | 1,965 6.45 31.16
5 | 1,998 6.55 37.71
6 | 2,059 6.75 44.47
7 | 2,176 7.14 51.61
8 | 2,163 7.10 58.70
9 | 2,243 7.36 66.06
10 | 2,202 7.22 73.28
11 | 2,083 6.83 80.12
12 | 2,035 6.68 86.79
13 | 2,047 6.71 93.51
14 | 1,979 6.49 100.00
--------------------+-----------------------------------
Total | 30,484 100.00
* yrsed is missing for respondents under 15 years of age.
. table sex, contents (mean yrsed sd yrsed min yrsed max yrsed)
--------------------------------------------------------------
Sex | mean(yrsed) sd(yrsed) min(yrsed) max(yrsed)
----------+---------------------------------------------------
Male | 12.79632 3.217926 0 17
Female | 12.75218 3.098084 0 17
--------------------------------------------------------------
* A lot of the kind of output you can get with summarize, you could also get using the table command.
. by sex: summarize yrsed if age>=25 & age<=34
-------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
-------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
* If we ask the question: what is the probability that women age 25-34 in the CPS have higher education than men of the same age in the CPS, the answer is: 100%. It is not a statistics question. There are no unknowns. If we ask the question, is this CPS distribution of men’s and women’s education consistent with a null hypothesis that men and women in the US as a whole have equal education, the answer (slightly surprisingly) is NO. The 0.24 years of education difference between men and women in this age group of the CPS is statistically significant. We will talk more about what this means.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* after class comment: If we want to know what probability is associated with this T statistic of 5.7164, Stata can tell us precisely:
. display ttail(18000, 5.716)
5.539e-09
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013
> _381_logs\class1.log
log type: text
closed on: 24 Sep 2013, 16:26:50
--------------------------------------------------------------------------------