-------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2015_381_logs\class1.log
log type: text
opened on: 21 Sep 2015, 10:08:54
. use "C:\Users\Michael\Desktop\cps_mar_2000.dta", clear
* Start every Stata session with a log! Then open your dataset
. *class starts here
. describe
Contains data from C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
size: 14,574,390
--------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household income
pubhous byte %8.0g pubhouslbl
Living in public housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in the household
poploc byte %8.0g poploclbl
Father's location in the household
sploc byte %8.0g sploclbl Spouse's location in household
famsize byte %25.0g famsizelbl
Number of own family members in hh
nchild byte %18.0g nchildlbl
Number of own children in
household
nchlt5 byte %23.0g nchlt5lbl
Number of own children under age 5
in hh
nsibs byte %18.0g nsibslbl Number of own siblings in
household
relate int %34.0g relatelbl
Relationship to household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed forces, or
child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational attainment, 1990
educrec byte %23.0g educreclbl
Educational attainment recode
schlcoll byte %45.0g schlcolllbl
School or college attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990 basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last year
hrswork byte %8.0g hrsworklbl
Hours worked last week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked per week (last
yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary income
incss long %12.0g Social Security income
incwelfr long %12.0g Welfare (public assistance) income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent period of
service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer group health
plan last year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt, negative values
recoded to 0
yrsed float %9.0g based on educrec
--------------------------------------------------------------------------------
Sorted by: sex
Note: dataset has changed since last saved
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race, missing
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
*Note that there are no missing values for race. Why? Because missing values get imputed by the census bureau.
. tabulate race [fweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
* The universe of the CPS (people living in non-institutional settings in the US) had 274 million people in March 2000. Also note that percentage of people who are black in the US is a little higher than the percentage of people who are black in the CPS. Why? Because the response rate for black respondents was a little lower than average, so they were assigned higher weights.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
perwt_roun~d | 133710 2049.868 1083.244 93 14281
* If you take 274 million and divide it by 133 thousand you get about 2 thousand. The average weight is about 2 thousand, because the CPS is a 1-in-2000 representative survey of US households.
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race, nolab
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
. codebook race
--------------------------------------------------------------------------------
race Race
--------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 100 White
13626 200 Black/Negro
1894 300 American Indian/Aleut/Eskimo
4715 650 Asian or Pacific Islander
* Race, like the other nominal categorical variables, is coded numerically, and the text value labels are added to the numbers later.
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 133710 35.17964 22.21722 0 90
* What happened to the people older than 90? 90 turns out to be the topcode for age. Topcodes are used to reduce the possibility of identification of individuals. Crucially, you need to look at the ipums documentation for variables to know what the topcodes are, what the missing value codes are, what the question wording is, and so on.
. summarize yrsed
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 103226 12.77328 3.156011 0 17
* If you run a tabulation, and you get fewer than 133,710 cases, you need to ask yourself where the other cases went. It turns out that children are not asked about educational attainment.
. tabulate age if yrsed==.
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 5.62 5.62
1 | 1,932 6.34 11.96
2 | 1,950 6.40 18.35
3 | 1,939 6.36 24.71
4 | 1,965 6.45 31.16
5 | 1,998 6.55 37.71
6 | 2,059 6.75 44.47
7 | 2,176 7.14 51.61
8 | 2,163 7.10 58.70
9 | 2,243 7.36 66.06
10 | 2,202 7.22 73.28
11 | 2,083 6.83 80.12
12 | 2,035 6.68 86.79
13 | 2,047 6.71 93.51
14 | 1,979 6.49 100.00
--------------------+-----------------------------------
Total | 30,484 100.00
. sort sex
. by sex: summarize yrsed if age>=25 & age<=34
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
* Do young women have more educational attainment than young men? In the CPS, the answer is obviously yes: 13.55>13.31. But what about in the US? Is this difference consistent with a null hypothesis that young women and young men in the US have the same educational levels? Another way of asking this question is: if young men and young women in the US had the same educational levels, what would be the probability of finding a difference as large as 0.24 years in a representative sample of 18K young people? That is the key statistical question we are going to be focusing on.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* So we run the t-test, and Stata reports a T-statistic of -5.7164. Is that large or small?
. display 1- ttail(18536, -5.7164)
5.524e-09
* It turns out that the t-statistic of -5.7, is very far from zero, much further than we would expect to get by chance. The probability of getting such a large difference by chance is about 5 chances in a billion. If we double the P value (because we might ask how large was the probability of getting a result this large in either direction, that is with the men or the women in the sample having 0.24 or more years of educational advantage, the answer is 1 in 100 million. In other words, very unlikely. If our data is totally inconsistent with the null hypothesis, that generally means that we reject the null hypothesis. So the we are reasonably sure that young women in the US are more educated than young men.
. summarize incwelfr
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 103226 40.62242 478.8231 0 25000
. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1289 3253.134 2813.505 1 25000
. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=. [fweight= perwt_rounded]
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 2551246 3072.095 2803.442 1 25000
* Here I generate a new variable, then replace the values that need to be replaced, then I add labels, and after that you should save the dataset.
. gen byte receives_welfare=0
. replace receives_welfare=1 if incwelfr>0 & incwelfr~=.
(1289 real changes made)
. tabulate receives_welfare
receives_we |
lfare | Freq. Percent Cum.
------------+-----------------------------------
0 | 132,421 99.04 99.04
1 | 1,289 0.96 100.00
------------+-----------------------------------
Total | 133,710 100.00
. label define receives_welfare_lbl 0 "no" 1 "yes"
. label val receives_welfare receives_welfare_lbl
. tabulate receives_welfare
receives_we |
lfare | Freq. Percent Cum.
------------+-----------------------------------
no | 132,421 99.04 99.04
yes | 1,289 0.96 100.00
------------+-----------------------------------
Total | 133,710 100.00
. label var receives_welfare "does respondent receive welfare"
. tabulate receives_welfare
does |
respondent |
receive |
welfare | Freq. Percent Cum.
------------+-----------------------------------
no | 132,421 99.04 99.04
yes | 1,289 0.96 100.00
------------+-----------------------------------
Total | 133,710 100.00
. sort receives_welfare
. by receives_welfare: summarize yrsed
--------------------------------------------------------------------------------
-> receives_welfare = no
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 101937 12.79583 3.153618 0 17
--------------------------------------------------------------------------------
-> receives_welfare = yes
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 1289 10.9903 2.817995 0 17
. table receives_welfare sex, contents(freq mean yrsed)
------------------------------
does |
responden |
t receive | Sex
welfare | Male Female
----------+-------------------
no | 64,603 67,818
| 12.80415 12.78808
|
yes | 188 1,101
| 10.75 11.03133
------------------------------
. table receives_welfare sex, contents(freq mean yrsed)
------------------------------
does |
responden |
t receive | Sex
welfare | Male Female
----------+-------------------
no | 64,603 67,818
| 12.80415 12.78808
|
yes | 188 1,101
| 10.75 11.03133
------------------------------
. table receives_welfare sex, contents(freq mean yrsed) row col
----------------------------------------
does |
responden |
t receive | Sex
welfare | Male Female Total
----------+-----------------------------
no | 64,603 67,818 132,421
| 12.80415 12.78808 12.79583
|
yes | 188 1,101 1,289
| 10.75 11.03133 10.9903
|
Total | 64,791 68,919 133,710
| 12.79632 12.75218 12.77328
----------------------------------------
. table receives_welfare sex, contents(freq mean yrsed mean incwage) row col
-------------------------------------------------
does |
responden |
t receive | Sex
welfare | Male Female Total
----------+--------------------------------------
no | 64,603 67,818 132,421
| 12.80415 12.78808 12.79583
| 26027.96038 13735.28356 19664.13624
|
yes | 188 1,101 1,289
| 10.75 11.03133 10.9903
| 3934.324468 3453.988193 3524.044996
|
Total | 64,791 68,919 133,710
| 12.79632 12.75218 12.77328
| 25943.79926 13525.1652 19462.59227
-------------------------------------------------
. table receives_welfare sex if age>=20 & age<=35, contents(freq mean yrsed mean incwage) row col
-------------------------------------------------
does |
responden |
t receive | Sex
welfare | Male Female Total
----------+--------------------------------------
no | 14,121 14,207 28,328
| 13.14121 13.50852 13.32542
| 25300.65888 15972.03836 20622.1884
|
yes | 55 620 675
| 11.51818 11.42016 11.42815
| 4802.290909 3887.974194 3962.474074
|
Total | 14,176 14,827 29,003
| 13.13491 13.42119 13.28126
| 25221.12937 15466.73589 20234.4593
-------------------------------------------------
. tabulate incwage
Wage and |
salary |
income | Freq. Percent Cum.
------------+-----------------------------------
0 | 35,825 34.71 34.71
1 | 7 0.01 34.71
5 | 15 0.01 34.73
7 | 1 0.00 34.73
8 | 1 0.00 34.73
10 | 1 0.00 34.73
12 | 2 0.00 34.73
18 | 1 0.00 34.73
20 | 10 0.01 34.74
21 | 2 0.00 34.74
28 | 2 0.00 34.75
30 | 5 0.00 34.75
31 | 1 0.00 34.75
34 | 4 0.00 34.76
35 | 5 0.00 34.76
36 | 1 0.00 34.76
40 | 8 0.01 34.77
44 | 1 0.00 34.77
45 | 4 0.00 34.77
46 | 3 0.00 34.78
--Break--
r(1);
* you never want to tabulate the continuous variables, unless you want to print the phone book. You can hit the red x to interrupt Stata commands that turn out to be mistakes.
* To ingest the new data, first put the uncompressed data file and the do file in one folder. Then set the home folder for stata to the folder with the data and the do file, using the cd command. Then run the do file.
. cd "C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again"
C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again
. clear all
. do "C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again\cps_00010.do"
. * NOTE: You need to set the Stata working directory to the path
. * where the data file is located.
.
. set more off
.
. clear
. quietly infix ///
> int year 1-4 ///
> long serial 5-9 ///
> float hwtsupp 10-19 ///
> byte month 20-21 ///
> float wtsupp 22-31 ///
> float wtfinl 32-41 ///
> byte age 42-43 ///
> byte sex 44-44 ///
> double inctot 45-52 ///
> using `"cps_00010.dat"'
.
. replace hwtsupp = hwtsupp / 10000
(210648 real changes made)
. replace wtsupp = wtsupp / 10000
(210648 real changes made)
. replace wtfinl = wtfinl / 10000
(0 real changes made)
.
. format hwtsupp %10.4f
. format wtsupp %10.4f
. format wtfinl %10.4f
. format inctot %8.0f
.
. label var year `"Survey year"'
. label var serial `"Household serial number"'
. label var hwtsupp `"Household weight, Supplement"'
. label var month `"Month"'
. label var wtsupp `"Supplement Weight"'
. label var wtfinl `"Final Basic Weight"'
. label var age `"Age"'
. label var sex `"Sex"'
. label var inctot `"Total personal income"'
.
. label define hwtsupp_lbl 0000000000 `"0000000000"'
. label values hwtsupp hwtsupp_lbl
.
. label define month_lbl 01 `"January"'
. label define month_lbl 02 `"February"', add
. label define month_lbl 03 `"March"', add
. label define month_lbl 04 `"April"', add
. label define month_lbl 05 `"May"', add
. label define month_lbl 06 `"June"', add
. label define month_lbl 07 `"July"', add
. label define month_lbl 08 `"August"', add
. label define month_lbl 09 `"September"', add
. label define month_lbl 10 `"October"', add
. label define month_lbl 11 `"November"', add
. label define month_lbl 12 `"December"', add
. label values month month_lbl
.
. label define wtfinl_lbl 0000000000 `"0"'
. label values wtfinl wtfinl_lbl
.
. label define age_lbl 00 `"Under 1 year"'
. label define age_lbl 01 `"1"', add
. label define age_lbl 02 `"2"', add
. label define age_lbl 03 `"3"', add
. label define age_lbl 04 `"4"', add
. label define age_lbl 05 `"5"', add
. label define age_lbl 06 `"6"', add
. label define age_lbl 07 `"7"', add
. label define age_lbl 08 `"8"', add
. label define age_lbl 09 `"9"', add
. label define age_lbl 10 `"10"', add
. label define age_lbl 11 `"11"', add
. label define age_lbl 12 `"12"', add
. label define age_lbl 13 `"13"', add
. label define age_lbl 14 `"14"', add
. label define age_lbl 15 `"15"', add
. label define age_lbl 16 `"16"', add
. label define age_lbl 17 `"17"', add
. label define age_lbl 18 `"18"', add
. label define age_lbl 19 `"19"', add
. label define age_lbl 20 `"20"', add
. label define age_lbl 21 `"21"', add
. label define age_lbl 22 `"22"', add
. label define age_lbl 23 `"23"', add
. label define age_lbl 24 `"24"', add
. label define age_lbl 25 `"25"', add
. label define age_lbl 26 `"26"', add
. label define age_lbl 27 `"27"', add
. label define age_lbl 28 `"28"', add
. label define age_lbl 29 `"29"', add
. label define age_lbl 30 `"30"', add
. label define age_lbl 31 `"31"', add
. label define age_lbl 32 `"32"', add
. label define age_lbl 33 `"33"', add
. label define age_lbl 34 `"34"', add
. label define age_lbl 35 `"35"', add
. label define age_lbl 36 `"36"', add
. label define age_lbl 37 `"37"', add
. label define age_lbl 38 `"38"', add
. label define age_lbl 39 `"39"', add
. label define age_lbl 40 `"40"', add
. label define age_lbl 41 `"41"', add
. label define age_lbl 42 `"42"', add
. label define age_lbl 43 `"43"', add
. label define age_lbl 44 `"44"', add
. label define age_lbl 45 `"45"', add
. label define age_lbl 46 `"46"', add
. label define age_lbl 47 `"47"', add
. label define age_lbl 48 `"48"', add
. label define age_lbl 49 `"49"', add
. label define age_lbl 50 `"50"', add
. label define age_lbl 51 `"51"', add
. label define age_lbl 52 `"52"', add
. label define age_lbl 53 `"53"', add
. label define age_lbl 54 `"54"', add
. label define age_lbl 55 `"55"', add
. label define age_lbl 56 `"56"', add
. label define age_lbl 57 `"57"', add
. label define age_lbl 58 `"58"', add
. label define age_lbl 59 `"59"', add
. label define age_lbl 60 `"60"', add
. label define age_lbl 61 `"61"', add
. label define age_lbl 62 `"62"', add
. label define age_lbl 63 `"63"', add
. label define age_lbl 64 `"64"', add
. label define age_lbl 65 `"65"', add
. label define age_lbl 66 `"66"', add
. label define age_lbl 67 `"67"', add
. label define age_lbl 68 `"68"', add
. label define age_lbl 69 `"69"', add
. label define age_lbl 70 `"70"', add
. label define age_lbl 71 `"71"', add
. label define age_lbl 72 `"72"', add
. label define age_lbl 73 `"73"', add
. label define age_lbl 74 `"74"', add
. label define age_lbl 75 `"75"', add
. label define age_lbl 76 `"76"', add
. label define age_lbl 77 `"77"', add
. label define age_lbl 78 `"78"', add
. label define age_lbl 79 `"79"', add
. label define age_lbl 80 `"80"', add
. label define age_lbl 81 `"81"', add
. label define age_lbl 82 `"82"', add
. label define age_lbl 83 `"83"', add
. label define age_lbl 84 `"84"', add
. label define age_lbl 85 `"85"', add
. label define age_lbl 86 `"86"', add
. label define age_lbl 87 `"87"', add
. label define age_lbl 88 `"88"', add
. label define age_lbl 89 `"89"', add
. label define age_lbl 90 `"90 (90+, 1988-2002)"', add
. label define age_lbl 91 `"91"', add
. label define age_lbl 92 `"92"', add
. label define age_lbl 93 `"93"', add
. label define age_lbl 94 `"94"', add
. label define age_lbl 95 `"95"', add
. label define age_lbl 96 `"96"', add
. label define age_lbl 97 `"97"', add
. label define age_lbl 98 `"98"', add
. label define age_lbl 99 `"99+"', add
. label values age age_lbl
.
. label define sex_lbl 1 `"Male"'
. label define sex_lbl 2 `"Female"', add
. label define sex_lbl 9 `"NIU"', add
. label values sex sex_lbl
.
. label define inctot_lbl 00999997 `"00999997"'
. label define inctot_lbl 99999997 `"99999997"', add
. label define inctot_lbl 99999999 `"99999999"', add
. label values inctot inctot_lbl
.
.
.
end of do-file
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2015_381_logs\class1.log
log type: text
closed on: 21 Sep 2015, 12:56:38
-----------------------------------------------------------------------------------------------------------------------