---------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2010_s381_logs\class2.log
log type: text
opened on: 23 Sep 2010, 14:17:39
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
. summarize incwelfr
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 103226 40.62242 478.8231 0 25000
* If you take the average of something with a lot of zeros, you get a skewed view of the data.
. summarize incwelfr if incwelfr>0
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1289 3253.134 2813.505 1 25000
* What we really want to know is what is the average welfare income for those who receive welfare. $3,000 makes more sense than $40.
. sort sex
. by sex: summarize incwelfr if incwelfr>0
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 188 2979.622 2644.509 1 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1101 3299.837 2839.866 1 25000
* Most welfare recipients are female.
. by sex: summarize incwelfr if incwelfr>0 [fweight= perwt_rounded]
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 357702 2897.24 2577.316 1 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 2193544 3100.608 2837.588 1 25000
* And there are about 2.2 million women, and 360K men on welfare.
. by sex: summarize incwelfr if incwelfr>0 &age>20 [fweight= perwt_rounded]
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 256209 2972.657 2636.861 1 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1886278 3237.094 2906.139 1 25000
* Are some of the welfare recipients younger than 21? Apparently yes (note the number of observations is lower here than above).
. by sex: summarize incwelfr if incwelfr>0 &age>12 [fweight= perwt_rounded]
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 357702 2897.24 2577.316 1 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 2193544 3100.608 2837.588 1 25000
* All welfare recipients are over 12.
. table sex if incwelfr>0, contents (freq mean incwelfr)
------------------------------------------
Sex | Freq. mean(incwelfr)
----------+-------------------------------
Male | 15,626 2979.62234
Female | 16,147 3299.837421
------------------------------------------
* You can use the table command to generate statistics, and averages, just like the summarize command. Here, however, the count of observations of men and women (unweighted) who have welfare income greater than zero is much larger than the unweighted count we got from summarize above, even though the means are exactly the same. What is the problem. The problem is that people with incwelfr==. (i.e. missing values) are counted as being >0, which is a weird property of the missing value code.
. table sex if incwelfr>0 , contents (freq mean incwelfr)
------------------------------------------
Sex | Freq. mean(incwelfr)
----------+-------------------------------
Male | 15,626 2979.62234
Female | 16,147 3299.837421
------------------------------------------
. table sex if incwelfr>0 & incwelfr~=. , contents (freq mean incwelfr)
------------------------------------------
Sex | Freq. mean(incwelfr)
----------+-------------------------------
Male | 188 2979.62234
Female | 1,101 3299.837421
------------------------------------------
* So if we exclude the missing values by hand, we get exactly the same unweighted count as we got with summarize.
. table sex if incwelfr>0 & incwelfr~=. [fweight= perwt_rounded] , contents (freq mean incwelfr)
------------------------------------------
Sex | Freq. mean(incwelfr)
----------+-------------------------------
Male | 357,702 2897.240312
Female | 2193544 3100.608278
------------------------------------------
* Again, exactly like the summarize command..
*Now let's look at wage income, which is more broadly relevant.
. by sex: summarize incwage if age>25 & age<35
---------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 8229 30226.95 27174.71 0 362302
---------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 8643 18039.8 20632.34 0 333564
. by sex: summarize incwage if age>24 & age<35
---------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 9027 29510.62 26619.54 0 362302
---------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 9511 17728.95 20249.23 0 333564
. table sex if age>24 & age<35, contents(freq mean incwage mean yrsed)
-------------------------------------------------------
Sex | Freq. mean(incwage) mean(yrsed)
----------+--------------------------------------------
Male | 9,027 29510.61781 13.31212
Female | 9,511 17728.94764 13.55657
-------------------------------------------------------
* women have more education, but earn less…
. table sex if age>24 & age<35& occ1990==178, contents(freq mean incwage mean yrsed)
-------------------------------------------------------
Sex | Freq. mean(incwage) mean(yrsed)
----------+--------------------------------------------
Male | 60 56928.93333 17
Female | 41 59430.68293 16.92683
-------------------------------------------------------
* occ1990==178 are the lawyers (you can look up the codes on the ipums website, or tabulate the variable, or codebook the variable, or list the value label attached to the variable). It looks like young women lawyers make a bit more money than young male lawyers…
. table sex if age>24 & age<35& occ1990==178 [fweight= perwt_rounded], contents(freq mean incwage mean yrsed)
-------------------------------------------------------
Sex | Freq. mean(incwage) mean(yrsed)
----------+--------------------------------------------
Male | 137,314 58326.18129 17
Female | 110,119 62426.92046 16.92127
-------------------------------------------------------
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 133710 35.17964 22.21722 0 90
*Why is age==90 the highest age in this dataset? The answer is that age is topcoded, to protect the individual identity of the people who happen to be old age outliers.
. table sex if age>24 & age<35& occ1990==178 [fweight= perwt_rounded], contents(freq mean incwage max incwage mean yrsed)
----------------------------------------------------------------------
Sex | Freq. mean(incwage) max(incwage) mean(yrsed)
----------+-----------------------------------------------------------
Male | 137,314 58326.18129 229339 17
Female | 110,119 62426.92046 150000 16.92127
----------------------------------------------------------------------
* Is it possible that the similarity of earnings for young male and young female lawyers is because the highest earning men are topcoded, which would skew our comparison? Actually, it turns out not to be the case. All of the young lawyers in our sample are below the topcode income of $362,302
. summarize incwage, detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 103226
25% 0 0 Sum of Wgt. 103226
50% 10000 Mean 19462.59
Largest Std. Dev. 28843.38
75% 30000 362302
90% 50000 362302 Variance 8.32e+08
95% 66500 362302 Skewness 3.583439
99% 125000 364302 Kurtosis 24.50639
. table sex if age>24 & age<35, contents(freq mean yrsed)
------------------------------------
Sex | Freq. mean(yrsed)
----------+-------------------------
Male | 9,027 13.31212
Female | 9,511 13.55657
------------------------------------
. display 13.55657-13.31212
.24445
* You can use the display function as a calculator
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* repeats our ttest of yrsed, but with age group consistent with my Excel file.
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate sex, nolabel
Sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 64,791 48.46 48.46
2 | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
* now to generate a new dummy variable which we will use in regression..
. generate male=0
. replace male=1 if sex==1
(64791 real changes made)
. label define male_lbl 0 "female" 1 "male"
* making a value label
. label val male male_lbl
* attaching that value label to the variable.
. tabulate sex male
| male
Sex | female male | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
. tabulate sex male, nolab
| male
Sex | 0 1 | Total
-----------+----------------------+----------
1 | 0 64,791 | 64,791
2 | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
. tabulate sex male, nolab miss
| male
Sex | 0 1 | Total
-----------+----------------------+----------
1 | 0 64,791 | 64,791
2 | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
* There are in this dataset no missing values for sex. The reason is that if the respondent left it missing, the Census Bureau imputed it. Imputation flags are available from ipums.
. regress yrsed male if age>24 & ager<35
ager not found
r(111);
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
* Note how this gives you the same coefficient, the same t-statistic, and therefore the same answer as the t-test above.
. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace
file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved
* I added a new variable (male) and I want to keep it, so I saved the dataset.
. clear
* Then I clear the dataset to make way for the new one.
. cd "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps"
C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps
*You have to execute the cd command so that Stata knows where to find your uncompressed dataset.
*The following is what it looks like in Stata when you import data. First it lists the do file, running it step by step. You can pick the do file to run from the menus, File> Do
. do "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\cps_00008.do"
. /* Important: you need to put the .dat and .do files in one folder/
> directory and then set the working folder to that folder. */
.
. set more off
.
. clear
. infix ///
> int year 1-4 ///
> byte age 5-6 ///
> byte sex 7 ///
> using cps_00008.dat
(210648 observations read)
.
. label var year `"Survey year"'
. label var age `"Age"'
. label var sex `"Sex"'
.
. label define agelbl 00 `"Under 1 year"'
. label define agelbl 01 `"1"', add
. label define agelbl 02 `"2"', add
. label define agelbl 03 `"3"', add
. label define agelbl 04 `"4"', add
. label define agelbl 05 `"5"', add
. label define agelbl 06 `"6"', add
. label define agelbl 07 `"7"', add
. label define agelbl 08 `"8"', add
. label define agelbl 09 `"9"', add
. label define agelbl 10 `"10"', add
. label define agelbl 11 `"11"', add
. label define agelbl 12 `"12"', add
. label define agelbl 13 `"13"', add
. label define agelbl 14 `"14"', add
. label define agelbl 15 `"15"', add
. label define agelbl 16 `"16"', add
. label define agelbl 17 `"17"', add
. label define agelbl 18 `"18"', add
. label define agelbl 19 `"19"', add
. label define agelbl 20 `"20"', add
. label define agelbl 21 `"21"', add
. label define agelbl 22 `"22"', add
. label define agelbl 23 `"23"', add
. label define agelbl 24 `"24"', add
. label define agelbl 25 `"25"', add
. label define agelbl 26 `"26"', add
. label define agelbl 27 `"27"', add
. label define agelbl 28 `"28"', add
. label define agelbl 29 `"29"', add
. label define agelbl 30 `"30"', add
. label define agelbl 31 `"31"', add
. label define agelbl 32 `"32"', add
. label define agelbl 33 `"33"', add
. label define agelbl 34 `"34"', add
. label define agelbl 35 `"35"', add
. label define agelbl 36 `"36"', add
. label define agelbl 37 `"37"', add
. label define agelbl 38 `"38"', add
. label define agelbl 39 `"39"', add
. label define agelbl 40 `"40"', add
. label define agelbl 41 `"41"', add
. label define agelbl 42 `"42"', add
. label define agelbl 43 `"43"', add
. label define agelbl 44 `"44"', add
. label define agelbl 45 `"45"', add
. label define agelbl 46 `"46"', add
. label define agelbl 47 `"47"', add
. label define agelbl 48 `"48"', add
. label define agelbl 49 `"49"', add
. label define agelbl 50 `"50"', add
. label define agelbl 51 `"51"', add
. label define agelbl 52 `"52"', add
. label define agelbl 53 `"53"', add
. label define agelbl 54 `"54"', add
. label define agelbl 55 `"55"', add
. label define agelbl 56 `"56"', add
. label define agelbl 57 `"57"', add
. label define agelbl 58 `"58"', add
. label define agelbl 59 `"59"', add
. label define agelbl 60 `"60"', add
. label define agelbl 61 `"61"', add
. label define agelbl 62 `"62"', add
. label define agelbl 63 `"63"', add
. label define agelbl 64 `"64"', add
. label define agelbl 65 `"65"', add
. label define agelbl 66 `"66"', add
. label define agelbl 67 `"67"', add
. label define agelbl 68 `"68"', add
. label define agelbl 69 `"69"', add
. label define agelbl 70 `"70"', add
. label define agelbl 71 `"71"', add
. label define agelbl 72 `"72"', add
. label define agelbl 73 `"73"', add
. label define agelbl 74 `"74"', add
. label define agelbl 75 `"75"', add
. label define agelbl 76 `"76"', add
. label define agelbl 77 `"77"', add
. label define agelbl 78 `"78"', add
. label define agelbl 79 `"79"', add
. label define agelbl 80 `"80"', add
. label define agelbl 81 `"81"', add
. label define agelbl 82 `"82"', add
. label define agelbl 83 `"83"', add
. label define agelbl 84 `"84"', add
. label define agelbl 85 `"85"', add
. label define agelbl 86 `"86"', add
. label define agelbl 87 `"87"', add
. label define agelbl 88 `"88"', add
. label define agelbl 89 `"89"', add
. label define agelbl 90 `"90 (90+, 1988-2002)"', add
. label define agelbl 91 `"91"', add
. label define agelbl 92 `"92"', add
. label define agelbl 93 `"93"', add
. label define agelbl 94 `"94"', add
. label define agelbl 95 `"95"', add
. label define agelbl 96 `"96"', add
. label define agelbl 97 `"97"', add
. label define agelbl 98 `"98"', add
. label define agelbl 99 `"99+"', add
. label values age agelbl
.
. label define sexlbl 1 `"Male"'
. label define sexlbl 2 `"Female"', add
. label values sex sexlbl
.
.
end of do-file
. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\my fun cps dataset trial.dta"
* We have just created a new Stata dataset, so we have to save it.
file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\my fun cps dataset trial.dta saved
. clear all
. exit, clear