--------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013
> _381_logs\class2.log
log type: text
opened on: 26 Sep 2013, 14:06:30
*Going back to last class, I had a ttest comparing the educational attainment of young women to young men.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* I said that the t-statistic of -5.7164 was enormously far from zero, but how far is it? How likely would we be to get a value this far away from zero just by chance? Answer:
. display ttail(18536, 5.716)
5.537e-09
* If we are doing a 2 tail test, which is more appropriate, we end up with a P value of about 1 in 100,000,000, which is a really small probability. That means that we are sure that this data did not come from a sampling frame (i.e. the whole US) in which men and women had equal educational attainments.
. display 2*ttail(18536, 5.716)
1.107e-08
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 133710 35.17964 22.21722 0 90
* Do you notice that the top age is 90? Does that make sense in a population of 133,710 people? No. The reason is that the variable age is topcoded to protect the identities of age outliers. See the ipums.org documentation.
. summarize incwelfr
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 103226 40.62242 478.8231 0 25000
* The average of welfare income over all respondents is $40.62, because most respondents received zero welfare income.
. summarize incwelfr if age>=15 & incwelfr>0
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1289 3253.134 2813.505 1 25000
* If we want to know what the average welfare income is for people who actually receive welfare, we get a much smaller sample and a more reasonable average.
. summarize incwelfr if age>=15 & incwelfr>0, detail
Welfare (public assistance) income
-------------------------------------------------------------
Percentiles Smallest
1% 26 1
5% 214 1
10% 450 1 Obs 1289
25% 1026 1 Sum of Wgt. 1289
50% 2664 Mean 3253.134
Largest Std. Dev. 2813.505
75% 4668 15600
90% 7000 19999 Variance 7915809
95% 8400 23292 Skewness 1.79416
99% 12648 25000 Kurtosis 9.428488
* If you want to know the median, ask for summarize, detail
* Let’s create a new variable that is dichotomous for whether the respondent received welfare or not.
. gen byte receives_welfare=0
. replace receives_welfare=1 if incwelfr>0 & incwelfr~=.
(1289 real changes made)
* set the new variable equal to 1 if the welfare income is greater than zero.
. label define receives_welfare_lbl 0 "no" 1 "yes"
. label val receives_welfare receives_welfare_lbl
* created a label associating 0 with “no” and 1 with “yes”, then we attached that label to the variable receives_welfare.
. label var receives_welfare "does respondent receive welfare"
. tabulate receives_welfare [fweight= perwt_rounded] if age>=15
does |
respondent |
receive |
welfare | Freq. Percent Cum.
------------+-----------------------------------
no |211,222,605 98.81 98.81
yes | 2,551,246 1.19 100.00
------------+-----------------------------------
Total |213,773,851 100.00
* In the US, among people age 15 or older (the people who are in the universe for the question about welfare income), there were 2.5 million welfare recipients, or 1.19% of the population.
. tabulate receives_welfare [pweight= perwt_rounded] if age>=15
pweight not allowed
* Someone asked me why I used fweight and not pweight. One reason is that pweight is not allowed for tabulate. The second reason is that fweight, or frequency weight is kind of weight I want to apply to this table, so that the table reflects the frequencies in the US population.
* And here is a table using the new “receives_welfare” variable to show(in the third of 3 statistics in each cell) the proportion of individuals in that cell who receive welfare:
. table educrec sex if age>20 [fweight= perwt_rounded], contents(freq mean incwelfr mean receives_welfare) row col
---------------------------------------------------------------
Educational attainment | Sex
recode | Male Female Total
------------------------+--------------------------------------
None or preschool | 409,822 463,962 873,784
| 0 201.8166229 107.1606301
| 0 .04848 .025742
|
Grades 1, 2, 3, or 4 | 988,458 959,869 1948327
| 21.2051377 186.4335592 102.6070993
| .011155 .039831 .025283
|
Grades 5, 6, 7, or 8 | 4792742 5028804 9821546
| 10.72959028 119.6578288 66.50276097
| .005356 .032857 .019437
|
Grade 9 | 1926372 2028431 3954803
| 20.88420617 134.0259969 78.91498944
| .007086 .046607 .027357
|
Grade 10 | 2498378 2892776 5391154
| 22.49344775 214.192737 125.3551177
| .008675 .0635 .038093
|
Grade 11 | 2607008 3013104 5620112
| 23.15243145 216.6690639 126.9022747
| .007129 .073434 .042677
|
Grade 12 | 3.01e+07 3.47e+07 6.48e+07
| 11.72673341 67.85343211 41.8001018
| .003832 .021749 .013432
|
1 to 3 years of college | 2.35e+07 2.70e+07 5.05e+07
| 7.269855825 44.67187372 27.25585651
| .002034 .01304 .007915
|
4+ years of college | 2.40e+07 2.28e+07 4.68e+07
| .3599692853 5.49143018 2.858781299
| .000103 .002347 .001196
|
Total | 9.09e+07 9.89e+07 1.90e+08
| 8.382322858 61.73025686 36.18842854
| .00282 .01907 .01129
---------------------------------------------------------------
. clear all
* After you have downloaded the *.gz data file and the *.do file, and after you have unzipped the *.gz data file and put the resulting data file in the same folder, then you have two steps left to ingest the data:
* first, copy the folder directory that the files are in and set the home directory of stata to that directory, using the cd command, thus (note the double quotes):
. cd "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data"
C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data
* Then invoke the do-file from this data directory, the easiest way may be to go to the menus and select File>Do and choose the do file from your data directory.
. do "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\cps_00012.do"
*Then the do file should run, and read all your data in, and add labels and so on.
. * NOTE: You need to set the Stata working directory to the path
. * where the data file is located.
.
. set more off
.
. clear
. quietly infix ///
> int year 1-4 ///
> long serial 5-9 ///
> float hwtsupp 10-19 ///
> byte month 20-21 ///
> float wtsupp 22-31 ///
> byte age 32-33 ///
> byte sex 34-34 ///
> double inctot 35-42 ///
> using `"cps_00012.dat"'
.
. replace hwtsupp = hwtsupp / 10000
(149642 real changes made)
. replace wtsupp = wtsupp / 10000
(149642 real changes made)
.
. format hwtsupp %10.4f
. format wtsupp %10.4f
. format inctot %8.0f
.
. label var year `"Survey year"'
. label var serial `"Household serial number"'
. label var hwtsupp `"Household weight, Supplement"'
. label var month `"Month"'
. label var wtsupp `"Supplement Weight"'
. label var age `"Age"'
. label var sex `"Sex"'
. label var inctot `"Total personal income"'
.
. label define hwtsupp_lbl 0000000000 `"0000000000"'
. label values hwtsupp hwtsupp_lbl
.
. label define month_lbl 01 `"January"'
. label define month_lbl 02 `"February"', add
. label define month_lbl 03 `"March"', add
. label define month_lbl 04 `"April"', add
. label define month_lbl 05 `"May"', add
. label define month_lbl 06 `"June"', add
. label define month_lbl 07 `"July"', add
. label define month_lbl 08 `"August"', add
. label define month_lbl 09 `"September"', add
. label define month_lbl 10 `"October"', add
. label define month_lbl 11 `"November"', add
. label define month_lbl 12 `"December"', add
. label values month month_lbl
.
. label define age_lbl 00 `"Under 1 year"'
. label define age_lbl 01 `"1"', add
. label define age_lbl 02 `"2"', add
. label define age_lbl 03 `"3"', add
. label define age_lbl 04 `"4"', add
. label define age_lbl 05 `"5"', add
. label define age_lbl 06 `"6"', add
. label define age_lbl 07 `"7"', add
. label define age_lbl 08 `"8"', add
. label define age_lbl 09 `"9"', add
. label define age_lbl 10 `"10"', add
. label define age_lbl 11 `"11"', add
. label define age_lbl 12 `"12"', add
. label define age_lbl 13 `"13"', add
. label define age_lbl 14 `"14"', add
. label define age_lbl 15 `"15"', add
. label define age_lbl 16 `"16"', add
. label define age_lbl 17 `"17"', add
. label define age_lbl 18 `"18"', add
. label define age_lbl 19 `"19"', add
. label define age_lbl 20 `"20"', add
. label define age_lbl 21 `"21"', add
. label define age_lbl 22 `"22"', add
. label define age_lbl 23 `"23"', add
. label define age_lbl 24 `"24"', add
. label define age_lbl 25 `"25"', add
. label define age_lbl 26 `"26"', add
. label define age_lbl 27 `"27"', add
. label define age_lbl 28 `"28"', add
. label define age_lbl 29 `"29"', add
. label define age_lbl 30 `"30"', add
. label define age_lbl 31 `"31"', add
. label define age_lbl 32 `"32"', add
. label define age_lbl 33 `"33"', add
. label define age_lbl 34 `"34"', add
. label define age_lbl 35 `"35"', add
. label define age_lbl 36 `"36"', add
. label define age_lbl 37 `"37"', add
. label define age_lbl 38 `"38"', add
. label define age_lbl 39 `"39"', add
. label define age_lbl 40 `"40"', add
. label define age_lbl 41 `"41"', add
. label define age_lbl 42 `"42"', add
. label define age_lbl 43 `"43"', add
. label define age_lbl 44 `"44"', add
. label define age_lbl 45 `"45"', add
. label define age_lbl 46 `"46"', add
. label define age_lbl 47 `"47"', add
. label define age_lbl 48 `"48"', add
. label define age_lbl 49 `"49"', add
. label define age_lbl 50 `"50"', add
. label define age_lbl 51 `"51"', add
. label define age_lbl 52 `"52"', add
. label define age_lbl 53 `"53"', add
. label define age_lbl 54 `"54"', add
. label define age_lbl 55 `"55"', add
. label define age_lbl 56 `"56"', add
. label define age_lbl 57 `"57"', add
. label define age_lbl 58 `"58"', add
. label define age_lbl 59 `"59"', add
. label define age_lbl 60 `"60"', add
. label define age_lbl 61 `"61"', add
. label define age_lbl 62 `"62"', add
. label define age_lbl 63 `"63"', add
. label define age_lbl 64 `"64"', add
. label define age_lbl 65 `"65"', add
. label define age_lbl 66 `"66"', add
. label define age_lbl 67 `"67"', add
. label define age_lbl 68 `"68"', add
. label define age_lbl 69 `"69"', add
. label define age_lbl 70 `"70"', add
. label define age_lbl 71 `"71"', add
. label define age_lbl 72 `"72"', add
. label define age_lbl 73 `"73"', add
. label define age_lbl 74 `"74"', add
. label define age_lbl 75 `"75"', add
. label define age_lbl 76 `"76"', add
. label define age_lbl 77 `"77"', add
. label define age_lbl 78 `"78"', add
. label define age_lbl 79 `"79"', add
. label define age_lbl 80 `"80"', add
. label define age_lbl 81 `"81"', add
. label define age_lbl 82 `"82"', add
. label define age_lbl 83 `"83"', add
. label define age_lbl 84 `"84"', add
. label define age_lbl 85 `"85"', add
. label define age_lbl 86 `"86"', add
. label define age_lbl 87 `"87"', add
. label define age_lbl 88 `"88"', add
. label define age_lbl 89 `"89"', add
. label define age_lbl 90 `"90 (90+, 1988-2002)"', add
. label define age_lbl 91 `"91"', add
. label define age_lbl 92 `"92"', add
. label define age_lbl 93 `"93"', add
. label define age_lbl 94 `"94"', add
. label define age_lbl 95 `"95"', add
. label define age_lbl 96 `"96"', add
. label define age_lbl 97 `"97"', add
. label define age_lbl 98 `"98"', add
. label define age_lbl 99 `"99+"', add
. label values age age_lbl
.
. label define sex_lbl 1 `"Male"'
. label define sex_lbl 2 `"Female"', add
. label define sex_lbl 9 `"NIU"', add
. label values sex sex_lbl
.
. label define inctot_lbl 00999997 `"00999997"'
. label define inctot_lbl 99999997 `"99999997"', add
. label define inctot_lbl 99999999 `"99999999"', add
. label values inctot inctot_lbl
.
.
.
end of do-file
. save "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta", replace
file C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta saved
* Then, by all means, do File>Save because you have a brand new Stata file and you don’t want to have to create it again.
* Also note, if you execute a command that is taking too long or is not what you really wanted, you can interrupt it by hitting the break button (looks like a red stop sign with an X through it) in Stata.
. tabulate age
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 2,029 1.36 1.36
1 | 2,249 1.50 2.86
2 | 2,400 1.60 4.46
3 | 2,384 1.59 6.06
4 | 2,527 1.69 7.74
5 | 2,500 1.67 9.42
6 | 2,403 1.61 11.02
7 | 2,416 1.61 12.64
8 | 2,371 1.58 14.22
9 | 2,358 1.58 15.80
10 | 2,303 1.54 17.33
11 | 2,370 1.58 18.92
12 | 2,342 1.57 20.48
13 | 2,306 1.54 22.02
14 | 2,283 1.53 23.55
15 | 2,237 1.49 25.05
16 | 2,154 1.44 26.48
17 | 2,115 1.41 27.90
18 | 1,962 1.31 29.21
19 | 1,789 1.20 30.40
20 | 1,799 1.20 31.61
21 | 1,775 1.19 32.79
22 | 1,876 1.25 34.05
--Break--
r(1);
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\f
> all_2013_381_logs\class2.log
log type: text
closed on: 26 Sep 2013, 16:00:54
------------------------------------------------------------------------