--------------------------------------------------------------------------------

name:  <unnamed>

> _381_logs\class2.log

log type:  text

opened on:  26 Sep 2013, 14:06:30

*Going back to last class, I had a ttest comparing the educational attainment of young women to young men.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* I said that the t-statistic of -5.7164 was enormously far from zero, but how far is it? How likely would we be to get a value this far away from zero just by chance? Answer:

. display ttail(18536, 5.716)

5.537e-09

* If we are doing a 2 tail test, which is more appropriate, we end up with a P value of about 1 in 100,000,000, which is a really small probability. That means that we are sure that this data did not come from a sampling frame (i.e. the whole US) in which men and women had equal educational attainments.

. display 2*ttail(18536, 5.716)

1.107e-08

. summarize age

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

age |    133710    35.17964    22.21722          0         90

* Do you notice that the top age is 90? Does that make sense in a population of 133,710 people? No. The reason is that the variable age is topcoded to protect the identities of age outliers. See the ipums.org documentation.

. summarize incwelfr

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |    103226    40.62242    478.8231          0      25000

* The average of welfare income over all respondents is \$40.62, because most respondents received zero welfare income.

. summarize incwelfr if age>=15 & incwelfr>0

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |      1289    3253.134    2813.505          1      25000

* If we want to know what the average welfare income is for people who actually receive welfare, we get a much smaller sample and a more reasonable average.

. summarize incwelfr if age>=15 & incwelfr>0, detail

Welfare (public assistance) income

-------------------------------------------------------------

Percentiles      Smallest

1%           26              1

5%          214              1

10%          450              1       Obs                1289

25%         1026              1       Sum of Wgt.        1289

50%         2664                      Mean           3253.134

Largest       Std. Dev.      2813.505

75%         4668          15600

90%         7000          19999       Variance        7915809

95%         8400          23292       Skewness        1.79416

99%        12648          25000       Kurtosis       9.428488

* If you want to know the median, ask for summarize, detail

* Let’s create a new variable that is dichotomous for whether the respondent received welfare or not.

. replace receives_welfare=1 if incwelfr>0 & incwelfr~=.

* set the new variable equal to 1 if the welfare income is greater than zero.

. label define receives_welfare_lbl 0 "no" 1 "yes"

* created a label associating 0 with “no” and 1 with “yes”, then we attached that label to the variable receives_welfare.

. tabulate receives_welfare [fweight= perwt_rounded] if age>=15

does |

respondent |

welfare |      Freq.     Percent        Cum.

------------+-----------------------------------

no |211,222,605       98.81       98.81

yes |  2,551,246        1.19      100.00

------------+-----------------------------------

Total |213,773,851      100.00

* In the US, among people age 15 or older (the people who are in the universe for the question about welfare income), there were 2.5 million welfare recipients, or 1.19% of the population.

. tabulate receives_welfare [pweight= perwt_rounded] if age>=15

pweight not allowed

* Someone asked me why I used fweight and not pweight. One reason is that pweight is not allowed for tabulate. The second reason is that fweight, or frequency weight is kind of weight I want to apply to this table, so that the table reflects the frequencies in the US population.

* And here is a table using the new “receives_welfare” variable to show(in the third of 3 statistics in each cell) the proportion of individuals in that cell who receive welfare:

. table  educrec sex if age>20  [fweight= perwt_rounded], contents(freq mean  incwelfr mean  receives_welfare) row col

---------------------------------------------------------------

Educational attainment  |                  Sex

recode                  |        Male       Female        Total

------------------------+--------------------------------------

None or preschool |     409,822      463,962      873,784

|           0  201.8166229  107.1606301

|           0       .04848      .025742

|

Grades 1, 2, 3, or 4 |     988,458      959,869      1948327

|  21.2051377  186.4335592  102.6070993

|     .011155      .039831      .025283

|

Grades 5, 6, 7, or 8 |     4792742      5028804      9821546

| 10.72959028  119.6578288  66.50276097

|     .005356      .032857      .019437

|

Grade 9 |     1926372      2028431      3954803

| 20.88420617  134.0259969  78.91498944

|     .007086      .046607      .027357

|

Grade 10 |     2498378      2892776      5391154

| 22.49344775   214.192737  125.3551177

|     .008675        .0635      .038093

|

Grade 11 |     2607008      3013104      5620112

| 23.15243145  216.6690639  126.9022747

|     .007129      .073434      .042677

|

Grade 12 |    3.01e+07     3.47e+07     6.48e+07

| 11.72673341  67.85343211   41.8001018

|     .003832      .021749      .013432

|

1 to 3 years of college |    2.35e+07     2.70e+07     5.05e+07

| 7.269855825  44.67187372  27.25585651

|     .002034       .01304      .007915

|

4+ years of college |    2.40e+07     2.28e+07     4.68e+07

| .3599692853   5.49143018  2.858781299

|     .000103      .002347      .001196

|

Total |    9.09e+07     9.89e+07     1.90e+08

| 8.382322858  61.73025686  36.18842854

|      .00282       .01907       .01129

---------------------------------------------------------------

. clear all

* After you have downloaded the *.gz data file and the *.do file, and after you have unzipped the *.gz data file and put the resulting data file in the same folder, then you have two steps left to ingest the data:

* first, copy the folder directory that the files are in and set the home directory of stata to that directory, using the cd command, thus (note the double quotes):

. cd "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data"

C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data

* Then invoke the do-file from this data directory, the easiest way may be to go to the menus and select File>Do and choose the do file from your data directory.

. do "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\cps_00012.do"

*Then the do file should run, and read all your data in, and add labels and so on.

. * NOTE: You need to set the Stata working directory to the path

. * where the data file is located.

.

. set more off

.

. clear

. quietly infix             ///

>   int     year     1-4    ///

>   long    serial   5-9    ///

>   float   hwtsupp  10-19  ///

>   byte    month    20-21  ///

>   float   wtsupp   22-31  ///

>   byte    age      32-33  ///

>   byte    sex      34-34  ///

>   double  inctot   35-42  ///

>   using `"cps_00012.dat"'

.

. replace hwtsupp = hwtsupp / 10000

. replace wtsupp  = wtsupp  / 10000

.

. format hwtsupp %10.4f

. format wtsupp  %10.4f

. format inctot  %8.0f

.

. label var year    `"Survey year"'

. label var serial  `"Household serial number"'

. label var hwtsupp `"Household weight, Supplement"'

. label var month   `"Month"'

. label var wtsupp  `"Supplement Weight"'

. label var age     `"Age"'

. label var sex     `"Sex"'

. label var inctot  `"Total personal income"'

.

. label define hwtsupp_lbl 0000000000 `"0000000000"'

. label values hwtsupp hwtsupp_lbl

.

. label define month_lbl 01 `"January"'

. label define month_lbl 02 `"February"', add

. label define month_lbl 03 `"March"', add

. label define month_lbl 04 `"April"', add

. label define month_lbl 05 `"May"', add

. label define month_lbl 06 `"June"', add

. label define month_lbl 07 `"July"', add

. label define month_lbl 08 `"August"', add

. label define month_lbl 09 `"September"', add

. label define month_lbl 10 `"October"', add

. label define month_lbl 11 `"November"', add

. label define month_lbl 12 `"December"', add

. label values month month_lbl

.

. label define age_lbl 00 `"Under 1 year"'

. label define age_lbl 01 `"1"', add

. label define age_lbl 02 `"2"', add

. label define age_lbl 03 `"3"', add

. label define age_lbl 04 `"4"', add

. label define age_lbl 05 `"5"', add

. label define age_lbl 06 `"6"', add

. label define age_lbl 07 `"7"', add

. label define age_lbl 08 `"8"', add

. label define age_lbl 09 `"9"', add

. label define age_lbl 10 `"10"', add

. label define age_lbl 11 `"11"', add

. label define age_lbl 12 `"12"', add

. label define age_lbl 13 `"13"', add

. label define age_lbl 14 `"14"', add

. label define age_lbl 15 `"15"', add

. label define age_lbl 16 `"16"', add

. label define age_lbl 17 `"17"', add

. label define age_lbl 18 `"18"', add

. label define age_lbl 19 `"19"', add

. label define age_lbl 20 `"20"', add

. label define age_lbl 21 `"21"', add

. label define age_lbl 22 `"22"', add

. label define age_lbl 23 `"23"', add

. label define age_lbl 24 `"24"', add

. label define age_lbl 25 `"25"', add

. label define age_lbl 26 `"26"', add

. label define age_lbl 27 `"27"', add

. label define age_lbl 28 `"28"', add

. label define age_lbl 29 `"29"', add

. label define age_lbl 30 `"30"', add

. label define age_lbl 31 `"31"', add

. label define age_lbl 32 `"32"', add

. label define age_lbl 33 `"33"', add

. label define age_lbl 34 `"34"', add

. label define age_lbl 35 `"35"', add

. label define age_lbl 36 `"36"', add

. label define age_lbl 37 `"37"', add

. label define age_lbl 38 `"38"', add

. label define age_lbl 39 `"39"', add

. label define age_lbl 40 `"40"', add

. label define age_lbl 41 `"41"', add

. label define age_lbl 42 `"42"', add

. label define age_lbl 43 `"43"', add

. label define age_lbl 44 `"44"', add

. label define age_lbl 45 `"45"', add

. label define age_lbl 46 `"46"', add

. label define age_lbl 47 `"47"', add

. label define age_lbl 48 `"48"', add

. label define age_lbl 49 `"49"', add

. label define age_lbl 50 `"50"', add

. label define age_lbl 51 `"51"', add

. label define age_lbl 52 `"52"', add

. label define age_lbl 53 `"53"', add

. label define age_lbl 54 `"54"', add

. label define age_lbl 55 `"55"', add

. label define age_lbl 56 `"56"', add

. label define age_lbl 57 `"57"', add

. label define age_lbl 58 `"58"', add

. label define age_lbl 59 `"59"', add

. label define age_lbl 60 `"60"', add

. label define age_lbl 61 `"61"', add

. label define age_lbl 62 `"62"', add

. label define age_lbl 63 `"63"', add

. label define age_lbl 64 `"64"', add

. label define age_lbl 65 `"65"', add

. label define age_lbl 66 `"66"', add

. label define age_lbl 67 `"67"', add

. label define age_lbl 68 `"68"', add

. label define age_lbl 69 `"69"', add

. label define age_lbl 70 `"70"', add

. label define age_lbl 71 `"71"', add

. label define age_lbl 72 `"72"', add

. label define age_lbl 73 `"73"', add

. label define age_lbl 74 `"74"', add

. label define age_lbl 75 `"75"', add

. label define age_lbl 76 `"76"', add

. label define age_lbl 77 `"77"', add

. label define age_lbl 78 `"78"', add

. label define age_lbl 79 `"79"', add

. label define age_lbl 80 `"80"', add

. label define age_lbl 81 `"81"', add

. label define age_lbl 82 `"82"', add

. label define age_lbl 83 `"83"', add

. label define age_lbl 84 `"84"', add

. label define age_lbl 85 `"85"', add

. label define age_lbl 86 `"86"', add

. label define age_lbl 87 `"87"', add

. label define age_lbl 88 `"88"', add

. label define age_lbl 89 `"89"', add

. label define age_lbl 90 `"90 (90+, 1988-2002)"', add

. label define age_lbl 91 `"91"', add

. label define age_lbl 92 `"92"', add

. label define age_lbl 93 `"93"', add

. label define age_lbl 94 `"94"', add

. label define age_lbl 95 `"95"', add

. label define age_lbl 96 `"96"', add

. label define age_lbl 97 `"97"', add

. label define age_lbl 98 `"98"', add

. label define age_lbl 99 `"99+"', add

. label values age age_lbl

.

. label define sex_lbl 1 `"Male"'

. label define sex_lbl 2 `"Female"', add

. label define sex_lbl 9 `"NIU"', add

. label values sex sex_lbl

.

. label define inctot_lbl 00999997 `"00999997"'

. label define inctot_lbl 99999997 `"99999997"', add

. label define inctot_lbl 99999999 `"99999999"', add

. label values inctot inctot_lbl

.

.

.

end of do-file

. save "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta", replace

file C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta saved

* Then, by all means, do File>Save because you have a brand new Stata file and you don’t want to have to create it again.

* Also note, if you execute a command that is taking too long or is not what you really wanted, you can interrupt it by hitting the break button (looks like a red stop sign with an X through it) in Stata.

. tabulate age

Age |      Freq.     Percent        Cum.

--------------------+-----------------------------------

Under 1 year |      2,029        1.36        1.36

1 |      2,249        1.50        2.86

2 |      2,400        1.60        4.46

3 |      2,384        1.59        6.06

4 |      2,527        1.69        7.74

5 |      2,500        1.67        9.42

6 |      2,403        1.61       11.02

7 |      2,416        1.61       12.64

8 |      2,371        1.58       14.22

9 |      2,358        1.58       15.80

10 |      2,303        1.54       17.33

11 |      2,370        1.58       18.92

12 |      2,342        1.57       20.48

13 |      2,306        1.54       22.02

14 |      2,283        1.53       23.55

15 |      2,237        1.49       25.05

16 |      2,154        1.44       26.48

17 |      2,115        1.41       27.90

18 |      1,962        1.31       29.21

19 |      1,789        1.20       30.40

20 |      1,799        1.20       31.61

21 |      1,775        1.19       32.79

22 |      1,876        1.25       34.05

--Break--

r(1);

. log close

name:  <unnamed>