--------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013

> _381_logs\class2.log

  log type:  text

 opened on:  26 Sep 2013, 14:06:30

 

*Going back to last class, I had a ttest comparing the educational attainment of young women to young men.

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* I said that the t-statistic of -5.7164 was enormously far from zero, but how far is it? How likely would we be to get a value this far away from zero just by chance? Answer:

 

. display ttail(18536, 5.716)

5.537e-09

 

* If we are doing a 2 tail test, which is more appropriate, we end up with a P value of about 1 in 100,000,000, which is a really small probability. That means that we are sure that this data did not come from a sampling frame (i.e. the whole US) in which men and women had equal educational attainments.

 

. display 2*ttail(18536, 5.716)

1.107e-08

 

. summarize age

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

         age |    133710    35.17964    22.21722          0         90

 

* Do you notice that the top age is 90? Does that make sense in a population of 133,710 people? No. The reason is that the variable age is topcoded to protect the identities of age outliers. See the ipums.org documentation.

 

. summarize incwelfr

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    103226    40.62242    478.8231          0      25000

 

* The average of welfare income over all respondents is $40.62, because most respondents received zero welfare income.

 

. summarize incwelfr if age>=15 & incwelfr>0

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |      1289    3253.134    2813.505          1      25000

 

* If we want to know what the average welfare income is for people who actually receive welfare, we get a much smaller sample and a more reasonable average.

 

. summarize incwelfr if age>=15 & incwelfr>0, detail

 

             Welfare (public assistance) income

-------------------------------------------------------------

      Percentiles      Smallest

 1%           26              1

 5%          214              1

10%          450              1       Obs                1289

25%         1026              1       Sum of Wgt.        1289

 

50%         2664                      Mean           3253.134

                        Largest       Std. Dev.      2813.505

75%         4668          15600

90%         7000          19999       Variance        7915809

95%         8400          23292       Skewness        1.79416

99%        12648          25000       Kurtosis       9.428488

 

* If you want to know the median, ask for summarize, detail

 

* Let’s create a new variable that is dichotomous for whether the respondent received welfare or not.

 

. gen byte receives_welfare=0

 

. replace receives_welfare=1 if incwelfr>0 & incwelfr~=.

(1289 real changes made)

 

* set the new variable equal to 1 if the welfare income is greater than zero.

 

 

. label define receives_welfare_lbl 0 "no" 1 "yes"

 

. label val receives_welfare receives_welfare_lbl

 

* created a label associating 0 with “no” and 1 with “yes”, then we attached that label to the variable receives_welfare.

 

. label var receives_welfare "does respondent receive welfare"

 

. tabulate receives_welfare [fweight= perwt_rounded] if age>=15

 

       does |

 respondent |

    receive |

    welfare |      Freq.     Percent        Cum.

------------+-----------------------------------

         no |211,222,605       98.81       98.81

        yes |  2,551,246        1.19      100.00

------------+-----------------------------------

      Total |213,773,851      100.00

 

* In the US, among people age 15 or older (the people who are in the universe for the question about welfare income), there were 2.5 million welfare recipients, or 1.19% of the population.

 

. tabulate receives_welfare [pweight= perwt_rounded] if age>=15

pweight not allowed

 

* Someone asked me why I used fweight and not pweight. One reason is that pweight is not allowed for tabulate. The second reason is that fweight, or frequency weight is kind of weight I want to apply to this table, so that the table reflects the frequencies in the US population.

 

* And here is a table using the new “receives_welfare” variable to show(in the third of 3 statistics in each cell) the proportion of individuals in that cell who receive welfare:

 

. table  educrec sex if age>20  [fweight= perwt_rounded], contents(freq mean  incwelfr mean  receives_welfare) row col

 

---------------------------------------------------------------

Educational attainment  |                  Sex                

recode                  |        Male       Female        Total

------------------------+--------------------------------------

      None or preschool |     409,822      463,962      873,784

                        |           0  201.8166229  107.1606301

                        |           0       .04848      .025742

                        |

   Grades 1, 2, 3, or 4 |     988,458      959,869      1948327

                        |  21.2051377  186.4335592  102.6070993

                        |     .011155      .039831      .025283

                        |

   Grades 5, 6, 7, or 8 |     4792742      5028804      9821546

                        | 10.72959028  119.6578288  66.50276097

                        |     .005356      .032857      .019437

                        |

                Grade 9 |     1926372      2028431      3954803

                        | 20.88420617  134.0259969  78.91498944

                        |     .007086      .046607      .027357

                        |

               Grade 10 |     2498378      2892776      5391154

                        | 22.49344775   214.192737  125.3551177

                        |     .008675        .0635      .038093

                        |

               Grade 11 |     2607008      3013104      5620112

                        | 23.15243145  216.6690639  126.9022747

                        |     .007129      .073434      .042677

                        |

               Grade 12 |    3.01e+07     3.47e+07     6.48e+07

                        | 11.72673341  67.85343211   41.8001018

                        |     .003832      .021749      .013432

                        |

1 to 3 years of college |    2.35e+07     2.70e+07     5.05e+07

                        | 7.269855825  44.67187372  27.25585651

                        |     .002034       .01304      .007915

                        |

    4+ years of college |    2.40e+07     2.28e+07     4.68e+07

                        | .3599692853   5.49143018  2.858781299

                        |     .000103      .002347      .001196

                        |

                  Total |    9.09e+07     9.89e+07     1.90e+08

                        | 8.382322858  61.73025686  36.18842854

                        |      .00282       .01907       .01129

---------------------------------------------------------------

 

. clear all

 

* After you have downloaded the *.gz data file and the *.do file, and after you have unzipped the *.gz data file and put the resulting data file in the same folder, then you have two steps left to ingest the data:

 

 

* first, copy the folder directory that the files are in and set the home directory of stata to that directory, using the cd command, thus (note the double quotes):

 

. cd "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data"

C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data

 

* Then invoke the do-file from this data directory, the easiest way may be to go to the menus and select File>Do and choose the do file from your data directory.

 

. do "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\cps_00012.do"

 

*Then the do file should run, and read all your data in, and add labels and so on.

 

. * NOTE: You need to set the Stata working directory to the path

. * where the data file is located.

.

. set more off

 

.

. clear

 

. quietly infix             ///

>   int     year     1-4    ///

>   long    serial   5-9    ///

>   float   hwtsupp  10-19  ///

>   byte    month    20-21  ///

>   float   wtsupp   22-31  ///

>   byte    age      32-33  ///

>   byte    sex      34-34  ///

>   double  inctot   35-42  ///

>   using `"cps_00012.dat"'

 

.

. replace hwtsupp = hwtsupp / 10000

(149642 real changes made)

 

. replace wtsupp  = wtsupp  / 10000

(149642 real changes made)

 

.

. format hwtsupp %10.4f

 

. format wtsupp  %10.4f

 

. format inctot  %8.0f

 

.

. label var year    `"Survey year"'

 

. label var serial  `"Household serial number"'

 

. label var hwtsupp `"Household weight, Supplement"'

 

. label var month   `"Month"'

 

. label var wtsupp  `"Supplement Weight"'

 

. label var age     `"Age"'

 

. label var sex     `"Sex"'

 

. label var inctot  `"Total personal income"'

 

.

. label define hwtsupp_lbl 0000000000 `"0000000000"'

 

. label values hwtsupp hwtsupp_lbl

 

.

. label define month_lbl 01 `"January"'

 

. label define month_lbl 02 `"February"', add

 

. label define month_lbl 03 `"March"', add

 

. label define month_lbl 04 `"April"', add

 

. label define month_lbl 05 `"May"', add

 

. label define month_lbl 06 `"June"', add

 

. label define month_lbl 07 `"July"', add

 

. label define month_lbl 08 `"August"', add

 

. label define month_lbl 09 `"September"', add

 

. label define month_lbl 10 `"October"', add

 

. label define month_lbl 11 `"November"', add

 

. label define month_lbl 12 `"December"', add

 

. label values month month_lbl

 

.

. label define age_lbl 00 `"Under 1 year"'

 

. label define age_lbl 01 `"1"', add

 

. label define age_lbl 02 `"2"', add

 

. label define age_lbl 03 `"3"', add

 

. label define age_lbl 04 `"4"', add

 

. label define age_lbl 05 `"5"', add

 

. label define age_lbl 06 `"6"', add

 

. label define age_lbl 07 `"7"', add

 

. label define age_lbl 08 `"8"', add

 

. label define age_lbl 09 `"9"', add

 

. label define age_lbl 10 `"10"', add

 

. label define age_lbl 11 `"11"', add

 

. label define age_lbl 12 `"12"', add

 

. label define age_lbl 13 `"13"', add

 

. label define age_lbl 14 `"14"', add

 

. label define age_lbl 15 `"15"', add

 

. label define age_lbl 16 `"16"', add

 

. label define age_lbl 17 `"17"', add

 

. label define age_lbl 18 `"18"', add

 

. label define age_lbl 19 `"19"', add

 

. label define age_lbl 20 `"20"', add

 

. label define age_lbl 21 `"21"', add

 

. label define age_lbl 22 `"22"', add

 

. label define age_lbl 23 `"23"', add

 

. label define age_lbl 24 `"24"', add

 

. label define age_lbl 25 `"25"', add

 

. label define age_lbl 26 `"26"', add

 

. label define age_lbl 27 `"27"', add

 

. label define age_lbl 28 `"28"', add

 

. label define age_lbl 29 `"29"', add

 

. label define age_lbl 30 `"30"', add

 

. label define age_lbl 31 `"31"', add

 

. label define age_lbl 32 `"32"', add

 

. label define age_lbl 33 `"33"', add

 

. label define age_lbl 34 `"34"', add

 

. label define age_lbl 35 `"35"', add

 

. label define age_lbl 36 `"36"', add

 

. label define age_lbl 37 `"37"', add

 

. label define age_lbl 38 `"38"', add

 

. label define age_lbl 39 `"39"', add

 

. label define age_lbl 40 `"40"', add

 

. label define age_lbl 41 `"41"', add

 

. label define age_lbl 42 `"42"', add

 

. label define age_lbl 43 `"43"', add

 

. label define age_lbl 44 `"44"', add

 

. label define age_lbl 45 `"45"', add

 

. label define age_lbl 46 `"46"', add

 

. label define age_lbl 47 `"47"', add

 

. label define age_lbl 48 `"48"', add

 

. label define age_lbl 49 `"49"', add

 

. label define age_lbl 50 `"50"', add

 

. label define age_lbl 51 `"51"', add

 

. label define age_lbl 52 `"52"', add

 

. label define age_lbl 53 `"53"', add

 

. label define age_lbl 54 `"54"', add

 

. label define age_lbl 55 `"55"', add

 

. label define age_lbl 56 `"56"', add

 

. label define age_lbl 57 `"57"', add

 

. label define age_lbl 58 `"58"', add

 

. label define age_lbl 59 `"59"', add

 

. label define age_lbl 60 `"60"', add

 

. label define age_lbl 61 `"61"', add

 

. label define age_lbl 62 `"62"', add

 

. label define age_lbl 63 `"63"', add

 

. label define age_lbl 64 `"64"', add

 

. label define age_lbl 65 `"65"', add

 

. label define age_lbl 66 `"66"', add

 

. label define age_lbl 67 `"67"', add

 

. label define age_lbl 68 `"68"', add

 

. label define age_lbl 69 `"69"', add

 

. label define age_lbl 70 `"70"', add

 

. label define age_lbl 71 `"71"', add

 

. label define age_lbl 72 `"72"', add

 

. label define age_lbl 73 `"73"', add

 

. label define age_lbl 74 `"74"', add

 

. label define age_lbl 75 `"75"', add

 

. label define age_lbl 76 `"76"', add

 

. label define age_lbl 77 `"77"', add

 

. label define age_lbl 78 `"78"', add

 

. label define age_lbl 79 `"79"', add

 

. label define age_lbl 80 `"80"', add

 

. label define age_lbl 81 `"81"', add

 

. label define age_lbl 82 `"82"', add

 

. label define age_lbl 83 `"83"', add

 

. label define age_lbl 84 `"84"', add

 

. label define age_lbl 85 `"85"', add

 

. label define age_lbl 86 `"86"', add

 

. label define age_lbl 87 `"87"', add

 

. label define age_lbl 88 `"88"', add

 

. label define age_lbl 89 `"89"', add

 

. label define age_lbl 90 `"90 (90+, 1988-2002)"', add

 

. label define age_lbl 91 `"91"', add

 

. label define age_lbl 92 `"92"', add

 

. label define age_lbl 93 `"93"', add

 

. label define age_lbl 94 `"94"', add

 

. label define age_lbl 95 `"95"', add

 

. label define age_lbl 96 `"96"', add

 

. label define age_lbl 97 `"97"', add

 

. label define age_lbl 98 `"98"', add

 

. label define age_lbl 99 `"99+"', add

 

. label values age age_lbl

 

.

. label define sex_lbl 1 `"Male"'

 

. label define sex_lbl 2 `"Female"', add

 

. label define sex_lbl 9 `"NIU"', add

 

. label values sex sex_lbl

 

.

. label define inctot_lbl 00999997 `"00999997"'

 

. label define inctot_lbl 99999997 `"99999997"', add

 

. label define inctot_lbl 99999999 `"99999999"', add

 

. label values inctot inctot_lbl

 

.

.

.

end of do-file

 

. save "C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta", replace

file C:\Users\Michael\Documents\current class files\intro soc methods\newer 1995 CPS HW1 data\1995 March CPS.dta saved

 

* Then, by all means, do File>Save because you have a brand new Stata file and you don’t want to have to create it again.

 

* Also note, if you execute a command that is taking too long or is not what you really wanted, you can interrupt it by hitting the break button (looks like a red stop sign with an X through it) in Stata.

 

. tabulate age

 

                Age |      Freq.     Percent        Cum.

--------------------+-----------------------------------

       Under 1 year |      2,029        1.36        1.36

                  1 |      2,249        1.50        2.86

                  2 |      2,400        1.60        4.46

                  3 |      2,384        1.59        6.06

                  4 |      2,527        1.69        7.74

                  5 |      2,500        1.67        9.42

                  6 |      2,403        1.61       11.02

                  7 |      2,416        1.61       12.64

                  8 |      2,371        1.58       14.22

                  9 |      2,358        1.58       15.80

                 10 |      2,303        1.54       17.33

                 11 |      2,370        1.58       18.92

                 12 |      2,342        1.57       20.48

                 13 |      2,306        1.54       22.02

                 14 |      2,283        1.53       23.55

                 15 |      2,237        1.49       25.05

                 16 |      2,154        1.44       26.48

                 17 |      2,115        1.41       27.90

                 18 |      1,962        1.31       29.21

                 19 |      1,789        1.20       30.40

                 20 |      1,799        1.20       31.61

                 21 |      1,775        1.19       32.79

                 22 |      1,876        1.25       34.05

--Break--

r(1);

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\f

> all_2013_381_logs\class2.log

  log type:  text

 closed on:  26 Sep 2013, 16:00:54

------------------------------------------------------------------------