Class begins here

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*I said at the end of last class that the tail probability associated with a T-statistic of -5.7 was tiny, here is the actual cumulative tail probability:

. display ttail(18536, 5.7164)

5.524e-09

* Notice in the above command that I used a positive 5.7 value, that is because Stata’s ttail command is designed to give the right-hand, or upper tail cumulative probability. This is the same as 1- the left hand tail probability:

. display 1- ttail(18536, -5.7164)

5.524e-09

* Usually we take the tail probability and multiply by two, which is chance that we would get a value this high (for the difference between women’s and men’s education) if the true value of the difference were zero:

. display 2*ttail(18536, 5.7164)

1.105e-08

* Since this two-tailed test yields a tiny probability of 1 in 100,000,000, we can reject the null hypothesis that young men and young women in the US have the same educational attainment.

. summarize incwelfr

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |    103226    40.62242    478.8231          0      25000

. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=.

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |      1289    3253.134    2813.505          1      25000

. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=. [fweight= perwt_rounded]

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |   2551246    3072.095    2803.442          1      25000

* Use the “if” to specify subsets of the data that are relevant to the question. Use fweight to get number of observations in the US. There are 1289 adults I the CPS who reported positive welfare income in 1999; this corresponds to 2.55 million adults in the US.

* Now, generate a new variable and add value and variable labels.

. gen byte receives_welfare=0

. replace receives_welfare=1 if incwelfr>0 & incwelfr~=.

(1289 real changes made)

lfare |      Freq.     Percent        Cum.

------------+-----------------------------------

0 |    132,421       99.04       99.04

1 |      1,289        0.96      100.00

------------+-----------------------------------

Total |    133,710      100.00

. label define receives_welfare_lbl 0 "no" 1 "yes"

lfare |      Freq.     Percent        Cum.

------------+-----------------------------------

no |    132,421       99.04       99.04

yes |      1,289        0.96      100.00

------------+-----------------------------------

Total |    133,710      100.00

. label var receives_welfare "does respondent receive welfare"

does |

respondent |

welfare |      Freq.     Percent        Cum.

------------+-----------------------------------

no |    132,421       99.04       99.04

yes |      1,289        0.96      100.00

------------+-----------------------------------

Total |    133,710      100.00

* One thing you never want to do is tabulate a continuous variable. The result is output like the phone book. Hit the interrupt button.

. tabulate incwage

Wage and |

salary |

income |      Freq.     Percent        Cum.

------------+-----------------------------------

0 |     35,825       34.71       34.71

1 |          7        0.01       34.71

5 |         15        0.01       34.73

7 |          1        0.00       34.73

8 |          1        0.00       34.73

10 |          1        0.00       34.73

12 |          2        0.00       34.73

18 |          1        0.00       34.73

20 |         10        0.01       34.74

21 |          2        0.00       34.74

28 |          2        0.00       34.75

30 |          5        0.00       34.75

31 |          1        0.00       34.75

34 |          4        0.00       34.76

35 |          5        0.00       34.76

36 |          1        0.00       34.76

40 |          8        0.01       34.77

44 |          1        0.00       34.77

45 |          4        0.00       34.77

46 |          3        0.00       34.78

--Break--

r(1);

. table  educrec sex if age>20  [fweight= perwt_rounded], contents(freq mean  incwelfr mean  receives_welfare) row col

---------------------------------------------------------------

Educational attainment  |                  Sex

recode                  |        Male       Female        Total

------------------------+--------------------------------------

None or preschool |     409,822      463,962      873,784

|           0  201.8166229  107.1606301

|           0       .04848      .025742

|

Grades 1, 2, 3, or 4 |     988,458      959,869      1948327

|  21.2051377  186.4335592  102.6070993

|     .011155      .039831      .025283

|

Grades 5, 6, 7, or 8 |     4792742      5028804      9821546

| 10.72959028  119.6578288  66.50276097

|     .005356      .032857      .019437

|

Grade 9 |     1926372      2028431      3954803

| 20.88420617  134.0259969  78.91498944

|     .007086      .046607      .027357

|

Grade 10 |     2498378      2892776      5391154

| 22.49344775   214.192737  125.3551177

|     .008675        .0635      .038093

|

Grade 11 |     2607008      3013104      5620112

| 23.15243145  216.6690639  126.9022747

|     .007129      .073434      .042677

|

Grade 12 |    3.01e+07     3.47e+07     6.48e+07

| 11.72673341  67.85343211   41.8001018

|     .003832      .021749      .013432

|

1 to 3 years of college |    2.35e+07     2.70e+07     5.05e+07

| 7.269855825  44.67187372  27.25585651

|     .002034       .01304      .007915

|

4+ years of college |    2.40e+07     2.28e+07     4.68e+07

| .3599692853   5.49143018  2.858781299

|     .000103      .002347      .001196

|

Total |    9.09e+07     9.89e+07     1.90e+08

| 8.382322858  61.73025686  36.18842854

|      .00282       .01907       .01129

---------------------------------------------------------------

* Table is a useful command for creating tables of statistics (in this case the proportion who receive welfare) by other variables (in this case, education and gender).

. clear all

Now on to data ingestion. After you have downloaded and unzipped the data file, and downloaded the stata command file (with .do extension), you need to take the directory path of the directory with the data and do file, and make that the working directory for stata

. cd "C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again"

C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again

Then, (this is easiest using the File>do command in the stata menu system), you run the do file.

. do "C:\Users\Michael\Documents\current class files\intro soc methods\2005 data again\cps_00010.do"

. * NOTE: You need to set the Stata working directory to the path

. * where the data file is located.

.

. set more off

.

. clear

. quietly infix             ///

>   int     year     1-4    ///

>   long    serial   5-9    ///

>   float   hwtsupp  10-19  ///

>   byte    month    20-21  ///

>   float   wtsupp   22-31  ///

>   float   wtfinl   32-41  ///

>   byte    age      42-43  ///

>   byte    sex      44-44  ///

>   double  inctot   45-52  ///

>   using `"cps_00010.dat"'

.

. replace hwtsupp = hwtsupp / 10000

(210648 real changes made)

. replace wtsupp  = wtsupp  / 10000

(210648 real changes made)

. replace wtfinl  = wtfinl  / 10000

(0 real changes made)

.

. format hwtsupp %10.4f

. format wtsupp  %10.4f

. format wtfinl  %10.4f

. format inctot  %8.0f

.

. label var year    `"Survey year"'

. label var serial  `"Household serial number"'

. label var hwtsupp `"Household weight, Supplement"'

. label var month   `"Month"'

. label var wtsupp  `"Supplement Weight"'

. label var wtfinl  `"Final Basic Weight"'

. label var age     `"Age"'

. label var sex     `"Sex"'

. label var inctot  `"Total personal income"'

.

Don't forget to save your new stata file!

