* Class starts here. Always open a Stata log at the beginning of every work session.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* The T-statistic for the difference between men’s education and women’s education (in the 25-34 age group) is -5.7. What probability is associated with a t-statistic of -5.7? The answer is, as is shown below, about 5 parts in a billion. If we double it, to get the probability in both tails, we end up with 1 in 100 million i.e., 1.05 x 10-8. As the course goes on, we will endeavor to explain this in more detail.

. display 1-ttail(18536,-5.7164)

5.524e-09

. display ttail(18536,5.7164)

5.524e-09

. display 2*ttail(18536,5.7164)

1.105e-08

. summarize incwelfr

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |    103226    40.62242    478.8231          0      25000

. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=.

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |      1289    3253.134    2813.505          1      25000

* There are 1289 welfare recipients in the CPS.

. summarize incwelfr if age>=15 & incwelfr>0 & incwelfr~=. [fweight=perwt_rounded]

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

incwelfr |   2551246    3072.095    2803.442          1      25000

*Applying the weights, we see that there are 2.5 million welfare recipients in the US in March, 2000.

. display 2551246*3072

7.837e+09

* The display command is an in-line calculator. Multiplying the number of welfare recipients by the average 1999 welfare income yields \$7.8 billion in total welfare expenditures. Does that sound like a lot? It is only \$40 per US adult.

* Now on to the syntax for creating new variables. Use the generate command, or gen for short:

. replace receives_welfare =1 if incwelfr>0 & incwelfr~=.

* This next command generates a label that associates the value 0 with the text “no welfare” and the value 1 with the text “receives welfare”

* This next command associates the above defined value label with the variable receives_welfare. And at this point, if you wanted to save the newly created variable with the rest of your dataset, it would be good to

-----------------+-----------------------------------

no welfare |271,536,575       99.07       99.07

receives welfare |  2,551,246        0.93      100.00

-----------------+-----------------------------------

Total |274,087,821      100.00

. summarize perwt_rounded

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

perwt_roun~d |    133710    2049.868    1083.244         93      14281

* The average weight is 2049.

* Not only can the values get labels, but the variable itself can get a label (note label “var” here compared to label “val” above)

. summarize perwt_rounded, detail

integer perwt, negative values recoded to 0

-------------------------------------------------------------

Percentiles      Smallest

1%          284             93

5%          428             93

10%          603             93       Obs              133710

25%         1188             96       Sum of Wgt.      133710

50%         2049                      Mean           2049.868

Largest       Std. Dev.      1083.244

75%         2649          11824

90%         3534          12547       Variance        1173417

95%         3967          12905       Skewness       .6144906

99%         4893          14281       Kurtosis       4.006292

. table receives_welfare sex [fweight= perwt_rounded] , contents(freq mean age mean yrsed mean incwage) row col

--------------------------------------------------------

did respondent   |

in 1999          |        Male       Female        Total

-----------------+--------------------------------------

no welfare |    1.34e+08     1.38e+08     2.72e+08

|        34.2         36.4         35.3

|    12.92792     12.90996      12.9187

| 26619.92881  14124.35177  20203.23216

|

receives welfare |     357,702      2193544      2551246

|        34.8         32.8         33.1

|    10.75763     11.14463     11.09037

| 4196.737659  3577.073717  3663.954806

|

Total |    1.34e+08     1.40e+08     2.74e+08

|        34.2         36.3         35.3

|    12.92039     12.87497     12.89688

| 26542.14272  13915.27974  20005.84709

--------------------------------------------------------

. codebook sex

--------------------------------------------------------------------------------------------------

sex                                                                                            Sex

--------------------------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

. codebook race

--------------------------------------------------------------------------------------------------

race                                                                                          Race

--------------------------------------------------------------------------------------------------

type:  numeric (int)

label:  racelbl

range:  [100,650]                    units:  10

unique values:  4                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

1.1e+05      100  White

13626       200  Black/Negro

1894       300  American Indian/Aleut/Eskimo

4715       650  Asian or Pacific Islander

. table receives_welfare sex, contents(freq mean incwelfr) row col

--------------------------------------------

did respondent   |

in 1999          |    Male   Female    Total

-----------------+--------------------------

no welfare |  64,603   67,818  132,421

|       0        0        0

|

receives welfare |     188    1,101    1,289

|    2980     3300     3253

|

Total |  64,791   68,919  133,710

|      11       67       41

--------------------------------------------

* A key variable for HW 1:

. tabulate citizen

Citizenship status |      Freq.     Percent        Cum.

--------------------------------+-----------------------------------

NIU |    117,310       87.73       87.73

Born abroad of American parents |        976        0.73       88.46

Naturalized citizen |      5,348        4.00       92.46

Not a citizen |     10,076        7.54      100.00

--------------------------------+-----------------------------------

Total |    133,710      100.00

. codebook citizen

--------------------------------------------------------------------------------------------------

citizen                                                                         Citizenship status

--------------------------------------------------------------------------------------------------

type:  numeric (byte)

label:  citizenlbl

range:  [0,3]                        units:  1

unique values:  4                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

1.2e+05        0  NIU

976         1  Born abroad of American parents

5348         2  Naturalized citizen

10076         3  Not a citizen

* The people who are NIU in the variable citizen, where were they born? Answer: US.

. tabulate bpl if citizen==0

Birthplace |      Freq.     Percent        Cum.

----------------------------+-----------------------------------

United States, n.s. |    116,213       99.06       99.06

Puerto Rico |        950        0.81       99.87

U.S. outlying areas, n.s. |        140        0.12       99.99

Mexico |          4        0.00      100.00

El Salvador |          3        0.00      100.00

----------------------------+-----------------------------------

Total |    117,310      100.00

. log close

name:  <unnamed>