-----------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pag

> es\soc_meth_proj3\2010_logs\fifth class.log

  log type:  text

 opened on:   9 Feb 2010, 14:05:46

 

. tabulate vetlast

 

     Veteran's most recent |

         period of service |      Freq.     Percent        Cum.

---------------------------+-----------------------------------

                       NIU |     30,904       23.11       23.11

                No service |     91,149       68.17       91.28

              World War II |      2,428        1.82       93.10

                Korean War |      1,716        1.28       94.38

               Vietnam Era |      3,683        2.75       97.14

             Other service |      3,830        2.86      100.00

---------------------------+-----------------------------------

                     Total |    133,710      100.00

 

. table vetlast if sex==1, contents(freq p25 age mean age p75 age)

 

------------------------------------------------------------------------

Veteran's     |

most recent   |

period of     |

service       |              Freq.           p25(age)          mean(age)

--------------+---------------------------------------------------------

          NIU |             15,810                  4 7.7223911285400391

   No service |             37,926                 25 38.128620147705078

 World War II |              2,339                 74 77.200088500976563

   Korean War |              1,681                 66 67.854255676269531

  Vietnam Era |              3,584                 49 52.687778472900391

Other service |              3,451                 35 45.978267669677734

------------------------------------------------------------------------

 

----------------------------------

Veteran's     |

most recent   |

period of     |

service       |           p75(age)

--------------+-------------------

          NIU |                 11

   No service |                 48

 World War II |                 80

   Korean War |                 70

  Vietnam Era |                 55

Other service |                 60

----------------------------------

 

. table vetlast if sex==1 & age>65 & age<71, contents(freq mean inctot)

 

------------------------------------------

Veteran's     |

most recent   |

period of     |

service       |        Freq.  mean(inctot)

--------------+---------------------------

   No service |          805   27938.16646

 World War II |           38   17378.52632

   Korean War |        1,037   32267.98457

  Vietnam Era |           47   37251.76596

Other service |          141   42317.05674

------------------------------------------

*One question that students struggled a bit with in HW1 (it was not part of the grade) was whether there were enough vets and non-vets of the same age to make a sensible or statistically sound comparison. Most students said "no," but I want to suggest that the answer is "yes."

 

 

 

. codebook vetlast

 

---------------------------------------------------------------------------------

vetlast                                   Veteran's most recent period of service

---------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  vetlastlbl

 

                 range:  [0,9]                        units:  1

         unique values:  6                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                         30904         0  NIU

                         91149         1  No service

                          2428         4  World War II

                          1716         6  Korean War

                          3683         8  Vietnam Era

                          3830         9  Other service

 

. *I am going to generate a dummy variable that contrasts Korean war vets with non-service persons. That is, I am going to make a home-made dummy variable, which is going to be useful for just the contrast between Korean vets and non vets.

 

. gen Korean_vet=0 if vetlast==1

(42561 missing values generated)

 

. replace Korean_vet=1 if vetlast==6

(1716 real changes made)

 

. tabulate vetlast Korean_vet, missing

 

Veteran's most recent |            Korean_vet

    period of service |         0          1          . |     Total

----------------------+---------------------------------+----------

                  NIU |         0          0     30,904 |    30,904

           No service |    91,149          0          0 |    91,149

         World War II |         0          0      2,428 |     2,428

           Korean War |         0      1,716          0 |     1,716

          Vietnam Era |         0          0      3,683 |     3,683

        Other service |         0          0      3,830 |     3,830

----------------------+---------------------------------+----------

                Total |    91,149      1,716     40,845 |   133,710

 

 

. display 32267-27938

4329

 

. *There is an income difference of 4329 between the Korean vets and the same age male non-vets

 

. regress inctot  Korean_vet if sex==1 & age>65 & age<71

 

      Source |       SS       df       MS              Number of obs =    1842

-------------+------------------------------           F(  1,  1840) =    8.52

       Model |  8.4962e+09     1  8.4962e+09           Prob > F      =  0.0035

    Residual |  1.8342e+12  1840   996824336           R-squared     =  0.0046

-------------+------------------------------           Adj R-squared =  0.0041

       Total |  1.8427e+12  1841  1.0009e+09           Root MSE      =   31573

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

  Korean_vet |   4329.818   1483.088     2.92   0.004     1421.106     7238.53

       _cons |   27938.17   1112.785    25.11   0.000     25755.71    30120.62

------------------------------------------------------------------------------

 

. *The t-statistic indicates a significant difference between Korean vets and non vets of the same age (see the T-test I perform below which yields exactly the same T statistic).

 

. * the coefficient of 4329 equals the difference between groups

 

* The R-square is the proportion of the overall variance in inctot explained by our predictor variables (in this case our predictor is only Korean_vet, which explains 0.0046, or less than 1% of the variance in inctot.

 

. *stata has several ways to generate dummy variables. One way is the xi command.

 

. xi i.vetlast

i.vetlast         _Ivetlast_0-9       (naturally coded; _Ivetlast_0 omitted)

 

*by itself, the xi command generates a set of dummy variables, one for every level of the categorical variable, with one category (generally the first category) excluded and used as the comparison group. Each dummy variable is a zero-one contrast, like the dummy variables we made by hand.

 

. tabulate vetlast  _Ivetlast_6

 

Veteran's most recent |      vetlast==6

    period of service |         0          1 |     Total

----------------------+----------------------+----------

                  NIU |    30,904          0 |    30,904

           No service |    91,149          0 |    91,149

         World War II |     2,428          0 |     2,428

           Korean War |         0      1,716 |     1,716

          Vietnam Era |     3,683          0 |     3,683

        Other service |     3,830          0 |     3,830

----------------------+----------------------+----------

                Total |   131,994      1,716 |   133,710

 

*More usually, we will use xi and regress together, with the i.vetlast telling stata to use xi to generate dummy variables for each level of vetlast.

 

. xi: regress inctot i.vetlast if sex==1 & age>65 & age<71

i.vetlast         _Ivetlast_0-9       (naturally coded; _Ivetlast_0 omitted)

note: _Ivetlast_8 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =    2068

-------------+------------------------------           F(  4,  2063) =    8.34

       Model |  3.6137e+10     4  9.0341e+09           Prob > F      =  0.0000

    Residual |  2.2337e+12  2063  1.0828e+09           R-squared     =  0.0159

-------------+------------------------------           Adj R-squared =  0.0140

       Total |  2.2699e+12  2067  1.0982e+09           Root MSE      =   32905

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

 _Ivetlast_1 |  -9313.599   4937.879    -1.89   0.059    -18997.35    370.1461

 _Ivetlast_4 |  -19873.24   7178.542    -2.77   0.006    -33951.18   -5795.297

 _Ivetlast_6 |  -4983.781   4907.314    -1.02   0.310    -14607.59    4640.023

 _Ivetlast_8 |  (omitted)

 _Ivetlast_9 |   5065.291   5542.273     0.91   0.361    -5803.742    15934.32

       _cons |   37251.77   4799.749     7.76   0.000     27838.91    46664.62

------------------------------------------------------------------------------

 

*After running the regression, take a look at your variable window- a bunch of new variables appear starting with _I

*The contrast we really want is between _Ivetlast_6 (value 6 corresponds to Korean war vets) and _Ivetlast1 (corresponding with non-vets). We can recover that comparison with lincom, giving us the linear combination that we want. One important thing to remember is that the omitted category is arbitrary, and often the omitted category won't be the one we really want to compare to. In this case the omitted category is vetlast=0, or the NIU respondents. So we create the contrast we want with lincom.

 

. lincom  _Ivetlast_6- _Ivetlast_1

 

 ( 1)  - _Ivetlast_1 + _Ivetlast_6 = 0

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   4329.818   1545.699     2.80   0.005     1298.525    7361.111

------------------------------------------------------------------------------

 

* Here we recover the Korean vet versus non vet comparison, with the same mean but the T statistic is slightly different because the presence of all the other folks (vetlast==2, ==3, etc) shape the common variance and effect the t statistic.

 

. char vetlast [omit] 1

 

*the above command tells stata to use vetlast=1, i.e. non-vets as the default omitted value, which makes more sense for us.

 

. xi: regress inctot i.vetlast if sex==1 & age>65 & age<71

i.vetlast         _Ivetlast_0-9       (naturally coded; _Ivetlast_1 omitted)

note: _Ivetlast_0 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =    2068

-------------+------------------------------           F(  4,  2063) =    8.34

       Model |  3.6137e+10     4  9.0341e+09           Prob > F      =  0.0000

    Residual |  2.2337e+12  2063  1.0828e+09           R-squared     =  0.0159

-------------+------------------------------           Adj R-squared =  0.0140

       Total |  2.2699e+12  2067  1.0982e+09           Root MSE      =   32905

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

 _Ivetlast_0 |  (omitted)

 _Ivetlast_4 |  -10559.64   5462.501    -1.93   0.053    -21272.23    152.9501

 _Ivetlast_6 |   4329.818   1545.699     2.80   0.005     1298.525    7361.111

 _Ivetlast_8 |   9313.599   4937.879     1.89   0.059    -370.1461    18997.35

 _Ivetlast_9 |   14378.89   3004.039     4.79   0.000     8487.626    20270.15

       _cons |   27938.17   1159.764    24.09   0.000     25663.74     30212.6

------------------------------------------------------------------------------

 

. *the char command set the default omitted value for vetlast to 1, which is the no service value, which is what we wanted.

 

 

* If we go back to looking only at two groups, Korean vet versus non vets of the same age (remember this particular dummy variable has missing values for all the other levels of vetlast), we get the same coefficient and t-statistic as our first regression above.

 

. ttest inctot if sex==1 & age>65 & age<71, by( Korean_vet)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

       0 |     805    27938.17    1229.848    34893.89    25524.07    30352.26

       1 |    1037    32267.98    892.2152    28731.55    30517.23    34018.74

---------+--------------------------------------------------------------------

combined |    1842    30375.75    737.1402    31636.97    28930.03    31821.46

---------+--------------------------------------------------------------------

    diff |           -4329.818    1483.088                -7238.53   -1421.106

------------------------------------------------------------------------------

    diff = mean(0) - mean(1)                                      t =  -2.9195

Ho: diff = 0                                     degrees of freedom =     1840

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0018         Pr(|T| > |t|) = 0.0035          Pr(T > t) = 0.9982

 

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

 

. clear

 

. exit, clear

 

* And now a few comments that relate to what I said at the end of class, but which didn't make into the class log.

 

* In order to use xi sensibly with our 3 occupations for HW2, while excluding all the other occupations and also generating output that is readable, here is what I suggest.

 

First, create a new variable that has 3 categories (for nurses, lawyers, and sociologists, missing for all other occupations).

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125

 

                 Occupation, 1990 basis |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                      Registered nurses |        966       68.37       68.37

                  Sociology instructors |          6        0.42       68.79

                                Lawyers |        441       31.21      100.00

----------------------------------------+-----------------------------------

                                  Total |      1,413      100.00

 

. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125, nolab

 

Occupation, |

 1990 basis |      Freq.     Percent        Cum.

------------+-----------------------------------

         95 |        966       68.37       68.37

        125 |          6        0.42       68.79

        178 |        441       31.21      100.00

------------+-----------------------------------

      Total |      1,413      100.00

 

. gen hw2_occ=1 if occ1990==95

(132744 missing values generated)

 

. replace  hw2_occ=2 if occ1990==125

(6 real changes made)

 

. replace hw2_occ=3 if occ1990==178

(441 real changes made)

 

. label define  hw2_occ 1 "nurses" 2 "sociologists" 3 "lawyers"

 

. label val  hw2_occ hw2_occ

 

tabulate occ1990  hw2_occ

 

     Occupation, 1990 |             hw2_occ

                basis |    nurses  sociologi    lawyers |     Total

----------------------+---------------------------------+----------

    Registered nurses |       966          0          0 |       966

Sociology instructors |         0          6          0 |         6

              Lawyers |         0          0        441 |       441

----------------------+---------------------------------+----------

                Total |       966          6        441 |     1,413

 

*now we are ready to use xi and regress on hw2_occ…

 

 

xi: regress inctot i.hw2_occ

i.hw2_occ         _Ihw2_occ_1-3       (naturally coded; _Ihw2_occ_1 omitted)

 

      Source |       SS       df       MS              Number of obs =    1413

-------------+------------------------------           F(  2,  1410) =  262.68

       Model |  1.0359e+12     2  5.1795e+11           Prob > F      =  0.0000

    Residual |  2.7802e+12  1410  1.9718e+09           R-squared     =  0.2715

-------------+------------------------------           Adj R-squared =  0.2704

       Total |  3.8161e+12  1412  2.7026e+09           Root MSE      =   44405

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

 _Ihw2_occ_2 |   3576.166   18184.46     0.20   0.844    -32095.35    39247.68

 _Ihw2_occ_3 |   58455.42   2551.942    22.91   0.000      53449.4    63461.43

       _cons |   40787.17   1428.706    28.55   0.000     37984.55    43589.79

------------------------------------------------------------------------------

 

*Here nurses are the omitted, comparison category, and sociologists and lawyers are compared to them (in the HW you will use incwage)