name:  <unnamed>

> 1_logs\class6.log

log type:  text

opened on:  10 Oct 2013, 13:43:13

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

* One left over point from HW1 is that Q4 called for a direct comparison between the income of veterans and non-veterans, and most student HW that I read skipped over this. But the simple comparison is important, and revealing.

. tabulate vetlast

Veteran's most recent |

period of service |      Freq.     Percent        Cum.

---------------------------+-----------------------------------

NIU |     30,904       23.11       23.11

No service |     91,149       68.17       91.28

World War II |      2,428        1.82       93.10

Korean War |      1,716        1.28       94.38

Vietnam Era |      3,683        2.75       97.14

Other service |      3,830        2.86      100.00

---------------------------+-----------------------------------

Total |    133,710      100.00

. gen byte veteran=0 if vetlast~=0

(30904 missing values generated)

. replace veteran=1 if vetlast>1

. tabulate vetlast veteran

Veteran's most recent |        veteran

period of service |         0          1 |     Total

----------------------+----------------------+----------

No service |    91,149          0 |    91,149

World War II |         0      2,428 |     2,428

Korean War |         0      1,716 |     1,716

Vietnam Era |         0      3,683 |     3,683

Other service |         0      3,830 |     3,830

----------------------+----------------------+----------

Total |    91,149     11,657 |   102,806

. table veteran [aweight= perwt_rounded] , contents (mean inctot)

------------------------

veteran | mean(inctot)

----------+-------------

0 |  25052.93274

1 |   38866.1566

------------------------

* So note: the veterans have a lot more income (on average) than the non-veterans. Why? Because the veterans are more likely to be male, and more likely to be older, when earnings peak.

. graph box age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

. graph hbox age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

* Two orientations of the box plot. Look up graph boxplot in the Stata manual for an explanation of how the outliers and whiskers are calculated.

*Now on to a brief discussion of dummy variables with metro as the predictor. Note that this is covered in more detail in my Excel sheet, “understanding dummy variables.”

. codebook metro

-----------------------------------------------------------------------

metro                                  Metropolitan central city status

-----------------------------------------------------------------------

type:  numeric (byte)

label:  metrolbl

range:  [0,4]                        units:  1

unique values:  5                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

340         0  Not identifiable

29658         1  Not in metro area

32481         2  Central city

51468         3  Outside central city

19763         4  Central city status unknown

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

Not identifiable |            94    31743.04255

Not in metro area |         6,628     27189.6465

Central city |         6,727    34445.35841

Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

. regress incwage metro if age>29 & age<65

Source |       SS       df       MS              Number of obs =   60477

-------------+------------------------------           F(  1, 60475) =  464.31

Model |  5.0002e+11     1  5.0002e+11           Prob > F      =  0.0000

Residual |  6.5126e+13 60475  1.0769e+09           R-squared     =  0.0076

Total |  6.5626e+13 60476  1.0852e+09           Root MSE      =   32816

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

metro |   2870.889   133.2332    21.55   0.000     2609.752    3132.027

_cons |   20308.34   353.9993    57.37   0.000      19614.5    21002.18

------------------------------------------------------------------------------

* Please don’t ever do this: don’t treat the categorical variable like a continuous variable and just plug it in to the regression. Stata will let you, but it is wrong, wrong, wrong. One way to think about how wrong it is: what are the units of metro? If metro doesn’t have units, you need to go the dummy variable route.

* First, using the old syntax of xi: and i.variable to generate the dummy variables.

. xi: regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

note: _Imetro_1 omitted because of collinearity

Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Imetro_1 |          0  (omitted)

_Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

_Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

_Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

_cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

* Note that the coefficients correspond to the actual differences of mean values between the categories, here everything is compared to central city, because I left category zero (not identified) out of the analysis.

. table metro, contents (mean _Imetro_1 mean _Imetro_2 mean _Imetro_3 mean _Imetro_4)

-------------------------------------------------------------------------------------

Metropolitan central city   |

status                      |      __000002      __000003      __000004      __000005

----------------------------+--------------------------------------------------------

Not identifiable |             0             0             0             0

Not in metro area |             1             0             0             0

Central city |             0             1             0             0

Outside central city |             0             0             1             0

Central city status unknown |             0             0             0             1

-------------------------------------------------------------------------------------

* What the dummy variables actually look like.

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

Not identifiable |            94    31743.04255

Not in metro area |         6,628     27189.6465

Central city |         6,727    34445.35841

Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

. regress incwage ib2.metro if age>29 & age<65 & sex==1 & metro~=0

Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

------------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------------+----------------------------------------------------------------

metro |

Not in metro area  |  -7255.712   668.0533   -10.86   0.000    -8565.127   -5946.297

Outside central..  |   8757.676   591.1938    14.81   0.000      7598.91    9916.443

Central city st..  |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

|

_cons |   34445.36   470.6309    73.19   0.000      33522.9    35367.82

------------------------------------------------------------------------------------

*First, compared to city center (ib2 means compared to base value=2)

. regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0

Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

------------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------------+----------------------------------------------------------------

metro |

Central city  |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

Outside central..  |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

Central city st..  |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

|

_cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------------

* Next compared to rural. The above 2 regressions have different comparison category for metro, so the coefficients are all different, but the model is the same and the same contrasts can be recovered:

. lincom 2.metro-3.metro

( 1)  2.metro - 3.metro = 0

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

(1) |  -8757.676   591.1938   -14.81   0.000    -9916.443    -7598.91

------------------------------------------------------------------------------

* The suburban-urban contrast.

*generating the 3 occupational dummy vars by hand, which is highly recommended.

. gen byte nurses=0

. replace nurses=1 if occ1990==95

. gen byte lawyers=0

. replace lawyers=1 if occ1990==178

. gen byte sociologists=0

. replace sociologists=1 if occ1990==125

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, contents (freq mean inctot)

--------------------------------------------------

Occupation, 1990      |

basis                 |        Freq.  mean(inctot)

----------------------+---------------------------

Registered nurses |          966    40787.1677

Sociology instructors |            6   44363.33333

Lawyers |          441   99242.58277

--------------------------------------------------

. regress inctot nurses if occ1990==178| occ1990==95

Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

------------------------------------------------------------------------------

inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

nurses |  -58455.42   2556.381   -22.87   0.000    -63470.15   -53440.68

_cons |   99242.58   2118.201    46.85   0.000     95087.41    103397.8

------------------------------------------------------------------------------

*nurses compared to lawyers.

. regress inctot lawyers if occ1990==178| occ1990==95

Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

------------------------------------------------------------------------------

inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   58455.42   2556.381    22.87   0.000     53440.68    63470.15

_cons |   40787.17   1431.192    28.50   0.000     37979.66    43594.67

------------------------------------------------------------------------------

*lawyers compared to nurses.

*without restricting the sample, we would get nurses compared to everyone else, which is not what we want in this case.

. regress inctot nurses

Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  1,103224) =  207.52

Model |  2.1289e+11     1  2.1289e+11           Prob > F      =  0.0000

Residual |  1.0590e+14103224  1.0259e+09           R-squared     =  0.0020

Total |  1.0611e+14103225  1.0279e+09           Root MSE      =   32029

------------------------------------------------------------------------------

inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

nurses |   14915.35   1035.387    14.41   0.000        12886    16944.69

_cons |   25871.82   100.1605   258.30   0.000     25675.51    26068.13

------------------------------------------------------------------------------

. log close

name:  <unnamed>