-------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fal

> l_2010_s381_logs\class6.log

  log type:  text

 opened on:   7 Oct 2010, 14:08:36

 

. tabulate metro

 

  Metropolitan central city |

                     status |      Freq.     Percent        Cum.

----------------------------+-----------------------------------

           Not identifiable |        340        0.25        0.25

          Not in metro area |     29,658       22.18       22.44

               Central city |     32,481       24.29       46.73

       Outside central city |     51,468       38.49       85.22

Central city status unknown |     19,763       14.78      100.00

----------------------------+-----------------------------------

                      Total |    133,710      100.00

 

. tabulate metro, nolabel

 

Metropolita |

  n central |

city status |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |        340        0.25        0.25

          1 |     29,658       22.18       22.44

          2 |     32,481       24.29       46.73

          3 |     51,468       38.49       85.22

          4 |     19,763       14.78      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. codebook metro

 

-------------------------------------------------------------------------------

metro                                                   Metropolitan central city status

-------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

 

*Codebook, and tabulate followed by tabulate, nolabel are two ways of figuring out which numerical codes correspond to which actual categories.

 

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

* Right: "outside central city," or suburbs, have the highest income, while rural "not in metro area" have the lowest.

 

. regress incwage metro if age>29 & age<65 & sex==1

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  1, 29333) =  400.17

       Model |  6.0248e+11     1  6.0248e+11           Prob > F      =  0.0000

    Residual |  4.4162e+13 29333  1.5055e+09           R-squared     =  0.0135

-------------+------------------------------           Adj R-squared =  0.0134

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38801

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |   4512.636   225.5831    20.00   0.000     4070.483    4954.789

       _cons |   25359.28   598.1346    42.40   0.000     24186.91    26531.65

------------------------------------------------------------------------------

 

* Don't ever do this except by accident. Here we put metro in to our regression as a numeric variable, which makes no sense since the numbers of metro are just place holders for the different categories. The numbers could be anything, so the regression makes no sense. If you can't think of what the units of the variable are, it probably should not be treated as a numeric variable in regression. So what do we need? We need dummy variables.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

 

* metro has 5 levels, Stata generated 4 dummy variables and omitted the first category, metro=0, or "not identifiable." The other indicator variables get coded zero for all the other categories, 1 for the indicated category. xi is built in to stata, and it will work on both Stata ver 10 and Stata ver 11. And when you run the xi command, you will see new variables showing up in your variable list for each new dummy variable.

 

. table metro, contents(mean  _Imetro_1 mean _Imetro_2 mean _Imetro_3 mean _Imetro_4)

 

----------------------------------------------------------------------------------------

Metropolitan central city   |

status                      |      __000002       __000003       __000004       __000005

----------------------------+-----------------------------------------------------------

           Not identifiable |             0              0              0              0

          Not in metro area |             1              0              0              0

               Central city |             0              1              0              0

       Outside central city |             0              0              1              0

Central city status unknown |             0              0              0              1

----------------------------------------------------------------------------------------

 

. char metro[omit] 1

 

*unfortunately with xi, you need a separate command to change the omitted category for variable metro, in this case we are changing to the comparison category to metro==1, which are the rural folks.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_1 omitted)

 

 

. table metro, contents(mean  _Imetro_0 mean _Imetro_2 mean _Imetro_3 mean _Imetro_4)

 

----------------------------------------------------------------------------------------

Metropolitan central city   |

status                      |      __000002       __000003       __000004       __000005

----------------------------+-----------------------------------------------------------

           Not identifiable |             1              0              0              0

          Not in metro area |             0              0              0              0

               Central city |             0              1              0              0

       Outside central city |             0              0              1              0

Central city status unknown |             0              0              0              1

----------------------------------------------------------------------------------------

 

* because we made metro=1, "not in metro area" the omitted category above with the char command, now we get indicator variables for every category but that one.

 

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

. regress incwage _Imetro* if metro~=0 & age>29 & age<65 & sex==1

note: _Imetro_0 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |  (omitted)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

* So, a few things to note about the regression. First, we have 4 terms including the constant predicting 4 categories (we dropped "not identifiable" from our analysis). Having 4 terms predicting 4 things means we can fit the actual data exactly. The constant coefficient equals the income for our excluded category, rural income. The other coefficients represent the difference between that area and rural average income.

 

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  190.17

       Model |  1.1316e+12     4  2.8291e+11           Prob > F      =  0.0000

    Residual |  4.3633e+13 29330  1.4877e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0251

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38570

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |   4553.396   4006.316     1.14   0.256    -3299.164    12405.96

   _Imetro_2 |   7255.712   667.5305    10.87   0.000     5947.322    8564.102

   _Imetro_3 |   16013.39   593.5204    26.98   0.000     14850.06    17176.71

   _Imetro_4 |   8368.313   758.1121    11.04   0.000      6882.38    9854.247

       _cons |   27189.65   473.7616    57.39   0.000     26261.05    28118.24

------------------------------------------------------------------------------

 

* If we put the unimportant metro==0 "not idenfiable" folks back into the regression, what changes? Well, the N goes up, the fit statistics change a little, the constant and the other coefficients are unchanged, but all of the standard errors and the t statistics are changed a little, because the new cases affect the joint variance of income across all cases.

 

. lincom  _Imetro_3- _Imetro_2

 

 ( 1)  - _Imetro_2 + _Imetro_3 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   8757.676   590.7312    14.83   0.000     7599.817    9915.536

------------------------------------------------------------------------------

 

* if we want to make a comparison between two of the other groups, lincom is one way- it uses the results of the previous regression, so this is what we would get for metro=3 if we had made metro==2 the comparison category..

 

. xi: regress incwage i.metro if age>29 & age<65 & sex==1

i.metro           _Imetro_0-4         (naturally coded; _Imetro_1 omitted)

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  190.17

       Model |  1.1316e+12     4  2.8291e+11           Prob > F      =  0.0000

    Residual |  4.3633e+13 29330  1.4877e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0251

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38570

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |   4553.396   4006.316     1.14   0.256    -3299.164    12405.96

   _Imetro_2 |   7255.712   667.5305    10.87   0.000     5947.322    8564.102

   _Imetro_3 |   16013.39   593.5204    26.98   0.000     14850.06    17176.71

   _Imetro_4 |   8368.313   758.1121    11.04   0.000      6882.38    9854.247

       _cons |   27189.65   473.7616    57.39   0.000     26261.05    28118.24

------------------------------------------------------------------------------

 

* The xi: format combines the xi and the regression step, and here we would put an i. in front of every variable that is categorical that we need to generate dummy variables for.

 

. regress incwage i.metro if age>29 & age<65 & sex==1

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  190.17

       Model |  1.1316e+12     4  2.8291e+11           Prob > F      =  0.0000

    Residual |  4.3633e+13 29330  1.4877e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0251

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38570

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          1  |  -4553.396   4006.316    -1.14   0.256    -12405.96    3299.164

          2  |   2702.316   4005.904     0.67   0.500    -5149.436    10554.07

          3  |   11459.99   3994.238     2.87   0.004     3631.107    19288.88

          4  |   3814.917    4021.99     0.95   0.343    -4068.363     11698.2

             |

       _cons |   31743.04   3978.206     7.98   0.000     23945.58     39540.5

------------------------------------------------------------------------------

 

* the above is a similar syntax, which Stata calls factor variables, but without the xi: and this syntax is only available in Stata 11.

 

. regress incwage ib2.metro if age>29 & age<65 & sex==1

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  190.17

       Model |  1.1316e+12     4  2.8291e+11           Prob > F      =  0.0000

    Residual |  4.3633e+13 29330  1.4877e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0251

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38570

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          0  |  -2702.316   4005.904    -0.67   0.500    -10554.07    5149.436

          1  |  -7255.712   667.5305   -10.87   0.000    -8564.102   -5947.322

          3  |   8757.676   590.7312    14.83   0.000     7599.817    9915.536

          4  |   1112.602   755.9304     1.47   0.141    -369.0558    2594.259

             |

       _cons |   34445.36   470.2626    73.25   0.000     33523.62    35367.09

------------------------------------------------------------------------------

 

* Like desmat (see below), factor variables (above- search fvvarlist on the Stata help) allow you to set the omitted category on the fly, which is nice. But factor variables (unlike xi and unlike desmat) do not create a new set of variables in your variable list that you can manipulate later (say with lincom), so that is a limitation.

 

 

. desmat: regress incwage metro=ind(2) if age>29 & age<65 & sex==1

--------------------------------------------------------------------------------------

   Linear regression

--------------------------------------------------------------------------------------

   Dependent variable                                                         incwage

   Number of observations:                                                      29335

   F statistic:                                                               190.172

   Model degrees of freedom:                                                        4

   Residual degrees of freedom:                                                 29330

   R-squared:                                                                   0.025

   Adjusted R-squared:                                                          0.025

   Root MSE                                                                 38570.134

   Prob:                                                                        0.000

--------------------------------------------------------------------------------------

nr Effect                                                           Coeff        s.e.

--------------------------------------------------------------------------------------

   metro

1    Not identifiable                                            4553.396    4006.316

2    Central city                                                7255.712**   667.531

3    Outside central city                                       16013.388**   593.520

4    Central city status unknown                                 8368.313**   758.112

5  _cons                                                        27189.646**   473.762

--------------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* type findit desmat and follow the links to download. desmat assumes that predictor variables are always categorical, and need to be made into dummies, unless you use the @ prefix to indicate that the predictor is continuous, so no i. prefix for the categorical metro. The ind(2) means that desmat will take the second category (which in this case is metro==1) and make it the omitted category.

 

. codebook metro

 

--------------------------------------------------------------------------------------

metro                                                 Metropolitan central city status

--------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_pr

> oj3\fall_2010_s381_logs\class6.log

  log type:  text

 closed on:   7 Oct 2010, 15:33:05

-------------------------------------------------------------------------------------------------