-----------------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\clas4_2009.log

  log type:  text

 opened on:  10 Feb 2009, 11:16:26

 

*Actually, this was class 5... So I renamed the log later.

 

. set mem 200m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory          200M    max. data space                200.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               203.163M

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

. tabulate metro

 

  Metropolitan central city |

                     status |      Freq.     Percent        Cum.

----------------------------+-----------------------------------

           Not identifiable |        340        0.25        0.25

          Not in metro area |     29,658       22.18       22.44

               Central city |     32,481       24.29       46.73

       Outside central city |     51,468       38.49       85.22

Central city status unknown |     19,763       14.78      100.00

----------------------------+-----------------------------------

                      Total |    133,710      100.00

 

. tabulate metro, nolab

 

Metropolita |

  n central |

city status |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |        340        0.25        0.25

          1 |     29,658       22.18       22.44

          2 |     32,481       24.29       46.73

          3 |     51,468       38.49       85.22

          4 |     19,763       14.78      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. table metro if age>29 & age<65 & sex==1, contents (mean incwage)

 

-------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)

----------------------------+--------------

           Not identifiable |   31743.04255

          Not in metro area |    27189.6465

               Central city |   34445.35841

       Outside central city |    43203.0348

Central city status unknown |   35557.95997

-------------------------------------------

 

. *suburbs have the highest income, rural has the lowest, city is somewhere in between.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

 

. table metro, contents (mean _Imetro_1 mean _Imetro_2 mean _Imetro_3)

 

----------------------------------------------------------------------------

Metropolitan central city   |

status                      | mean(_Imetr~1)  mean(_Imetr~2)  mean(_Imetr~3)

----------------------------+-----------------------------------------------

           Not identifiable |              0               0               0

          Not in metro area |              1               0               0

               Central city |              0               1               0

       Outside central city |              0               0               1

Central city status unknown |              0               0               0

----------------------------------------------------------------------------

 

. *I want to change the comparison group for these dummy variables to the rural. Because the first category was basically empty.

 

. char metro[omit] 1

 

*change the omitted value to metro==1, ie rural.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_1 omitted)

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |  (dropped)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

. *See my excel file for an explanation of why this is the same as the result from our simple table.

 

. *For instance, for central city:

 

. display 27189+7255

34444

 

. *That's our central city average.

 

. *men in every other metro category make more than men in rural American, T statistics are all significant.

 

. *What if we want to compare category 4 and category 2?

 

. lincom _Imetro_4-_Imetro_2

 

 ( 1) - _Imetro_2 + _Imetro_4 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

------------------------------------------------------------------------------

 

. *The difference between category 4 and category 2 is 1112 dollars in income, but it is not a significant difference.

 

. *one thing you definitely do not want to do, is put the categorical variable straight in the regression:

 

. regress incwage metro if age>29 & age<65 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  1, 29239) =  400.78

       Model |  6.0432e+11     1  6.0432e+11           Prob > F      =  0.0000

    Residual |  4.4088e+13 29239  1.5078e+09           R-squared     =  0.0135

-------------+------------------------------           Adj R-squared =  0.0135

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38831

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |   4563.546   227.9541    20.02   0.000     4116.745    5010.346

       _cons |   25213.42    605.392    41.65   0.000     24026.83    26400.02

------------------------------------------------------------------------------

 

. *Please don't do that

 

. tabulate metro

 

  Metropolitan central city |

                     status |      Freq.     Percent        Cum.

----------------------------+-----------------------------------

           Not identifiable |        340        0.25        0.25

          Not in metro area |     29,658       22.18       22.44

               Central city |     32,481       24.29       46.73

       Outside central city |     51,468       38.49       85.22

Central city status unknown |     19,763       14.78      100.00

----------------------------+-----------------------------------

                      Total |    133,710      100.00

 

*Make my own set of dummy variables, with rural as the excluded category

 

. gen cent_city=0

 

. replace cent_city=1 if metro==2

(32481 real changes made)

 

. gen suburb=0

 

. replace suburb=1 if metro==3

(51468 real changes made)

 

. gen metro_cent_city_unkown=0

 

. replace  metro_cent_city_unkown=1 if metro==4

(19763 real changes made)

 

. regress incwage  cent_city suburb metro_cent_city_unkown if age>29 & age<65 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   cent_city |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

      suburb |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

metro_cent~n |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

. *this has some relevance for HW2

 

. lincom  metro_cent_city_unkown- cent_city

 

 ( 1) - cent_city + metro_cent_city_unkown = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

------------------------------------------------------------------------------

 

. *let me change gears a bit, and talk about random subsets

 

. gen random=runiform()

 

. summarize random

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

      random |    133710    .5010981    .2889151   3.11e-06   .9999956

 

. histogram random

(bin=51, start=3.108e-06, width=.0196077)

 

. *OK, so random value really is uniform from zero to 1.

 

. table sex if age>24 & age<35 & random<.05, contents (mean yrsed sd yrsed freq)

 

-------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

     Male |    13.46273     3.079119          483

   Female |    13.47561     3.012099          451

-------------------------------------------------

 

. ttest yrsed if age>24 & age<35 & random<.05, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |     483    13.46273    .1401048    3.079119    13.18744    13.73802

  Female |     451    13.47561    .1418342    3.012098    13.19687    13.75435

---------+--------------------------------------------------------------------

combined |     934    13.46895    .0996458    3.045317    13.27339    13.66451

---------+--------------------------------------------------------------------

    diff |           -.0128768    .1995152               -.4044279    .3786742

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -0.0645

Ho: diff = 0                                     degrees of freedom =      932

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.4743         Pr(|T| > |t|) = 0.9486          Pr(T > t) = 0.5257

 

. *This random subset has hardly any difference in yrsed between men and women. The small difference that there is is totally not significant.

 

. graph hbox yrsed if age>24 & age<35, over(sex)

 

. *boxplot of yrsed by sex is not informative, particularly, because the two genders have the same boxplot.

 

. log close

       log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\clas4_2009.log

  log type:  text

 closed on:  10 Feb 2009, 12:09:08

---------------------------------------------------------------------------------