-----------------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\section3_2009.log

  log type:  text

 opened on:  10 Feb 2009, 15:25:54

 

. set mem 200m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory          200M    max. data space                200.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               203.163M

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

. xi i.occ1990

--Break--

r(1);

 

*It might seem convenient to generate dummy variables for the entire set of occupational categories, but it is actually not a great idea, because there are something like a thousand different occupational categories, and that would add a thousand dummy variables to your dataset. You might run out of memory. Much better is to first create a variable for the 3 occupations you want, perhaps setting all other to missing, and then use that variable to generate the dummies.

 

. memory

                                                  bytes

--------------------------------------------------------------------

Details of set memory usage

    overhead (pointers)                         534,840        0.26%

    data                                     45,327,690       21.61%

                                        ----------------------------

    data + overhead                          45,862,530       21.87%

    free                                    163,852,662       78.13%

                                        ----------------------------

    Total allocated                         209,715,192      100.00%

--------------------------------------------------------------------

Other memory usage

    set maxvar usage                          2,001,730

    set matsize usage                         1,315,200

    programs, saved results, etc.                76,822

                                        ---------------

    Total                                     3,393,752

-------------------------------------------------------

Grand total                                 213,108,944

 

. *xi of occ1990 was going to make a thousand new dummy variables. That is not what we want or need.

 

. drop _Iocc*

 

. memory

                                                  bytes

--------------------------------------------------------------------

Details of set memory usage

    overhead (pointers)                         534,840        0.26%

    data                                     15,242,940        7.27%

                                        ----------------------------

    data + overhead                          15,777,780        7.52%

    free                                    193,937,412       92.48%

                                        ----------------------------

    Total allocated                         209,715,192      100.00%

--------------------------------------------------------------------

Other memory usage

    set maxvar usage                          2,001,730

    set matsize usage                         1,315,200

    programs, saved results, etc.                73,958

                                        ---------------

    Total                                     3,390,888

-------------------------------------------------------

Grand total                                 213,106,080

 

. tabulate metro

 

  Metropolitan central city |

                     status |      Freq.     Percent        Cum.

----------------------------+-----------------------------------

           Not identifiable |        340        0.25        0.25

          Not in metro area |     29,658       22.18       22.44

               Central city |     32,481       24.29       46.73

       Outside central city |     51,468       38.49       85.22

Central city status unknown |     19,763       14.78      100.00

----------------------------+-----------------------------------

                      Total |    133,710      100.00

 

. tabulate metro, nolab

 

Metropolita |

  n central |

city status |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |        340        0.25        0.25

          1 |     29,658       22.18       22.44

          2 |     32,481       24.29       46.73

          3 |     51,468       38.49       85.22

          4 |     19,763       14.78      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. char metro[omit] 1

 

*redoing some things we did in class, making dummy variables for metro and putting them in the regression.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_1 omitted)

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  190.17

       Model |  1.1316e+12     4  2.8291e+11           Prob > F      =  0.0000

    Residual |  4.3633e+13 29330  1.4877e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0251

       Total |  4.4765e+13 29334  1.5260e+09           Root MSE      =   38570

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |   4553.396   4006.316     1.14   0.256    -3299.164    12405.96

   _Imetro_2 |   7255.712   667.5305    10.87   0.000     5947.322    8564.102

   _Imetro_3 |   16013.39   593.5204    26.98   0.000     14850.06    17176.71

   _Imetro_4 |   8368.313   758.1121    11.04   0.000      6882.38    9854.247

       _cons |   27189.65   473.7616    57.39   0.000     26261.05    28118.24

------------------------------------------------------------------------------

 

. *This is the results we got today in class...

 

. *metro==1, rural, is the excluded category here. Everything else is compared to rural.

 

. *The constant is the excluded category mean.

 

. table metro if age>29 & age<65 & sex==1, contents (mean incwage)

 

-------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)

----------------------------+--------------

           Not identifiable |   31743.04255

          Not in metro area |    27189.6465

               Central city |   34445.35841

       Outside central city |    43203.0348

Central city status unknown |   35557.95997

-------------------------------------------

 

. *One variation is to include the weights

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage)

 

-------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)

----------------------------+--------------

           Not identifiable |    32020.1697

          Not in metro area |   27344.17913

               Central city |    34517.6849

       Outside central city |   43963.55122

Central city status unknown |   35398.57026

-------------------------------------------

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)    sd(incwage)          Freq.

----------------------------+--------------------------------------------

           Not identifiable |    32020.1697       27352.47             94

          Not in metro area |   27344.17913       28233.76          6,628

               Central city |    34517.6849       38462.56          6,727

       Outside central city |   43963.55122       44645.15         11,639

Central city status unknown |   35398.57026       36143.29          4,247

-------------------------------------------------------------------------

 

.

. *The thing about aweight is that it adjusts the mean but not the N.

 

. table metro if age>29 & age<65 & sex==1, contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)    sd(incwage)          Freq.

----------------------------+--------------------------------------------

           Not identifiable |   31743.04255       27474.74             94

          Not in metro area |    27189.6465       28299.05          6,628

               Central city |   34445.35841       38491.83          6,727

       Outside central city |    43203.0348       44057.68         11,639

Central city status unknown |   35557.95997       36639.06          4,247

-------------------------------------------------------------------------

 

. *In theory, aweighted analysis is better.

 

. table metro if age>29 & age<65 & sex==1 [aweight= perwt_rounded], contents (mean incwage sd incwage freq)

 

-------------------------------------------------------------------------

Metropolitan central city   |

status                      | mean(incwage)    sd(incwage)          Freq.

----------------------------+--------------------------------------------

           Not identifiable |    32020.1697       27352.47             94

          Not in metro area |   27344.17913       28233.76          6,628

               Central city |    34517.6849       38462.56          6,727

       Outside central city |   43963.55122       44645.15         11,639

Central city status unknown |   35398.57026       36143.29          4,247

-------------------------------------------------------------------------

 

. regress incwage _Imetro* if age>29 & age<65 & sex==1 [aweight= perwt_rounded]

(sum of wgt is   6.0783e+07)

 

      Source |       SS       df       MS              Number of obs =   29335

-------------+------------------------------           F(  4, 29330) =  191.94

       Model |  1.1913e+12     4  2.9784e+11           Prob > F      =  0.0000

    Residual |  4.5511e+13 29330  1.5517e+09           R-squared     =  0.0255

-------------+------------------------------           Adj R-squared =  0.0254

       Total |  4.6703e+13 29334  1.5921e+09           Root MSE      =   39392

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |   4675.991   4166.002     1.12   0.262    -3489.561    12841.54

   _Imetro_2 |   7173.506   713.0054    10.06   0.000     5775.983    8571.028

   _Imetro_3 |   16619.37   632.9456    26.26   0.000     15378.77    17859.97

   _Imetro_4 |   8054.391   819.5563     9.83   0.000     6448.024    9660.758

       _cons |   27344.18   529.7572    51.62   0.000     26305.83    28382.53

------------------------------------------------------------------------------

 

. *Standard regression is really a regression of mean or average values.

 

* rather than use xi, I prefer another dummy variable generator called desmat, a free add-on to stata.

 

. ssc install desmat, replace

checking desmat consistency and verifying not already installed...

 

the following files will be replaced:

    c:\ado\stbplus\d\desmat.ado

 

installing into c:\ado\stbplus\...

installation complete.

 

. desmat: regress incwage metro=ind(2) if age>29 & age<65 & sex==1 [aweight= perwt_rounded]

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 29335

   aweight:                                                        perwt_rounded

   F statistic:                                                          191.942

   Model degrees of freedom:                                                   4

   Residual degrees of freedom:                                            29330

   R-squared:                                                              0.026

   Adjusted R-squared:                                                     0.025

   Root MSE                                                            39391.542

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

   metro

1    Not identifiable                                       4675.991    4166.002

2    Central city                                           7173.506**   713.005

3    Outside central city                                  16619.372**   632.946

4    Central city status unknown                            8054.391**   819.556

5  _cons                                                   27344.179**   529.757

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. *I used desmat to run the regression, and to make the dummy variables at the same time, also telling stata to use the second category of metro as the excluded category.

 

. *desmat creates its own dummy variables, coding them _x_1, _x_2, etc.

 

*How to calculate p values from a given T statistic? Let’s say the T-statistic is 2.5, and the N is 1500.

 

. display ttail (1500, 2.5)

ttail not found

r(111);

 

. display ttail(1500, 2.5)

.0062627

 

. *How to generate P values from a T statistic

 

. *for a 2 tail test, we would want to double this the P value that stata gives us (which is just the one-sided tail probability).

 

. display .0062627*2

.0125254

 

. *a little more than 1%, but less than 5%

 

. display 2*(ttail(20, 2.5))

.02123355

 

. *Note that for ttail, as for a lot of stata commands that require parentheses, it did not like the space between the command and the paren.

 

. clear all

 

. exit, clear