-----------------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2018_logs\lastclass.log

  log type:  text

 opened on:   5 Dec 2018, 10:33:34

 

 

*In this class we are looking at regressions and changing the inputs. Take a look also at https://web.stanford.edu/~mrosenfe/soc_meth_proj3/soc_180B_regression_whatchanges.htm

 

 

. regress incwage male ib3.metro lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  6,103219) = 1311.98

       Model |  6.0852e+12     6  1.0142e+12           Prob > F      =  0.0000

    Residual |  7.9792e+13103219   773034198           R-squared     =  0.0709

-------------+------------------------------           Adj R-squared =  0.0708

       Total |  8.5877e+13103225   831940347           Root MSE      =   27803

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                        male |   12231.38   173.3196    70.57   0.000     11891.68    12571.09

                             |

                       metro |

           Not identifiable  |  -1997.594   1704.125    -1.17   0.241    -5337.656    1342.469

          Not in metro area  |  -7879.961   230.2617   -34.22   0.000    -8331.271   -7428.651

               Central city  |  -3375.647    224.791   -15.02   0.000    -3816.234   -2935.059

Central city status unknown  |  -3988.916   264.4686   -15.08   0.000    -4507.271   -3470.561

                             |

                     lawyers |   51195.58   1328.037    38.55   0.000     48592.64    53798.51

                       _cons |   16573.35   162.7125   101.86   0.000     16254.44    16892.27

----------------------------------------------------------------------------------------------

 

* Add yrsed, a new predictor and see how R-square improves and all the other coefficients change:

 

. regress incwage male ib3.metro yrsed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                        male |   12144.91   162.8297    74.59   0.000     11825.76    12464.05

                             |

                       metro |

           Not identifiable  |  -1735.335    1600.97    -1.08   0.278    -4873.214    1402.545

          Not in metro area  |   -6042.26   216.8909   -27.86   0.000    -6467.364   -5617.157

               Central city  |  -2266.333   211.3957   -10.72   0.000    -2680.666       -1852

Central city status unknown  |  -3106.574   248.5734   -12.50   0.000    -3593.774   -2619.373

                             |

                       yrsed |   3038.551   25.93063   117.18   0.000     2987.727    3089.374

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -22955.05   370.3499   -61.98   0.000    -23680.94   -22229.1

 

 

* What about changing the units of educational attainment from years to months? T-stat remains the same but coeff and SD change. R-square remains the same.

 

. regress incwage male ib3.metro months_ed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                        male |   12144.91   162.8297    74.59   0.000     11825.76    12464.05

                             |

                       metro |

           Not identifiable  |  -1735.335    1600.97    -1.08   0.278    -4873.214    1402.545

          Not in metro area  |   -6042.26   216.8909   -27.86   0.000    -6467.364   -5617.157

               Central city  |  -2266.333   211.3957   -10.72   0.000    -2680.666       -1852

Central city status unknown  |  -3106.574   248.5734   -12.50   0.000    -3593.774   -2619.373

                             |

                   months_ed |   253.2126   2.160886   117.18   0.000     248.9773    257.4479

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -22955.05   370.3499   -61.98   0.000    -23680.94   -22229.17

----------------------------------------------------------------------------------------------

 

. regress incwage female ib3.metro months_ed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12144.91   162.8297   -74.59   0.000    -12464.05   -11825.76

                             |

                       metro |

           Not identifiable  |  -1735.335    1600.97    -1.08   0.278    -4873.214    1402.545

          Not in metro area  |   -6042.26   216.8909   -27.86   0.000    -6467.364   -5617.157

               Central city  |  -2266.333   211.3957   -10.72   0.000    -2680.666       -1852

Central city status unknown  |  -3106.574   248.5734   -12.50   0.000    -3593.774   -2619.373

                             |

                   months_ed |   253.2126   2.160886   117.18   0.000     248.9773    257.4479

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -10810.15   372.5182   -29.02   0.000    -11540.28   -10080.02

----------------------------------------------------------------------------------------------

 

 

* If we look only at lawyers, obviously the sample size goes down drastically, and on the smaller sample size, all coefficients are different.

 

 

. regress incwage female ib3.metro yrsed if lawyers==1

 

      Source |       SS       df       MS              Number of obs =     441

-------------+------------------------------           F(  5,   435) =    4.38

       Model |  1.0044e+11     5  2.0087e+10           Prob > F      =  0.0007

    Residual |  1.9964e+12   435  4.5894e+09           R-squared     =  0.0479

-------------+------------------------------           Adj R-squared =  0.0370

       Total |  2.0968e+12   440  4.7655e+09           Root MSE      =   67745

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -22645.57   7116.978    -3.18   0.002    -36633.51   -8657.626

                             |

                       metro |

          Not in metro area  |  -37248.55   12181.48    -3.06   0.002    -61190.43   -13306.68

               Central city  |  -963.7417   7048.969    -0.14   0.891    -14818.01    12890.53

Central city status unknown  |  -16951.57   13043.65    -1.30   0.194    -42587.98    8684.851

                             |

                       yrsed |   10454.45    7147.74     1.46   0.144    -3593.948    24502.85

                       _cons |  -91622.81   121461.7    -0.75   0.451    -330347.6      147102

----------------------------------------------------------------------------------------------

 

 

* Add in the nurses, sample size goes up, and again every coefficient is different. As sample size goes up, in general SD goes down and T-stats go up.

 

. regress incwage female ib3.metro yrsed if lawyers==1| nurses==1

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  6,  1400) =   39.31

       Model |  4.2686e+11     6  7.1143e+10           Prob > F      =  0.0000

    Residual |  2.5338e+12  1400  1.8099e+09           R-squared     =  0.1442

-------------+------------------------------           Adj R-squared =  0.1405

       Total |  2.9607e+12  1406  2.1057e+09           Root MSE      =   42543

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -30191.21   2718.588   -11.11   0.000    -35524.15   -24858.26

                             |

                       metro |

           Not identifiable  |  -5228.588   21353.46    -0.24   0.807    -47116.82    36659.64

          Not in metro area  |  -13520.09    3272.99    -4.13   0.000    -19940.59   -7099.602

               Central city  |   1147.146   2874.773     0.40   0.690    -4492.181    6786.474

Central city status unknown  |  -8891.097    3445.64    -2.58   0.010    -15650.27   -2131.923

                             |

                       yrsed |   3315.386   807.7033     4.10   0.000     1730.947    4899.825

                       _cons |   21556.32   13766.42     1.57   0.118     -5448.71    48561.35

----------------------------------------------------------------------------------------------

 

. regress incwage female ib3.metro yrsed lawyers if lawyers==1| nurses==1

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  7,  1399) =   39.43

       Model |  4.8784e+11     7  6.9692e+10           Prob > F      =  0.0000

    Residual |  2.4728e+12  1399  1.7676e+09           R-squared     =  0.1648

-------------+------------------------------           Adj R-squared =  0.1606

       Total |  2.9607e+12  1406  2.1057e+09           Root MSE      =   42042

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -17643.27   3432.434    -5.14   0.000    -24376.54   -10909.99

                             |

                       metro |

           Not identifiable  |   -2555.09   21107.38    -0.12   0.904    -43960.61    38850.43

          Not in metro area  |  -11759.26    3248.38    -3.62   0.000    -18131.48   -5387.035

               Central city  |   -553.633     2855.7    -0.19   0.846    -6155.549    5048.283

Central city status unknown  |  -6600.401   3427.399    -1.93   0.054     -13323.8    122.9936

                             |

                       yrsed |   1893.814   834.0932     2.27   0.023     257.6055    3530.022

                     lawyers |   20572.89    3502.49     5.87   0.000     13702.19    27443.59

                       _cons |   28369.24   13653.96     2.08   0.038     1584.803    55153.68

----------------------------------------------------------------------------------------------

 

. regress incwage female ib3.metro yrsed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12144.91   162.8297   -74.59   0.000    -12464.05   -11825.76

                             |

                       metro |

           Not identifiable  |  -1735.335    1600.97    -1.08   0.278    -4873.214    1402.545

          Not in metro area  |   -6042.26   216.8909   -27.86   0.000    -6467.364   -5617.157

               Central city  |  -2266.333   211.3957   -10.72   0.000    -2680.666       -1852

Central city status unknown  |  -3106.574   248.5734   -12.50   0.000    -3593.774   -2619.373

                             |

                       yrsed |   3038.551   25.93063   117.18   0.000     2987.727    3089.374

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -10810.15   372.5182   -29.02   0.000    -11540.28   -10080.02

----------------------------------------------------------------------------------------------

 

. codebook metro

 

-----------------------------------------------------------------------------------------------------

metro                                                                Metropolitan central city status

-----------------------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

 

* Change the comparison category for metro.

 

. regress incwage female ib2.metro yrsed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12144.91   162.8297   -74.59   0.000    -12464.05   -11825.76

                             |

                       metro |

           Not identifiable  |   530.9983   1604.149     0.33   0.741    -2613.114     3675.11

          Not in metro area  |  -3775.927    238.677   -15.82   0.000    -4243.731   -3308.124

       Outside central city  |   2266.333   211.3957    10.72   0.000         1852    2680.666

Central city status unknown  |  -840.2406   268.0711    -3.13   0.002    -1365.656   -314.8247

                             |

                       yrsed |   3038.551   25.93063   117.18   0.000     2987.727    3089.374

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -13076.48   377.9098   -34.60   0.000    -13817.18   -12335.78

----------------------------------------------------------------------------------------------

 

. gen random=runiform()

 

. summarize random

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

      random |    133710    .5010981    .2889151   3.11e-06   .9999956

 

 

* reduce the sample size randomly by half. This yields SD larger and T-stat smaller (roughly, because of randomization) by a factor of sqrt(2)

 

. regress incwage female ib2.metro yrsed lawyers if random<=.5

 

      Source |       SS       df       MS              Number of obs =   51342

-------------+------------------------------           F(  7, 51334) = 1639.66

       Model |  7.9180e+12     7  1.1311e+12           Prob > F      =  0.0000

    Residual |  3.5413e+13 51334   689860495           R-squared     =  0.1827

-------------+------------------------------           Adj R-squared =  0.1826

       Total |  4.3331e+13 51341   843989298           Root MSE      =   26265

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12486.96   232.0831   -53.80   0.000    -12941.85   -12032.08

                             |

                       metro |

           Not identifiable  |  -1205.238   2333.599    -0.52   0.606    -5779.116     3368.64

          Not in metro area  |  -3528.435   340.4361   -10.36   0.000    -4195.693   -2861.176

       Outside central city  |   2476.326   301.5399     8.21   0.000     1885.304    3067.347

Central city status unknown  |  -1075.994   382.9919    -2.81   0.005    -1826.662   -325.3264

                             |

                       yrsed |   3055.973   37.04056    82.50   0.000     2983.373    3128.573

                     lawyers |   41881.49   1786.683    23.44   0.000     38379.57    45383.41

                       _cons |  -13165.17   539.7285   -24.39   0.000    -14223.05    -12107.3

----------------------------------------------------------------------------------------------

 

. codebook union

 

-----------------------------------------------------------------------------------------------------

union                                                                                Union membership

-----------------------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  unionlbl

 

                 range:  [0,3]                        units:  1

         unique values:  4                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                        1.2e+05        0  NIU

                         11383         1  No union coverage

                          1883         2  Member of labor union

                           195         3  Covered by union but not a

                                          member

 

. gen byte new_union=1 if union==2| union==3

(131632 missing values generated)

 

. replace new_union=0 if union==1

(11383 real changes made)

 

. tabulate union new_union

 

                      |       new_union

     Union membership |         0          1 |     Total

----------------------+----------------------+----------

    No union coverage |    11,383          0 |    11,383

Member of labor union |         0      1,883 |     1,883

Covered by union but  |         0        195 |       195

----------------------+----------------------+----------

                Total |    11,383      2,078 |    13,461

 

* Union has a lot of missing values. What if we used union as a predictor in the models? We would get a sharply reduced sample size.

 

 

. tabulate union new_union, miss

 

                      |            new_union

     Union membership |         0          1          . |     Total

----------------------+---------------------------------+----------

                  NIU |         0          0    120,249 |   120,249

    No union coverage |    11,383          0          0 |    11,383

Member of labor union |         0      1,883          0 |     1,883

Covered by union but  |         0        195          0 |       195

----------------------+---------------------------------+----------

                Total |    11,383      2,078    120,249 |   133,710

 

 

. regress incwage female ib2.metro yrsed lawyers i.new_union

 

      Source |       SS       df       MS              Number of obs =   13461

-------------+------------------------------           F(  8, 13452) =  435.79

       Model |  2.4371e+12     8  3.0464e+11           Prob > F      =  0.0000

    Residual |  9.4038e+12 13452   699064454           R-squared     =  0.2058

-------------+------------------------------           Adj R-squared =  0.2054

       Total |  1.1841e+13 13460   879714936           Root MSE      =   26440

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |   -14355.8   457.3965   -31.39   0.000    -15252.36   -13459.23

                             |

                       metro |

           Not identifiable  |  -3710.895   3885.242    -0.96   0.340    -11326.51    3904.724

          Not in metro area  |   -4151.12   679.8949    -6.11   0.000    -5483.809    -2818.43

       Outside central city  |   3518.947   592.6353     5.94   0.000     2357.298    4680.595

Central city status unknown  |  -710.1423   758.6761    -0.94   0.349    -2197.254    776.9695

                             |

                       yrsed |   3652.481   84.62179    43.16   0.000      3486.61    3818.351

                     lawyers |   40232.75    2879.16    13.97   0.000     34589.19     45876.3

                 1.new_union |   3882.035   633.6996     6.13   0.000     2639.895    5124.175

                       _cons |  -12776.82   1250.789   -10.22   0.000    -15228.54    -10325.1

----------------------------------------------------------------------------------------------

 

. regress incwage female ib2.metro yrsed lawyers

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3235.73

       Model |  1.5454e+13     7  2.2077e+12           Prob > F      =  0.0000

    Residual |  7.0423e+13103218   682277869           R-squared     =  0.1800

-------------+------------------------------           Adj R-squared =  0.1799

       Total |  8.5877e+13103225   831940347           Root MSE      =   26120

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12144.91   162.8297   -74.59   0.000    -12464.05   -11825.76

                             |

                       metro |

           Not identifiable  |   530.9983   1604.149     0.33   0.741    -2613.114     3675.11

          Not in metro area  |  -3775.927    238.677   -15.82   0.000    -4243.731   -3308.124

       Outside central city  |   2266.333   211.3957    10.72   0.000         1852    2680.666

Central city status unknown  |  -840.2406   268.0711    -3.13   0.002    -1365.656   -314.8247

                             |

                       yrsed |   3038.551   25.93063   117.18   0.000     2987.727    3089.374

                     lawyers |   38622.84   1252.251    30.84   0.000     36168.45    41077.24

                       _cons |  -13076.48   377.9098   -34.60   0.000    -13817.18   -12335.78

----------------------------------------------------------------------------------------------

* aweights change the coefficients and the SD and t-stats a little, but since the weights don’t vary enormously in CPS, the changes are minor. And note aweights yields the same sample size as before, 103K.

 

 

 

. regress incwage female ib2.metro yrsed lawyers [aweight= perwt_rounded]

(sum of wgt is   2.1377e+08)

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  7,103218) = 3277.55

       Model |  1.6296e+13     7  2.3279e+12           Prob > F      =  0.0000

    Residual |  7.3312e+13103218   710267554           R-squared     =  0.1819

-------------+------------------------------           Adj R-squared =  0.1818

       Total |  8.9608e+13103225   868084070           Root MSE      =   26651

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12335.87   166.0723   -74.28   0.000    -12661.37   -12010.37

                             |

                       metro |

           Not identifiable  |    1769.08   1681.643     1.05   0.293    -1526.918    5065.078

          Not in metro area  |  -3103.803    254.555   -12.19   0.000    -3602.728   -2604.879

       Outside central city  |   2506.564   211.2352    11.87   0.000     2092.545    2920.582

Central city status unknown  |  -679.0621   279.3252    -2.43   0.015    -1226.536   -131.5885

                             |

                       yrsed |   3218.374   27.16669   118.47   0.000     3165.128     3271.62

                     lawyers |   37575.29   1237.786    30.36   0.000     35149.24    40001.33

                       _cons |  -15674.09   397.9709   -39.39   0.000    -16454.11   -14894.07

----------------------------------------------------------------------------------------------

 

* But if we use the weights as fweights instead of aweights, we are then magnifying the sample size by 2000 times, and increasing the t-stats and decreasing the SEs by a factor of sqrt(2000)=approximately 42

 

 

. regress incwage female ib2.metro yrsed lawyers [fweight= perwt_rounded]

 

      Source |       SS       df       MS              Number of obs =213773851

-------------+------------------------------           F(  7,213773843) =       .

       Model |  3.3747e+16     7  4.8210e+15           Prob > F      =  0.0000

    Residual |  1.5182e+17213773843   710212535           R-squared     =  0.1819

-------------+------------------------------           Adj R-squared =  0.1819

       Total |  1.8557e+17213773850   868075664           Root MSE      =   26650

 

----------------------------------------------------------------------------------------------

                     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

                      female |  -12335.87     3.6492 -3380.43   0.000    -12343.02   -12328.72

                             |

                       metro |

           Not identifiable  |    1769.08   36.95168    47.88   0.000     1696.656    1841.504

          Not in metro area  |  -3103.803    5.59348  -554.90   0.000    -3114.766    -3092.84

       Outside central city  |   2506.564    4.64159   540.02   0.000     2497.466    2515.661

Central city status unknown  |  -679.0621   6.137768  -110.64   0.000     -691.092   -667.0323

                             |

                       yrsed |   3218.374   .5969488  5391.37   0.000     3217.204    3219.544

                     lawyers |   37575.29   27.19856  1381.52   0.000     37521.98    37628.59

                       _cons |  -15674.09   8.744839 -1792.38   0.000    -15691.23   -15656.95

----------------------------------------------------------------------------------------------

 

. save "C:\Users\mexmi\Documents\current class files\intro soc methods\cps_mar_2000_new with addition

> al vars.dta", replace

file C:\Users\mexmi\Documents\current class files\intro soc methods\cps_mar_2000_new with additional

> vars.dta saved

 

. exit