-----------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2018_logs\lastclass.log
log type: text
opened on: 5 Dec 2018, 10:33:34
*In this class we are looking at regressions and changing the inputs. Take a look also at https://web.stanford.edu/~mrosenfe/soc_meth_proj3/soc_180B_regression_whatchanges.htm
. regress incwage male ib3.metro lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 6,103219) = 1311.98
Model | 6.0852e+12 6 1.0142e+12 Prob > F = 0.0000
Residual | 7.9792e+13103219 773034198 R-squared = 0.0709
-------------+------------------------------ Adj R-squared = 0.0708
Total | 8.5877e+13103225 831940347 Root MSE = 27803
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
male | 12231.38 173.3196 70.57 0.000 11891.68 12571.09
|
metro |
Not identifiable | -1997.594 1704.125 -1.17 0.241 -5337.656 1342.469
Not in metro area | -7879.961 230.2617 -34.22 0.000 -8331.271 -7428.651
Central city | -3375.647 224.791 -15.02 0.000 -3816.234 -2935.059
Central city status unknown | -3988.916 264.4686 -15.08 0.000 -4507.271 -3470.561
|
lawyers | 51195.58 1328.037 38.55 0.000 48592.64 53798.51
_cons | 16573.35 162.7125 101.86 0.000 16254.44 16892.27
----------------------------------------------------------------------------------------------
* Add yrsed, a new predictor and see how R-square improves and all the other coefficients change:
. regress incwage male ib3.metro yrsed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
male | 12144.91 162.8297 74.59 0.000 11825.76 12464.05
|
metro |
Not identifiable | -1735.335 1600.97 -1.08 0.278 -4873.214 1402.545
Not in metro area | -6042.26 216.8909 -27.86 0.000 -6467.364 -5617.157
Central city | -2266.333 211.3957 -10.72 0.000 -2680.666 -1852
Central city status unknown | -3106.574 248.5734 -12.50 0.000 -3593.774 -2619.373
|
yrsed | 3038.551 25.93063 117.18 0.000 2987.727 3089.374
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -22955.05 370.3499 -61.98 0.000 -23680.94 -22229.1
* What about changing the units of educational attainment from years to months? T-stat remains the same but coeff and SD change. R-square remains the same.
. regress incwage male ib3.metro months_ed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
male | 12144.91 162.8297 74.59 0.000 11825.76 12464.05
|
metro |
Not identifiable | -1735.335 1600.97 -1.08 0.278 -4873.214 1402.545
Not in metro area | -6042.26 216.8909 -27.86 0.000 -6467.364 -5617.157
Central city | -2266.333 211.3957 -10.72 0.000 -2680.666 -1852
Central city status unknown | -3106.574 248.5734 -12.50 0.000 -3593.774 -2619.373
|
months_ed | 253.2126 2.160886 117.18 0.000 248.9773 257.4479
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -22955.05 370.3499 -61.98 0.000 -23680.94 -22229.17
----------------------------------------------------------------------------------------------
. regress incwage female ib3.metro months_ed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12144.91 162.8297 -74.59 0.000 -12464.05 -11825.76
|
metro |
Not identifiable | -1735.335 1600.97 -1.08 0.278 -4873.214 1402.545
Not in metro area | -6042.26 216.8909 -27.86 0.000 -6467.364 -5617.157
Central city | -2266.333 211.3957 -10.72 0.000 -2680.666 -1852
Central city status unknown | -3106.574 248.5734 -12.50 0.000 -3593.774 -2619.373
|
months_ed | 253.2126 2.160886 117.18 0.000 248.9773 257.4479
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -10810.15 372.5182 -29.02 0.000 -11540.28 -10080.02
----------------------------------------------------------------------------------------------
* If we look only at lawyers, obviously the sample size goes down drastically, and on the smaller sample size, all coefficients are different.
. regress incwage female ib3.metro yrsed if lawyers==1
Source | SS df MS Number of obs = 441
-------------+------------------------------ F( 5, 435) = 4.38
Model | 1.0044e+11 5 2.0087e+10 Prob > F = 0.0007
Residual | 1.9964e+12 435 4.5894e+09 R-squared = 0.0479
-------------+------------------------------ Adj R-squared = 0.0370
Total | 2.0968e+12 440 4.7655e+09 Root MSE = 67745
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -22645.57 7116.978 -3.18 0.002 -36633.51 -8657.626
|
metro |
Not in metro area | -37248.55 12181.48 -3.06 0.002 -61190.43 -13306.68
Central city | -963.7417 7048.969 -0.14 0.891 -14818.01 12890.53
Central city status unknown | -16951.57 13043.65 -1.30 0.194 -42587.98 8684.851
|
yrsed | 10454.45 7147.74 1.46 0.144 -3593.948 24502.85
_cons | -91622.81 121461.7 -0.75 0.451 -330347.6 147102
----------------------------------------------------------------------------------------------
* Add in the nurses, sample size goes up, and again every coefficient is different. As sample size goes up, in general SD goes down and T-stats go up.
. regress incwage female ib3.metro yrsed if lawyers==1| nurses==1
Source | SS df MS Number of obs = 1407
-------------+------------------------------ F( 6, 1400) = 39.31
Model | 4.2686e+11 6 7.1143e+10 Prob > F = 0.0000
Residual | 2.5338e+12 1400 1.8099e+09 R-squared = 0.1442
-------------+------------------------------ Adj R-squared = 0.1405
Total | 2.9607e+12 1406 2.1057e+09 Root MSE = 42543
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -30191.21 2718.588 -11.11 0.000 -35524.15 -24858.26
|
metro |
Not identifiable | -5228.588 21353.46 -0.24 0.807 -47116.82 36659.64
Not in metro area | -13520.09 3272.99 -4.13 0.000 -19940.59 -7099.602
Central city | 1147.146 2874.773 0.40 0.690 -4492.181 6786.474
Central city status unknown | -8891.097 3445.64 -2.58 0.010 -15650.27 -2131.923
|
yrsed | 3315.386 807.7033 4.10 0.000 1730.947 4899.825
_cons | 21556.32 13766.42 1.57 0.118 -5448.71 48561.35
----------------------------------------------------------------------------------------------
. regress incwage female ib3.metro yrsed lawyers if lawyers==1| nurses==1
Source | SS df MS Number of obs = 1407
-------------+------------------------------ F( 7, 1399) = 39.43
Model | 4.8784e+11 7 6.9692e+10 Prob > F = 0.0000
Residual | 2.4728e+12 1399 1.7676e+09 R-squared = 0.1648
-------------+------------------------------ Adj R-squared = 0.1606
Total | 2.9607e+12 1406 2.1057e+09 Root MSE = 42042
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -17643.27 3432.434 -5.14 0.000 -24376.54 -10909.99
|
metro |
Not identifiable | -2555.09 21107.38 -0.12 0.904 -43960.61 38850.43
Not in metro area | -11759.26 3248.38 -3.62 0.000 -18131.48 -5387.035
Central city | -553.633 2855.7 -0.19 0.846 -6155.549 5048.283
Central city status unknown | -6600.401 3427.399 -1.93 0.054 -13323.8 122.9936
|
yrsed | 1893.814 834.0932 2.27 0.023 257.6055 3530.022
lawyers | 20572.89 3502.49 5.87 0.000 13702.19 27443.59
_cons | 28369.24 13653.96 2.08 0.038 1584.803 55153.68
----------------------------------------------------------------------------------------------
. regress incwage female ib3.metro yrsed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12144.91 162.8297 -74.59 0.000 -12464.05 -11825.76
|
metro |
Not identifiable | -1735.335 1600.97 -1.08 0.278 -4873.214 1402.545
Not in metro area | -6042.26 216.8909 -27.86 0.000 -6467.364 -5617.157
Central city | -2266.333 211.3957 -10.72 0.000 -2680.666 -1852
Central city status unknown | -3106.574 248.5734 -12.50 0.000 -3593.774 -2619.373
|
yrsed | 3038.551 25.93063 117.18 0.000 2987.727 3089.374
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -10810.15 372.5182 -29.02 0.000 -11540.28 -10080.02
----------------------------------------------------------------------------------------------
. codebook metro
-----------------------------------------------------------------------------------------------------
metro Metropolitan central city status
-----------------------------------------------------------------------------------------------------
type: numeric (byte)
label: metrolbl
range: [0,4] units: 1
unique values: 5 missing .: 0/133710
tabulation: Freq. Numeric Label
340 0 Not identifiable
29658 1 Not in metro area
32481 2 Central city
51468 3 Outside central city
19763 4 Central city status unknown
* Change the comparison category for metro.
. regress incwage female ib2.metro yrsed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12144.91 162.8297 -74.59 0.000 -12464.05 -11825.76
|
metro |
Not identifiable | 530.9983 1604.149 0.33 0.741 -2613.114 3675.11
Not in metro area | -3775.927 238.677 -15.82 0.000 -4243.731 -3308.124
Outside central city | 2266.333 211.3957 10.72 0.000 1852 2680.666
Central city status unknown | -840.2406 268.0711 -3.13 0.002 -1365.656 -314.8247
|
yrsed | 3038.551 25.93063 117.18 0.000 2987.727 3089.374
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -13076.48 377.9098 -34.60 0.000 -13817.18 -12335.78
----------------------------------------------------------------------------------------------
. gen random=runiform()
. summarize random
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
random | 133710 .5010981 .2889151 3.11e-06 .9999956
* reduce the sample size randomly by half. This yields SD larger and T-stat smaller (roughly, because of randomization) by a factor of sqrt(2)
. regress incwage female ib2.metro yrsed lawyers if random<=.5
Source | SS df MS Number of obs = 51342
-------------+------------------------------ F( 7, 51334) = 1639.66
Model | 7.9180e+12 7 1.1311e+12 Prob > F = 0.0000
Residual | 3.5413e+13 51334 689860495 R-squared = 0.1827
-------------+------------------------------ Adj R-squared = 0.1826
Total | 4.3331e+13 51341 843989298 Root MSE = 26265
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12486.96 232.0831 -53.80 0.000 -12941.85 -12032.08
|
metro |
Not identifiable | -1205.238 2333.599 -0.52 0.606 -5779.116 3368.64
Not in metro area | -3528.435 340.4361 -10.36 0.000 -4195.693 -2861.176
Outside central city | 2476.326 301.5399 8.21 0.000 1885.304 3067.347
Central city status unknown | -1075.994 382.9919 -2.81 0.005 -1826.662 -325.3264
|
yrsed | 3055.973 37.04056 82.50 0.000 2983.373 3128.573
lawyers | 41881.49 1786.683 23.44 0.000 38379.57 45383.41
_cons | -13165.17 539.7285 -24.39 0.000 -14223.05 -12107.3
----------------------------------------------------------------------------------------------
. codebook union
-----------------------------------------------------------------------------------------------------
union Union membership
-----------------------------------------------------------------------------------------------------
type: numeric (byte)
label: unionlbl
range: [0,3] units: 1
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.2e+05 0 NIU
11383 1 No union coverage
1883 2 Member of labor union
195 3 Covered by union but not a
member
. gen byte new_union=1 if union==2| union==3
(131632 missing values generated)
. replace new_union=0 if union==1
(11383 real changes made)
. tabulate union new_union
| new_union
Union membership | 0 1 | Total
----------------------+----------------------+----------
No union coverage | 11,383 0 | 11,383
Member of labor union | 0 1,883 | 1,883
Covered by union but | 0 195 | 195
----------------------+----------------------+----------
Total | 11,383 2,078 | 13,461
* Union has a lot of missing values. What if we used union as a predictor in the models? We would get a sharply reduced sample size.
. tabulate union new_union, miss
| new_union
Union membership | 0 1 . | Total
----------------------+---------------------------------+----------
NIU | 0 0 120,249 | 120,249
No union coverage | 11,383 0 0 | 11,383
Member of labor union | 0 1,883 0 | 1,883
Covered by union but | 0 195 0 | 195
----------------------+---------------------------------+----------
Total | 11,383 2,078 120,249 | 133,710
. regress incwage female ib2.metro yrsed lawyers i.new_union
Source | SS df MS Number of obs = 13461
-------------+------------------------------ F( 8, 13452) = 435.79
Model | 2.4371e+12 8 3.0464e+11 Prob > F = 0.0000
Residual | 9.4038e+12 13452 699064454 R-squared = 0.2058
-------------+------------------------------ Adj R-squared = 0.2054
Total | 1.1841e+13 13460 879714936 Root MSE = 26440
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -14355.8 457.3965 -31.39 0.000 -15252.36 -13459.23
|
metro |
Not identifiable | -3710.895 3885.242 -0.96 0.340 -11326.51 3904.724
Not in metro area | -4151.12 679.8949 -6.11 0.000 -5483.809 -2818.43
Outside central city | 3518.947 592.6353 5.94 0.000 2357.298 4680.595
Central city status unknown | -710.1423 758.6761 -0.94 0.349 -2197.254 776.9695
|
yrsed | 3652.481 84.62179 43.16 0.000 3486.61 3818.351
lawyers | 40232.75 2879.16 13.97 0.000 34589.19 45876.3
1.new_union | 3882.035 633.6996 6.13 0.000 2639.895 5124.175
_cons | -12776.82 1250.789 -10.22 0.000 -15228.54 -10325.1
----------------------------------------------------------------------------------------------
. regress incwage female ib2.metro yrsed lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3235.73
Model | 1.5454e+13 7 2.2077e+12 Prob > F = 0.0000
Residual | 7.0423e+13103218 682277869 R-squared = 0.1800
-------------+------------------------------ Adj R-squared = 0.1799
Total | 8.5877e+13103225 831940347 Root MSE = 26120
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12144.91 162.8297 -74.59 0.000 -12464.05 -11825.76
|
metro |
Not identifiable | 530.9983 1604.149 0.33 0.741 -2613.114 3675.11
Not in metro area | -3775.927 238.677 -15.82 0.000 -4243.731 -3308.124
Outside central city | 2266.333 211.3957 10.72 0.000 1852 2680.666
Central city status unknown | -840.2406 268.0711 -3.13 0.002 -1365.656 -314.8247
|
yrsed | 3038.551 25.93063 117.18 0.000 2987.727 3089.374
lawyers | 38622.84 1252.251 30.84 0.000 36168.45 41077.24
_cons | -13076.48 377.9098 -34.60 0.000 -13817.18 -12335.78
----------------------------------------------------------------------------------------------
* aweights change the coefficients and the SD and t-stats a little, but since the weights don’t vary enormously in CPS, the changes are minor. And note aweights yields the same sample size as before, 103K.
. regress incwage female ib2.metro yrsed lawyers [aweight= perwt_rounded]
(sum of wgt is 2.1377e+08)
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 7,103218) = 3277.55
Model | 1.6296e+13 7 2.3279e+12 Prob > F = 0.0000
Residual | 7.3312e+13103218 710267554 R-squared = 0.1819
-------------+------------------------------ Adj R-squared = 0.1818
Total | 8.9608e+13103225 868084070 Root MSE = 26651
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12335.87 166.0723 -74.28 0.000 -12661.37 -12010.37
|
metro |
Not identifiable | 1769.08 1681.643 1.05 0.293 -1526.918 5065.078
Not in metro area | -3103.803 254.555 -12.19 0.000 -3602.728 -2604.879
Outside central city | 2506.564 211.2352 11.87 0.000 2092.545 2920.582
Central city status unknown | -679.0621 279.3252 -2.43 0.015 -1226.536 -131.5885
|
yrsed | 3218.374 27.16669 118.47 0.000 3165.128 3271.62
lawyers | 37575.29 1237.786 30.36 0.000 35149.24 40001.33
_cons | -15674.09 397.9709 -39.39 0.000 -16454.11 -14894.07
----------------------------------------------------------------------------------------------
* But if we use the weights as fweights instead of aweights, we are then magnifying the sample size by 2000 times, and increasing the t-stats and decreasing the SEs by a factor of sqrt(2000)=approximately 42
. regress incwage female ib2.metro yrsed lawyers [fweight= perwt_rounded]
Source | SS df MS Number of obs =213773851
-------------+------------------------------ F( 7,213773843) = .
Model | 3.3747e+16 7 4.8210e+15 Prob > F = 0.0000
Residual | 1.5182e+17213773843 710212535 R-squared = 0.1819
-------------+------------------------------ Adj R-squared = 0.1819
Total | 1.8557e+17213773850 868075664 Root MSE = 26650
----------------------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
female | -12335.87 3.6492 -3380.43 0.000 -12343.02 -12328.72
|
metro |
Not identifiable | 1769.08 36.95168 47.88 0.000 1696.656 1841.504
Not in metro area | -3103.803 5.59348 -554.90 0.000 -3114.766 -3092.84
Outside central city | 2506.564 4.64159 540.02 0.000 2497.466 2515.661
Central city status unknown | -679.0621 6.137768 -110.64 0.000 -691.092 -667.0323
|
yrsed | 3218.374 .5969488 5391.37 0.000 3217.204 3219.544
lawyers | 37575.29 27.19856 1381.52 0.000 37521.98 37628.59
_cons | -15674.09 8.744839 -1792.38 0.000 -15691.23 -15656.95
----------------------------------------------------------------------------------------------
. save "C:\Users\mexmi\Documents\current class files\intro soc methods\cps_mar_2000_new with addition
> al vars.dta", replace
file C:\Users\mexmi\Documents\current class files\intro soc methods\cps_mar_2000_new with additional
> vars.dta saved
. exit