--------------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2011_381_logs\class8.log

log type:  text

opened on:  20 Oct 2011, 13:23:48

. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

* Take a look at Excel worksheet pages on regression additivity, and regression linearity

. table yrsed sex if age>=30 & age<=39 & incwage>0, contents(freq mean incwage)

------------------------------------

based on  |           Sex

educrec   |        Male       Female

----------+-------------------------

0 |          21           12

| 17776.19048  14534.33333

|

2.5 |          94           39

| 18555.43617  10650.89744

|

6.5 |         326          204

| 20013.61963  12691.59314

|

9 |         173          101

|  18950.9422  10504.71287

|

10 |         180          151

| 22419.21111  11830.55629

|

11 |         235          178

| 22384.86383  13230.02247

|

12 |       3,082        2,510

|  31565.6486  18713.77171

|

14 |       2,269        2,380

|  37670.1353  22863.12983

|

17 |       2,506        2,281

| 59410.84158  37053.81149

------------------------------------

. table yrsed sex if age>=30 & age<=39 & incwage>0, contents(freq mean incwage p25 incwage median incwage p75 incwage)

------------------------------------

based on  |           Sex

educrec   |        Male       Female

----------+-------------------------

0 |          21           12

| 17776.19048  14534.33333

|       12480         7500

|       15000        12340

|       22000        14500

|

2.5 |          94           39

| 18555.43617  10650.89744

|       12000         7600

|       16000        10000

|       23000        14820

|

6.5 |         326          204

| 20013.61963  12691.59314

|       12000         6450

|       16950        11930

|       25000        15220

|

9 |         173          101

|  18950.9422  10504.71287

|       12000         4300

|       17000        10000

|       24000        14000

|

10 |         180          151

| 22419.21111  11830.55629

|       12740         5600

|       20000        10404

|       29000        16000

|

11 |         235          178

| 22384.86383  13230.02247

|       13000         6000

|       20000        11880

|       30000        18000

|

12 |       3,082        2,510

|  31565.6486  18713.77171

|       18200        10000

|       28321        16900

|       40000        25000

|

14 |       2,269        2,380

|  37670.1353  22863.12983

|       23777        11000

|       34000        20000

|       46000        30000

|

17 |       2,506        2,281

| 59410.84158  37053.81149

|       32300        20000

|       50000        32875

|       70000        47500

------------------------------------

. regress incwage yrsed male if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  2, 16739) = 1895.13

Model |  2.6827e+12     2  1.3413e+12           Prob > F      =  0.0000

Residual |  1.1848e+13 16739   707778375           R-squared     =  0.1846

Total |  1.4530e+13 16741   867939018           Root MSE      =   26604

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   3636.148   73.18493    49.68   0.000     3492.698    3779.598

male |   16014.47   412.5284    38.82   0.000     15205.88    16823.07

_cons |  -25264.87   1050.082   -24.06   0.000    -27323.14    -23206.6

------------------------------------------------------------------------------

. *we will call this Model 1

. predict linear_M1

(option xb assumed; fitted values)

(30484 missing values generated)

. table yrsed sex if age>=30 & age<=39 & incwage>0, contents(freq mean incwage mean linear_M1)

------------------------------------

based on  |           Sex

educrec   |        Male       Female

----------+-------------------------

0 |          21           12

| 17776.19048  14534.33333

|   -9250.396    -25264.87

|

2.5 |          94           39

| 18555.43617  10650.89744

|   -160.0259     -16174.5

|

6.5 |         326          204

| 20013.61963  12691.59314

|    14384.57    -1629.909

|

9 |         173          101

|  18950.9422  10504.71287

|    23474.94      7460.46

|

10 |         180          151

| 22419.21111  11830.55629

|    27111.08     11096.61

|

11 |         235          178

| 22384.86383  13230.02247

|    30747.23     14732.76

|

12 |       3,082        2,510

|  31565.6486  18713.77171

|    34383.38      18368.9

|

14 |       2,269        2,380

|  37670.1353  22863.12983

|    41655.68      25641.2

|

17 |       2,506        2,281

| 59410.84158  37053.81149

|    52564.12     36549.64

------------------------------------

. gen HS=0

. replace HS=1 if yrsed==12

. gen byte Assoc=0

. replace Assoc=1 if yrsed==14

. gen byte BA_plus=0

. replace BA_plus=1 if yrsed==17

. regress incwage HS Assoc BA_plus male if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  4, 16737) = 1071.59

Model |  2.9625e+12     4  7.4062e+11           Prob > F      =  0.0000

Residual |  1.1568e+13 16737   691144260           R-squared     =  0.2039

Total |  1.4530e+13 16741   867939018           Root MSE      =   26290

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

HS |   9300.967   726.1082    12.81   0.000     7877.718    10724.22

Assoc |   14583.65   744.3114    19.59   0.000     13124.72    16042.58

BA_plus |   32695.45   740.6737    44.14   0.000     31243.65    34147.25

male |   15691.06    408.079    38.45   0.000     14891.18    16490.94

_cons |   7848.007   680.6283    11.53   0.000     6513.903     9182.11

------------------------------------------------------------------------------

. predict M2_nonlinear

(option xb assumed; fitted values)

. table yrsed sex if age>=30 & age<=39 & incwage>0, contents(freq mean incwage mean M2_nonlinear)

------------------------------------

based on  |           Sex

educrec   |        Male       Female

----------+-------------------------

0 |          21           12

| 17776.19048  14534.33333

|    23539.06     7848.006

|

2.5 |          94           39

| 18555.43617  10650.89744

|    23539.06     7848.006

|

6.5 |         326          204

| 20013.61963  12691.59314

|    23539.06     7848.006

|

9 |         173          101

|  18950.9422  10504.71287

|    23539.06     7848.006

|

10 |         180          151

| 22419.21111  11830.55629

|    23539.06     7848.006

|

11 |         235          178

| 22384.86383  13230.02247

|    23539.06     7848.006

|

12 |       3,082        2,510

|  31565.6486  18713.77171

|    32840.03     17148.97

|

14 |       2,269        2,380

|  37670.1353  22863.12983

|    38122.71     22431.66

|

17 |       2,506        2,281

| 59410.84158  37053.81149

|    56234.51     40543.46

------------------------------------

. table educrec, contents(mean yrsed)

-------------------------------------

Educational attainment  |

recode                  | mean(yrsed)

------------------------+------------

NIU |

None or preschool |           0

Grades 1, 2, 3, or 4 |         2.5

Grades 5, 6, 7, or 8 |         6.5

1 to 3 years of college |          14

4+ years of college |          17

-------------------------------------

* And now a brief look at what changes and what doesn't change in regression when we change the inputs.

. regress incwage yrsed male if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  2, 16739) = 1895.13

Model |  2.6827e+12     2  1.3413e+12           Prob > F      =  0.0000

Residual |  1.1848e+13 16739   707778375           R-squared     =  0.1846

Total |  1.4530e+13 16741   867939018           Root MSE      =   26604

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   3636.148   73.18493    49.68   0.000     3492.698    3779.598

male |   16014.47   412.5284    38.82   0.000     15205.88    16823.07

_cons |  -25264.87   1050.082   -24.06   0.000    -27323.14    -23206.6

------------------------------------------------------------------------------

. regress incwage yrsed female if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  2, 16739) = 1895.13

Model |  2.6827e+12     2  1.3413e+12           Prob > F      =  0.0000

Residual |  1.1848e+13 16739   707778375           R-squared     =  0.1846

Total |  1.4530e+13 16741   867939018           Root MSE      =   26604

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   3636.148   73.18493    49.68   0.000     3492.698    3779.598

female |  -16014.47   412.5284   -38.82   0.000    -16823.07   -15205.88

_cons |  -9250.395   1025.037    -9.02   0.000    -11259.58   -7241.214

------------------------------------------------------------------------------

* Changing the excluded category of gender, from female to male, reverses that coefficient, the SE is the same, so the T-statistic is reversed but it still means the same thing. The yrsed coefficient, SE and T-stat are unchanged, as is the R-square. The constant has changed. The model is exactly the same in substance, but different in appearance.

. regress incwage yrsed female i.metro if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  6, 16735) =  685.40

Model |  2.8662e+12     6  4.7771e+11           Prob > F      =  0.0000

Residual |  1.1664e+13 16735   696977740           R-squared     =  0.1973

Total |  1.4530e+13 16741   867939018           Root MSE      =   26400

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   3554.301   72.86128    48.78   0.000     3411.485    3697.117

female |  -15890.47   409.5059   -38.80   0.000    -16693.15    -15087.8

|

metro |

1  |  -481.5509   3800.688    -0.13   0.899    -7931.301    6968.199

2  |   4539.458    3793.82     1.20   0.232     -2896.83    11975.75

3  |   8288.016   3785.255     2.19   0.029     868.5163    15707.52

4  |   2866.183       3811     0.75   0.452     -4603.78    10336.15

|

_cons |  -13037.94   3914.939    -3.33   0.001    -20711.63   -5364.245

------------------------------------------------------------------------------

* When you add a new variable, everything changes, but here N stays the same (because there appear to be no missing values for metro) and R-square goes up a little bit.

. regress incwage yrsed female ib4.metro if age>=30 & age<=39 & incwage>0

Source |       SS       df       MS              Number of obs =   16742

-------------+------------------------------           F(  6, 16735) =  685.40

Model |  2.8662e+12     6  4.7771e+11           Prob > F      =  0.0000

Residual |  1.1664e+13 16735   696977740           R-squared     =  0.1973

Total |  1.4530e+13 16741   867939018           Root MSE      =   26400

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   3554.301   72.86128    48.78   0.000     3411.485    3697.117

female |  -15890.47   409.5059   -38.80   0.000    -16693.15    -15087.8

|

metro |

0  |  -2866.183       3811    -0.75   0.452    -10336.15     4603.78

1  |  -3347.734   714.7852    -4.68   0.000    -4748.788   -1946.679

2  |   1673.275   679.1819     2.46   0.014      342.007    3004.544

3  |   5421.833   631.1918     8.59   0.000     4184.631    6659.036

|

_cons |  -10171.76   1146.288    -8.87   0.000     -12418.6    -7924.91

------------------------------------------------------------------------------

* Changing the excluded category of metro gives very different looking metro coefficients, and T-statistics, and the constant changes, but the model is the same model, and the other variables are unchanged.

. log close

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web

> pages\soc_meth_proj3\fall_2011_381_logs\class8.log

log type:  text

closed on:  20 Oct 2011, 15:31:08

--------------------------------------------------------------------------------