. edit

(8 vars, 11 obs pasted into editor)

- preserve

. twoway (scatter y3 x3) (lfit y3 x3)

. *looking at this plot would lead you to believe that the point with the largest residual is also the most influential point. It doesn't have to be that way...

. regress y3 x3

Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.97

Model |  27.4700075     1  27.4700075           Prob > F      =  0.0022

Residual |  13.7561905     9  1.52846561           R-squared     =  0.6663

Total |  41.2261979    10  4.12261979           Root MSE      =  1.2363

------------------------------------------------------------------------------

y3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

x3 |   .4997273   .1178777     4.24   0.002     .2330695    .7663851

_cons |   3.002455   1.124481     2.67   0.026     .4587014    5.546208

------------------------------------------------------------------------------

. rvfplot, yline(0)

. predict y3x3_resid, residual

. predict y3x3_dfbeta, dfbeta(x3)

. gen abs_resid=abs( y3x3_resid)

. gen abs_dfbeta=abs( y3x3_dfbeta)

. sort  abs_dfbeta

. list x3 y3  y3x3_resid y3x3_dfbeta

+------------------------------------+

| x3      y3   y3x3_re~d   y3x3_df~a |

|------------------------------------|

1. |  9    7.11   -.3899998           0 |

2. |  7    6.42   -.0805454    .0134248 |

3. |  8    6.77   -.2302727    .0186437 |

4. |  6    6.08    .0791818   -.0208842 |

5. | 10    7.46   -.5397272   -.0441267 |

|------------------------------------|

6. |  5    5.73    .2289091   -.0874021 |

7. | 11    7.81   -.6894546   -.1172275 |

8. |  4    5.39    .3886362    -.208915 |

9. | 12    8.15   -.8491821   -.2313598 |

10. | 14    8.84   -1.158636   -.6674063 |

|------------------------------------|

11. | 13   12.74    3.241091    525.2796 |

+------------------------------------+

. list x3 y3   abs_resid abs_dfbeta

+----------------------------------+

| x3      y3   abs_re~d   abs_df~a |

|----------------------------------|

1. |  9    7.11   .3899998          0 |

2. |  7    6.42   .0805454   .0134248 |

3. |  8    6.77   .2302727   .0186437 |

4. |  6    6.08   .0791818   .0208842 |

5. | 10    7.46   .5397272   .0441267 |

|----------------------------------|

6. |  5    5.73   .2289091   .0874021 |

7. | 11    7.81   .6894546   .1172275 |

8. |  4    5.39   .3886362    .208915 |

9. | 12    8.15   .8491821   .2313598 |

10. | 14    8.84   1.158636   .6674063 |

|----------------------------------|

11. | 13   12.74   3.241091   525.2796 |

+----------------------------------+

. *In this case the influence of the one point, x=13, y=12.74 is our only influential point. It is also the largest residual.

. twoway (scatter y3 x3) (lfit y3 x3)

. clear all

use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\fifty_state_dataset.dta"

> , clear

. twoway (scatter incwage  US_born_pct, mlabel(statefip)

parentheses do not balance

r(198);

. twoway (scatter incwage  US_born_pct, mlabel(statefip))

. twoway (scatter incwage  US_born_pct, mlabel(statefip)) (lfit  incwage US_born_pct)

. clear

use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

. tabulate  ownershp

Ownership of dwelling |      Freq.     Percent        Cum.

----------------------+-----------------------------------

Owned or being bought |     92,671       69.31       69.31

No cash rent |      1,986        1.49       70.79

With cash rent |     39,053       29.21      100.00

----------------------+-----------------------------------

Total |    133,710      100.00

. tabulate  ownershp, nolab

Ownership |

of dwelling |      Freq.     Percent        Cum.

------------+-----------------------------------

10 |     92,671       69.31       69.31

21 |      1,986        1.49       70.79

22 |     39,053       29.21      100.00

------------+-----------------------------------

Total |    133,710      100.00

. gen byte homeowner=0

. replace  homeowner=1 if ownershp==10

. tabulate  ownershp homeowner

|       homeowner

Ownership of dwelling |         0          1 |     Total

----------------------+----------------------+----------

Owned or being bought |         0     92,671 |    92,671

No cash rent |     1,986          0 |     1,986

With cash rent |    39,053          0 |    39,053

----------------------+----------------------+----------

Total |    41,039     92,671 |   133,710

. xi: regress  homeowner yrsed i.race age age_sq

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

r(111);

. gen age_sq=age^2

. xi: regress  homeowner yrsed i.race age age_sq

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  6,103219) = 1307.03

Model |  1507.14778     6  251.191296           Prob > F      =  0.0000

Residual |  19837.1291103219   .19218486           R-squared     =  0.0706

Total |  21344.2769103225  .206774298           Root MSE      =  .43839

------------------------------------------------------------------------------

homeowner |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   .0157783   .0004529    34.83   0.000     .0148905     .016666

_Irace_200 |  -.1705199   .0046503   -36.67   0.000    -.1796344   -.1614053

_Irace_300 |  -.0681401   .0121678    -5.60   0.000    -.0919887   -.0442914

_Irace_650 |  -.1310786   .0074083   -17.69   0.000    -.1455988   -.1165584

age |   .0073786   .0003761    19.62   0.000     .0066416    .0081157

age_sq |  -.0000256   3.89e-06    -6.58   0.000    -.0000332    -.000018

_cons |   .2644731   .0087256    30.31   0.000     .2473709    .2815752

------------------------------------------------------------------------------

. predict homeownderV1

(option xb assumed; fitted values)

(30484 missing values generated)

. summarize homeownderV1

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeownderV1 |    103226    .7079127     .120833   .1988769   .9895798

. xi: regress  homeowner yrsed i.race age age_sq if age>19

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

Source |       SS       df       MS              Number of obs =   93544

-------------+------------------------------           F(  6, 93537) = 2059.91

Model |  2254.92051     6  375.820084           Prob > F      =  0.0000

Residual |  17065.3136 93537  .182444526           R-squared     =  0.1167

Total |  19320.2341 93543  .206538534           Root MSE      =  .42714

------------------------------------------------------------------------------

homeowner |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |    .021881   .0004587    47.70   0.000      .020982    .0227801

_Irace_200 |  -.1652352   .0047924   -34.48   0.000    -.1746283    -.155842

_Irace_300 |   -.074545   .0127958    -5.83   0.000    -.0996246   -.0494655

_Irace_650 |  -.1355203   .0076276   -17.77   0.000    -.1504704   -.1205702

age |   .0255745   .0004594    55.67   0.000     .0246742    .0264749

age_sq |  -.0001845   4.50e-06   -41.03   0.000    -.0001933   -.0001756

_cons |  -.2904088   .0119861   -24.23   0.000    -.3139014   -.2669161

------------------------------------------------------------------------------

. drop  homeownderV1

. predict homeownderV1

(option xb assumed; fitted values)

(30484 missing values generated)

. summarize homeownderV1

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeownderV1 |    103226    .6689267    .1941697  -.1135276   .9680372

. *Here we have some negative predicted values for home ownership, that's a problem...

. xi: logit  homeowner yrsed i.race age age_sq if age>19

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

Iteration 0:   log likelihood = -56454.512

Iteration 1:   log likelihood =  -51074.78

Iteration 2:   log likelihood =  -50967.81

Iteration 3:   log likelihood = -50967.623

Logistic regression                               Number of obs   =      93544

LR chi2(6)      =   10973.78

Prob > chi2     =     0.0000

Log likelihood = -50967.623                       Pseudo R2       =     0.0972

------------------------------------------------------------------------------

homeowner |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |    .117566   .0025024    46.98   0.000     .1126614    .1224706

_Irace_200 |  -.8096816    .024064   -33.65   0.000    -.8568461   -.7625171

_Irace_300 |  -.3733268   .0653883    -5.71   0.000    -.5014856    -.245168

_Irace_650 |  -.6811511   .0385774   -17.66   0.000    -.7567614   -.6055409

age |   .1184691    .002498    47.43   0.000      .113573    .1233651

age_sq |   -.000819    .000025   -32.79   0.000    -.0008679     -.00077

_cons |  -3.920357   .0644681   -60.81   0.000    -4.046712   -3.794001

------------------------------------------------------------------------------

. predict homeowner2_logit

(option pr assumed; Pr(homeowner))

(30484 missing values generated)

. summarize  homeowner2_logit

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeowner2~t |    103226    .6666674    .1992733   .0415957     .91392

. *With logistic regression, we get coefficients in the same directions (usually but not always) and similar but not exactly the same significance statistics (Z vs T, but as we know these are quite similar with large N). Also, you are guaranteed to get predicted values between 0 and 1, which can be important.

. xi: logit  homeowner yrsed i.race age age_sq if age>19, or

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

Iteration 0:   log likelihood = -56454.512

Iteration 1:   log likelihood =  -51074.78

Iteration 2:   log likelihood =  -50967.81

Iteration 3:   log likelihood = -50967.623

Logistic regression                               Number of obs   =      93544

LR chi2(6)      =   10973.78

Prob > chi2     =     0.0000

Log likelihood = -50967.623                       Pseudo R2       =     0.0972

------------------------------------------------------------------------------

homeowner | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   1.124756   .0028146    46.98   0.000     1.119253    1.130286

_Irace_200 |   .4449997   .0107085   -33.65   0.000     .4244988    .4664908

_Irace_300 |   .6884402    .045016    -5.71   0.000     .6056303     .782573

_Irace_650 |   .5060342   .0195215   -17.66   0.000     .4691835    .5457791

age |   1.125772   .0028122    47.43   0.000     1.120274    1.131297

age_sq |   .9991814    .000025   -32.79   0.000     .9991325    .9992303

------------------------------------------------------------------------------

. *Odds ratios of less than 1 are decreases, and odds ratios of more than 1 are increases. Think of the odds ratios as multiplicative. Coefficients of zero correspond to odds ratios of 1 when you exponentiate zero, you get 1.

save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta",

> replace

file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved

. exit, clear