-----------------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\class10_2009.log

  log type:  text

 opened on:  26 Feb 2009, 11:22:19

 

. set mem 200m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory          200M    max. data space                200.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               203.163M

 

. edit

(8 vars, 11 obs pasted into editor)

- preserve

 

. twoway (scatter y3 x3) (lfit y3 x3)

 

. *looking at this plot would lead you to believe that the point with the largest residual is also the most influential point. It doesn't have to be that way...

 

. regress y3 x3

 

      Source |       SS       df       MS              Number of obs =      11

-------------+------------------------------           F(  1,     9) =   17.97

       Model |  27.4700075     1  27.4700075           Prob > F      =  0.0022

    Residual |  13.7561905     9  1.52846561           R-squared     =  0.6663

-------------+------------------------------           Adj R-squared =  0.6292

       Total |  41.2261979    10  4.12261979           Root MSE      =  1.2363

 

------------------------------------------------------------------------------

          y3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

          x3 |   .4997273   .1178777     4.24   0.002     .2330695    .7663851

       _cons |   3.002455   1.124481     2.67   0.026     .4587014    5.546208

------------------------------------------------------------------------------

 

. rvfplot, yline(0)

 

. predict y3x3_resid, residual

 

. predict y3x3_dfbeta, dfbeta(x3)

 

. gen abs_resid=abs( y3x3_resid)

 

. gen abs_dfbeta=abs( y3x3_dfbeta)

 

. sort  abs_dfbeta

 

. list x3 y3  y3x3_resid y3x3_dfbeta

 

     +------------------------------------+

     | x3      y3   y3x3_re~d   y3x3_df~a |

     |------------------------------------|

  1. |  9    7.11   -.3899998           0 |

  2. |  7    6.42   -.0805454    .0134248 |

  3. |  8    6.77   -.2302727    .0186437 |

  4. |  6    6.08    .0791818   -.0208842 |

  5. | 10    7.46   -.5397272   -.0441267 |

     |------------------------------------|

  6. |  5    5.73    .2289091   -.0874021 |

  7. | 11    7.81   -.6894546   -.1172275 |

  8. |  4    5.39    .3886362    -.208915 |

  9. | 12    8.15   -.8491821   -.2313598 |

 10. | 14    8.84   -1.158636   -.6674063 |

     |------------------------------------|

 11. | 13   12.74    3.241091    525.2796 |

     +------------------------------------+

 

. list x3 y3   abs_resid abs_dfbeta

 

     +----------------------------------+

     | x3      y3   abs_re~d   abs_df~a |

     |----------------------------------|

  1. |  9    7.11   .3899998          0 |

  2. |  7    6.42   .0805454   .0134248 |

  3. |  8    6.77   .2302727   .0186437 |

  4. |  6    6.08   .0791818   .0208842 |

  5. | 10    7.46   .5397272   .0441267 |

     |----------------------------------|

  6. |  5    5.73   .2289091   .0874021 |

  7. | 11    7.81   .6894546   .1172275 |

  8. |  4    5.39   .3886362    .208915 |

  9. | 12    8.15   .8491821   .2313598 |

 10. | 14    8.84   1.158636   .6674063 |

     |----------------------------------|

 11. | 13   12.74   3.241091   525.2796 |

     +----------------------------------+

 

. *In this case the influence of the one point, x=13, y=12.74 is our only influential point. It is also the largest residual.

 

. twoway (scatter y3 x3) (lfit y3 x3)

 

. clear all

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\fifty_state_dataset.dta"

> , clear

 

. twoway (scatter incwage  US_born_pct, mlabel(statefip)

parentheses do not balance

r(198);

 

. twoway (scatter incwage  US_born_pct, mlabel(statefip))

 

. twoway (scatter incwage  US_born_pct, mlabel(statefip)) (lfit  incwage US_born_pct)

 

. clear

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

. tabulate  ownershp

 

Ownership of dwelling |      Freq.     Percent        Cum.

----------------------+-----------------------------------

Owned or being bought |     92,671       69.31       69.31

         No cash rent |      1,986        1.49       70.79

       With cash rent |     39,053       29.21      100.00

----------------------+-----------------------------------

                Total |    133,710      100.00

 

. tabulate  ownershp, nolab

 

  Ownership |

of dwelling |      Freq.     Percent        Cum.

------------+-----------------------------------

         10 |     92,671       69.31       69.31

         21 |      1,986        1.49       70.79

         22 |     39,053       29.21      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. gen byte homeowner=0

 

. replace  homeowner=1 if ownershp==10

(92671 real changes made)

 

. tabulate  ownershp homeowner

 

                      |       homeowner

Ownership of dwelling |         0          1 |     Total

----------------------+----------------------+----------

Owned or being bought |         0     92,671 |    92,671

         No cash rent |     1,986          0 |     1,986

       With cash rent |    39,053          0 |    39,053

----------------------+----------------------+----------

                Total |    41,039     92,671 |   133,710

 

 

. xi: regress  homeowner yrsed i.race age age_sq

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

variable age_sq not found

r(111);

 

. gen age_sq=age^2

 

. xi: regress  homeowner yrsed i.race age age_sq

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  6,103219) = 1307.03

       Model |  1507.14778     6  251.191296           Prob > F      =  0.0000

    Residual |  19837.1291103219   .19218486           R-squared     =  0.0706

-------------+------------------------------           Adj R-squared =  0.0706

       Total |  21344.2769103225  .206774298           Root MSE      =  .43839

 

------------------------------------------------------------------------------

   homeowner |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |   .0157783   .0004529    34.83   0.000     .0148905     .016666

  _Irace_200 |  -.1705199   .0046503   -36.67   0.000    -.1796344   -.1614053

  _Irace_300 |  -.0681401   .0121678    -5.60   0.000    -.0919887   -.0442914

  _Irace_650 |  -.1310786   .0074083   -17.69   0.000    -.1455988   -.1165584

         age |   .0073786   .0003761    19.62   0.000     .0066416    .0081157

      age_sq |  -.0000256   3.89e-06    -6.58   0.000    -.0000332    -.000018

       _cons |   .2644731   .0087256    30.31   0.000     .2473709    .2815752

------------------------------------------------------------------------------

 

. predict homeownderV1

(option xb assumed; fitted values)

(30484 missing values generated)

 

. summarize homeownderV1

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeownderV1 |    103226    .7079127     .120833   .1988769   .9895798

 

. xi: regress  homeowner yrsed i.race age age_sq if age>19

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

 

      Source |       SS       df       MS              Number of obs =   93544

-------------+------------------------------           F(  6, 93537) = 2059.91

       Model |  2254.92051     6  375.820084           Prob > F      =  0.0000

    Residual |  17065.3136 93537  .182444526           R-squared     =  0.1167

-------------+------------------------------           Adj R-squared =  0.1167

       Total |  19320.2341 93543  .206538534           Root MSE      =  .42714

 

------------------------------------------------------------------------------

   homeowner |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |    .021881   .0004587    47.70   0.000      .020982    .0227801

  _Irace_200 |  -.1652352   .0047924   -34.48   0.000    -.1746283    -.155842

  _Irace_300 |   -.074545   .0127958    -5.83   0.000    -.0996246   -.0494655

  _Irace_650 |  -.1355203   .0076276   -17.77   0.000    -.1504704   -.1205702

         age |   .0255745   .0004594    55.67   0.000     .0246742    .0264749

      age_sq |  -.0001845   4.50e-06   -41.03   0.000    -.0001933   -.0001756

       _cons |  -.2904088   .0119861   -24.23   0.000    -.3139014   -.2669161

------------------------------------------------------------------------------

 

. drop  homeownderV1

 

. predict homeownderV1

(option xb assumed; fitted values)

(30484 missing values generated)

 

. summarize homeownderV1

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeownderV1 |    103226    .6689267    .1941697  -.1135276   .9680372

 

. *Here we have some negative predicted values for home ownership, that's a problem...

 

. xi: logit  homeowner yrsed i.race age age_sq if age>19

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

 

Iteration 0:   log likelihood = -56454.512

Iteration 1:   log likelihood =  -51074.78

Iteration 2:   log likelihood =  -50967.81

Iteration 3:   log likelihood = -50967.623

 

Logistic regression                               Number of obs   =      93544

                                                  LR chi2(6)      =   10973.78

                                                  Prob > chi2     =     0.0000

Log likelihood = -50967.623                       Pseudo R2       =     0.0972

 

------------------------------------------------------------------------------

   homeowner |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |    .117566   .0025024    46.98   0.000     .1126614    .1224706

  _Irace_200 |  -.8096816    .024064   -33.65   0.000    -.8568461   -.7625171

  _Irace_300 |  -.3733268   .0653883    -5.71   0.000    -.5014856    -.245168

  _Irace_650 |  -.6811511   .0385774   -17.66   0.000    -.7567614   -.6055409

         age |   .1184691    .002498    47.43   0.000      .113573    .1233651

      age_sq |   -.000819    .000025   -32.79   0.000    -.0008679     -.00077

       _cons |  -3.920357   .0644681   -60.81   0.000    -4.046712   -3.794001

------------------------------------------------------------------------------

 

. predict homeowner2_logit

(option pr assumed; Pr(homeowner))

(30484 missing values generated)

 

. summarize  homeowner2_logit

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

homeowner2~t |    103226    .6666674    .1992733   .0415957     .91392

 

. *With logistic regression, we get coefficients in the same directions (usually but not always) and similar but not exactly the same significance statistics (Z vs T, but as we know these are quite similar with large N). Also, you are guaranteed to get predicted values between 0 and 1, which can be important.

 

. xi: logit  homeowner yrsed i.race age age_sq if age>19, or

i.race            _Irace_100-650      (naturally coded; _Irace_100 omitted)

 

Iteration 0:   log likelihood = -56454.512

Iteration 1:   log likelihood =  -51074.78

Iteration 2:   log likelihood =  -50967.81

Iteration 3:   log likelihood = -50967.623

 

Logistic regression                               Number of obs   =      93544

                                                  LR chi2(6)      =   10973.78

                                                  Prob > chi2     =     0.0000

Log likelihood = -50967.623                       Pseudo R2       =     0.0972

 

------------------------------------------------------------------------------

   homeowner | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |   1.124756   .0028146    46.98   0.000     1.119253    1.130286

  _Irace_200 |   .4449997   .0107085   -33.65   0.000     .4244988    .4664908

  _Irace_300 |   .6884402    .045016    -5.71   0.000     .6056303     .782573

  _Irace_650 |   .5060342   .0195215   -17.66   0.000     .4691835    .5457791

         age |   1.125772   .0028122    47.43   0.000     1.120274    1.131297

      age_sq |   .9991814    .000025   -32.79   0.000     .9991325    .9992303

------------------------------------------------------------------------------

 

. *Odds ratios of less than 1 are decreases, and odds ratios of more than 1 are increases. Think of the odds ratios as multiplicative. Coefficients of zero correspond to odds ratios of 1 when you exponentiate zero, you get 1.

 

. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta",

> replace

file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved

 

. exit, clear