-----------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\class10_2009.log
log type: text
opened on: 26 Feb 2009, 11:22:19
. set mem 200m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 200M max. data space 200.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
203.163M
. edit
(8 vars, 11 obs pasted into editor)
- preserve
. twoway (scatter y3 x3) (lfit y3 x3)
. *looking at this plot would lead you to believe that the point with the largest residual is also the most influential point. It doesn't have to be that way...
. regress y3 x3
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.97
Model | 27.4700075 1 27.4700075 Prob > F = 0.0022
Residual | 13.7561905 9 1.52846561 R-squared = 0.6663
-------------+------------------------------ Adj R-squared = 0.6292
Total | 41.2261979 10 4.12261979 Root MSE = 1.2363
------------------------------------------------------------------------------
y3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x3 | .4997273 .1178777 4.24 0.002 .2330695 .7663851
_cons | 3.002455 1.124481 2.67 0.026 .4587014 5.546208
------------------------------------------------------------------------------
. rvfplot, yline(0)
. predict y3x3_resid, residual
. predict y3x3_dfbeta, dfbeta(x3)
. gen abs_resid=abs( y3x3_resid)
. gen abs_dfbeta=abs( y3x3_dfbeta)
. sort abs_dfbeta
. list x3 y3 y3x3_resid y3x3_dfbeta
+------------------------------------+
| x3 y3 y3x3_re~d y3x3_df~a |
|------------------------------------|
1. | 9 7.11 -.3899998 0 |
2. | 7 6.42 -.0805454 .0134248 |
3. | 8 6.77 -.2302727 .0186437 |
4. | 6 6.08 .0791818 -.0208842 |
5. | 10 7.46 -.5397272 -.0441267 |
|------------------------------------|
6. | 5 5.73 .2289091 -.0874021 |
7. | 11 7.81 -.6894546 -.1172275 |
8. | 4 5.39 .3886362 -.208915 |
9. | 12 8.15 -.8491821 -.2313598 |
10. | 14 8.84 -1.158636 -.6674063 |
|------------------------------------|
11. | 13 12.74 3.241091 525.2796 |
+------------------------------------+
. list x3 y3 abs_resid abs_dfbeta
+----------------------------------+
| x3 y3 abs_re~d abs_df~a |
|----------------------------------|
1. | 9 7.11 .3899998 0 |
2. | 7 6.42 .0805454 .0134248 |
3. | 8 6.77 .2302727 .0186437 |
4. | 6 6.08 .0791818 .0208842 |
5. | 10 7.46 .5397272 .0441267 |
|----------------------------------|
6. | 5 5.73 .2289091 .0874021 |
7. | 11 7.81 .6894546 .1172275 |
8. | 4 5.39 .3886362 .208915 |
9. | 12 8.15 .8491821 .2313598 |
10. | 14 8.84 1.158636 .6674063 |
|----------------------------------|
11. | 13 12.74 3.241091 525.2796 |
+----------------------------------+
. *In this case the influence of the one point, x=13, y=12.74 is our only influential point. It is also the largest residual.
. twoway (scatter y3 x3) (lfit y3 x3)
. clear all
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\fifty_state_dataset.dta"
> , clear
. twoway (scatter incwage US_born_pct, mlabel(statefip)
parentheses do not balance
r(198);
. twoway (scatter incwage US_born_pct, mlabel(statefip))
. twoway (scatter incwage US_born_pct, mlabel(statefip)) (lfit incwage US_born_pct)
. clear
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
. tabulate ownershp
Ownership of dwelling | Freq. Percent Cum.
----------------------+-----------------------------------
Owned or being bought | 92,671 69.31 69.31
No cash rent | 1,986 1.49 70.79
With cash rent | 39,053 29.21 100.00
----------------------+-----------------------------------
Total | 133,710 100.00
. tabulate ownershp, nolab
Ownership |
of dwelling | Freq. Percent Cum.
------------+-----------------------------------
10 | 92,671 69.31 69.31
21 | 1,986 1.49 70.79
22 | 39,053 29.21 100.00
------------+-----------------------------------
Total | 133,710 100.00
. gen byte homeowner=0
. replace homeowner=1 if ownershp==10
(92671 real changes made)
. tabulate ownershp homeowner
| homeowner
Ownership of dwelling | 0 1 | Total
----------------------+----------------------+----------
Owned or being bought | 0 92,671 | 92,671
No cash rent | 1,986 0 | 1,986
With cash rent | 39,053 0 | 39,053
----------------------+----------------------+----------
Total | 41,039 92,671 | 133,710
. xi: regress homeowner yrsed i.race age age_sq
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
variable age_sq not found
r(111);
. gen age_sq=age^2
. xi: regress homeowner yrsed i.race age age_sq
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 6,103219) = 1307.03
Model | 1507.14778 6 251.191296 Prob > F = 0.0000
Residual | 19837.1291103219 .19218486 R-squared = 0.0706
-------------+------------------------------ Adj R-squared = 0.0706
Total | 21344.2769103225 .206774298 Root MSE = .43839
------------------------------------------------------------------------------
homeowner | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | .0157783 .0004529 34.83 0.000 .0148905 .016666
_Irace_200 | -.1705199 .0046503 -36.67 0.000 -.1796344 -.1614053
_Irace_300 | -.0681401 .0121678 -5.60 0.000 -.0919887 -.0442914
_Irace_650 | -.1310786 .0074083 -17.69 0.000 -.1455988 -.1165584
age | .0073786 .0003761 19.62 0.000 .0066416 .0081157
age_sq | -.0000256 3.89e-06 -6.58 0.000 -.0000332 -.000018
_cons | .2644731 .0087256 30.31 0.000 .2473709 .2815752
------------------------------------------------------------------------------
. predict homeownderV1
(option xb assumed; fitted values)
(30484 missing values generated)
. summarize homeownderV1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
homeownderV1 | 103226 .7079127 .120833 .1988769 .9895798
. xi: regress homeowner yrsed i.race age age_sq if age>19
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
Source | SS df MS Number of obs = 93544
-------------+------------------------------ F( 6, 93537) = 2059.91
Model | 2254.92051 6 375.820084 Prob > F = 0.0000
Residual | 17065.3136 93537 .182444526 R-squared = 0.1167
-------------+------------------------------ Adj R-squared = 0.1167
Total | 19320.2341 93543 .206538534 Root MSE = .42714
------------------------------------------------------------------------------
homeowner | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | .021881 .0004587 47.70 0.000 .020982 .0227801
_Irace_200 | -.1652352 .0047924 -34.48 0.000 -.1746283 -.155842
_Irace_300 | -.074545 .0127958 -5.83 0.000 -.0996246 -.0494655
_Irace_650 | -.1355203 .0076276 -17.77 0.000 -.1504704 -.1205702
age | .0255745 .0004594 55.67 0.000 .0246742 .0264749
age_sq | -.0001845 4.50e-06 -41.03 0.000 -.0001933 -.0001756
_cons | -.2904088 .0119861 -24.23 0.000 -.3139014 -.2669161
------------------------------------------------------------------------------
. drop homeownderV1
. predict homeownderV1
(option xb assumed; fitted values)
(30484 missing values generated)
. summarize homeownderV1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
homeownderV1 | 103226 .6689267 .1941697 -.1135276 .9680372
. *Here we have some negative predicted values for home ownership, that's a problem...
. xi: logit homeowner yrsed i.race age age_sq if age>19
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
Iteration 0: log likelihood = -56454.512
Iteration 1: log likelihood = -51074.78
Iteration 2: log likelihood = -50967.81
Iteration 3: log likelihood = -50967.623
Logistic regression Number of obs = 93544
LR chi2(6) = 10973.78
Prob > chi2 = 0.0000
Log likelihood = -50967.623 Pseudo R2 = 0.0972
------------------------------------------------------------------------------
homeowner | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | .117566 .0025024 46.98 0.000 .1126614 .1224706
_Irace_200 | -.8096816 .024064 -33.65 0.000 -.8568461 -.7625171
_Irace_300 | -.3733268 .0653883 -5.71 0.000 -.5014856 -.245168
_Irace_650 | -.6811511 .0385774 -17.66 0.000 -.7567614 -.6055409
age | .1184691 .002498 47.43 0.000 .113573 .1233651
age_sq | -.000819 .000025 -32.79 0.000 -.0008679 -.00077
_cons | -3.920357 .0644681 -60.81 0.000 -4.046712 -3.794001
------------------------------------------------------------------------------
. predict homeowner2_logit
(option pr assumed; Pr(homeowner))
(30484 missing values generated)
. summarize homeowner2_logit
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
homeowner2~t | 103226 .6666674 .1992733 .0415957 .91392
. *With logistic regression, we get coefficients in the same directions (usually but not always) and similar but not exactly the same significance statistics (Z vs T, but as we know these are quite similar with large N). Also, you are guaranteed to get predicted values between 0 and 1, which can be important.
. xi: logit homeowner yrsed i.race age age_sq if age>19, or
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
Iteration 0: log likelihood = -56454.512
Iteration 1: log likelihood = -51074.78
Iteration 2: log likelihood = -50967.81
Iteration 3: log likelihood = -50967.623
Logistic regression Number of obs = 93544
LR chi2(6) = 10973.78
Prob > chi2 = 0.0000
Log likelihood = -50967.623 Pseudo R2 = 0.0972
------------------------------------------------------------------------------
homeowner | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 1.124756 .0028146 46.98 0.000 1.119253 1.130286
_Irace_200 | .4449997 .0107085 -33.65 0.000 .4244988 .4664908
_Irace_300 | .6884402 .045016 -5.71 0.000 .6056303 .782573
_Irace_650 | .5060342 .0195215 -17.66 0.000 .4691835 .5457791
age | 1.125772 .0028122 47.43 0.000 1.120274 1.131297
age_sq | .9991814 .000025 -32.79 0.000 .9991325 .9992303
------------------------------------------------------------------------------
. *Odds ratios of less than 1 are decreases, and odds ratios of more than 1 are increases. Think of the odds ratios as multiplicative. Coefficients of zero correspond to odds ratios of 1 when you exponentiate zero, you get 1.
. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta",
> replace
file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved
. exit, clear