log type: text
opened on: 31 Oct 2005, 10:58:47
. *So a not entirely useful but practical introduction to stepwise modeling
. table husb wife, contents (sum count) row col
-----------------------------------------------------------------------------------
| wife
husb | black Mexican Oth Hisp All Others white Total
-----------+-----------------------------------------------------------------------
black | 4074 63 32 42 215 4426
Mexican | 25 3947 143 95 1009 5219
Oth Hisp | 16 132 239 18 304 709
All Others | 19 78 18 1022 360 1497
white | 103 1156 373 492 28453 30577
|
Total | 4237 5376 805 1669 30341 42428
-----------------------------------------------------------------------------------
. *the familiar HW 2 dataset
. *Let me show you some of the sets of interaction terms
. table husb wife, contents (mean endog)
-----------------------------------------------------------------------
| wife
husb | black Mexican Oth Hisp All Others white
-----------+-----------------------------------------------------------
black | 1 0 0 0 0
Mexican | 0 2 0 0 0
Oth Hisp | 0 0 3 0 0
All Others | 0 0 0 4 0
white | 0 0 0 0 5
-----------------------------------------------------------------------
. table husb wife, contents (mean QS)
-----------------------------------------------------------------------
| wife
husb | black Mexican Oth Hisp All Others white
-----------+-----------------------------------------------------------
black | 0 21 31 41 51
Mexican | 21 0 32 42 52
Oth Hisp | 31 32 0 43 53
All Others | 41 42 43 0 54
white | 51 52 53 54 0
-----------------------------------------------------------------------
. *QS is the quasi symmetry terms, symmetric off diagonal associations
. table husb wife, contents (mean QS2)
-----------------------------------------------------------------------
| wife
husb | black Mexican Oth Hisp All Others white
-----------+-----------------------------------------------------------
black | 1 21 0 41 51
Mexican | 21 2 32 0 0
Oth Hisp | 0 32 3 0 0
All Others | 41 0 0 4 45
white | 51 0 0 45 5
-----------------------------------------------------------------------
. *Another version of the quasi symmetry terms
. table husb wife, contents (mean Asym)
-----------------------------------------------------------------------
| wife
husb | black Mexican Oth Hisp All Others white
-----------+-----------------------------------------------------------
black | 0 0 0 0 0
Mexican | 21 0 0 0 0
Oth Hisp | 31 32 0 0 0
All Others | 41 42 43 0 0
white | 51 52 53 54 0
-----------------------------------------------------------------------
. set linesize 79
. desmat wife husb QS
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_4 wife ind(1)
2 _x_5 _x_8 husb ind(1)
3 _x_9 _x_18 QS ind(0)
. sw poisson count (_x_1-_x_8) _x_9-_x_18, forward pe(.001) pr(.05)
begin with empty model
p = 0.0000 < 0.0010 adding _x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8
p = 0.0000 < 0.0010 adding _x_15
p = 0.0000 < 0.0010 adding _x_16
p = 0.0000 < 0.0010 adding _x_9
p = 0.0000 < 0.0010 adding _x_18
p = 0.0000 < 0.0010 adding _x_13
p = 0.0000 < 0.0010 adding _x_12
p = 0.0000 < 0.0010 adding _x_10
p = 0.0000 < 0.0010 adding _x_17
p = 0.0000 < 0.0010 adding _x_11
p = 0.0000 < 0.0010 adding _x_14
Poisson regression Number of obs = 25
LR chi2(18) = 160097.82
Prob > chi2 = 0.0000
Log likelihood = -89.595854 Pseudo R2 = 0.9989
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | .3989251 .0513505 7.77 0.000 .2982799 .4995703
_x_2 | -.9695073 .0646542 -15.00 0.000 -1.096227 -.8427874
_x_3 | -.193238 .0573864 -3.37 0.001 -.3057133 -.0807627
_x_4 | 1.318979 .048543 27.17 0.000 1.223836 1.414121
_x_5 | -.4305946 .0513505 -8.39 0.000 -.5312398 -.3299495
_x_6 | -1.86641 .0646542 -28.87 0.000 -1.99313 -1.73969
_x_7 | -1.189626 .0573864 -20.73 0.000 -1.302101 -1.077151
_x_8 | .6246497 .048543 12.87 0.000 .5295072 .7197922
_x_15 | -4.274379 .0589036 -72.57 0.000 -4.389828 -4.15893
_x_16 | -2.283614 .0231474 -98.66 0.000 -2.328983 -2.238246
_x_9 | -4.596011 .1089742 -42.18 0.000 -4.809596 -4.382425
_x_18 | -2.549685 .0380546 -67.00 0.000 -2.624271 -2.4751
_x_13 | -3.148446 .0780804 -40.32 0.000 -3.301481 -2.995411
_x_12 | -4.322503 .1316565 -32.83 0.000 -4.580545 -4.064461
_x_10 | -3.813722 .1499479 -25.43 0.000 -4.107615 -3.51983
_x_17 | -2.046833 .050421 -40.59 0.000 -2.145656 -1.94801
_x_11 | -1.955531 .068899 -28.38 0.000 -2.09057 -1.820491
_x_14 | -3.313855 .170508 -19.44 0.000 -3.648045 -2.979666
_cons | 8.312381 .0156671 530.56 0.000 8.281674 8.343088
------------------------------------------------------------------------------
. desrep
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -89.596
LR chi square: 160097.818
Model degrees of freedom: 18
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
wife
1 Mexican 0.399** 0.051
2 Oth Hisp -0.970** 0.065
3 All Others -0.193** 0.057
4 white 1.319** 0.049
husb
5 Mexican -0.431** 0.051
6 Oth Hisp -1.866** 0.065
7 All Others -1.190** 0.057
8 white 0.625** 0.049
QS
9 51 -4.274** 0.059
10 52 -2.284** 0.023
11 21 -4.596** 0.109
12 54 -2.550** 0.038
13 42 -3.148** 0.078
14 41 -4.323** 0.132
15 31 -3.814** 0.150
16 53 -2.047** 0.050
17 32 -1.956** 0.069
18 43 -3.314** 0.171
19 _cons 8.312** 0.016
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 1.379208
Prob > chi2(6) = 0.9671
. desmat wife husb QS2
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_4 wife ind(1)
2 _x_5 _x_8 husb ind(1)
3 _x_9 _x_17 QS2 ind(0)
. sw poisson count (_x_1-_x_8) _x_9-_x_17, forward pe(.001) pr(.05)
begin with empty model
p = 0.0000 < 0.0010 adding _x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8
p = 0.0000 < 0.0010 adding _x_9
p = 0.0000 < 0.0010 adding _x_10
p = 0.0000 < 0.0010 adding _x_12
p = 0.0000 < 0.0010 adding _x_16
p = 0.0000 < 0.0010 adding _x_11
p = 0.0000 < 0.0010 adding _x_14
p = 0.0000 < 0.0010 adding _x_13
p = 0.0000 < 0.0010 adding _x_17
p = 0.0000 < 0.0010 adding _x_15
Poisson regression Number of obs = 25
LR chi2(17) = 160092.91
Prob > chi2 = 0.0000
Log likelihood = -92.0512 Pseudo R2 = 0.9989
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | -2.02435 .2156884 -9.39 0.000 -2.447092 -1.601609
_x_2 | -3.178646 .2263842 -14.04 0.000 -3.622351 -2.734941
_x_3 | .6088636 .1785219 3.41 0.001 .2589672 .95876
_x_4 | 3.0674 .1639647 18.71 0.000 2.746035 3.388765
_x_5 | -2.85263 .2063502 -13.82 0.000 -3.257069 -2.448191
_x_6 | -4.078832 .2154986 -18.93 0.000 -4.501201 -3.656462
_x_7 | -.3880939 .1689121 -2.30 0.022 -.7191556 -.0570321
_x_8 | 2.3733 .1522354 15.59 0.000 2.074924 2.671676
_x_9 | 1.603634 .3317839 4.83 0.000 .953349 2.253918
_x_10 | 6.448944 .1442074 44.72 0.000 6.166303 6.731586
_x_12 | -1.893438 .1469865 -12.88 0.000 -2.181526 -1.60535
_x_16 | -3.496464 .085671 -40.81 0.000 -3.664376 -3.328552
_x_11 | 6.025194 .1692812 35.59 0.000 5.693409 6.356979
_x_14 | 4.28146 .1558883 27.46 0.000 3.975924 4.586995
_x_13 | -.5694784 .18482 -3.08 0.002 -.9317189 -.2072378
_x_17 | -4.419243 .2138822 -20.66 0.000 -4.838445 -4.000042
_x_15 | -3.520818 .2108343 -16.70 0.000 -3.934045 -3.10759
_cons | 6.708747 .3314138 20.24 0.000 6.059188 7.358306
------------------------------------------------------------------------------
. poisgof
Goodness-of-fit chi2 = 6.289901
Prob > chi2(7) = 0.5063
. desrep
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -92.051
LR chi square: 160092.907
Model degrees of freedom: 17
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
wife
1 Mexican -2.024** 0.216
2 Oth Hisp -3.179** 0.226
3 All Others 0.609** 0.179
4 white 3.067** 0.164
husb
5 Mexican -2.853** 0.206
6 Oth Hisp -4.079** 0.215
7 All Others -0.388* 0.169
8 white 2.373** 0.152
QS2
9 1 1.604** 0.332
10 2 6.449** 0.144
11 5 -1.893** 0.147
12 45 -3.496** 0.086
13 3 6.025** 0.169
14 32 4.281** 0.156
15 21 -0.569** 0.185
16 51 -4.419** 0.214
17 41 -3.521** 0.211
18 _cons 6.709** 0.331
-------------------------------------------------------------------------------
* p < .05
** p < .01
. desmat wife husb QS2 Asym
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_4 wife ind(1)
2 _x_5 _x_8 husb ind(1)
3 _x_9 _x_16 QS2 ind(0)
4 _x_17 _x_23 Asym ind(0)
. sw poisson count (_x_1-_x_8) _x_9-_x_23, forward pe(.001) pr(.05)
begin with empty model
p = 0.0000 < 0.0010 adding _x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8
p = 0.0000 < 0.0010 adding _x_9
p = 0.0000 < 0.0010 adding _x_10
p = 0.0000 < 0.0010 adding _x_11
p = 0.0000 < 0.0010 adding _x_16
p = 0.0000 < 0.0010 adding _x_15
p = 0.0000 < 0.0010 adding _x_13
p = 0.0000 < 0.0010 adding _x_14
p = 0.0000 < 0.0010 adding _x_12
Poisson regression Number of obs = 25
LR chi2(16) = 160066.21
Prob > chi2 = 0.0000
Log likelihood = -105.39899 Pseudo R2 = 0.9987
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | -3.037627 .0802013 -37.88 0.000 -3.194818 -2.880435
_x_2 | -4.251087 .0810706 -52.44 0.000 -4.409983 -4.092192
_x_3 | -.2085674 .0560674 -3.72 0.000 -.3184575 -.0986773
_x_4 | 2.432403 .0843156 28.85 0.000 2.267147 2.597658
_x_5 | -3.803611 .0853107 -44.59 0.000 -3.970817 -3.636405
_x_6 | -5.070484 .0907108 -55.90 0.000 -5.248274 -4.892694
_x_7 | -1.15285 .0557006 -20.70 0.000 -1.262021 -1.043679
_x_8 | 1.797491 .0783424 22.94 0.000 1.643943 1.951039
_x_9 | 6.813906 .1344068 50.70 0.000 6.550474 7.077339
_x_10 | 6.489992 .1496864 43.36 0.000 6.196612 6.783372
_x_11 | -2.281927 .1344963 -16.97 0.000 -2.545535 -2.01832
_x_16 | -5.40365 .0900035 -60.04 0.000 -5.580054 -5.227247
_x_15 | -3.699758 .079329 -46.64 0.000 -3.855239 -3.544276
_x_13 | 4.696629 .1408534 33.34 0.000 4.420561 4.972697
_x_14 | -4.317155 .1313244 -32.87 0.000 -4.574547 -4.059764
_x_12 | -1.17485 .1264584 -9.29 0.000 -1.422704 -.9269958
_cons | 8.308043 .0156765 529.97 0.000 8.277317 8.338768
------------------------------------------------------------------------------
. desrep
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -105.399
LR chi square: 160066.212
Model degrees of freedom: 16
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
wife
1 Mexican -3.038** 0.080
2 Oth Hisp -4.251** 0.081
3 All Others -0.209** 0.056
4 white 2.432** 0.084
husb
5 Mexican -3.804** 0.085
6 Oth Hisp -5.070** 0.091
7 All Others -1.153** 0.056
8 white 1.797** 0.078
QS2
9 2 6.814** 0.134
10 3 6.490** 0.150
11 5 -2.282** 0.134
12 51 -5.404** 0.090
13 45 -3.700** 0.079
14 32 4.697** 0.141
15 41 -4.317** 0.131
16 21 -1.175** 0.126
17 _cons 8.308** 0.016
-------------------------------------------------------------------------------
* p < .05
** p < .01
. sw poisson count (_x_1-_x_8) _x_9-_x_23, pe(.001) pr(.05)
begin with full model
p = 0.4095 >= 0.0500 removing _x_23
p = 0.0640 >= 0.0500 removing _x_12
Poisson regression Number of obs = 25
LR chi2(21) = 160091.88
Prob > chi2 = 0.0000
Log likelihood = -92.562575 Pseudo R2 = 0.9988
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | -4.06539 .1083477 -37.52 0.000 -4.277747 -3.853032
_x_2 | -5.089304 .1351684 -37.65 0.000 -5.354229 -4.824379
_x_3 | -.1919224 .0645821 -2.97 0.003 -.318501 -.0653437
_x_4 | 2.26076 .1173969 19.26 0.000 2.030666 2.490853
_x_5 | -3.655269 .1149749 -31.79 0.000 -3.880615 -3.429922
_x_6 | -4.859961 .1234203 -39.38 0.000 -5.101861 -4.618062
_x_7 | -1.190941 .0645821 -18.44 0.000 -1.31752 -1.064363
_x_8 | 1.561703 .1201971 12.99 0.000 1.326121 1.797285
_x_9 | 7.688989 .1591377 48.32 0.000 7.377085 8.000893
_x_10 | 7.113348 .2019181 35.23 0.000 6.717596 7.5091
_x_11 | -1.878834 .2098434 -8.95 0.000 -2.29012 -1.467549
_x_22 | 1.136799 .1874922 6.06 0.000 .769321 1.504277
_x_13 | 5.442126 .1698054 32.05 0.000 5.109313 5.774938
_x_14 | -4.32311 .132376 -32.66 0.000 -4.582562 -4.063658
_x_15 | -3.488946 .1152098 -30.28 0.000 -3.714753 -3.263139
_x_16 | -5.214588 .1201611 -43.40 0.000 -5.450099 -4.979077
_x_17 | -1.438236 .2306929 -6.23 0.000 -1.890386 -.9860864
_x_18 | -.6798307 .2788056 -2.44 0.015 -1.22628 -.1333817
_x_19 | 1.30066 .1687754 7.71 0.000 .9698659 1.631453
_x_20 | .8582368 .2788395 3.08 0.002 .3117215 1.404752
_x_21 | 1.244027 .1637258 7.60 0.000 .9231305 1.564924
_cons | 8.312381 .0156671 530.56 0.000 8.281674 8.343088
------------------------------------------------------------------------------
. poisgof
Goodness-of-fit chi2 = 7.31265
Prob > chi2(3) = 0.0626
. desrep
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -92.563
LR chi square: 160091.885
Model degrees of freedom: 21
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
wife
1 Mexican -4.065** 0.108
2 Oth Hisp -5.089** 0.135
3 All Others -0.192** 0.065
4 white 2.261** 0.117
husb
5 Mexican -3.655** 0.115
6 Oth Hisp -4.860** 0.123
7 All Others -1.191** 0.065
8 white 1.562** 0.120
QS2
9 2 7.689** 0.159
10 3 7.113** 0.202
11 5 -1.879** 0.210
Asym
12 53 1.137** 0.187
QS2
13 32 5.442** 0.170
14 41 -4.323** 0.132
15 45 -3.489** 0.115
16 51 -5.215** 0.120
Asym
17 21 -1.438** 0.231
18 31 -0.680* 0.279
19 42 1.301** 0.169
20 43 0.858** 0.279
21 52 1.244** 0.164
22 _cons 8.312** 0.016
-------------------------------------------------------------------------------
* p < .05
** p < .01
. *It is worth noting here, that the forward and backward stepwise regressions, with the same criteria for inclusion and exclusion, gave quite different answers. The forward stepwise didn't have any of the asymmetric terms in its final answer.
. *Another tool that is worth knowing about.
. clear all
. *And for something rather different
. edit
(3 vars, 16 obs pasted into editor)
- preserve
. *The educational intermarriage dataset that we looked at earlier in the quarter
. table hed wed, contents (sum count) row col
--------------------------------------------------
| wed
hed | 1 2 3 4 Total
----------+---------------------------------------
1 | 32016 33374 8407 988 74785
2 | 28370 137876 43783 8446 218475
3 | 7051 48766 61633 18195 135645
4 | 984 13794 28635 51224 94637
|
Total | 68421 233810 142458 78853 523542
--------------------------------------------------
. desmat: poisson count hed wed
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -113882.425
LR chi square: 215237.595
Model degrees of freedom: 6
Pseudo R-squared: 0.486
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
hed
1 2 1.072** 0.004
2 3 0.595** 0.005
3 4 0.235** 0.005
wed
4 2 1.229** 0.004
5 3 0.733** 0.005
6 4 0.142** 0.005
7 _cons 9.187** 0.005
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 227578.9
Prob > chi2(9) = 0.0000
. predict P_independent
(option n assumed; predicted number of events)
. table hed wed, contents (sum P_independent)
--------------------------------------------------
| wed
hed | 1 2 3 4
----------+---------------------------------------
1 | 9773.551 33398.43 20349.32 11263.7
2 | 28552.2 97569.33 59447.98 32905.5
3 | 17727.26 60578.06 36909.58 20430.1
4 | 12367.98 42264.19 25751.13 14253.7
--------------------------------------------------
. *the local table odds ratios from predicted values of the independence model have to be=1
. display (60578*25751)/(36909.6*42264)
.99999791
. gen score=hed*wed
. table hed wed, contents (mean score)
----------------------------------
| wed
hed | 1 2 3 4
----------+-----------------------
1 | 1 2 3 4
2 | 2 4 6 8
3 | 3 6 9 12
4 | 4 8 12 16
----------------------------------
. desmat: poisson count hed wed @score
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -6373.659
LR chi square: 430255.129
Model degrees of freedom: 7
Pseudo R-squared: 0.971
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
hed
1 2 -0.836** 0.006
2 3 -3.731** 0.012
3 4 -7.128** 0.021
wed
4 2 -0.671** 0.006
5 3 -3.656** 0.012
6 4 -7.418** 0.022
7 score 1.000** 0.003
8 _cons 9.270** 0.005
-------------------------------------------------------------------------------
* p < .05
** p < .01
. display exp (1.00)
exp not found
r(111);
. display exp(1.00)
2.7182818
. predict P_modelrplusc
(option n assumed; predicted number of events)
. *The r plus c model is one of the log multiplicative models that you can generate with loglinear models, because it can be arrived at through maximum likelihood estimation.
. table hed wed, contents (sum P_modelrplusc)
--------------------------------------------------
| wed
hed | 1 2 3 4
----------+---------------------------------------
1 | 28854.76 40075.84 5506.446 347.9548
2 | 33991.05 128324.8 47927 8232.138
3 | 5110.256 52440.83 53237.8 24856.12
4 | 464.9252 12968.53 35786.75 45416.79
--------------------------------------------------
. display (52440.8*35786.8)/(53237.8*12968.5)
2.7182058
. poisgof
Goodness-of-fit chi2 = 12561.32
Prob > chi2(8) = 0.0000
. *For one degree of freedom, the r plus c model, which enforces constant local table odds ratio on the predicted values, actually doesn't do too badly.
. *If we add the endogamy diagonal to this, we improve it of course.
. gen endog=0
. replace endog=hed if hed==wed
(4 real changes made)
. desmat: poisson count hed wed endog @score
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -152.338
LR chi square: 442697.771
Model degrees of freedom: 11
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
hed
1 2 -0.433** 0.010
2 3 -2.465** 0.017
3 4 -5.109** 0.027
wed
4 2 -0.243** 0.010
5 3 -2.359** 0.017
6 4 -5.431** 0.027
endog
7 1 0.409** 0.011
8 2 0.430** 0.008
9 3 0.249** 0.008
10 4 0.845** 0.011
11 score 0.705** 0.004
12 _cons 9.260** 0.009
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 118.6754
Prob > chi2(4) = 0.0000
. *It's not a panacea, but it is one useful way to take advantage of the ordinal nature of the data.
. * The r+c model is one way to take advantage of the ordinal nature of the data. See Clogg and Shihadeh's book for a good summary of this.
. exit, clear