log type: text
opened on: 3 Nov 2003, 11:10:18
. set linesize 79
. use "HW2 with QS.dta", clear
. *What we are going to cover today is a little bit more of a systematic
. *approach to fitting this data.
. *One approach to this kind of data is the Quasi-Symmetry model
. *This is described in Hout
. *QS models are good for 2 dimensional tables with symmetric row and column categories
. table husb wife, contents (mean QS)
-----------------------------------------------------------------------
| wife
husb | black mexican oth hisp all others white
-----------+-----------------------------------------------------------
black | 1 21 31 41 51
mexican | 21 2 32 42 52
oth hisp | 31 32 3 43 53
all others | 41 42 43 4 54
white | 51 52 53 54 5
-----------------------------------------------------------------------
. *This is the full set of symmetric interactions.
. *There are 15 of them, but they are not all mutually independent.
. codebook QS
------------------------------------------------------------------------------------------
QS (unlabeled)
------------------------------------------------------------------------------------------
type: numeric (byte)
range: [1,54] units: 1
unique values: 15 missing .: 0/25
mean: 34.2
std. dev: 18.6123
percentiles: 10% 25% 50% 75% 90%
3 21 41 51 53
. desmat husb wife QS
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_4 husb ind(1)
2 _x_5 _x_8 wife ind(1)
3 _x_9 _x_18 QS ind(1)
. *desmat drops the colinear terms.
. desmat: poisson count husb wife QS
------------------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -89.596
LR chi square: 160097.818
Model degrees of freedom: 18
Pseudo R-squared: 0.999
Prob: 0.000
------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------------------
count
husb
1 mexican -0.431** 0.051
2 oth hisp -1.866** 0.065
3 all others -1.190** 0.057
4 white 0.625** 0.049
wife
5 mexican 0.399** 0.051
6 oth hisp -0.970** 0.065
7 all others -0.193** 0.057
8 white 1.319** 0.049
QS
9 21 -4.596** 0.109
10 31 -3.814** 0.150
11 32 -1.956** 0.069
12 41 -4.323** 0.132
13 42 -3.148** 0.078
14 43 -3.314** 0.171
15 51 -4.274** 0.059
16 52 -2.284** 0.023
17 53 -2.047** 0.050
18 54 -2.550** 0.038
19 _cons 8.312** 0.016
------------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 1.379208
Prob > chi2(6) = 0.9671
. *The quasi-symmetry model fits the data really well
. *But we've lost our favorite endogamy terms, and they've been replaced
. *By a full complement of off-diagonal symmetric interactions.
. table husb wife, contents (mean QS)
-----------------------------------------------------------------------
| wife
husb | black mexican oth hisp all others white
-----------+-----------------------------------------------------------
black | 1 21 31 41 51
mexican | 21 2 32 42 52
oth hisp | 31 32 3 43 53
all others | 41 42 43 4 54
white | 51 52 53 54 5
-----------------------------------------------------------------------
. table husb wife, contents (mean QS2)
-----------------------------------------------------------------------
| wife
husb | black mexican oth hisp all others white
-----------+-----------------------------------------------------------
black | 0 21 31 0 51
mexican | 21 0 32 0 52
oth hisp | 31 32 0 0 0
all others | 0 0 0 0 0
white | 51 52 0 0 0
-----------------------------------------------------------------------
. desmat: poisson count husb wife race_endog QS2
------------------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -89.596
LR chi square: 160097.818
Model degrees of freedom: 18
Pseudo R-squared: 0.999
Prob: 0.000
------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------------------
count
husb
1 mexican 0.743** 0.152
2 oth hisp -0.858** 0.214
3 all others -0.684** 0.219
4 white 2.397** 0.136
wife
5 mexican 1.573** 0.165
6 oth hisp 0.039 0.223
7 all others 0.313 0.230
8 white 3.092** 0.150
race_endog
9 1 4.828** 0.314
10 2 2.480** 0.232
11 3 2.811** 0.186
12 4 3.817** 0.177
13 5 1.283** 0.175
QS2
14 21 -0.942** 0.253
15 31 0.006 0.200
16 32 0.690** 0.110
17 51 -1.219** 0.221
18 52 -0.402* 0.188
19 _cons 3.484** 0.313
------------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 1.379208
Prob > chi2(6) = 0.9671
. *This presentation of the interactions is a bit more consistent
. *With the set of models from HW2, but the actual model is simply QS, the
. *Same as the quasi-symmetry model we ran above.
. desmat husb wife race_endog QS2
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_4 husb ind(1)
2 _x_5 _x_8 wife ind(1)
3 _x_9 _x_13 race_endog ind(0)
4 _x_14 _x_18 QS2 ind(0)
. sw poisson count (_x_1-_x_8) _x_9-_x_18, forward pe(.01) pr(.1)
begin with empty model
p = 0.0000 < 0.0100 adding _x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8
p = 0.0000 < 0.0100 adding _x_9
p = 0.0000 < 0.0100 adding _x_10
p = 0.0000 < 0.0100 adding _x_12
p = 0.0000 < 0.0100 adding _x_13
p = 0.0000 < 0.0100 adding _x_11
p = 0.0000 < 0.0100 adding _x_16
p = 0.0000 < 0.0100 adding _x_17
p = 0.0002 < 0.0100 adding _x_14
Poisson regression Number of obs = 25
LR chi2(16) = 160092.89
Prob > chi2 = 0.0000
Log likelihood = -92.058605 Pseudo R2 = 0.9989
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | .6580387 .1237163 5.32 0.000 .4155592 .9005183
_x_2 | -.5675102 .1336998 -4.24 0.000 -.829557 -.3054634
_x_3 | -.3744226 .1266318 -2.96 0.003 -.6226164 -.1262289
_x_4 | 2.387004 .1030538 23.16 0.000 2.185023 2.588986
_x_5 | 1.48623 .1386654 10.72 0.000 1.214451 1.758009
_x_6 | .3329503 .1481206 2.25 0.025 .0426393 .6232614
_x_7 | .6221063 .1419782 4.38 0.000 .3438341 .9003785
_x_8 | 3.080996 .1205447 25.56 0.000 2.844733 3.317259
_x_9 | 5.127894 .210949 24.31 0.000 4.714442 5.541347
_x_10 | 2.951956 .0833704 35.41 0.000 2.788553 3.115359
_x_12 | 3.497347 .0853671 40.97 0.000 3.330031 3.664663
_x_13 | 1.603523 .0796469 20.13 0.000 1.447418 1.759628
_x_11 | 2.526537 .1204165 20.98 0.000 2.290525 2.762549
_x_16 | .7836307 .1017227 7.70 0.000 .5842579 .9830034
_x_17 | -.9086142 .1332381 -6.82 0.000 -1.169756 -.6474723
_x_14 | -.5558246 .1472988 -3.77 0.000 -.844525 -.2671243
_cons | 3.184486 .2103664 15.14 0.000 2.772176 3.596797
------------------------------------------------------------------------------
. poisgof
Goodness-of-fit chi2 = 6.304709
Prob > chi2(8) = 0.6131
. desrep
------------------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -92.059
LR chi square: 160092.893
Model degrees of freedom: 16
Pseudo R-squared: 0.999
Prob: 0.000
------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------------------
count
husb
1 mexican 0.658** 0.124
2 oth hisp -0.568** 0.134
3 all others -0.374** 0.127
4 white 2.387** 0.103
wife
5 mexican 1.486** 0.139
6 oth hisp 0.333* 0.148
7 all others 0.622** 0.142
8 white 3.081** 0.121
race_endog
9 1 5.128** 0.211
10 2 2.952** 0.083
11 4 3.497** 0.085
12 5 1.604** 0.080
13 3 2.527** 0.120
QS2
14 32 0.784** 0.102
15 51 -0.909** 0.133
16 21 -0.556** 0.147
17 _cons 3.184** 0.210
------------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 6.304709
Prob > chi2(8) = 0.6131
. *This is the result of a forward stepwise process.
. sw poisson count (_x_1-_x_8) _x_9-_x_18, forward pe(.05) pr(.1)
begin with empty model
p = 0.0000 < 0.0500 adding _x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8
p = 0.0000 < 0.0500 adding _x_9
p = 0.0000 < 0.0500 adding _x_10
p = 0.0000 < 0.0500 adding _x_12
p = 0.0000 < 0.0500 adding _x_13
p = 0.0000 < 0.0500 adding _x_11
p = 0.0000 < 0.0500 adding _x_16
p = 0.0000 < 0.0500 adding _x_17
p = 0.0002 < 0.0500 adding _x_14
p = 0.0326 < 0.0500 adding _x_18
Poisson regression Number of obs = 25
LR chi2(17) = 160097.82
Prob > chi2 = 0.0000
Log likelihood = -89.596292 Pseudo R2 = 0.9989
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | .7410304 .1282705 5.78 0.000 .4896249 .9924359
_x_2 | -.8602365 .196971 -4.37 0.000 -1.246293 -.4741804
_x_3 | -.6866007 .1978931 -3.47 0.001 -1.074464 -.2987374
_x_4 | 2.394867 .1030996 23.23 0.000 2.192795 2.596938
_x_5 | 1.570528 .1428993 10.99 0.000 1.290451 1.850606
_x_6 | .0367311 .2079401 0.18 0.860 -.370824 .4442863
_x_7 | .3096806 .2080673 1.49 0.137 -.0981238 .7174851
_x_8 | 3.089169 .1205587 25.62 0.000 2.852878 3.325459
_x_9 | 4.823029 .258398 18.67 0.000 4.316578 5.329479
_x_10 | 2.4798 .2316559 10.70 0.000 2.025763 2.933838
_x_12 | 3.817085 .1769228 21.57 0.000 3.470322 4.163847
_x_13 | 1.282621 .1746304 7.34 0.000 .940352 1.624891
_x_11 | 2.810617 .1856191 15.14 0.000 2.44681 3.174424
_x_16 | .6896763 .1091149 6.32 0.000 .475815 .9035376
_x_17 | -1.22155 .2023955 -6.04 0.000 -1.618238 -.8248617
_x_14 | -.9445918 .234645 -4.03 0.000 -1.404488 -.484696
_x_18 | -.4024037 .1883423 -2.14 0.033 -.7715477 -.0332596
_cons | 3.489352 .2579226 13.53 0.000 2.983833 3.994871
------------------------------------------------------------------------------
. desrep
------------------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -89.596
LR chi square: 160097.817
Model degrees of freedom: 17
Pseudo R-squared: 0.999
Prob: 0.000
------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------------------
count
husb
1 mexican 0.741** 0.128
2 oth hisp -0.860** 0.197
3 all others -0.687** 0.198
4 white 2.395** 0.103
wife
5 mexican 1.571** 0.143
6 oth hisp 0.037 0.208
7 all others 0.310 0.208
8 white 3.089** 0.121
race_endog
9 1 4.823** 0.258
10 2 2.480** 0.232
11 4 3.817** 0.177
12 5 1.283** 0.175
13 3 2.811** 0.186
QS2
14 32 0.690** 0.109
15 51 -1.222** 0.202
16 21 -0.945** 0.235
17 52 -0.402* 0.188
18 _cons 3.489** 0.258
------------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 1.380085
Prob > chi2(7) = 0.9862
. *If we lower the threshold for entry into the model, from pe (.01) to pe (.05), we get one more term.
. sw poisson count (_x_1-_x_8) _x_9-_x_18, backward pe(.05) pr(.1)
backward not allowed
r(198);
. sw poisson count (_x_1-_x_8) _x_9-_x_18, pr(.1)
begin with full model
p = 0.9764 >= 0.1000 removing _x_15
Poisson regression Number of obs = 25
LR chi2(17) = 160097.82
Prob > chi2 = 0.0000
Log likelihood = -89.596292 Pseudo R2 = 0.9989
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_x_1 | .7410304 .1282705 5.78 0.000 .4896249 .9924359
_x_2 | -.8602365 .196971 -4.37 0.000 -1.246293 -.4741804
_x_3 | -.6866007 .1978931 -3.47 0.001 -1.074464 -.2987374
_x_4 | 2.394867 .1030996 23.23 0.000 2.192795 2.596938
_x_5 | 1.570528 .1428993 10.99 0.000 1.290451 1.850606
_x_6 | .0367311 .2079401 0.18 0.860 -.370824 .4442863
_x_7 | .3096806 .2080673 1.49 0.137 -.0981238 .7174851
_x_8 | 3.089169 .1205587 25.62 0.000 2.852878 3.325459
_x_9 | 4.823029 .258398 18.67 0.000 4.316578 5.329479
_x_10 | 2.4798 .2316559 10.70 0.000 2.025763 2.933838
_x_11 | 2.810617 .1856191 15.14 0.000 2.44681 3.174424
_x_12 | 3.817085 .1769228 21.57 0.000 3.470322 4.163847
_x_13 | 1.282621 .1746304 7.34 0.000 .940352 1.624891
_x_14 | -.9445918 .234645 -4.03 0.000 -1.404488 -.484696
_x_18 | -.4024037 .1883423 -2.14 0.033 -.7715477 -.0332596
_x_16 | .6896763 .1091149 6.32 0.000 .475815 .9035376
_x_17 | -1.22155 .2023955 -6.04 0.000 -1.618238 -.8248617
_cons | 3.489352 .2579226 13.53 0.000 2.983833 3.994871
------------------------------------------------------------------------------
. desrep
------------------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -89.596
LR chi square: 160097.817
Model degrees of freedom: 17
Pseudo R-squared: 0.999
Prob: 0.000
------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------------------
count
husb
1 mexican 0.741** 0.128
2 oth hisp -0.860** 0.197
3 all others -0.687** 0.198
4 white 2.395** 0.103
wife
5 mexican 1.571** 0.143
6 oth hisp 0.037 0.208
7 all others 0.310 0.208
8 white 3.089** 0.121
race_endog
9 1 4.823** 0.258
10 2 2.480** 0.232
11 3 2.811** 0.186
12 4 3.817** 0.177
13 5 1.283** 0.175
QS2
14 21 -0.945** 0.235
15 52 -0.402* 0.188
16 32 0.690** 0.109
17 51 -1.222** 0.202
18 _cons 3.489** 0.258
------------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 1.380085
Prob > chi2(7) = 0.9862
. *In this case backward and forward stepwise processes led to the same model.
*Backward stepwise means starting with the full model and throwing away insignificant terms
*forward stepwise means starting with some base model, and adding terms that are significant.
. exit, clear