----------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\f
> outh_class_notes.log
log type: text
opened
on:
. describe
Contains data
obs: 16
vars: 3
size: 160 (99.9% of memory free)
----------------------------------------------------------------------------
> ---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
> ---
hed byte %8.0g
wed byte %8.0g
count long %12.0g
----------------------------------------------------------------------------
> ---
Sorted by:
Note: dataset has changed since last saved
. set linesize 75
. describe
Contains data
obs: 16
vars: 3
size: 160 (99.9% of memory free)
---------------------------------------------------------------------------
> ----
storage display value
variable name type format label variable label
---------------------------------------------------------------------------
> ----
hed byte %8.0g
wed byte %8.0g
count long %12.0g
---------------------------------------------------------------------------
> ----
Sorted by:
Note: dataset has changed since last saved
. table hed wed, contents (sum count) row col
--------------------------------------------------
| wed
hed | 1 2 3 4 Total
----------+---------------------------------------
1 | 32016 33374 8407 988 74785
2 | 28370 137876 43783 8446 218475
3 | 7051 48766 61633 18195 135645
4 | 984 13794 28635 51224 94637
|
Total | 68421 233810 142458 78853 523542
--------------------------------------------------
. label define ed_lbl 1 "<HS" 2 "HS" 3 "Some Col" 4 "BA+"
. label val hed ed_lbl
. label val wed ed_lbl
. table hed wed, contents (sum count) row col
*Note the use of labels to add text to variables which are coded as numbers.
------------------------------------------------------------
| wed
hed | <HS HS Some Col BA+ Total
----------+-------------------------------------------------
<HS | 32016 33374 8407 988 74785
HS | 28370 137876 43783 8446 218475
Some
BA+ | 984 13794 28635 51224 94637
|
Total | 68421 233810 142458 78853 523542
------------------------------------------------------------
. *First model to take a look at is the independence model
. desmat: poisson count hed wed
------------------------------------------------------------------------------
Poisson regression
------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -113882.425
LR chi square: 215237.595
Model degrees of freedom: 6
Pseudo R-squared: 0.486
Prob: 0.000
------------------------------------------------------------------------------
nr Effect Coeff s.e.
------------------------------------------------------------------------------
count
hed
1 HS 1.072** 0.004
2
Some
3 BA+ 0.235** 0.005
wed
4 HS 1.229** 0.004
5
Some
6 BA+ 0.142** 0.005
7 _cons 9.187** 0.005
------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 227578.9
Prob > chi2(9) = 0.0000
. *this chisquare test completely rejects the null hypothesis, which in this case is that the independence model fits the data...
. * expected value of chisquare (9) is 9
. *
. predict P_independence
(option n assumed; predicted number of events)
. table hed wed, contents(sum count sum P_independence) row col
------------------------------------------------------------
| wed
hed | <HS HS Some Col BA+ Total
----------+-------------------------------------------------
<HS | 32016 33374 8407 988 74785
| 9773.551 33398.43 20349.32 11263.7 74785
|
HS | 28370 137876 43783 8446 218475
| 28552.2 97569.33 59447.98 32905.5 218475
|
Some
| 17727.26 60578.06 36909.58 20430.1 135645
|
BA+ | 984 13794 28635 51224 94637
| 12367.98 42264.19 25751.13 14253.7 94637
|
Total | 68421 233810 142458 78853 523542
| 68421 233810 142458 78853 523542
------------------------------------------------------------
. *The eyeball test shows that the independence model under-predicts the endogamy diagonal where spouses have the same education, and over-predicts the other corners, where spouses differ the most.
. label var hed "husband's education"
. label var wed "wife's education"
. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta"
file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved
. *The next thing to add to this model is a term that explains the special preference to marry one of the same education as yourself.
. gen byte ed_endogamy_simple =0
. replace ed_endogamy_simple=1 if hed==wed
(4 real changes made)
. table hed wed, contents(mean ed_endogamy_simple)
--------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+
----------+---------------------------------------
<HS | 1 0 0 0
HS | 0 1 0 0
Some
BA+ | 0 0 0 1
--------------------------------------------------
. desmat: poisson count hed wed ed_endogamy_simple
----------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -41944.565
LR chi square: 359113.316
Model degrees of freedom: 7
Pseudo R-squared: 0.811
Prob: 0.000
----------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------
count
hed
1 HS 0.740** 0.005
2
Some
3 BA+ 0.216** 0.005
wed
4 HS 0.979** 0.005
5
Some
6 BA+ 0.081** 0.005
ed_endogamy_simple
7 1 1.115** 0.003
8 _cons 9.067** 0.005
----------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 83703.13
Prob > chi2(8) = 0.0000
. *First thing to notice, this is an enormous improvement over the independence model, an improvement of 140K on 1df.
. *But still, goodness of fit rejects this model, which is to say this model does not yet fit the data very well.
. predict P_simple endogamy
(option n assumed; predicted number of events)
too many variables specified
r(103);
. predict P_simple_endogamy
(option n assumed; predicted number of events)
. table hed wed, contents (sum count sum P_simple_endogamy) row col
------------------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+ Total
----------+-------------------------------------------------
<HS | 32016 33374 8407 988 74785
| 26426.32 23047.51 15915.36 9395.808 74785
|
HS | 28370 137876 43783 8446 218475
| 18145.71 147304.7 33341.21 19683.35 218475
|
Some Col | 7051 48766 61633 18195 135645
| 13104.12 34867.67 73458.66 14214.54 135645
|
BA+ | 984 13794 28635 51224 94637
| 10744.85 28590.09 19742.76 35559.3 94637
|
Total | 68421 233810 142458 78853 523542
| 68421 233810 142458 78853 523542
------------------------------------------------------------
. table hed wed if hed==wed, contents (sum count sum P_simple_endogamy) row col
------------------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+ Total
----------+-------------------------------------------------
<HS | 32016 32016
| 26426.32 26426.32
|
HS | 137876 137876
| 147304.7 147304.7
|
Some Col | 61633 61633
| 73458.66 73458.66
|
BA+ | 51224 51224
| 35559.3 35559.3
|
Total | 32016 137876 61633 51224 282749
| 26426.32 147304.7 73458.66 35559.3 282749
------------------------------------------------------------
. *one of the next reasonable questions, is whether the force of endogamy, which is strongly positive, is different for different educational levels
. *let's quantify the difference in educational endogamy
. *one natural way to do this is to add 4 terms for endogamy, one for each cell, to see whether that improves the goodness of fit, and to see whether the resulting coefficients are very different.
. gen byte ed_endog_full=0
. replace ed_endog_full=hed if hed==wed
(4 real changes made)
. table hed wed, contents(mean ed_endog_full)
--------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+
----------+---------------------------------------
<HS | 1 0 0 0
HS | 0 2 0 0
Some Col | 0 0 3 0
BA+ | 0 0 0 4
--------------------------------------------------
. desmat: poisson count hed wed ed_endog_full
----------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -24059.274
LR chi square: 394883.898
Model degrees of freedom: 10
Pseudo R-squared: 0.891
Prob: 0.000
----------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------
count
hed
1 HS 1.134** 0.007
2 Some Col 0.819** 0.006
3 BA+ -0.017* 0.007
wed
4 HS 1.372** 0.007
5 Some Col 1.020** 0.007
6 BA+ -0.278** 0.008
ed_endog_full
7 1 1.722** 0.009
8 2 0.676** 0.007
9 3 0.537** 0.008
10 4 2.487** 0.009
11 _cons 8.652** 0.008
----------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 47932.55
Prob > chi2(5) = 0.0000
. *we improved the goodness of fit by 35K on 3 additional degrees of freedom. In other words, we need the additional 3 terms to fit the data, but this model does not yet fit the data well
. predict P_endogamy_full
(option n assumed; predicted number of events)
. table hed wed, contents(sum count sum P_endogamy_full) row col
------------------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+ Total
----------+-------------------------------------------------
<HS | 32016 33374 8407 988 74785
| 32016 22561.17 15875.39 4332.443 74785
|
HS | 28370 137876 43783 8446 218475
| 17790.29 137876 49342.89 13465.83 218475
|
Some Col | 7051 48766 61633 18195 135645
| 12987.8 51193.47 61633 9830.73 135645
|
BA+ | 984 13794 28635 51224 94637
| 5626.913 22179.36 15606.73 51224 94637
|
Total | 68421 233810 142458 78853 523542
| 68421 233810 142458 78853 523542
------------------------------------------------------------
. *The independence model implies that education does not matter at all in mate selection, i.e. that mate selection occurs independent of the education of the spouse. That seems to be not true at all.
. * The second model, simple endogamy, implies that there is a uniform force of endogamy and everyone else marries without regard to education. This fit better but still not well enough.
. * This last model assumes that the force of educational endogamy varies across educational groups, which seems to be true, but this model still makes no assumptions about what happens away from the educational endogamy diagonal, so the fit here is still not so good.
. *The next thing to add into the model is some kind of allowance for the lack of marriages where the educational attainments are most unequal.
. gen byte ed_diff_3=0
. replace ed_diff_3=1 if (hed==4 & wed==1) | (wed==4& hed==1)
(2 real changes made)
. table hed wed, contents(mean ed_diff_3)
--------------------------------------------------
husband's | wife's education
education | <HS HS Some Col BA+
----------+---------------------------------------
<HS | 0 0 0 1
HS | 0 0 0 0
Some
BA+ | 1 0 0 0
--------------------------------------------------
. desmat: poisson count hed wed ed_endog_full ed_diff_3
----------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 16
Initial log likelihood: -221501.223
Log likelihood: -17940.195
LR chi square: 407122.056
Model degrees of freedom: 11
Pseudo R-squared: 0.919
Prob: 0.000
----------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------
count
hed
1 HS 0.942** 0.007
2
Some
3 BA+ 0.009 0.007
wed
4 HS 1.132** 0.007
5
Some
6 BA+ -0.276** 0.008
ed_endog_full
7 1 1.410** 0.010
8 2 0.796** 0.007
9 3 0.583** 0.007
10 4 2.147** 0.010
ed_diff_3
11 1 -1.947** 0.023
12 _cons 8.964** 0.008
----------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 35694.39
Prob > chi2(4) = 0.0000
. *One last thing to look at is ways of testing whether two coefficients are significantly different from each other.
. test _x_8--_x_9=0
( 1) [count]_x_8 + [count]_x_9 = 0
chi2( 1) =28496.82
Prob > chi2 = 0.0000
. *the answer is up to this point, the two middle categories of educational endogamy are still significantly different, but as we add other terms into the model, this difference will dissipate, and we will end up saving 1df by combining them.
* Take a look at my excel file for a summary of this analysis.
. * if you have made changes to the dataset, remember to save before quitting
. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta", replace
file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved
. exit, clear