fourth class notes

----------------------------------------------------------------------------

log: C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\f

> outh_class_notes.log

log type: text

opened on: 4 Oct 2007, 11:06:46

. describe

Contains data

obs: 16

vars: 3

size: 160 (99.9% of memory free)

----------------------------------------------------------------------------

> ---

storage display value

variable name type format label variable label

----------------------------------------------------------------------------

> ---

hed byte %8.0g

wed byte %8.0g

count long %12.0g

----------------------------------------------------------------------------

> ---

Sorted by:

Note: dataset has changed since last saved

. set linesize 75

. describe

Contains data

obs: 16

vars: 3

size: 160 (99.9% of memory free)

---------------------------------------------------------------------------

> ----

storage display value

variable name type format label variable label

---------------------------------------------------------------------------

> ----

hed byte %8.0g

wed byte %8.0g

count long %12.0g

---------------------------------------------------------------------------

> ----

Sorted by:

Note: dataset has changed since last saved

. table hed wed, contents (sum count) row col

--------------------------------------------------

| wed

hed | 1 2 3 4 Total

----------+---------------------------------------

1 | 32016 33374 8407 988 74785

2 | 28370 137876 43783 8446 218475

3 | 7051 48766 61633 18195 135645

4 | 984 13794 28635 51224 94637

Total | 68421 233810 142458 78853 523542

--------------------------------------------------

. label define ed_lbl 1 "<HS" 2 "HS" 3 "Some Col" 4 "BA+"

. label val hed ed_lbl

. label val wed ed_lbl

. table hed wed, contents (sum count) row col

*Note the use of labels to add text to variables which are coded as numbers.

------------------------------------------------------------

| wed

hed | <HS HS Some Col BA+ Total

----------+-------------------------------------------------

<HS | 32016 33374 8407 988 74785

HS | 28370 137876 43783 8446 218475

Some Col | 7051 48766 61633 18195 135645

BA+ | 984 13794 28635 51224 94637

Total | 68421 233810 142458 78853 523542

------------------------------------------------------------

. *First model to take a look at is the independence model

. desmat: poisson count hed wed

------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------

Dependent variable count

Optimization: ml

Number of observations: 16

Initial log likelihood: -221501.223

Log likelihood: -113882.425

LR chi square: 215237.595

Model degrees of freedom: 6

Pseudo R-squared: 0.486

Prob: 0.000

------------------------------------------------------------------------------

nr Effect Coeff s.e.

------------------------------------------------------------------------------

count

hed

1 HS 1.072** 0.004

2 Some Col 0.595** 0.005

3 BA+ 0.235** 0.005

wed

4 HS 1.229** 0.004

5 Some Col 0.733** 0.005

6 BA+ 0.142** 0.005

7 _cons 9.187** 0.005

------------------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 227578.9

Prob > chi2(9) = 0.0000

. *this chisquare test completely rejects the null hypothesis, which in this case is that the independence model fits the data...

. * expected value of chisquare (9) is 9

. * Independence model has (r-1)+(c-1)+1 terms.

. predict P_independence

(option n assumed; predicted number of events)

. table hed wed, contents(sum count sum P_independence) row col

------------------------------------------------------------

| wed

hed | <HS HS Some Col BA+ Total

----------+-------------------------------------------------

<HS | 32016 33374 8407 988 74785

| 9773.551 33398.43 20349.32 11263.7 74785

HS | 28370 137876 43783 8446 218475

| 28552.2 97569.33 59447.98 32905.5 218475

Some Col | 7051 48766 61633 18195 135645

| 17727.26 60578.06 36909.58 20430.1 135645

BA+ | 984 13794 28635 51224 94637

| 12367.98 42264.19 25751.13 14253.7 94637

Total | 68421 233810 142458 78853 523542

| 68421 233810 142458 78853 523542

------------------------------------------------------------

. *The eyeball test shows that the independence model under-predicts the endogamy diagonal where spouses have the same education, and over-predicts the other corners, where spouses differ the most.

. label var hed "husband's education"

. label var wed "wife's education"

. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta"

file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved

. *The next thing to add to this model is a term that explains the special preference to marry one of the same education as yourself.

. gen byte ed_endogamy_simple =0

. replace ed_endogamy_simple=1 if hed==wed

(4 real changes made)

. table hed wed, contents(mean ed_endogamy_simple)

--------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+

----------+---------------------------------------

<HS | 1 0 0 0

HS | 0 1 0 0

Some Col | 0 0 1 0

BA+ | 0 0 0 1

--------------------------------------------------

. desmat: poisson count hed wed ed_endogamy_simple

----------------------------------------------------------------------

Poisson regression

----------------------------------------------------------------------

Dependent variable count

Optimization: ml

Number of observations: 16

Initial log likelihood: -221501.223

Log likelihood: -41944.565

LR chi square: 359113.316

Model degrees of freedom: 7

Pseudo R-squared: 0.811

Prob: 0.000

----------------------------------------------------------------------

nr Effect Coeff s.e.

----------------------------------------------------------------------

count

hed

1 HS 0.740** 0.005

2 Some Col 0.414** 0.005

3 BA+ 0.216** 0.005

wed

4 HS 0.979** 0.005

5 Some Col 0.608** 0.005

6 BA+ 0.081** 0.005

ed_endogamy_simple

7 1 1.115** 0.003

8 _cons 9.067** 0.005

----------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 83703.13

Prob > chi2(8) = 0.0000

. *First thing to notice, this is an enormous improvement over the independence model, an improvement of 140K on 1df.

. *But still, goodness of fit rejects this model, which is to say this model does not yet fit the data very well.

. predict P_simple endogamy

(option n assumed; predicted number of events)

too many variables specified

r(103);

. predict P_simple_endogamy

(option n assumed; predicted number of events)

. table hed wed, contents (sum count sum P_simple_endogamy) row col

------------------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+ Total

----------+-------------------------------------------------

<HS | 32016 33374 8407 988 74785

| 26426.32 23047.51 15915.36 9395.808 74785

HS | 28370 137876 43783 8446 218475

| 18145.71 147304.7 33341.21 19683.35 218475

Some Col | 7051 48766 61633 18195 135645

| 13104.12 34867.67 73458.66 14214.54 135645

BA+ | 984 13794 28635 51224 94637

| 10744.85 28590.09 19742.76 35559.3 94637

Total | 68421 233810 142458 78853 523542

| 68421 233810 142458 78853 523542

------------------------------------------------------------

. table hed wed if hed==wed, contents (sum count sum P_simple_endogamy) row col

------------------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+ Total

----------+-------------------------------------------------

<HS | 32016 32016

| 26426.32 26426.32

HS | 137876 137876

| 147304.7 147304.7

Some Col | 61633 61633

| 73458.66 73458.66

BA+ | 51224 51224

| 35559.3 35559.3

Total | 32016 137876 61633 51224 282749

| 26426.32 147304.7 73458.66 35559.3 282749

------------------------------------------------------------

. *one of the next reasonable questions, is whether the force of endogamy, which is strongly positive, is different for different educational levels

. *let's quantify the difference in educational endogamy

. *one natural way to do this is to add 4 terms for endogamy, one for each cell, to see whether that improves the goodness of fit, and to see whether the resulting coefficients are very different.

. gen byte ed_endog_full=0

. replace ed_endog_full=hed if hed==wed

(4 real changes made)

. table hed wed, contents(mean ed_endog_full)

--------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+

----------+---------------------------------------

<HS | 1 0 0 0

HS | 0 2 0 0

Some Col | 0 0 3 0

BA+ | 0 0 0 4

--------------------------------------------------

. desmat: poisson count hed wed ed_endog_full

----------------------------------------------------------------------

Poisson regression

----------------------------------------------------------------------

Dependent variable count

Optimization: ml

Number of observations: 16

Initial log likelihood: -221501.223

Log likelihood: -24059.274

LR chi square: 394883.898

Model degrees of freedom: 10

Pseudo R-squared: 0.891

Prob: 0.000

----------------------------------------------------------------------

nr Effect Coeff s.e.

----------------------------------------------------------------------

count

hed

1 HS 1.134** 0.007

2 Some Col 0.819** 0.006

3 BA+ -0.017* 0.007

wed

4 HS 1.372** 0.007

5 Some Col 1.020** 0.007

6 BA+ -0.278** 0.008

ed_endog_full

7 1 1.722** 0.009

8 2 0.676** 0.007

9 3 0.537** 0.008

10 4 2.487** 0.009

11 _cons 8.652** 0.008

----------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 47932.55

Prob > chi2(5) = 0.0000

. *we improved the goodness of fit by 35K on 3 additional degrees of freedom. In other words, we need the additional 3 terms to fit the data, but this model does not yet fit the data well

. predict P_endogamy_full

(option n assumed; predicted number of events)

. table hed wed, contents(sum count sum P_endogamy_full) row col

------------------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+ Total

----------+-------------------------------------------------

<HS | 32016 33374 8407 988 74785

| 32016 22561.17 15875.39 4332.443 74785

HS | 28370 137876 43783 8446 218475

| 17790.29 137876 49342.89 13465.83 218475

Some Col | 7051 48766 61633 18195 135645

| 12987.8 51193.47 61633 9830.73 135645

BA+ | 984 13794 28635 51224 94637

| 5626.913 22179.36 15606.73 51224 94637

Total | 68421 233810 142458 78853 523542

| 68421 233810 142458 78853 523542

------------------------------------------------------------

. *The independence model implies that education does not matter at all in mate selection, i.e. that mate selection occurs independent of the education of the spouse. That seems to be not true at all.

. * The second model, simple endogamy, implies that there is a uniform force of endogamy and everyone else marries without regard to education. This fit better but still not well enough.

. * This last model assumes that the force of educational endogamy varies across educational groups, which seems to be true, but this model still makes no assumptions about what happens away from the educational endogamy diagonal, so the fit here is still not so good.

. *The next thing to add into the model is some kind of allowance for the lack of marriages where the educational attainments are most unequal.

. gen byte ed_diff_3=0

. replace ed_diff_3=1 if (hed==4 & wed==1) | (wed==4& hed==1)

(2 real changes made)

. table hed wed, contents(mean ed_diff_3)

--------------------------------------------------

husband's | wife's education

education | <HS HS Some Col BA+

----------+---------------------------------------

<HS | 0 0 0 1

HS | 0 0 0 0

Some Col | 0 0 0 0

BA+ | 1 0 0 0

--------------------------------------------------

. desmat: poisson count hed wed ed_endog_full ed_diff_3

----------------------------------------------------------------------

Poisson regression

----------------------------------------------------------------------

Dependent variable count

Optimization: ml

Number of observations: 16

Initial log likelihood: -221501.223

Log likelihood: -17940.195

LR chi square: 407122.056

Model degrees of freedom: 11

Pseudo R-squared: 0.919

Prob: 0.000

----------------------------------------------------------------------

nr Effect Coeff s.e.

----------------------------------------------------------------------

count

hed

1 HS 0.942** 0.007

2 Some Col 0.667** 0.007

3 BA+ 0.009 0.007

wed

4 HS 1.132** 0.007

5 Some Col 0.815** 0.007

6 BA+ -0.276** 0.008

ed_endog_full

7 1 1.410** 0.010

8 2 0.796** 0.007

9 3 0.583** 0.007

10 4 2.147** 0.010

ed_diff_3

11 1 -1.947** 0.023

12 _cons 8.964** 0.008

----------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 35694.39

Prob > chi2(4) = 0.0000

. *One last thing to look at is ways of testing whether two coefficients are significantly different from each other.

. test _x_8--_x_9=0

( 1) [count]_x_8 + [count]_x_9 = 0

chi2( 1) =28496.82

Prob > chi2 = 0.0000

. *the answer is up to this point, the two middle categories of educational endogamy are still significantly different, but as we add other terms into the model, this difference will dissipate, and we will end up saving 1df by combining them.

* Take a look at my excel file for a summary of this analysis.

. * if you have made changes to the dataset, remember to save before quitting

. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta", replace

file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved

. exit, clear