----------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\f

> outh_class_notes.log

  log type:  text

 opened on:   4 Oct 2007, 11:06:46

 

. describe

 

Contains data

  obs:            16                         

 vars:             3                         

 size:           160 (99.9% of memory free)

----------------------------------------------------------------------------

> ---

              storage  display     value

variable name   type   format      label      variable label

----------------------------------------------------------------------------

> ---

hed             byte   %8.0g                 

wed             byte   %8.0g                  

count           long   %12.0g                

----------------------------------------------------------------------------

> ---

Sorted by: 

     Note:  dataset has changed since last saved

 

. set linesize 75

 

. describe

 

Contains data

  obs:            16                         

 vars:             3                         

 size:           160 (99.9% of memory free)

---------------------------------------------------------------------------

> ----

              storage  display     value

variable name   type   format      label      variable label

---------------------------------------------------------------------------

> ----

hed             byte   %8.0g                 

wed             byte   %8.0g                 

count           long   %12.0g                

---------------------------------------------------------------------------

> ----

Sorted by: 

     Note:  dataset has changed since last saved

 

. table hed wed, contents (sum count) row col

 

--------------------------------------------------

          |                  wed                 

      hed |      1       2       3       4   Total

----------+---------------------------------------

        1 |  32016   33374    8407     988   74785

        2 |  28370  137876   43783    8446  218475

        3 |   7051   48766   61633   18195  135645

        4 |    984   13794   28635   51224   94637

          |

    Total |  68421  233810  142458   78853  523542

--------------------------------------------------

 

. label define ed_lbl 1 "<HS" 2 "HS" 3 "Some Col" 4 "BA+"

 

. label val hed ed_lbl

 

. label val wed ed_lbl

 

. table hed wed, contents (sum count) row col

 

*Note the use of labels to add text to variables which are coded as numbers.

 

------------------------------------------------------------

          |                       wed                      

      hed |      <HS        HS  Some Col       BA+     Total

----------+-------------------------------------------------

      <HS |    32016     33374      8407       988     74785

       HS |    28370    137876     43783      8446    218475

 Some Col |     7051     48766     61633     18195    135645

      BA+ |      984     13794     28635     51224     94637

          |

    Total |    68421    233810    142458     78853    523542

------------------------------------------------------------

 

. *First model to take a look at is the independence model

. desmat: poisson count hed wed

------------------------------------------------------------------------------

   Poisson regression

------------------------------------------------------------------------------

   Dependent variable                                                   count

   Optimization:                                                           ml

   Number of observations:                                                 16

   Initial log likelihood:                                        -221501.223

   Log likelihood:                                                -113882.425

   LR chi square:                                                  215237.595

   Model degrees of freedom:                                                6

   Pseudo R-squared:                                                    0.486

   Prob:                                                                0.000

------------------------------------------------------------------------------

nr Effect                                                   Coeff        s.e.

------------------------------------------------------------------------------

   count

     hed

1      HS                                                   1.072**     0.004

2      Some Col                                             0.595**     0.005

3      BA+                                                  0.235**     0.005

     wed

4      HS                                                   1.229**     0.004

5      Some Col                                             0.733**     0.005

6      BA+                                                  0.142**     0.005

7    _cons                                                  9.187**     0.005

------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  227578.9

         Prob > chi2(9)        =    0.0000

 

. *this chisquare test completely rejects the null hypothesis, which in this case is that the independence model fits the data...

. * expected value of chisquare (9) is 9

. * Independence model has (r-1)+(c-1)+1 terms.

. predict P_independence

(option n assumed; predicted number of events)

 

. table hed wed, contents(sum count sum P_independence) row col

 

------------------------------------------------------------

          |                       wed                      

      hed |      <HS        HS  Some Col       BA+     Total

----------+-------------------------------------------------

      <HS |    32016     33374      8407       988     74785

          | 9773.551  33398.43  20349.32   11263.7     74785

          |

       HS |    28370    137876     43783      8446    218475

          |  28552.2  97569.33  59447.98   32905.5    218475

          |

 Some Col |     7051     48766     61633     18195    135645

          | 17727.26  60578.06  36909.58   20430.1    135645

          |

      BA+ |      984     13794     28635     51224     94637

          | 12367.98  42264.19  25751.13   14253.7     94637

          |

    Total |    68421    233810    142458     78853    523542

          |    68421    233810    142458     78853    523542

------------------------------------------------------------

 

. *The eyeball test shows that the independence model under-predicts the endogamy diagonal where spouses have the same education, and over-predicts the other corners, where spouses differ the most.

. label var hed "husband's education"

 

. label var wed "wife's education"

 

. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta"

file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved

 

. *The next thing to add to this model is a term that explains the special preference to marry one of the same education as yourself.

. gen byte ed_endogamy_simple =0

 

. replace  ed_endogamy_simple=1 if hed==wed

(4 real changes made)

 

. table hed wed, contents(mean  ed_endogamy_simple)

 

--------------------------------------------------

husband's |            wife's education          

education |      <HS        HS  Some Col       BA+

----------+---------------------------------------

      <HS |        1         0         0         0

       HS |        0         1         0         0

 Some Col |        0         0         1         0

      BA+ |        0         0         0         1

--------------------------------------------------

 

 

 

. desmat: poisson count hed wed ed_endogamy_simple

----------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------

   Dependent variable                                           count

   Optimization:                                                   ml

   Number of observations:                                         16

   Initial log likelihood:                                -221501.223

   Log likelihood:                                         -41944.565

   LR chi square:                                          359113.316

   Model degrees of freedom:                                        7

   Pseudo R-squared:                                            0.811

   Prob:                                                        0.000

----------------------------------------------------------------------

nr Effect                                           Coeff        s.e.

----------------------------------------------------------------------

   count

     hed

1      HS                                           0.740**     0.005

2      Some Col                                     0.414**     0.005

3      BA+                                          0.216**     0.005

     wed

4      HS                                           0.979**     0.005

5      Some Col                                     0.608**     0.005

6      BA+                                          0.081**     0.005

     ed_endogamy_simple

7      1                                            1.115**     0.003

8    _cons                                          9.067**     0.005

----------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  83703.13

         Prob > chi2(8)        =    0.0000

 

. *First thing to notice, this is an enormous improvement over the independence model, an improvement of 140K on 1df.

. *But still, goodness of fit rejects this model, which is to say this model does not yet fit the data very well.

. predict P_simple endogamy

(option n assumed; predicted number of events)

too many variables specified

r(103);

 

. predict P_simple_endogamy

(option n assumed; predicted number of events)

 

. table hed wed, contents (sum count sum  P_simple_endogamy) row col

 

------------------------------------------------------------

husband's |                 wife's education               

education |      <HS        HS  Some Col       BA+     Total

----------+-------------------------------------------------

      <HS |    32016     33374      8407       988     74785

          | 26426.32  23047.51  15915.36  9395.808     74785

          |

       HS |    28370    137876     43783      8446    218475

          | 18145.71  147304.7  33341.21  19683.35    218475

          |

 Some Col |     7051     48766     61633     18195    135645

          | 13104.12  34867.67  73458.66  14214.54    135645

          |

      BA+ |      984     13794     28635     51224     94637

          | 10744.85  28590.09  19742.76   35559.3     94637

          |

    Total |    68421    233810    142458     78853    523542

          |    68421    233810    142458     78853    523542

------------------------------------------------------------

 

. table hed wed if hed==wed, contents (sum count sum  P_simple_endogamy) row col

 

------------------------------------------------------------

husband's |                 wife's education               

education |      <HS        HS  Some Col       BA+     Total

----------+-------------------------------------------------

      <HS |    32016                                   32016

          | 26426.32                                26426.32

          |

       HS |             137876                        137876

          |           147304.7                      147304.7

          |

 Some Col |                        61633               61633

          |                     73458.66            73458.66

          |

      BA+ |                                  51224     51224

          |                                35559.3   35559.3

          |

    Total |    32016    137876     61633     51224    282749

          | 26426.32  147304.7  73458.66   35559.3    282749

------------------------------------------------------------

 

. *one of the next reasonable questions, is whether the force of endogamy, which is strongly positive, is different for different educational levels

. *let's quantify the difference in educational endogamy

. *one natural way to do this is to add 4 terms for endogamy, one for each cell, to see whether that improves the goodness of fit, and to see whether the resulting coefficients are very different.

. gen byte ed_endog_full=0

 

. replace ed_endog_full=hed if hed==wed

(4 real changes made)

 

. table hed wed, contents(mean ed_endog_full)

 

--------------------------------------------------

husband's |            wife's education          

education |      <HS        HS  Some Col       BA+

----------+---------------------------------------

      <HS |        1         0         0         0

       HS |        0         2         0         0

 Some Col |        0         0         3         0

      BA+ |        0         0         0         4

--------------------------------------------------

 

. desmat: poisson count hed wed  ed_endog_full

----------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------

   Dependent variable                                           count

   Optimization:                                                   ml

   Number of observations:                                         16

   Initial log likelihood:                                -221501.223

   Log likelihood:                                         -24059.274

   LR chi square:                                          394883.898

   Model degrees of freedom:                                       10

   Pseudo R-squared:                                            0.891

   Prob:                                                        0.000

----------------------------------------------------------------------

nr Effect                                           Coeff        s.e.

----------------------------------------------------------------------

   count

     hed

1      HS                                           1.134**     0.007

2      Some Col                                     0.819**     0.006

3      BA+                                         -0.017*      0.007

     wed

4      HS                                           1.372**     0.007

5      Some Col                                     1.020**     0.007

6      BA+                                         -0.278**     0.008

     ed_endog_full

7      1                                            1.722**     0.009

8      2                                            0.676**     0.007

9      3                                            0.537**     0.008

10     4                                            2.487**     0.009

11   _cons                                          8.652**     0.008

----------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  47932.55

         Prob > chi2(5)        =    0.0000

 

. *we improved the goodness of fit by 35K on 3 additional degrees of freedom. In other words, we need the additional 3 terms to fit the data, but this model does not yet fit the data well

. predict P_endogamy_full

(option n assumed; predicted number of events)

 

. table hed wed, contents(sum count sum  P_endogamy_full) row col

 

------------------------------------------------------------

husband's |                 wife's education               

education |      <HS        HS  Some Col       BA+     Total

----------+-------------------------------------------------

      <HS |    32016     33374      8407       988     74785

          |    32016  22561.17  15875.39  4332.443     74785

          |

       HS |    28370    137876     43783      8446    218475

          | 17790.29    137876  49342.89  13465.83    218475

          |

 Some Col |     7051     48766     61633     18195    135645

          |  12987.8  51193.47     61633   9830.73    135645

          |

      BA+ |      984     13794     28635     51224     94637

          | 5626.913  22179.36  15606.73     51224     94637

          |

    Total |    68421    233810    142458     78853    523542

          |    68421    233810    142458     78853    523542

------------------------------------------------------------

 

. *The independence model implies that education does not matter at all in mate selection, i.e. that mate selection occurs independent of the education of the spouse. That seems to be not true at all.

. * The second model, simple endogamy, implies that there is a uniform force of endogamy and everyone else marries without regard to education. This fit better but still not well enough.

. * This last model assumes that the force of educational endogamy varies across educational groups, which seems to be true, but this model still makes no assumptions about what happens away from the educational endogamy diagonal, so the fit here is still not so good.

. *The next thing to add into the model is some kind of allowance for the lack of marriages where the educational attainments are most unequal.

. gen byte ed_diff_3=0

 

. replace ed_diff_3=1 if (hed==4 & wed==1) | (wed==4& hed==1)

(2 real changes made)

 

. table hed wed, contents(mean  ed_diff_3)

 

--------------------------------------------------

husband's |            wife's education          

education |      <HS        HS  Some Col       BA+

----------+---------------------------------------

      <HS |        0         0         0         1

       HS |        0         0         0         0

 Some Col |        0         0         0         0

      BA+ |        1         0         0         0

--------------------------------------------------

 

 

. desmat: poisson count hed wed  ed_endog_full ed_diff_3

----------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------

   Dependent variable                                           count

   Optimization:                                                   ml

   Number of observations:                                         16

   Initial log likelihood:                                -221501.223

   Log likelihood:                                         -17940.195

   LR chi square:                                          407122.056

   Model degrees of freedom:                                       11

   Pseudo R-squared:                                            0.919

   Prob:                                                        0.000

----------------------------------------------------------------------

nr Effect                                           Coeff        s.e.

----------------------------------------------------------------------

   count

     hed

1      HS                                           0.942**     0.007

2      Some Col                                     0.667**     0.007

3      BA+                                          0.009       0.007

     wed

4      HS                                           1.132**     0.007

5      Some Col                                     0.815**     0.007

6      BA+                                         -0.276**     0.008

     ed_endog_full

7      1                                            1.410**     0.010

8      2                                            0.796**     0.007

9      3                                            0.583**     0.007

10     4                                            2.147**     0.010

     ed_diff_3

11     1                                           -1.947**     0.023

12   _cons                                          8.964**     0.008

----------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  35694.39

         Prob > chi2(4)        =    0.0000

 

. *One last thing to look at is ways of testing whether two coefficients are significantly different from each other.

. test _x_8--_x_9=0

 

 ( 1)  [count]_x_8 + [count]_x_9 = 0

 

           chi2(  1) =28496.82

         Prob > chi2 =    0.0000

 

. *the answer is up to this point, the two middle categories of educational endogamy are still significantly different, but as we add other terms into the model, this difference will dissipate, and we will end up saving 1df by combining them.

* Take a look at my excel file for a summary of this analysis.

. * if you have made changes to the dataset, remember to save before quitting

. save "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta", replace

file C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\ed_intermar.dta saved

 

. exit, clear