log type:  text

 opened on:  12 Oct 2005, 10:58:52

 

. *first let me show you ID and BIC

. *For our simple dataset A first

. edit

(3 vars, 4 obs pasted into editor)

- preserve

 

. desmat: poisson count race occ

------------------------------------------------------------------------------------------

   Poisson regression

------------------------------------------------------------------------------------------

   Dependent variable                                                               count

   Optimization:                                                                       ml

   Number of observations:                                                              4

   Initial log likelihood:                                                     -26656.550

   Log likelihood:                                                                -59.074

   LR chi square:                                                               53194.953

   Model degrees of freedom:                                                            2

   Pseudo R-squared:                                                                0.998

   Prob:                                                                            0.000

------------------------------------------------------------------------------------------

nr Effect                                                               Coeff        s.e.

------------------------------------------------------------------------------------------

   count

     race

1      w                                                                1.829**     0.011

     occ

2      WC                                                              -0.921**     0.008

3    _cons                                                              8.825**     0.011

------------------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  73.77235

         Prob > chi2(1)        =    0.0000

 

. set linesize 79

 

. tabulate race occ [fweight=count]

 

           |          occ

      race |       Oth         WC |     Total

-----------+----------------------+----------

         n |     7,146      2,361 |     9,507

         w |    42,012     17,216 |    59,228

-----------+----------------------+----------

     Total |    49,158     19,577 |    68,735

 

 

. *BIC= GOF-dfln(N)

 

. display 73.77-(1*(ln(68735)))

62.631986

 

. *positive BIC means the model is rejected in comparsion to the saturated model, whereas negative BIC is preferred to the saturated model

. predict A_indep

(option n assumed; predicted number of events)

 

. generate ID_parts= 50*(abs((A_indep/68735)-(count/68735)))

 

. table race occ, contents(sum ID_parts) row col

 

----------------------------------------

          |             occ            

     race |      Oth        WC     Total

----------+-----------------------------

        n | .2522511  .2522511  .5045021

        w | .2522511  .2522511  .5045021

          |

    Total | .5045021  .5045021  1.009004

----------------------------------------

 

. *The ID for this model is the total of the statistic summed over all cells, which I get by generating the cell statistic and then tabling, summing, and looking at the total, which is 1.009

. *just to briefly demonstrate what I mean by BIC and ID, and how to calculate them.

. *The interpretation of ID is percentage of the actual data that would have to

move to fit the model, or vice versa. Smaller numbers mean a smaller percentage needs to move, that means better fit. In this case roughly 1% of the dataset would have to move from one cell to another in order to transform the actual data into the data under the hypothesis of independence, or vice versa. In other words the same 1% would move from the expected values under independence to create the actual data. 1% may not seem like a lot, but in this case that 1% is not only statistically significant, but rather substantial when applied to the whole US labor market of 100 million persons.

. clear all

 

. use "C:\AAA Miker Files\newer web pages\soc_388_notes\ed intermar.dta", clear

 

. *reminder, now back to the educational intermarriage dataset.

. desmat: poisgof count hed wed  endog

varlist not allowed

r(101);

 

. desmat: poisson count hed wed  endog

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                  16

   Initial log likelihood:                                         -221501.223

   Log likelihood:                                                  -24059.274

   LR chi square:                                                   394883.898

   Model degrees of freedom:                                                10

   Pseudo R-squared:                                                     0.891

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     hed

1      2                                                     1.134**     0.007

2      3                                                     0.819**     0.006

3      4                                                    -0.017*      0.007

     wed

4      2                                                     1.372**     0.007

5      3                                                     1.020**     0.007

6      4                                                    -0.278**     0.008

     endog

7      1                                                     1.722**     0.009

8      2                                                     0.676**     0.007

9      3                                                     0.537**     0.008

10     4                                                     2.487**     0.009

11   _cons                                                   8.652**     0.008

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  47932.55

         Prob > chi2(5)        =    0.0000

 

. *in order to fit the data better, we had to account for some of the off-diagonal interactions.

. desmat: poisson count hed wed  endog eddiff3 eddiff2

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                  16

   Initial log likelihood:                                         -221501.223

   Log likelihood:                                                    -145.628

   LR chi square:                                                   442711.189

   Model degrees of freedom:                                                12

   Pseudo R-squared:                                                     0.999

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     hed

1      2                                                     0.627**     0.008

2      3                                                     0.355**     0.007

3      4                                                     0.180**     0.008

     wed

4      2                                                     0.817**     0.008

5      3                                                     0.461**     0.007

6      4                                                    -0.142**     0.009

     endog

7      1                                                     0.763**     0.011

8      2                                                     0.779**     0.007

9      3                                                     0.601**     0.008

10     4                                                     1.195**     0.011

     eddiff3

11     1                                                    -2.749**     0.024

     eddiff2

12     1                                                    -1.068**     0.006

13   _cons                                                   9.611**     0.009

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  105.2568

         Prob > chi2(3)        =    0.0000

 

. *What my examination of the pearson residuals showed, was that the two most extreme cells in terms of educational difference were fit poorly

. desmat: poisson count hed wed  endog eddiff3 eddiff2 eddiff3m

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                  16

   Initial log likelihood:                                         -221501.223

   Log likelihood:                                                    -117.905

   LR chi square:                                                   442766.636

   Model degrees of freedom:                                                13

   Pseudo R-squared:                                                     0.999

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     hed

1      2                                                     0.630**     0.008

2      3                                                     0.360**     0.007

3      4                                                     0.188**     0.008

     wed

4      2                                                     0.813**     0.008

5      3                                                     0.456**     0.007

6      4                                                    -0.153**     0.009

     endog

7      1                                                     0.762**     0.011

8      2                                                     0.779**     0.007

9      3                                                     0.601**     0.008

10     4                                                     1.197**     0.011

     eddiff3

11     1                                                    -2.563**     0.033

     eddiff2

12     1                                                    -1.068**     0.006

     eddiff3m

13     1                                                    -0.346**     0.046

14   _cons                                                   9.612**     0.009

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  49.81037

         Prob > chi2(2)        =    0.0000

 

. log close

 

  log type:  text

 closed on:  12 Oct 2005, 11:55:29

-------------------------------------------------------------------------------