log type:  text

 opened on:  11 Oct 2005, 15:52:23

 

. edit

(3 vars, 4 obs pasted into editor)

- preserve

 

. tabulate race occ [fweight=count

invalid syntax

r(198);

 

. tabulate race occ [fweight=count]

 

           |          occ

      race |       Oth         WC |     Total

-----------+----------------------+----------

         n |     7,146      2,361 |     9,507

         w |    42,012     17,216 |    59,228

-----------+----------------------+----------

     Total |    49,158     19,577 |    68,735

 

 

. tabulate race occ [fweight=count], lrchi2 chi2

 

           |          occ

      race |       Oth         WC |     Total

-----------+----------------------+----------

         n |     7,146      2,361 |     9,507

         w |    42,012     17,216 |    59,228

-----------+----------------------+----------

     Total |    49,158     19,577 |    68,735

 

          Pearson chi2(1) =  72.0617   Pr = 0.000

 likelihood-ratio chi2(1) =  73.7553   Pr = 0.000

 

. desmat: poisson count race occ

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                   4

   Initial log likelihood:                                          -26656.550

   Log likelihood:                                                     -59.074

   LR chi square:                                                    53194.953

   Model degrees of freedom:                                                 2

   Pseudo R-squared:                                                     0.998

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     race

1      w                                                     1.829**     0.011

     occ

2      WC                                                   -0.921**     0.008

3    _cons                                                   8.825**     0.011

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  73.77235

         Prob > chi2(1)        =    0.0000

 

. poisgof, pearson

 

         Goodness-of-fit chi2  =  72.06174

         Prob > chi2(1)        =    0.0000

 

. *It takes 3 terms to generate the independence model, leaving 1 term for residual df

. *The tabulate command and the poisgof after loglinear model yield essentially the same statistics, 72.06 (Pearson) or 73.77 (LR chi2) on 1 df compared to the saturated model. This indicates that the independence model does not fit the data very well by the likelihood ratio test, which is consistent with the high degree of statistical significance we found in calculating the log odds ratio of this dataset by hand; the interaction between race and occupation is presicely what is not accounted for here. It is interesting to note that the eyeball test (Question 3) showed that the independence model and the actual data were not too far apart. Given large sample sizes, the LRT generates tests that some scholars consider to be overestimates of the power to distinguish between two competing hypotheses.

. *What is the probability of chisquare 73.77 on one df?

. display chi2tail(1,77.3)

1.469e-18

 

. *SMALL.

. predict A_independence

(option n assumed; predicted number of events)

 

. table race occ, contents (sum count sum  A_independence) row col

 

----------------------------------------

          |             occ            

     race |      Oth        WC     Total

----------+-----------------------------

        n |     7146      2361      9507

          |  6799.23   2707.77      9507

          |

        w |    42012     17216     59228

          | 42358.77  16869.23     59228

          |

    Total |    49158     19577     68735

          |    49158     19577     68735

----------------------------------------

 

. *That's a comparison of the actual data and the data under the assumption of independence.

. *Now on to question 7:

 

. desmat: poisson count race*occ

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                   4

   Initial log likelihood:                                          -26656.550

   Log likelihood:                                                     -22.196

   LR chi square:                                                    53268.708

   Model degrees of freedom:                                                 3

   Pseudo R-squared:                                                     0.999

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     race

1      w                                                     1.771**     0.013

     occ

2      WC                                                   -1.107**     0.024

     race.occ

3      w.WC                                                  0.215**     0.025

4    _cons                                                   8.874**     0.012

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  .0170694

         Prob > chi2(0)        =         .

 

. *Of course the residual degrees of freedom are zero, since the model has 4 terms. And the model fits the data exactly since the model has as many terms as the data itself. The goodness of fit statistic differs from zero only because the iterative fitting of the model doesn't end up fitting the data absolutely exactly.

. *And note, of course, that the interaction term is simply the log odds ratio and associated standard error we calculated by hand.

 

. predict A_saturated

(option n assumed; predicted number of events)

 

. table race occ, contents (sum count sum   A_saturated) row col

 

-------------------------------

          |         occ       

     race |   Oth     WC  Total

----------+--------------------

        n |  7146   2361   9507

          |  7146   2361   9507

          |

        w | 42012  17216  59228

          | 42012  17216  59228

          |

    Total | 49158  19577  68735

          | 49158  19577  68735

-------------------------------

 

. clear all.

 

. *Now let me load dataset B and get back to question 5.

. edit

(3 vars, 25 obs pasted into editor)

- preserve

 

. *starting with the independence model for dataset B

. desmat: poisson count husb wife

-------------------------------------------------------------------------------

   Poisson regression

-------------------------------------------------------------------------------

   Dependent variable                                                    count

   Optimization:                                                            ml

   Number of observations:                                                  25

   Initial log likelihood:                                          -80138.505

   Log likelihood:                                                  -22065.255

   LR chi square:                                                   116146.499

   Model degrees of freedom:                                                 8

   Pseudo R-squared:                                                     0.725

   Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

   count

     husb

1      Black                                                 1.084**     0.030

2      Mexican                                               1.249**     0.029

3      Oth Hisp                                             -0.747**     0.046

4      White                                                 3.017**     0.026

     wife

5      Black                                                 0.932**     0.029

6      Mexican                                               1.170**     0.028

7      Oth Hisp                                             -0.729**     0.043

8      White                                                 2.900**     0.025

9    _cons                                                   4.076**     0.035

-------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =   43952.7

         Prob > chi2(16)       =    0.0000

 

. *indpendence has r+c-1=9 terms, so residual df is 16.

. display chi2tail(16,43953)

0

 

. *The probability here is zero. We already noted that dataset B seems to be much further from independence than dataset A, though the goodness of fit chisquare for the independence model from dataset A has a fairly small probabilityas well.

 

. tabulate husb wife [fweight=count], lrchi2

 

           |                          wife

      husb | All Other      Black    Mexican   Oth Hisp      White |     Total

-----------+-------------------------------------------------------+----------

All Others |     1,022         19         78         18        360 |     1,497

     Black |        42      4,074         63         32        215 |     4,426

   Mexican |        95         25      3,947        143      1,009 |     5,219

  Oth Hisp |        18         16        132        239        304 |       709

     White |       492        103      1,156        373     28,453 |    30,577

-----------+-------------------------------------------------------+----------

     Total |     1,669      4,237      5,376        805     30,341 |    42,428

 

likelihood-ratio chi2(16) =  4.4e+04   Pr = 0.000

 

. scalar lrt_B=r(chi2_lr)

 

. display lrt_B

43952.723

 

. *The output from the tabulate command gave us only two significant digits for the chisquare statistic, but you can get the exact figure from stata. See the help or the manuals for this.

. *Anyway, the point is that the chisquare test for independence that accompanies every table you have ever seen is really a likelihood ratio test comparison of the actual data to the independence model.

 

. log close

  log type:  text

 closed on:  11 Oct 2005, 16:26:12

-------------------------------------------------------------------------------