
       log:  C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2003\clas

> s 14.log

  log type:  text

 opened on:  12 Nov 2003, 11:26:59


. *We're going to look at some exact tests for odds ratios and independence


. csi 23 10 27 16, or


                 |   Exposed   Unexposed  |     Total


           Cases |        23          10  |        33

        Noncases |        27          16  |        43


           Total |        50          26  |        76

                 |                        |

            Risk |       .46    .3846154  |  .4342105

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |         .0753846       | -.1571114    .3078807 

      Risk ratio |            1.196       |  .6753688    2.117978 

 Attr. frac. ex. |         .1638796       | -.4806724    .5278515 

 Attr. frac. pop |         .1142191       |

      Odds ratio |         1.362963       |  .5249006    3.529677  (Cornfield)


                             chi2(1) =     0.40  Pr>chi2 = 0.5293


. csi 23 10 27 15, or


                 |   Exposed   Unexposed  |     Total


           Cases |        23          10  |        33

        Noncases |        27          15  |        42


           Total |        50          25  |        75

                 |                        |

            Risk |       .46          .4  |       .44

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |              .06       | -.1765637    .2965637 

      Risk ratio |             1.15       |   .652775    2.025966 

 Attr. frac. ex. |         .1304348       | -.5319213    .5064083 

 Attr. frac. pop |         .0909091       |

      Odds ratio |         1.277778       |  .4882768    3.335527  (Cornfield)


                             chi2(1) =     0.24  Pr>chi2 = 0.6217


. *That's the frog data.

. csi 23 10 27 15, or exact


                 |   Exposed   Unexposed  |     Total


           Cases |        23          10  |        33

        Noncases |        27          15  |        42


           Total |        50          25  |        75

                 |                        |

            Risk |       .46          .4  |       .44

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |              .06       | -.1765637    .2965637 

      Risk ratio |             1.15       |   .652775    2.025966 

 Attr. frac. ex. |         .1304348       | -.5319213    .5064083 

 Attr. frac. pop |         .0909091       |

      Odds ratio |         1.277778       |  .4882768    3.335527  (Cornfield)


                                1-sided Fisher's exact P = 0.4040

                                2-sided Fisher's exact P = 0.8055


. csi 3 4 5 6, or


                 |   Exposed   Unexposed  |     Total


           Cases |         3           4  |         7

        Noncases |         5           6  |        11


           Total |         8          10  |        18

                 |                        |

            Risk |      .375          .4  |  .3888889

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |            -.025       | -.4774796    .4274796 

      Risk ratio |            .9375       |   .290024     3.03046 

 Prev. frac. ex. |            .0625       |  -2.03046     .709976 

 Prev. frac. pop |         .0277778       |

      Odds ratio |               .9       |   .146097    5.647991  (Cornfield)


                             chi2(1) =     0.01  Pr>chi2 = 0.9139


. csi 3 4 5 6, or exact


                 |   Exposed   Unexposed  |     Total


           Cases |         3           4  |         7

        Noncases |         5           6  |        11


           Total |         8          10  |        18

                 |                        |

            Risk |      .375          .4  |  .3888889

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |            -.025       | -.4774796    .4274796 

      Risk ratio |            .9375       |   .290024     3.03046 

 Prev. frac. ex. |            .0625       |  -2.03046     .709976 

 Prev. frac. pop |         .0277778       |

      Odds ratio |               .9       |   .146097    5.647991  (Cornfield)


                                1-sided Fisher's exact P = 0.6478

                                2-sided Fisher's exact P = 1.0000


. csi 1000 4 5 6, or


                 |   Exposed   Unexposed  |     Total


           Cases |      1000           4  |      1004

        Noncases |         5           6  |        11


           Total |      1005          10  |      1015

                 |                        |

            Risk |  .9950249          .4  |  .9891626

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |         .5950249       |  .2913574    .8986923 

      Risk ratio |         2.487562       |  1.164393    5.314328 

 Attr. frac. ex. |             .598       |  .1411833    .8118295 

 Attr. frac. pop |         .5956175       |

      Odds ratio |              300       |  68.40492     1337.21  (Cornfield)


                             chi2(1) =   327.02  Pr>chi2 = 0.0000


. csi 1000 4 5 6, or exact


                 |   Exposed   Unexposed  |     Total


           Cases |      1000           4  |      1004

        Noncases |         5           6  |        11


           Total |      1005          10  |      1015

                 |                        |

            Risk |  .9950249          .4  |  .9891626

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |         .5950249       |  .2913574    .8986923 

      Risk ratio |         2.487562       |  1.164393    5.314328 

 Attr. frac. ex. |             .598       |  .1411833    .8118295 

 Attr. frac. pop |         .5956175       |

      Odds ratio |              300       |  68.40492     1337.21  (Cornfield)


                                1-sided Fisher's exact P = 0.0000

                                2-sided Fisher's exact P = 0.0000


. csi 42012 7146 17216 2361, or


                 |   Exposed   Unexposed  |     Total


           Cases |     42012        7146  |     49158

        Noncases |     17216        2361  |     19577


           Total |     59228        9507  |     68735

                 |                        |

            Risk |  .7093267    .7516567  |  .7151815

                 |                        |

                 |      Point estimate    |  [95% Conf. Interval]


 Risk difference |          -.04233       | -.0517533   -.0329067 

      Risk ratio |         .9436844       |  .9318199       .9557 

 Prev. frac. ex. |         .0563156       |     .0443    .0681801 

 Prev. frac. pop |         .0485264       |

      Odds ratio |         .8062581       |  .7670994    .8474158  (Cornfield)


                             chi2(1) =    72.06  Pr>chi2 = 0.0000


. *OK, that's enough for exact tests for the moment.

. use "C:\AAA Miker Files\newer web pages\soc_388_notes\ed intermar.dta", clear


. *This is the educational intermarriage data we have seen before.

. tabulate hed wed [fweight=count]


           |                     wed

       hed |         1          2          3          4 |     Total


         1 |    32,016     33,374      8,407        988 |    74,785

         2 |    28,370    137,876     43,783      8,446 |   218,475

         3 |     7,051     48,766     61,633     18,195 |   135,645

         4 |       984     13,794     28,635     51,224 |    94,637


     Total |    68,421    233,810    142,458     78,853 |   523,542



. desmat: poisson count hed wed


   Poisson regression


   Dependent variable                                                               count

   Optimization:                                                                       ml

   Number of observations:                                                             16

   Initial log likelihood:                                                    -221501.223

   Log likelihood:                                                            -113882.425

   LR chi square:                                                              215237.595

   Model degrees of freedom:                                                            6

   Pseudo R-squared:                                                                0.486

   Prob:                                                                            0.000


nr Effect                                                               Coeff        s.e.




1      2                                                                1.072**     0.004

2      3                                                                0.595**     0.005

3      4                                                                0.235**     0.005


4      2                                                                1.229**     0.004

5      3                                                                0.733**     0.005

6      4                                                                0.142**     0.005

7    _cons                                                              9.187**     0.005


*  p < .05

** p < .01


. poisgof, pearson


         Goodness-of-fit chi2  =    257372

         Prob > chi2(9)        =    0.0000


. *This is the independence model

. predict indep_count

(option n assumed; predicted number of events)


. table hed wed, contents (sum count sum  indep_count) row col



          |                       wed                      

      hed |        1         2         3         4     Total


        1 |    32016     33374      8407       988     74785

          | 9773.551  33398.43  20349.32   11263.7     74785


        2 |    28370    137876     43783      8446    218475

          |  28552.2  97569.33  59447.98   32905.5    218475


        3 |     7051     48766     61633     18195    135645

          | 17727.26  60578.06  36909.58   20430.1    135645


        4 |      984     13794     28635     51224     94637

          | 12367.98  42264.19  25751.13   14253.7     94637


    Total |    68421    233810    142458     78853    523542

          |    68421    233810    142458     78853    523542



. *residuals are simply observed minus expected

. gen indep_residuals=count- indep_count


. table hed wed, contents (sum  indep_residuals) row col



          |                          wed                        

      hed |         1          2          3          4      Total


        1 |  22242.45  -24.42969  -11942.32   -10275.7          0

        2 | -182.2031   40306.67  -15664.98   -24459.5  -.0039063

        3 | -10676.26  -11812.06   24723.42    -2235.1  -.0019531

        4 | -11383.98  -28470.19   2883.871    36970.3  -.0019531


    Total |  .0019531  -.0039063  -.0039063  -.0019531  -.0078125



. *for all practical purposes, the sum of residuals (row sums, col sums, and total sum) is

>  zero.

. *And it has to be that way.

. *one way of thinking about the expected standard deviation of counts in each cell is squ

> are root of the expected count.

. *gen sqrt_indep_count= indep_count^.5

. gen sqrt_indep_count= indep_count^.5


. gen std_resid= indep_residuals/ sqrt_indep_count


. table hed wed, contents (sum   std_resid) row col



          |                          wed                        

      hed |         1          2          3          4      Total


        1 |  224.9865  -.1336765    -83.717  -96.82131   44.31449

        2 | -1.078291   129.0388  -64.24824  -134.8383  -71.12604

        3 | -80.18597   -47.9919   128.6883   -15.6373   -15.1269

        4 | -102.3634  -138.4854   17.97123   309.6629   86.78525


    Total |   41.3588  -57.57221  -1.305752   62.36596    44.8468



. *Standardized residuals (and there are sevaral more sophisticated ways of standardizing)

>  tell you which cells are the most far away from actual counts.

. gen std_resid_squared= std_resid^2


. table hed wed, contents (sum  std_resid_squared) row col



          |                       wed                      

      hed |        1         2         3         4     Total


        1 | 50618.92  .0178694  7008.537  9374.366  67001.84

        2 | 1.162712  16651.01  4127.836  18181.37  38961.37

        3 | 6429.789  2303.222  16560.67   244.525  25538.21

        4 | 10478.27  19178.21   322.965  95891.09  125870.5


    Total | 67528.14  38132.46  28020.01  123691.4    257372



. *The sum of the squared residuals or the square of the pearson residuals, gives you pear

> son chisquare statistics.  This can tell you how much each cell contributes to the lack

> of good fit.

. save "C:\AAA Miker Files\newer web pages\soc_388_notes\ed intermar.dta", replace

file C:\AAA Miker Files\newer web pages\soc_388_notes\ed intermar.dta saved


. *Note that the sum of the squared pearson residuals, 257,372, is the pearson goodness of

>  fit statistic for this model.

. exit, clear