log for analysis of weights with loglin

--------------------------------------------------------------------------------------------

log: C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2003\clogg and eliason

> log.log

log type: text

opened on: 17 Nov 2003, 11:11:08

. use "C:\AAA Miker Files\newer web pages\soc_388_notes\clogg and eliason data.dta", clear

. table color labor, contents(sum uwcount sum wtcount) by(sex)

----------------------------------------------

sex and | labor

color | unemployed part-time other

----------+-----------------------------------

male |

white | 3511 4227 31467

| 5024241 5951616 4.43e+07

black | 604 356 2245

| 1160284 658244 3960180

other | 165 157 924

| 169785 176468 1134672

----------+-----------------------------------

female |

white | 2281 7833 18945

| 3179714 1.08e+07 2.66e+07

black | 545 563 2132

| 929225 916001 3556176

other | 89 216 725

| 91581 231120 817075

----------------------------------------------

. table color labor, contents(sum uwcount sum weight sum wtcount) by(sex)

----------------------------------------------

sex and | labor

color | unemployed part-time other

----------+-----------------------------------

male |

white | 3511 4227 31467

| 1431 1408 1408

| 5024241 5951616 4.43e+07

black | 604 356 2245

| 1921 1849 1764

| 1160284 658244 3960180

other | 165 157 924

| 1029 1124 1228

| 169785 176468 1134672

----------+-----------------------------------

female |

white | 2281 7833 18945

| 1394 1373 1405

| 3179714 1.08e+07 2.66e+07

black | 545 563 2132

| 1705 1627 1668

| 929225 916001 3556176

other | 89 216 725

| 1029 1070 1127

| 91581 231120 817075

----------------------------------------------

. table color, contents (mean weight)

------------------------

color | mean(weight)

----------+-------------

white | 1403.17

black | 1755.67

other | 1101.17

------------------------

. *The weights are not uniform.

. *uniform weights would have no effect on the model.

. *The reason that the CPS has weights is to reflect the fact that some populations have lower

rates of response to the survey than other populations.

. *The weight that we're using here is something like the inverse of the sampling frequency.

Blacks are sampled at a lower rate because they're less likely to respond to the survey,

so those Blacks that do respond to the CPS get a higher weight in the CPS.

. desmat: poisson uwcount labor*sex labor*color sex*color, desmat(zval)

option desmat() not allowed

r(198);

. desmat: poisson uwcount labor*sex labor*color sex*color, desrep(zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable uwcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -81627.074

Log likelihood: -123.390

LR chi square: 163007.367

Model degrees of freedom: 13

Pseudo R-squared: 0.998

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

uwcount

labor

1 part-time 0.210** 0.022 9.583

2 other 2.182** 0.017 127.658

sex

3 female -0.446** 0.025 -18.192

labor.sex

4 part-time.female 1.017** 0.030 33.621

5 other.female -0.049 0.026 -1.891

color

6 black -1.771** 0.035 -51.143

7 other -3.178** 0.067 -47.667

labor.color

8 part-time.black -1.043** 0.048 -21.928

9 part-time.other -0.380** 0.084 -4.550

10 other.black -0.822** 0.036 -22.834

11 other.other -0.292** 0.069 -4.238

sex.color

12 female.black 0.354** 0.027 13.279

13 female.other 0.126** 0.044 2.891

14 _cons 8.170** 0.016 502.506

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 86.53056

Prob > chi2(4) = 0.0000

. poisgof, pearson

Goodness-of-fit chi2 = 89.79915

Prob > chi2(4) = 0.0000

. *By summary statistics, you can tell this is C+E's number 1. But the Z values don't correspond to their reported Z values.

That's because they use deviation coding and exclude the highest group

. desmat: poisson uwcount labor*sex labor*color sex*color, defcon(dev(3)) desrep(zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable uwcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -81627.074

Log likelihood: -123.390

LR chi square: 163007.367

Model degrees of freedom: 13

Pseudo R-squared: 0.998

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

uwcount

labor

1 unemployed -0.677** 0.018 -38.575

2 part-time -0.433** 0.017 -26.205

sex

3 male -0.018* 0.009 -2.004

labor.sex

4 unemployed.male 0.161** 0.009 18.482

5 part-time.male -0.347** 0.007 -46.843

color

6 white 1.852** 0.011 162.348

7 black -0.364** 0.014 -25.659

labor.color

8 unemployed.white -0.282** 0.018 -15.386

9 unemployed.black 0.340** 0.022 15.425

10 part-time.white 0.193** 0.017 11.300

11 part-time.black -0.229** 0.022 -10.465

sex.color

12 male.white 0.080** 0.009 9.162

13 male.black -0.097** 0.011 -8.679

14 _cons 7.053** 0.011 640.281

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. *It's the same model, but with the dummy variable coding that is the same as the coding

> that C+E use. That's deviation coding, see the desmat help file.

. poisgof

Goodness-of-fit chi2 = 86.53056

Prob > chi2(4) = 0.0000

. poisgof, pearson.

option pearson. not allowed

r(198);

. poisgof, pearson

Goodness-of-fit chi2 = 89.79915

Prob > chi2(4) = 0.0000

. *Same model, #1

. *Number two, which is not reported in Clogg and Eliason, is to inflate the count by the

> weight and use the new huge weighted counts as dependent variable

. desmat: poisson wtcount labor*sex labor*color sex*color, defcon(dev(3)) desrep(zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable wtcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -1.137e+08

Log likelihood: -71806.645

LR chi square: 2.273e+08

Model degrees of freedom: 13

Pseudo R-squared: 0.999

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

wtcount

labor

1 unemployed -0.689** 0.001 -1329.498

2 part-time -0.435** 0.000 -915.000

sex

3 male 0.013** 0.000 50.973

labor.sex

4 unemployed.male 0.165** 0.000 717.857

5 part-time.male -0.340** 0.000 -1735.469

color

6 white 1.858** 0.000 5625.057

7 black -0.133** 0.000 -345.910

labor.color

8 unemployed.white -0.263** 0.001 -490.386

9 unemployed.black 0.383** 0.001 631.891

10 part-time.white 0.186** 0.000 380.579

11 part-time.black -0.231** 0.001 -394.068

sex.color

12 male.white 0.059** 0.000 240.222

13 male.black -0.083** 0.000 -280.576

14 _cons 14.294** 0.000 44573.248

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 143357.3

Prob > chi2(4) = 0.0000

. poisgof, pearson

Goodness-of-fit chi2 = 148750.5

Prob > chi2(4) = 0.0000

. *Use of the weighted counts (here the weights are huge) does 2 things.

. *It shrinks the SE to miniscule values

. *It inflates the GOF test way out of town.

* use of weights in this way is the most wrong.

. desmat, sigcut (0.001 0.0000001) sigsym(*** *&*)

varlist not allowed

r(101);

. desrep, sigcut (0.001 0.0000001) sigsym(*** *&*)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable wtcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -1.137e+08

Log likelihood: -71806.645

LR chi square: 2.273e+08

Model degrees of freedom: 13

Pseudo R-squared: 0.999

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e.

------------------------------------------------------------------------------------------

wtcount

1 _x_1 -0.689*&* 0.001

2 _x_2 -0.435*&* 0.000

3 _x_3 0.013*&* 0.000

4 _x_4 0.165*&* 0.000

5 _x_5 -0.340*&* 0.000

6 _x_6 1.858*&* 0.000

7 _x_7 -0.133*&* 0.000

8 _x_8 -0.263*&* 0.001

9 _x_9 0.383*&* 0.001

10 _x_10 0.186*&* 0.000

11 _x_11 -0.231*&* 0.001

12 _x_12 0.059*&* 0.000

13 _x_13 -0.083*&* 0.000

14 _cons 14.294*&* 0.000

------------------------------------------------------------------------------------------

*** p < .001

*&* p < 1.00000000e-07

. *You can do whatever you want with the cutoffs and symbols using desrep

. desrep, zval

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable wtcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -1.137e+08

Log likelihood: -71806.645

LR chi square: 2.273e+08

Model degrees of freedom: 13

Pseudo R-squared: 0.999

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

wtcount

1 _x_1 -0.689** 0.001 -1329.498

2 _x_2 -0.435** 0.000 -915.000

3 _x_3 0.013** 0.000 50.973

4 _x_4 0.165** 0.000 717.857

5 _x_5 -0.340** 0.000 -1735.469

6 _x_6 1.858** 0.000 5625.057

7 _x_7 -0.133** 0.000 -345.910

8 _x_8 -0.263** 0.001 -490.386

9 _x_9 0.383** 0.001 631.891

10 _x_10 0.186** 0.000 380.579

11 _x_11 -0.231** 0.001 -394.068

12 _x_12 0.059** 0.000 240.222

13 _x_13 -0.083** 0.000 -280.576

14 _cons 14.294** 0.000 44573.248

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. *The model that uses wtcount as the dependent variable actually has the correct coefficients

. desmat: poisson wtcount labor*sex labor*color sex*color, defcon(dev(3)) desrep(zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable wtcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -1.137e+08

Log likelihood: -71806.645

LR chi square: 2.273e+08

Model degrees of freedom: 13

Pseudo R-squared: 0.999

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

wtcount

labor

1 unemployed -0.689** 0.001 -1329.498

2 part-time -0.435** 0.000 -915.000

sex

3 male 0.013** 0.000 50.973

labor.sex

4 unemployed.male 0.165** 0.000 717.857

5 part-time.male -0.340** 0.000 -1735.469

color

6 white 1.858** 0.000 5625.057

7 black -0.133** 0.000 -345.910

labor.color

8 unemployed.white -0.263** 0.001 -490.386

9 unemployed.black 0.383** 0.001 631.891

10 part-time.white 0.186** 0.000 380.579

11 part-time.black -0.231** 0.001 -394.068

sex.color

12 male.white 0.059** 0.000 240.222

13 male.black -0.083** 0.000 -280.576

14 _cons 14.294** 0.000 44573.248

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. *If you look at this disastrous model, the coefficients are actually correct, compare to the final column of C+E table 6

. *That might lead a person to simply rescale the weights, and apply the rescaled weights*

> uwcount as the new depvar. That would be Clogg+ Eliason # 2

. desmat: poisson wt_count_rescale labor*sex labor*color sex*color, defcon(dev(3)) desrep

> (zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable wt_count_rescale

Optimization: ml

Number of observations: 18

Initial log likelihood: -80153.975

Log likelihood: -130.414

LR chi square: 160047.123

Model degrees of freedom: 13

Pseudo R-squared: 0.998

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

wt_count_rescale

labor

1 unemployed -0.689** 0.020 -35.281

2 part-time -0.435** 0.018 -24.282

sex

3 male 0.013 0.010 1.353

labor.sex

4 unemployed.male 0.165** 0.009 19.050

5 part-time.male -0.340** 0.007 -46.055

color

6 white 1.858** 0.012 149.274

7 black -0.133** 0.015 -9.180

labor.color

8 unemployed.white -0.263** 0.020 -13.013

9 unemployed.black 0.383** 0.023 16.769

10 part-time.white 0.186** 0.018 10.100

11 part-time.black -0.231** 0.022 -10.457

sex.color

12 male.white 0.059** 0.009 6.375

13 male.black -0.083** 0.011 -7.446

14 _cons 7.036** 0.012 582.209

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. *Okay. This model has the correct coefficients, because it takes the weights into accou

> nt. And it has standard errors that are not crazy, because the scale of the dataset ref

> lects the actual scale of unweighted counts.

. *But the SE are still not quite right.

. poisgof

Goodness-of-fit chi2 = 100.9525

Prob > chi2(4) = 0.0000

. poisgof, pearson

Goodness-of-fit chi2 = 104.7538

Prob > chi2(4) = 0.0000

. *The way Stata takes the weights into account is through an option called exposure.

. desmat: poisson uwcount labor*sex labor*color sex*color, exposure (invweight) defcon(de

> v(3)) desrep(zval)

------------------------------------------------------------------------------------------

Poisson regression

------------------------------------------------------------------------------------------

Dependent variable uwcount

Optimization: ml

Number of observations: 18

Initial log likelihood: -84619.027

Log likelihood: -124.919

LR chi square: 168988.216

Model degrees of freedom: 13

Pseudo R-squared: 0.999

Prob: 0.000

------------------------------------------------------------------------------------------

nr Effect Coeff s.e. z

------------------------------------------------------------------------------------------

uwcount

labor

1 unemployed -0.688** 0.018 -39.155

2 part-time -0.440** 0.017 -26.614

sex

3 male 0.013 0.009 1.381

labor.sex

4 unemployed.male 0.166** 0.009 18.987

5 part-time.male -0.343** 0.007 -46.302

color

6 white 1.860** 0.011 163.037

7 black -0.136** 0.014 -9.590

labor.color

8 unemployed.white -0.265** 0.018 -14.436

9 unemployed.black 0.386** 0.022 17.495

10 part-time.white 0.191** 0.017 11.175

11 part-time.black -0.238** 0.022 -10.861

sex.color

12 male.white 0.058** 0.009 6.705

13 male.black -0.085** 0.011 -7.611

14 _cons 14.292** 0.011 1296.842

ln(invweight) (offset)

------------------------------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 89.58815

Prob > chi2(4) = 0.0000

. poisgof, pearson

Goodness-of-fit chi2 = 93.54537

Prob > chi2(4) = 0.0000

. *Here what you have is a model that has the coefficients of models 2 and 3, but the standard errors of model # 1

. exit, clear