. *Now a quick look at one way to incorporate continuous variables with many levels into your dataset.

. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear

 

. tabulate  maritl race if age>18

 

                      |                     p25

   Marital Status p17 |     White      Black  Amer Indi      Asian |     Total

----------------------+--------------------------------------------+----------

married, spouse prese |    49,583      3,398        566      1,962 |    55,509

married, AF spouse pr |       282         43          2         20 |       347

married, spouse absen |     1,047        155         24        120 |     1,346

              widowed |     5,544        794         62        157 |     6,557

             divorced |     8,118      1,063        150        167 |     9,498

            separated |     1,472        498         35         56 |     2,061

        never married |    15,776      3,100        330        846 |    20,052

----------------------+--------------------------------------------+----------

                Total |    81,822      9,051      1,169      3,328 |    95,370

 

 

. tabulate race maritl if age>18

 

            |                   Marital Status p17

        p25 | married,   married,   married,     widowed   divorced |     Total

------------+-------------------------------------------------------+----------

      White |    49,583        282      1,047      5,544      8,118 |    81,822

      Black |     3,398         43        155        794      1,063 |     9,051

Amer Indian |       566          2         24         62        150 |     1,169

      Asian |     1,962         20        120        157        167 |     3,328

------------+-------------------------------------------------------+----------

      Total |    55,509        347      1,346      6,557      9,498 |    95,370

 

 

            |  Marital Status p17

        p25 | separated  never mar |     Total

------------+----------------------+----------

      White |     1,472     15,776 |    81,822

      Black |       498      3,100 |     9,051

Amer Indian |        35        330 |     1,169

      Asian |        56        846 |     3,328

------------+----------------------+----------

      Total |     2,061     20,052 |    95,370

 

 

. *OK, my race category has 4.

. *If we want to include income and age as continuous variables, here is what we do

. contract sex race maritl if age>18, zero

 

. rename _freq count

 

. sort maritl sex race

 

*The data need to be sorted and saved so we can match merge later.

 

. describe

 

Contains data from C:\Documents and Settings\Michael Rosenfeld\My Documents\newer

> web pages\soc_meth_proj3\cps_y2k_numeric.dta

  obs:            56                         

 vars:             4                          16 May 2004 11:38

 size:           504 (99.9% of memory free)

-------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

-------------------------------------------------------------------------------

maritl          byte   %26.0g      marlbl     Marital Status p17

sex             byte   %8.0g       sexnm      p20

race            byte   %11.0g      racenm     p25

count           int    %12.0g                 Frequency

-------------------------------------------------------------------------------

Sorted by:  maritl  sex  race

     Note:  dataset has changed since last saved

 

. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta"

file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta saved

 

. clear all

 

. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear

 

. table maritl sex race, contents (mean age mean ernval) replace

 

----------------------------------------------------------------------------------

                           |                      p25 and p20                    

                           | ----- White ----- ----- Black ----- -- Amer Indian --

        Marital Status p17 |     male   female     male   female     male   female

---------------------------+------------------------------------------------------

   married, spouse present |  48.8624  46.5191  48.3705  46.1828  46.5054  44.5709

                           | 35293.31 15117.26 26513.44 17735.44 25456.47 12258.17

                           |

married, AF spouse present |  35.2727  32.4205     38.7  33.6667              36.5

                           |    24867 10558.61    17190 18295.15             29500

                           |

    married, spouse absent |  45.3807  48.6034  46.2206  46.6705     34.6  44.1429

                           | 21912.31 13030.78 13399.46 13870.09    20800     3910

                           |

                   widowed |  72.5132  72.5255  70.0347  69.3856     69.4       66

                           |  8137.62 3717.476 6222.861 4397.989   4944.2 3321.167

                           |

                  divorced |  47.3213  48.0498  47.9876   48.384  47.4328  44.0482

                           | 28454.29 19517.37 21636.77 17372.46 16301.63    18223

                           |

                 separated |  42.9467  40.7664  44.4148  45.8308  42.9167   42.087

                           | 25741.93 13212.84 15448.19 12733.79     9515 8515.218

                           |

             never married |  16.6091  15.9766  17.6193  19.4669  16.4299    15.09

                           | 6881.324 4936.816  5928.62 6224.273 3787.177 2753.067

----------------------------------------------------------------------------------

 

----------------------------------------------

                           |    p25 and p20  

                           | ----- Asian -----

        Marital Status p17 |     male   female

---------------------------+------------------

   married, spouse present |  47.3441   44.207

                           |  38021.8 17515.35

                           |

married, AF spouse present |       29  34.6316

                           |    29000 13664.21

                           |

    married, spouse absent |  43.8824  46.3654

                           | 25892.93 13103.06

                           |

                   widowed |  71.5455  67.9853

                           | 6909.091 5301.691

                           |

                  divorced |  46.2586   46.055

                           | 33023.66 22472.59

                           |

                 separated |  45.0909     42.2

                           | 37878.14 11585.14

                           |

             never married |  17.2418  16.4641

                           | 8573.331 6483.706

----------------------------------------------

 

. rename table1 age

 

. rename table2 income

 

. clear all

 

. *I need to impose the same over 18 restriction

. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear

 

. table maritl sex race if age>18, contents (mean age mean ernval) replace

 

----------------------------------------------------------------------------------

                           |                      p25 and p20                    

                           | ----- White ----- ----- Black ----- -- Amer Indian --

        Marital Status p17 |     male   female     male   female     male   female

---------------------------+------------------------------------------------------

   married, spouse present |  48.8765  46.5879  48.3895  46.2185  46.5054  44.7596

                           | 35307.28 15142.04 26528.47 17751.03 25456.47 12315.93

                           |

married, AF spouse present |  35.2727  32.6462     38.7  33.6667              36.5

                           |    24867 10665.28    17190 18295.15             29500

                           |

    married, spouse absent |  45.4848  48.9674  46.2206       47     34.6  44.1429

                           | 21965.96 13164.15 13399.46 14029.52    20800     3910

                           |

                   widowed |  72.6264  72.5255  70.0347  69.4662     69.4       66

                           | 8153.624 3717.476 6222.861 4404.755   4944.2 3321.167

                           |

                  divorced |  47.4123  48.1209  48.2344  48.4849  47.4328  44.0482

                           | 28533.57 19558.24 21793.66 17424.95 16301.63    18223

                           |

                 separated |  43.6056  41.0984  45.4353  46.0976  42.9167   42.087

                           | 26347.53 13382.48 15993.42 12799.95     9515 8515.218

                           |

             never married |  30.5652  31.2294  32.3861  33.1888  29.9565  29.5205

                           | 18887.22 15237.13 15700.48 14066.88 10755.82 9350.466

----------------------------------------------------------------------------------

 

----------------------------------------------

                           |    p25 and p20  

                           | ----- Asian -----

        Marital Status p17 |     male   female

---------------------------+------------------

   married, spouse present |  47.3441  44.2587

                           |  38021.8 17539.51

                           |

married, AF spouse present |       29  34.6316

                           |    29000 13664.21

                           |

    married, spouse absent |  43.8824  46.3654

                           | 25892.93 13103.06

                           |

                   widowed |  71.5455  68.3704

                           | 6909.091 5340.963

                           |

                  divorced |  46.2586   46.055

                           | 33023.66 22472.59

                           |

                 separated |  45.0909  42.9412

                           | 37878.14 11826.47

                           |

             never married |  28.8807  29.1221

                           | 21152.29 17624.26

----------------------------------------------

 

. rename table1 age

 

. rename table2 income

 

. sort maritl sex race

*The dataset has to be sorted in order to merge

 

 

. merge maritl sex race using "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta"

(label sexnm already defined)

(label marlbl already defined)

(label racenm already defined)

 

. tabulate _merge

 

     _merge |      Freq.     Percent        Cum.

------------+-----------------------------------

          2 |          1        1.79        1.79

          3 |         55       98.21      100.00

------------+-----------------------------------

      Total |         56      100.00

 

*We had one zero count out of 56 cells, so that combination has missing data for mean age and mean income.

 

. drop _merge

 

. describe

 

Contains data

  obs:            56                         

 vars:             6                         

 size:           952 (99.9% of memory free)

-------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

-------------------------------------------------------------------------------

maritl          byte   %26.0g      marlbl     Marital Status p17

sex             byte   %8.0g       sexnm      p20

race            byte   %11.0g      racenm     p25

age             float  %8.0g                  mean(age)

income          float  %9.0g                  mean(ernval)

count           int    %12.0g                 Frequency

-------------------------------------------------------------------------------

Sorted by: 

     Note:  dataset has changed since last saved

 

* Notice the dataset still has 56 cells, 7(maritl)x 4 (race) x 2 (sex). All we have done is added an average of age an income to each of the 56 cells. This enables us to take account of additional variables without expanding the number of cells (and increasing data sparseness) beyond reason.

 

 

 

. desmat: poisson count sex*maritl race*maritl

----------------------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------------------

   Dependent variable                                                       count

   Optimization:                                                               ml

   Number of observations:                                                     56

   Initial log likelihood:                                            -154544.164

   Log likelihood:                                                       -308.260

   LR chi square:                                                      308471.809

   Model degrees of freedom:                                                   34

   Pseudo R-squared:                                                        0.998

   Prob:                                                                    0.000

----------------------------------------------------------------------------------

nr Effect                                                       Coeff        s.e.

----------------------------------------------------------------------------------

   count

     sex

1      female                                                  -0.012       0.008

     maritl

2      married, AF spouse present                              -6.835**     0.176

3      married, spouse absent                                  -3.865**     0.042

4      widowed                                                 -3.200**     0.030

5      divorced                                                -1.984**     0.017

6      separated                                               -3.812**     0.039

7      never married                                           -1.083**     0.012

     sex.maritl

8      female.married, AF spouse present                        2.265**     0.183

9      female.married, spouse absent                            0.015       0.055

10     female.widowed                                           1.505**     0.033

11     female.divorced                                          0.324**     0.022

12     female.separated                                         0.525**     0.046

13     female.never married                                    -0.130**     0.017

     race

14     Black                                                   -2.680**     0.018

15     Amer Indian                                             -4.473**     0.042

16     Asian                                                   -3.230**     0.023

     race.maritl

17     Black.married, AF spouse present                         0.800**     0.165

18     Black.married, spouse absent                             0.770**     0.088

19     Black.widowed                                            0.737**     0.042

20     Black.divorced                                           0.647**     0.037

21     Black.separated                                          1.597**     0.055

22     Black.never married                                      1.053**     0.026

23     Amer Indian.married, AF spouse present                  -0.476       0.711

24     Amer Indian.married, spouse absent                       0.697**     0.211

25     Amer Indian.widowed                                     -0.021       0.135

26     Amer Indian.divorced                                     0.482**     0.093

27     Amer Indian.separated                                    0.734**     0.176

28     Amer Indian.never married                                0.606**     0.070

29     Asian.married, AF spouse present                         0.584*      0.233

30     Asian.married, spouse absent                             1.063**     0.099

31     Asian.widowed                                           -0.335**     0.084

32     Asian.divorced                                          -0.654**     0.081

33     Asian.separated                                         -0.039       0.138

34     Asian.never married                                      0.304**     0.042

35   _cons                                                     10.124**     0.006

----------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =   223.269

         Prob > chi2(21)       =    0.0000

 

. gen age_sq=age^2

(1 missing value generated)

 

. summarize age

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

         age |        55    45.45743    12.02094   28.88069   72.62635

 

. replace age=45.46 if age==.

(1 real change made)

 

*replace missing value with the global mean- a reasonable but certainly not the only way to impute missing values.

 

. summarize income

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

      income |        55    17034.32    8729.349   3321.167    38021.8

 

. replace income = 17034 if income==.

(1 real change made)

 

. replace age_sq=age^2

(1 real change made)

 

. desmat: poisson count sex*maritl race*maritl @age @age_sq @income

----------------------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------------------

   Dependent variable                                                       count

   Optimization:                                                               ml

   Number of observations:                                                     56

   Initial log likelihood:                                            -154544.164

   Log likelihood:                                                       -306.453

   LR chi square:                                                      308475.421

   Model degrees of freedom:                                                   37

   Pseudo R-squared:                                                        0.998

   Prob:                                                                    0.000

----------------------------------------------------------------------------------

nr Effect                                                       Coeff        s.e.

----------------------------------------------------------------------------------

   count

     sex

1      female                                                   0.081       0.066

     maritl

2      married, AF spouse present                              -6.424**     0.319

3      married, spouse absent                                  -3.769**     0.080

4      divorced                                                -1.943**     0.036

5      separated                                               -3.670**     0.115

     sex.maritl

6      female.married, AF spouse present                        2.312**     0.195

7      female.married, spouse absent                           -0.113       0.119

8      female.widowed                                           1.416**     0.063

9      female.divorced                                          0.246**     0.065

10     female.separated                                         0.509**     0.057

11     female.never married                                    -0.246**     0.070

     race

12     Black                                                   -2.665**     0.021

13     Amer Indian                                             -4.417**     0.059

14     Asian                                                   -3.202**     0.043

     maritl

15     widowed                                                 -2.887**     0.854

16     never married                                           -0.454       0.392

     race.maritl

17     Black.married, AF spouse present                         0.713**     0.171

18     Black.married, spouse absent                             0.774**     0.088

19     Black.widowed                                            0.631**     0.161

20     Black.divorced                                           0.635**     0.042

21     Black.separated                                          1.502**     0.097

22     Black.never married                                      0.955**     0.069

23     Amer Indian.married, AF spouse present                  -0.755       0.727

24     Amer Indian.married, spouse absent                       0.864**     0.238

25     Amer Indian.widowed                                     -0.233       0.315

26     Amer Indian.divorced                                     0.488**     0.093

27     Amer Indian.separated                                    0.694**     0.182

28     Amer Indian.never married                                0.626**     0.082

29     Asian.married, AF spouse present                         0.491*      0.242

30     Asian.married, spouse absent                             1.068**     0.099

31     Asian.widowed                                           -0.470*      0.197

32     Asian.divorced                                          -0.663**     0.082

33     Asian.separated                                         -0.121       0.153

34     Asian.never married                                      0.365**     0.074

35   mean(age)                                                  0.110       0.091

36   age_sq                                                    -0.001       0.001

37   mean(ernval)                                               0.000       0.000

38   _cons                                                      6.995**     2.195

----------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

         Goodness-of-fit chi2  =  219.6567

         Prob > chi2(18)       =    0.0000

 

 

r(1);

 

. desmat: poisson count sex*maritl race*maritl @age*maritl @income*maritl, difficult

----------------------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------------------

   Dependent variable                                                       count

   Optimization:                                                               ml

   Number of observations:                                                     56

   Initial log likelihood:                                            -154544.164

   Log likelihood:                                                       -212.432

   LR chi square:                                                      308663.463

   Model degrees of freedom:                                                   48

   Pseudo R-squared:                                                        0.999

   Prob:                                                                    0.000

----------------------------------------------------------------------------------

nr Effect                                                       Coeff        s.e.

----------------------------------------------------------------------------------

   count

     sex

1      female                                                  -0.346**     0.124

     maritl

2      married, spouse absent                                 -15.588**     4.450

     sex.maritl

3      female.married, AF spouse present                        1.927**     0.695

4      female.married, spouse absent                            0.447*      0.199

5      female.widowed                                           1.911**     0.236

6      female.divorced                                          0.890**     0.172

7      female.separated                                         0.990**     0.173

8      female.never married                                     0.709**     0.149

     race

9      Black                                                   -2.741**     0.029

10     Amer Indian                                             -4.740**     0.120

11     Asian                                                   -3.446**     0.113

     race.maritl

12     Black.married, AF spouse present                         1.196**     0.365

13     Black.married, spouse absent                             1.092**     0.154

14     Black.widowed                                            0.908**     0.222

15     Black.divorced                                           0.803**     0.064

16     Black.separated                                          1.498**     0.335

17     Black.never married                                      0.943**     0.165

18     Amer Indian.married, AF spouse present                   0.549       1.241

19     Amer Indian.married, spouse absent                       2.055**     0.672

20     Amer Indian.widowed                                      0.485       0.505

21     Amer Indian.divorced                                     1.093**     0.281

22     Amer Indian.separated                                    1.027**     0.282

23     Amer Indian.never married                                2.824**     0.206

24     Asian.married, AF spouse present                         0.966**     0.368

25     Asian.married, spouse absent                             1.386**     0.182

26     Asian.widowed                                            0.001       0.274

27     Asian.divorced                                          -0.446**     0.153

28     Asian.separated                                          0.081       0.189

29     Asian.never married                                      0.649**     0.199

30   mean(age)                                                 -0.116*      0.055

     maritl

31     never married                                          -21.517**     3.669

     age.maritl

32     age.married, AF spouse present                           0.143       0.132

33     age.married, spouse absent                               0.221*      0.088

34     age.widowed                                              0.154       0.093

35     age.divorced                                             0.181*      0.084

36     age.separated                                            0.163       0.088

37     age.never married                                        0.459**     0.099

38   mean(ernval)                                              -0.000       0.000

     maritl

39     married, AF spouse present                             -12.128*      5.495

40     widowed                                                -11.885       6.322

41     divorced                                               -11.795**     4.340

42     separated                                              -11.742**     3.697

     income.maritl

43     income.married, AF spouse present                       -0.000       0.000

44     income.married, spouse absent                            0.000*      0.000

45     income.widowed                                           0.000       0.000

46     income.divorced                                          0.000*      0.000

47     income.separated                                         0.000       0.000

48     income.never married                                     0.000**     0.000

49   _cons                                                     15.927**     2.676

----------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. poisgof

 

*Bringing in age and income as continuous variables improves the fit a lot, but does not expand the number of cells beyond 56.

 

         Goodness-of-fit chi2  =  31.61457

         Prob > chi2(7)        =    0.0000

 

. clear all

 

. exit, clear