. *Now a quick look at one way to incorporate continuous variables with many levels into your dataset.
. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear
. tabulate maritl race if age>18
| p25
Marital Status p17 | White Black Amer Indi Asian | Total
----------------------+--------------------------------------------+----------
married, spouse prese | 49,583 3,398 566 1,962 | 55,509
married, AF spouse pr | 282 43 2 20 | 347
married, spouse absen | 1,047 155 24 120 | 1,346
widowed | 5,544 794 62 157 | 6,557
divorced | 8,118 1,063 150 167 | 9,498
separated | 1,472 498 35 56 | 2,061
never married | 15,776 3,100 330 846 | 20,052
----------------------+--------------------------------------------+----------
Total | 81,822 9,051 1,169 3,328 | 95,370
. tabulate race maritl if age>18
| Marital Status p17
p25 | married, married, married, widowed divorced | Total
------------+-------------------------------------------------------+----------
White | 49,583 282 1,047 5,544 8,118 | 81,822
Black | 3,398 43 155 794 1,063 | 9,051
Amer Indian | 566 2 24 62 150 | 1,169
Asian | 1,962 20 120 157 167 | 3,328
------------+-------------------------------------------------------+----------
Total | 55,509 347 1,346 6,557 9,498 | 95,370
| Marital Status p17
p25 | separated never mar | Total
------------+----------------------+----------
White | 1,472 15,776 | 81,822
Black | 498 3,100 | 9,051
Amer Indian | 35 330 | 1,169
Asian | 56 846 | 3,328
------------+----------------------+----------
Total | 2,061 20,052 | 95,370
. *OK, my race category has 4.
. *If we want to include income and age as continuous variables, here is what we do
. contract sex race maritl if age>18, zero
. rename _freq count
. sort maritl sex race
*The data need to be sorted and saved so we can match merge later.
. describe
Contains data from C:\Documents and Settings\Michael Rosenfeld\My Documents\newer
> web pages\soc_meth_proj3\cps_y2k_numeric.dta
obs: 56
vars:
4
size: 504 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
maritl byte %26.0g marlbl Marital Status p17
sex byte %8.0g sexnm p20
race byte %11.0g racenm p25
count int %12.0g Frequency
-------------------------------------------------------------------------------
Sorted by: maritl sex race
Note: dataset has changed since last saved
. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta"
file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta saved
. clear all
. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear
. table maritl sex race, contents (mean age mean ernval) replace
----------------------------------------------------------------------------------
| p25 and p20
| ----- White ----- ----- Black ----- -- Amer Indian --
Marital Status p17 | male female male female male female
---------------------------+------------------------------------------------------
married, spouse present | 48.8624 46.5191 48.3705 46.1828 46.5054 44.5709
| 35293.31 15117.26 26513.44 17735.44 25456.47 12258.17
|
married, AF spouse present | 35.2727 32.4205 38.7 33.6667 36.5
| 24867 10558.61 17190 18295.15 29500
|
married, spouse absent | 45.3807 48.6034 46.2206 46.6705 34.6 44.1429
| 21912.31 13030.78 13399.46 13870.09 20800 3910
|
widowed | 72.5132 72.5255 70.0347 69.3856 69.4 66
| 8137.62 3717.476 6222.861 4397.989 4944.2 3321.167
|
divorced | 47.3213 48.0498 47.9876 48.384 47.4328 44.0482
| 28454.29 19517.37 21636.77 17372.46 16301.63 18223
|
separated | 42.9467 40.7664 44.4148 45.8308 42.9167 42.087
| 25741.93 13212.84 15448.19 12733.79 9515 8515.218
|
never married | 16.6091 15.9766 17.6193 19.4669 16.4299 15.09
| 6881.324 4936.816 5928.62 6224.273 3787.177 2753.067
----------------------------------------------------------------------------------
----------------------------------------------
| p25 and p20
| ----- Asian -----
Marital Status p17 | male female
---------------------------+------------------
married, spouse present | 47.3441 44.207
| 38021.8 17515.35
|
married, AF spouse present | 29 34.6316
| 29000 13664.21
|
married, spouse absent | 43.8824 46.3654
| 25892.93 13103.06
|
widowed | 71.5455 67.9853
| 6909.091 5301.691
|
divorced | 46.2586 46.055
| 33023.66 22472.59
|
separated | 45.0909 42.2
| 37878.14 11585.14
|
never married | 17.2418 16.4641
| 8573.331 6483.706
----------------------------------------------
. rename table1 age
. rename table2 income
. clear all
. *I need to impose the same over 18 restriction
. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_y2k_numeric.dta", clear
. table maritl sex race if age>18, contents (mean age mean ernval) replace
----------------------------------------------------------------------------------
| p25 and p20
| ----- White ----- ----- Black ----- -- Amer Indian --
Marital Status p17 | male female male female male female
---------------------------+------------------------------------------------------
married, spouse present | 48.8765 46.5879 48.3895 46.2185 46.5054 44.7596
| 35307.28 15142.04 26528.47 17751.03 25456.47 12315.93
|
married, AF spouse present | 35.2727 32.6462 38.7 33.6667 36.5
| 24867 10665.28 17190 18295.15 29500
|
married, spouse absent | 45.4848 48.9674 46.2206 47 34.6 44.1429
| 21965.96 13164.15 13399.46 14029.52 20800 3910
|
widowed | 72.6264 72.5255 70.0347 69.4662 69.4 66
| 8153.624 3717.476 6222.861 4404.755 4944.2 3321.167
|
divorced | 47.4123 48.1209 48.2344 48.4849 47.4328 44.0482
| 28533.57 19558.24 21793.66 17424.95 16301.63 18223
|
separated | 43.6056 41.0984 45.4353 46.0976 42.9167 42.087
| 26347.53 13382.48 15993.42 12799.95 9515 8515.218
|
never married | 30.5652 31.2294 32.3861 33.1888 29.9565 29.5205
| 18887.22 15237.13 15700.48 14066.88 10755.82 9350.466
----------------------------------------------------------------------------------
----------------------------------------------
| p25 and p20
| ----- Asian -----
Marital Status p17 | male female
---------------------------+------------------
married, spouse present | 47.3441 44.2587
| 38021.8 17539.51
|
married, AF spouse present | 29 34.6316
| 29000 13664.21
|
married, spouse absent | 43.8824 46.3654
| 25892.93 13103.06
|
widowed | 71.5455 68.3704
| 6909.091 5340.963
|
divorced | 46.2586 46.055
| 33023.66 22472.59
|
separated | 45.0909 42.9412
| 37878.14 11826.47
|
never married | 28.8807 29.1221
| 21152.29 17624.26
----------------------------------------------
. rename table1 age
. rename table2 income
. sort maritl sex race
*The dataset has to be sorted in order to merge
. merge maritl sex race using "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\methods tabular arrays\race sex maritl status dataset.dta"
(label sexnm already defined)
(label marlbl already defined)
(label racenm already defined)
. tabulate _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
2 | 1 1.79 1.79
3 | 55 98.21 100.00
------------+-----------------------------------
Total | 56 100.00
*We had one zero count out of 56 cells, so that combination has missing data for mean age and mean income.
. drop _merge
. describe
Contains data
obs: 56
vars: 6
size: 952 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
maritl byte %26.0g marlbl Marital Status p17
sex byte %8.0g sexnm p20
race byte %11.0g racenm p25
age float %8.0g mean(age)
income float %9.0g mean(ernval)
count int %12.0g Frequency
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
* Notice the dataset still has 56 cells, 7(maritl)x 4 (race) x 2 (sex). All we have done is added an average of age an income to each of the 56 cells. This enables us to take account of additional variables without expanding the number of cells (and increasing data sparseness) beyond reason.
. desmat: poisson count sex*maritl race*maritl
----------------------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 56
Initial log likelihood: -154544.164
Log likelihood: -308.260
LR chi square: 308471.809
Model degrees of freedom: 34
Pseudo R-squared: 0.998
Prob: 0.000
----------------------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------------------
count
sex
1 female -0.012 0.008
maritl
2 married, AF spouse present -6.835** 0.176
3 married, spouse absent -3.865** 0.042
4 widowed -3.200** 0.030
5 divorced -1.984** 0.017
6 separated -3.812** 0.039
7 never married -1.083** 0.012
sex.maritl
8 female.married, AF spouse present 2.265** 0.183
9 female.married, spouse absent 0.015 0.055
10 female.widowed 1.505** 0.033
11 female.divorced 0.324** 0.022
12 female.separated 0.525** 0.046
13 female.never married -0.130** 0.017
race
14 Black -2.680** 0.018
15 Amer Indian -4.473** 0.042
16 Asian -3.230** 0.023
race.maritl
17 Black.married, AF spouse present 0.800** 0.165
18 Black.married, spouse absent 0.770** 0.088
19 Black.widowed 0.737** 0.042
20 Black.divorced 0.647** 0.037
21 Black.separated 1.597** 0.055
22 Black.never married 1.053** 0.026
23 Amer Indian.married, AF spouse present -0.476 0.711
24 Amer Indian.married, spouse absent 0.697** 0.211
25 Amer Indian.widowed -0.021 0.135
26 Amer Indian.divorced 0.482** 0.093
27 Amer Indian.separated 0.734** 0.176
28 Amer Indian.never married 0.606** 0.070
29 Asian.married, AF spouse present 0.584* 0.233
30 Asian.married, spouse absent 1.063** 0.099
31 Asian.widowed -0.335** 0.084
32 Asian.divorced -0.654** 0.081
33 Asian.separated -0.039 0.138
34 Asian.never married 0.304** 0.042
35 _cons 10.124** 0.006
----------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 223.269
Prob > chi2(21) = 0.0000
. gen age_sq=age^2
(1 missing value generated)
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 55 45.45743 12.02094 28.88069 72.62635
. replace age=45.46 if age==.
(1 real change made)
*replace missing value with the global mean- a reasonable but certainly not the only way to impute missing values.
. summarize income
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
income | 55 17034.32 8729.349 3321.167 38021.8
. replace income = 17034 if income==.
(1 real change made)
. replace age_sq=age^2
(1 real change made)
. desmat: poisson count sex*maritl race*maritl @age @age_sq @income
----------------------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 56
Initial log likelihood: -154544.164
Log likelihood: -306.453
LR chi square: 308475.421
Model degrees of freedom: 37
Pseudo R-squared: 0.998
Prob: 0.000
----------------------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------------------
count
sex
1 female 0.081 0.066
maritl
2 married, AF spouse present -6.424** 0.319
3 married, spouse absent -3.769** 0.080
4 divorced -1.943** 0.036
5 separated -3.670** 0.115
sex.maritl
6 female.married, AF spouse present 2.312** 0.195
7 female.married, spouse absent -0.113 0.119
8 female.widowed 1.416** 0.063
9 female.divorced 0.246** 0.065
10 female.separated 0.509** 0.057
11 female.never married -0.246** 0.070
race
12 Black -2.665** 0.021
13 Amer Indian -4.417** 0.059
14 Asian -3.202** 0.043
maritl
15 widowed -2.887** 0.854
16 never married -0.454 0.392
race.maritl
17 Black.married, AF spouse present 0.713** 0.171
18 Black.married, spouse absent 0.774** 0.088
19 Black.widowed 0.631** 0.161
20 Black.divorced 0.635** 0.042
21 Black.separated 1.502** 0.097
22 Black.never married 0.955** 0.069
23 Amer Indian.married, AF spouse present -0.755 0.727
24 Amer Indian.married, spouse absent 0.864** 0.238
25 Amer Indian.widowed -0.233 0.315
26 Amer Indian.divorced 0.488** 0.093
27 Amer Indian.separated 0.694** 0.182
28 Amer Indian.never married 0.626** 0.082
29 Asian.married, AF spouse present 0.491* 0.242
30 Asian.married, spouse absent 1.068** 0.099
31 Asian.widowed -0.470* 0.197
32 Asian.divorced -0.663** 0.082
33 Asian.separated -0.121 0.153
34 Asian.never married 0.365** 0.074
35 mean(age) 0.110 0.091
36 age_sq -0.001 0.001
37 mean(ernval) 0.000 0.000
38 _cons 6.995** 2.195
----------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 219.6567
Prob > chi2(18) = 0.0000
r(1);
. desmat: poisson count sex*maritl race*maritl @age*maritl @income*maritl, difficult
----------------------------------------------------------------------------------
Poisson regression
----------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 56
Initial log likelihood: -154544.164
Log likelihood: -212.432
LR chi square: 308663.463
Model degrees of freedom: 48
Pseudo R-squared: 0.999
Prob: 0.000
----------------------------------------------------------------------------------
nr Effect Coeff s.e.
----------------------------------------------------------------------------------
count
sex
1 female -0.346** 0.124
maritl
2 married, spouse absent -15.588** 4.450
sex.maritl
3 female.married, AF spouse present 1.927** 0.695
4 female.married, spouse absent 0.447* 0.199
5 female.widowed 1.911** 0.236
6 female.divorced 0.890** 0.172
7 female.separated 0.990** 0.173
8 female.never married 0.709** 0.149
race
9 Black -2.741** 0.029
10 Amer Indian -4.740** 0.120
11 Asian -3.446** 0.113
race.maritl
12 Black.married, AF spouse present 1.196** 0.365
13 Black.married, spouse absent 1.092** 0.154
14 Black.widowed 0.908** 0.222
15 Black.divorced 0.803** 0.064
16 Black.separated 1.498** 0.335
17 Black.never married 0.943** 0.165
18 Amer Indian.married, AF spouse present 0.549 1.241
19 Amer Indian.married, spouse absent 2.055** 0.672
20 Amer Indian.widowed 0.485 0.505
21 Amer Indian.divorced 1.093** 0.281
22 Amer Indian.separated 1.027** 0.282
23 Amer Indian.never married 2.824** 0.206
24 Asian.married, AF spouse present 0.966** 0.368
25 Asian.married, spouse absent 1.386** 0.182
26 Asian.widowed 0.001 0.274
27 Asian.divorced -0.446** 0.153
28 Asian.separated 0.081 0.189
29 Asian.never married 0.649** 0.199
30 mean(age) -0.116* 0.055
maritl
31 never married -21.517** 3.669
age.maritl
32 age.married, AF spouse present 0.143 0.132
33 age.married, spouse absent 0.221* 0.088
34 age.widowed 0.154 0.093
35 age.divorced 0.181* 0.084
36 age.separated 0.163 0.088
37 age.never married 0.459** 0.099
38 mean(ernval) -0.000 0.000
maritl
39 married, AF spouse present -12.128* 5.495
40 widowed -11.885 6.322
41 divorced -11.795** 4.340
42 separated -11.742** 3.697
income.maritl
43 income.married, AF spouse present -0.000 0.000
44 income.married, spouse absent 0.000* 0.000
45 income.widowed 0.000 0.000
46 income.divorced 0.000* 0.000
47 income.separated 0.000 0.000
48 income.never married 0.000** 0.000
49 _cons 15.927** 2.676
----------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
*Bringing in age and income as continuous variables improves the fit a lot, but does not expand the number of cells beyond 56.
Goodness-of-fit chi2 = 31.61457
Prob > chi2(7) = 0.0000
. clear all
. exit, clear