-------------------------------------------------------------------------------------------
log type: text
opened on: 25 Nov 2003, 23:33:52
. use LA_intermar.dta, clear
. *This is the data from HW2, Los Angeles intermarriage data.
. tabulate husb wife [fweight=count]
| wife
husb | AllOth Black Mex OthHisp White | Total
-----------+-------------------------------------------------------+----------
AllOth | 1,022 19 78 18 360 | 1,497
Black | 42 4,074 63 32 215 | 4,426
Mex | 95 25 3,947 143 1,009 | 5,219
OthHisp | 18 16 132 239 304 | 709
White | 492 103 1,156 373 28,453 | 30,577
-----------+-------------------------------------------------------+----------
Total | 1,669 4,237 5,376 805 30,341 | 42,428
. *let's make a simple model here.
. *Independence Model plus one term for racial endogamy.
. gen byte endogamy_simple=0
. replace endogamy_simple=1 if husb==wife
(5 real changes made)
. sort wife husb
. edit
- preserve
. *For this exercise Sorting is important in order to be able to visualize the X matrix,
> sorting has no effect on any of the results...
. desmat: poisson count husb wife endogamy_simple
-----------------------------------------------------------------------------------------
Poisson regression
-----------------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -1405.264
LR chi square: 157466.482
Model degrees of freedom: 9
Pseudo R-squared: 0.982
Prob: 0.000
-----------------------------------------------------------------------------------------
nr Effect Coeff s.e.
-----------------------------------------------------------------------------------------
count
husb
1 Black 0.822** 0.039
2 Mex 0.796** 0.039
3 OthHisp -0.466** 0.054
4 White 1.848** 0.035
wife
5 Black 0.388** 0.038
6 Mex 0.645** 0.037
7 OthHisp -0.492** 0.050
8 White 1.507** 0.033
endogamy_simple
9 1 2.850** 0.017
10 _cons 4.061** 0.030
-----------------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 2632.715
Prob > chi2(15) = 0.0000
. predict M1_pred
(option n assumed; predicted number of events)
. ereturn list
scalars:
e(rc) = 0
e(ll) = -1405.263996403603
e(rank) = 10
e(df_m) = 9
e(ll_0) = -80138.5049037998
e(chi2) = 157466.4818147924
e(p) = 0
e(k) = 10
e(k_eq) = 1
e(k_dv) = 1
e(ic) = 6
e(N) = 25
e(r2_p) = .9824645593514564
macros:
e(cmd) : "poisson"
e(predict) : "poisso_p"
e(gof) : "poiss_g"
e(opt) : "ml"
e(title) : "Poisson regression"
e(depvar) : "count"
e(user) : "poiss_lf"
e(chi2type) : "LR"
e(crittype) : "log likelihood"
e(technique) : "nr"
matrices:
e(b) : 1 x 10
e(V) : 10 x 10
e(ilog) : 1 x 20
functions:
e(sample)
. matrix Betas=e(b)
. mkmat _x*, matrix (my_X_matrix)
*mkmat takes variables, in this case the 9 _x dummy variabels that desmat created,
and makes a matrix out of them.
. matrix dir
my_X_matrix[25,9]
Betas[1,10]
. matrix list my_X_matrix
my_X_matrix[25,9]
_x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8 _x_9
r1 0 0 0 0 0 0 0 0 1
r2 1 0 0 0 0 0 0 0 0
r3 0 1 0 0 0 0 0 0 0
r4 0 0 1 0 0 0 0 0 0
r5 0 0 0 1 0 0 0 0 0
r6 0 0 0 0 1 0 0 0 0
r7 1 0 0 0 1 0 0 0 1
r8 0 1 0 0 1 0 0 0 0
r9 0 0 1 0 1 0 0 0 0
r10 0 0 0 1 1 0 0 0 0
r11 0 0 0 0 0 1 0 0 0
r12 1 0 0 0 0 1 0 0 0
r13 0 1 0 0 0 1 0 0 1
r14 0 0 1 0 0 1 0 0 0
r15 0 0 0 1 0 1 0 0 0
r16 0 0 0 0 0 0 1 0 0
r17 1 0 0 0 0 0 1 0 0
r18 0 1 0 0 0 0 1 0 0
r19 0 0 1 0 0 0 1 0 1
r20 0 0 0 1 0 0 1 0 0
r21 0 0 0 0 0 0 0 1 0
r22 1 0 0 0 0 0 0 1 0
r23 0 1 0 0 0 0 0 1 0
r24 0 0 1 0 0 0 0 1 0
r25 0 0 0 1 0 0 0 1 1
. *The X Matrix has 25 rows, one for each observation in the data set, but only 9 columns
But Beta vector has 10 entries. We need a 10th column for the X matrix, corresponding to the constant term.
. *The thing to add is a column of ones, because the constant term applies to every cell.
. matrix dir
my_X_matrix[25,9]
Betas[1,10]
. matrix my_X_matrix=my_X_matrix,J(25,1,1)
. matrix list my_X_matrix
my_X_matrix[25,10]
_x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8 _x_9 c1
r1 0 0 0 0 0 0 0 0 1 1
r2 1 0 0 0 0 0 0 0 0 1
r3 0 1 0 0 0 0 0 0 0 1
r4 0 0 1 0 0 0 0 0 0 1
r5 0 0 0 1 0 0 0 0 0 1
r6 0 0 0 0 1 0 0 0 0 1
r7 1 0 0 0 1 0 0 0 1 1
r8 0 1 0 0 1 0 0 0 0 1
r9 0 0 1 0 1 0 0 0 0 1
r10 0 0 0 1 1 0 0 0 0 1
r11 0 0 0 0 0 1 0 0 0 1
r12 1 0 0 0 0 1 0 0 0 1
r13 0 1 0 0 0 1 0 0 1 1
r14 0 0 1 0 0 1 0 0 0 1
r15 0 0 0 1 0 1 0 0 0 1
r16 0 0 0 0 0 0 1 0 0 1
r17 1 0 0 0 0 0 1 0 0 1
r18 0 1 0 0 0 0 1 0 0 1
r19 0 0 1 0 0 0 1 0 1 1
r20 0 0 0 1 0 0 1 0 0 1
r21 0 0 0 0 0 0 0 1 0 1
r22 1 0 0 0 0 0 0 1 0 1
r23 0 1 0 0 0 0 0 1 0 1
r24 0 0 1 0 0 0 0 1 0 1
r25 0 0 0 1 0 0 0 1 1 1
. *Now our 25x10 X matrix is ready to multiply with our 10x1 column vector of Betas, to
> yield a 25x1 column vector of predicted log counts, which we could then exponentiate to
> get our predicted values.
. matrix dir
my_X_matrix[25,10]
Betas[1,10]
. *Only thing we need to do is transpose the Beta vector. I don't know why Stata produces a row,
but column vector is needed here.
. matrix Betas=Betas'
. matrix dir
Betas[10,1]
my_X_matrix[25,10]
. matrix XB=my_X_matrix*Betas
. matrix dir
XB[25,1]
Betas[10,1]
my_X_matrix[25,10]
. svmat XB, names(Pred_log_count)
*svmat translates a matrix or vector into a variable in your dataset.
. gen Pred_count=exp( Pred_log_count1)
. table husb wife, contents(sum count sum M1_pred sum Pred_count)
------------------------------------------------------------
| wife
husb | AllOth Black Mex OthHisp White
----------+-------------------------------------------------
AllOth | 1022 19 78 18 360
| 1003.61 85.49981 110.6361 35.466 261.7885
| 1003.61 85.49979 110.6361 35.466 261.7886
|
Black | 42 4074 63 32 215
| 132.0576 3365.631 251.7948 80.71646 595.8
| 132.0576 3365.63 251.7948 80.71646 595.8
|
Mex | 95 25 3947 143 1009
| 128.6253 189.5302 4241.911 78.61861 580.315
| 128.6253 189.5302 4241.91 78.6186 580.3151
|
OthHisp | 18 16 132 239 304
| 36.42676 53.67503 69.45509 385.0976 164.3455
| 36.42675 53.67504 69.45508 385.0977 164.3455
|
White | 492 103 1156 373 28453
| 368.2808 542.6639 702.2031 225.1013 28738.75
| 368.2809 542.6639 702.2031 225.1013 28738.75
------------------------------------------------------------
. *What we have here is actual counts on top, followed by Stata's predicted counts,
followed by the predicted counts we produced by our own little Matrix operations.
I just wanted to show that that is the way it's actually done.
. log close
log type: text
closed on: 26 Nov 2003, 00:03:39
-----------------------------------------------------------------------------------------