linear model as matrix operation

-------------------------------------------------------------------------------------------

log type: text

opened on: 25 Nov 2003, 23:33:52

. use LA_intermar.dta, clear

. *This is the data from HW2, Los Angeles intermarriage data.

. tabulate husb wife [fweight=count]

| wife

husb | AllOth Black Mex OthHisp White | Total

-----------+-------------------------------------------------------+----------

AllOth | 1,022 19 78 18 360 | 1,497

Black | 42 4,074 63 32 215 | 4,426

Mex | 95 25 3,947 143 1,009 | 5,219

OthHisp | 18 16 132 239 304 | 709

White | 492 103 1,156 373 28,453 | 30,577

-----------+-------------------------------------------------------+----------

Total | 1,669 4,237 5,376 805 30,341 | 42,428

. *let's make a simple model here.

. *Independence Model plus one term for racial endogamy.

. gen byte endogamy_simple=0

. replace endogamy_simple=1 if husb==wife

(5 real changes made)

. sort wife husb

. edit

- preserve

. *For this exercise Sorting is important in order to be able to visualize the X matrix,

> sorting has no effect on any of the results...

. desmat: poisson count husb wife endogamy_simple

-----------------------------------------------------------------------------------------

Poisson regression

-----------------------------------------------------------------------------------------

Dependent variable count

Optimization: ml

Number of observations: 25

Initial log likelihood: -80138.505

Log likelihood: -1405.264

LR chi square: 157466.482

Model degrees of freedom: 9

Pseudo R-squared: 0.982

Prob: 0.000

-----------------------------------------------------------------------------------------

nr Effect Coeff s.e.

-----------------------------------------------------------------------------------------

count

husb

1 Black 0.822** 0.039

2 Mex 0.796** 0.039

3 OthHisp -0.466** 0.054

4 White 1.848** 0.035

wife

5 Black 0.388** 0.038

6 Mex 0.645** 0.037

7 OthHisp -0.492** 0.050

8 White 1.507** 0.033

endogamy_simple

9 1 2.850** 0.017

10 _cons 4.061** 0.030

-----------------------------------------------------------------------------------------

* p < .05

** p < .01

. poisgof

Goodness-of-fit chi2 = 2632.715

Prob > chi2(15) = 0.0000

. predict M1_pred

(option n assumed; predicted number of events)

. ereturn list

scalars:

e(rc) = 0

e(ll) = -1405.263996403603

e(rank) = 10

e(df_m) = 9

e(ll_0) = -80138.5049037998

e(chi2) = 157466.4818147924

e(p) = 0

e(k) = 10

e(k_eq) = 1

e(k_dv) = 1

e(ic) = 6

e(N) = 25

e(r2_p) = .9824645593514564

macros:

e(cmd) : "poisson"

e(predict) : "poisso_p"

e(gof) : "poiss_g"

e(opt) : "ml"

e(title) : "Poisson regression"

e(depvar) : "count"

e(user) : "poiss_lf"

e(chi2type) : "LR"

e(crittype) : "log likelihood"

e(technique) : "nr"

matrices:

e(b) : 1 x 10

e(V) : 10 x 10

e(ilog) : 1 x 20

functions:

e(sample)

. matrix Betas=e(b)

. mkmat _x*, matrix (my_X_matrix)

*mkmat takes variables, in this case the 9 _x dummy variabels that desmat created,

and makes a matrix out of them.

. matrix dir

my_X_matrix[25,9]

Betas[1,10]

. matrix list my_X_matrix

my_X_matrix[25,9]

_x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8 _x_9

r1 0 0 0 0 0 0 0 0 1

r2 1 0 0 0 0 0 0 0 0

r3 0 1 0 0 0 0 0 0 0

r4 0 0 1 0 0 0 0 0 0

r5 0 0 0 1 0 0 0 0 0

r6 0 0 0 0 1 0 0 0 0

r7 1 0 0 0 1 0 0 0 1

r8 0 1 0 0 1 0 0 0 0

r9 0 0 1 0 1 0 0 0 0

r10 0 0 0 1 1 0 0 0 0

r11 0 0 0 0 0 1 0 0 0

r12 1 0 0 0 0 1 0 0 0

r13 0 1 0 0 0 1 0 0 1

r14 0 0 1 0 0 1 0 0 0

r15 0 0 0 1 0 1 0 0 0

r16 0 0 0 0 0 0 1 0 0

r17 1 0 0 0 0 0 1 0 0

r18 0 1 0 0 0 0 1 0 0

r19 0 0 1 0 0 0 1 0 1

r20 0 0 0 1 0 0 1 0 0

r21 0 0 0 0 0 0 0 1 0

r22 1 0 0 0 0 0 0 1 0

r23 0 1 0 0 0 0 0 1 0

r24 0 0 1 0 0 0 0 1 0

r25 0 0 0 1 0 0 0 1 1

. *The X Matrix has 25 rows, one for each observation in the data set, but only 9 columns

But Beta vector has 10 entries. We need a 10th column for the X matrix, corresponding to the constant term.

. *The thing to add is a column of ones, because the constant term applies to every cell.

. matrix dir

my_X_matrix[25,9]

Betas[1,10]

. matrix my_X_matrix=my_X_matrix,J(25,1,1)

. matrix list my_X_matrix

my_X_matrix[25,10]

_x_1 _x_2 _x_3 _x_4 _x_5 _x_6 _x_7 _x_8 _x_9 c1

r1 0 0 0 0 0 0 0 0 1 1

r2 1 0 0 0 0 0 0 0 0 1

r3 0 1 0 0 0 0 0 0 0 1

r4 0 0 1 0 0 0 0 0 0 1

r5 0 0 0 1 0 0 0 0 0 1

r6 0 0 0 0 1 0 0 0 0 1

r7 1 0 0 0 1 0 0 0 1 1

r8 0 1 0 0 1 0 0 0 0 1

r9 0 0 1 0 1 0 0 0 0 1

r10 0 0 0 1 1 0 0 0 0 1

r11 0 0 0 0 0 1 0 0 0 1

r12 1 0 0 0 0 1 0 0 0 1

r13 0 1 0 0 0 1 0 0 1 1

r14 0 0 1 0 0 1 0 0 0 1

r15 0 0 0 1 0 1 0 0 0 1

r16 0 0 0 0 0 0 1 0 0 1

r17 1 0 0 0 0 0 1 0 0 1

r18 0 1 0 0 0 0 1 0 0 1

r19 0 0 1 0 0 0 1 0 1 1

r20 0 0 0 1 0 0 1 0 0 1

r21 0 0 0 0 0 0 0 1 0 1

r22 1 0 0 0 0 0 0 1 0 1

r23 0 1 0 0 0 0 0 1 0 1

r24 0 0 1 0 0 0 0 1 0 1

r25 0 0 0 1 0 0 0 1 1 1

. *Now our 25x10 X matrix is ready to multiply with our 10x1 column vector of Betas, to

> yield a 25x1 column vector of predicted log counts, which we could then exponentiate to

> get our predicted values.

. matrix dir

my_X_matrix[25,10]

Betas[1,10]

. *Only thing we need to do is transpose the Beta vector. I don't know why Stata produces a row,

but column vector is needed here.

. matrix Betas=Betas'

. matrix dir

Betas[10,1]

my_X_matrix[25,10]

. matrix XB=my_X_matrix*Betas

. matrix dir

XB[25,1]

Betas[10,1]

my_X_matrix[25,10]

. svmat XB, names(Pred_log_count)

*svmat translates a matrix or vector into a variable in your dataset.

. gen Pred_count=exp( Pred_log_count1)

. table husb wife, contents(sum count sum M1_pred sum Pred_count)

------------------------------------------------------------

| wife

husb | AllOth Black Mex OthHisp White

----------+-------------------------------------------------

AllOth | 1022 19 78 18 360

| 1003.61 85.49981 110.6361 35.466 261.7885

| 1003.61 85.49979 110.6361 35.466 261.7886

Black | 42 4074 63 32 215

| 132.0576 3365.631 251.7948 80.71646 595.8

| 132.0576 3365.63 251.7948 80.71646 595.8

Mex | 95 25 3947 143 1009

| 128.6253 189.5302 4241.911 78.61861 580.315

| 128.6253 189.5302 4241.91 78.6186 580.3151

OthHisp | 18 16 132 239 304

| 36.42676 53.67503 69.45509 385.0976 164.3455

| 36.42675 53.67504 69.45508 385.0977 164.3455

White | 492 103 1156 373 28453

| 368.2808 542.6639 702.2031 225.1013 28738.75

| 368.2809 542.6639 702.2031 225.1013 28738.75

------------------------------------------------------------

. *What we have here is actual counts on top, followed by Stata's predicted counts,

followed by the predicted counts we produced by our own little Matrix operations.

I just wanted to show that that is the way it's actually done.

. log close

log type: text

closed on: 26 Nov 2003, 00:03:39

-----------------------------------------------------------------------------------------