--------------------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_388_notes\Soc_388_2007\second_class_note
> s.log
log type: text
opened on: 27 Sep 2007, 11:22:49
. edit
(1 var, 1 obs pasted into editor)
(1 var, 1 obs pasted into editor)
(1 var, 1 obs pasted into editor)
. edit
(3 vars, 4 obs pasted into editor)
- preserve
. edit
- preserve
. save "C:\AAA Miker Files\newer web pages\soc_388_notes\Soc_388_2007\frogs.dta"
file C:\AAA Miker Files\newer web pages\soc_388_notes\Soc_388_2007\frogs.dta saved
. exit, clear
-----------------------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_388_notes\Soc_388_2007\second_class_notes.log
log type: text
opened on: 27 Sep 2007, 12:16:15
*Note (all my comments will start with an asterisk). I find it is always better to make stata logs in the .log format, which is a simple text format, rather than the .smcl format which is marked up and can only be read by a Stata-wise editor.
. *Here's a lesson for students: I quit stata and restarted and forgot to enable the log again, so here is my redo. The notes will not be word-for-word what I had in class because I have had to retype everything.
. use "C:\AAA Miker Files\newer web pages\soc_388_notes\Soc_388_2007\frogs.dta", clear
. *open the data set by copying into the stata data editor from excel, or by opening the data directly
. *first model, constant only
. poisson count
Iteration 0: log likelihood = -14.328367
Iteration 1: log likelihood = -14.328367 (backed up)
Poisson regression Number of obs = 4
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -14.328367 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 2.931194 .1154701 25.38 0.000 2.704877 3.157511
------------------------------------------------------------------------------
. *one term in the model
. poisgof
Goodness-of-fit chi2 = 9.822078
Prob > chi2(3) = 0.0201
. *3 terms left over for goodness of fit, a subject that I will be explaining more next class
. *where does this model fit the data?
. predict constant_only
(option n assumed; predicted number of events)
. table color live [iweight=count], row col
-------------------------------
| live
Color | Lilly Water Total
----------+--------------------
Blue | 23 27 50
Green | 10 15 25
|
Total | 33 42 75
-------------------------------
. table live color [iweight=count], row col
-------------------------------
| Color
live | Blue Green Total
----------+--------------------
Lilly | 23 10 33
Water | 27 15 42
|
Total | 50 25 75
-------------------------------
. *That's our actual dataset
. *now the predicted values
. table live color [iweight= constant_only], row col
-------------------------------
| Color
live | Blue Green Total
----------+--------------------
Lilly | 18.75 18.75 37.5
Water | 18.75 18.75 37.5
|
Total | 37.5 37.5 75
-------------------------------
. *one term, gets the total number of frogs right (75) and assumes every cell has the same count.
. *now on to the second model, which is the independence model
. set linesize 79
*Setting linesize is useful when using desmat because otherwise desmat will fill the results window.
. desmat: poisson count live color
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 4
Initial log likelihood: -14.328
Log likelihood: -9.540
LR chi square: 9.578
Model degrees of freedom: 2
Pseudo R-squared: 0.334
Prob: 0.008
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
live
1 Water 0.241 0.233
color
2 Green -0.693** 0.245
3 _cons 3.091** 0.192
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = .2445188
Prob > chi2(1) = 0.6210
. *this model uses 3 terms, has 1 df left over
. predict independence_model
(option n assumed; predicted number of events)
. table live color [iweight= count], row col
-------------------------------
| Color
live | Blue Green Total
----------+--------------------
Lilly | 23 10 33
Water | 27 15 42
|
Total | 50 25 75
-------------------------------
. table live color [iweight= independence_model], row col
-------------------------------
| Color
live | Blue Green Total
----------+--------------------
Lilly | 22 11 33
Water | 28 14 42
|
Total | 50 25 75
-------------------------------
. *these are the predicted values of our independence model, just as we calculated them by hand in Excel. Note that the model fits the marginal distributions of color and live exactly.
. *note also that the independence model is fundamentally a multiplicative model. Loglinear models have a multiplicative interpretation always present, see my notes.
. *final model:
. desmat: poisson count color*live
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 4
Initial log likelihood: -14.328
Log likelihood: -9.417
LR chi square: 9.822
Model degrees of freedom: 3
Pseudo R-squared: 0.343
Prob: 0.020
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
color
1 Green -0.833* 0.379
live
2 Water 0.160 0.284
color.live
3 Green.Water 0.245 0.497
4 _cons 3.135** 0.209
-------------------------------------------------------------------------------
* p < .05
** p < .01
. *Key point: the interaction term is the log odds ratio (.245) with standard error that we calculated by hand in excel.
. *note that Stata has a built-in way to generate dummy variables, xi, which for simple models works just as well as desmat
. xi: poisson count i.color*i.live
i.color _Icolor_1-2 (_Icolor_1 for color==Blue omitted)
i.live _Ilive_1-2 (_Ilive_1 for live==Lilly omitted)
i.color*i.live _IcolXliv_#_# (coded as above)
Iteration 0: log likelihood = -9.417463
Iteration 1: log likelihood = -9.4173319
Iteration 2: log likelihood = -9.4173319
Poisson regression Number of obs = 4
LR chi2(3) = 9.82
Prob > chi2 = 0.0201
Log likelihood = -9.4173319 Pseudo R2 = 0.3427
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icolor_2 | -.8329091 .3787852 -2.20 0.028 -1.575315 -.0905037
_Ilive_2 | .1603427 .2837522 0.57 0.572 -.3958014 .7164867
_IcolXliv_~2 | .2451225 .497174 0.49 0.622 -.7293206 1.219566
_cons | 3.135494 .2085144 15.04 0.000 2.726813 3.544175
------------------------------------------------------------------------------
. poisgof
Goodness-of-fit chi2 = 7.95e-06
Prob > chi2(0) = .
. *same interaction term, same model..
. *what do the dummy variables look like? take a look at the data browser...
. *The log will be automatically updated and saved as you work (if you are smart enough to remember to open it), whereas the dataset will only be saved if you choose to save it.
. *here i have made additions to the frog dataset that I don't particularly care to save.
. exit, clear