. use "C:\Users\mexmi\Documents\current class files\intro soc methods\Soc 382 stuff\Soc 382 event
> history\HCMST file for class, single observation per couple v2.dta", clear
( )
. *class begins here
*resetting the single observation stset command with the id option, to get ready for multiple cases per observation.
. stset relationship_duration_at_end, failure(ever_broke_up_w2345==1) enter(time how_long_relationship_w1) id( caseid_new)
id: caseid_new
failure event: ever_broke_up_w2345 == 1
obs. time interval: (relationship_duration_at_end[_n-1], relationship_duration_at_end]
enter on or after: time how_long_relationship_w1
exit on or before: failure
------------------------------------------------------------------------------
2670 total observations
33 event time missing (relationship_duration_at_end>=.) PROBABLE ERROR
------------------------------------------------------------------------------
2637 observations remaining, representing
2637 subjects
494 failures in single-failure-per-subject data
8265.833 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 73.33334
. drop case_duration
*stsplit is the key command that turns our 2670 observation dataset into a 100K observation dataset.
. stsplit case_duration, every(0.083334) after (how_long_relationship_w1)
(96553 observations (episodes) created)
*0.08333 is 1/12, or in other words a month.
. gen age_tv=ppage+case_duration
(33 missing values generated)
. gen how_long_relationship_tv=how_long_relationship+case_duration
(33 missing values generated)
. list caseid_new case_duration ppage age_tv how_long_relationship_tv ever_broke_up_w2345 _d if ca
> seid_new==26315
+-------------------------------------------------------------------+
| caseid~w case_d~n ppage age_tv how_lo~v eve~2345 _d |
|-------------------------------------------------------------------|
103. | 26315 0 31 31 8 . 0 |
104. | 26315 .083334 31 31.08333 8.083334 . 0 |
105. | 26315 .166668 31 31.16667 8.166668 . 0 |
106. | 26315 .250002 31 31.25 8.250002 . 0 |
107. | 26315 .333336 31 31.33334 8.333336 . 0 |
|-------------------------------------------------------------------|
108. | 26315 .41667 31 31.41667 8.41667 . 0 |
109. | 26315 .500004 31 31.5 8.500004 . 0 |
110. | 26315 .583338 31 31.58334 8.583338 . 0 |
111. | 26315 .666672 31 31.66667 8.666672 1 1 |
|
|
* I should add that although it is easy enough to generate the time dependency of age and relationship duration by period in this dataset, we don’t actually *Need* to do this. stcox regress has a tvc option that will allow you to take advantage of time varying covariates that change in a predictable way without expanding the dataset to the vastly larger couple-period version. What cannot be done any other way other than in a person-period dataset, however, is the time varying covariates like marriage and cohabitation and the presence of children that change in unpredictable ways.
. clear all
. use "C:\Users\mexmi\Documents\current class files\intro soc methods\Soc 382 stuff\Soc 382 event history\wave1 to 5 combined for sharing with class stset v3 reduced vars.dta", clear
( )
. stcox resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv
failure _d: broke_up_tv_v2
analysis time _t: how_long_relationship_tv
enter on or after: case_duration==1
id: caseid_new
Iteration 0: log likelihood = -2567.7135
Iteration 1: log likelihood = -2418.7815
Iteration 2: log likelihood = -2413.966
Iteration 3: log likelihood = -2413.9618
Refining estimates:
Iteration 0: log likelihood = -2413.9618
Cox regression -- Breslow method for ties
No. of subjects = 2593 Number of obs = 116677
No. of failures = 482
Time at risk = 9723.033187
LR chi2(6) = 307.50
Log likelihood = -2413.9618 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
------------------------+----------------------------------------------------------------
resp_col_dgre_tv | .8951854 .0870987 -1.14 0.255 .7397651 1.083259
married_tv | .3466575 .0493536 -7.44 0.000 .2622499 .4582327
coresident_tv | .274713 .0341612 -10.39 0.000 .2152931 .3505323
children_in_hh_tv | 1.117576 .0524007 2.37 0.018 1.01945 1.225146
met_online_augmented | 1.102708 .1311427 0.82 0.411 .8734317 1.392169
ln_hhinc_2009dollars_tv | .8729056 .0472785 -2.51 0.012 .7849904 .9706669
-----------------------------------------------------------------------------------------
. stcurve, hazard
* Shows the plot of the hazard function of breakup by relationship duration, controlling for the covariates in the cox model.
. logit broke_up_tv_v2 how_long_relationship_tv inv_relationship_duration_tv resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv, or
Iteration 0: log likelihood = -3137.4236
Iteration 1: log likelihood = -2922.1718
Iteration 2: log likelihood = -2760.0368
Iteration 3: log likelihood = -2688.6227
Iteration 4: log likelihood = -2642.293
Iteration 5: log likelihood = -2641.5022
Iteration 6: log likelihood = -2641.5016
Iteration 7: log likelihood = -2641.5016
Logistic regression Number of obs = 119270
LR chi2(8) = 991.84
Prob > chi2 = 0.0000
Log likelihood = -2641.5016 Pseudo R2 = 0.1581
----------------------------------------------------------------------------------------------
broke_up_tv_v2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
how_long_relationship_tv | .9498672 .0061788 -7.91 0.000 .9378337 .962055
inv_relationship_duration_tv | 1.087082 .027266 3.33 0.001 1.034934 1.141858
resp_col_dgre_tv | .8729314 .08529 -1.39 0.164 .7207974 1.057175
married_tv | .3367909 .0471636 -7.77 0.000 .2559526 .4431606
coresident_tv | .2366521 .0282098 -12.09 0.000 .1873459 .2989349
children_in_hh_tv | 1.126575 .053512 2.51 0.012 1.026428 1.236494
met_online_augmented | 1.154389 .1377265 1.20 0.229 .9136883 1.4585
ln_hhinc_2009dollars_tv | .8737686 .0477246 -2.47 0.013 .7850629 .9724974
_cons | .1526157 .0882817 -3.25 0.001 .0491151 .4742238
----------------------------------------------------------------------------------------------
* Key point: if the logit model has good and appropriate controls for the time axis, in this case relationship duration, the odds ratios in this discrete time model should be the same as the hazard ratios in the Cox proportional hazard model.
. logit broke_up_tv_v2 how_long_relationship_tv inv_relationship_duration_tv resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv, or cluster(caseid_new)
Iteration 0: log pseudolikelihood = -3137.4236
Iteration 1: log pseudolikelihood = -2922.1718
Iteration 2: log pseudolikelihood = -2760.0368
Iteration 3: log pseudolikelihood = -2688.6227
Iteration 4: log pseudolikelihood = -2642.293
Iteration 5: log pseudolikelihood = -2641.5022
Iteration 6: log pseudolikelihood = -2641.5016
Iteration 7: log pseudolikelihood = -2641.5016
Logistic regression Number of obs = 119270
Wald chi2(8) = 1013.48
Prob > chi2 = 0.0000
Log pseudolikelihood = -2641.5016 Pseudo R2 = 0.1581
(Std. Err. adjusted for 2593 clusters in caseid_new)
----------------------------------------------------------------------------------------------
| Robust
broke_up_tv_v2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
how_long_relationship_tv | .9498672 .0072971 -6.70 0.000 .9356723 .9642774
inv_relationship_duration_tv | 1.087082 .0148074 6.13 0.000 1.058444 1.116495
resp_col_dgre_tv | .8729314 .0897432 -1.32 0.186 .7136264 1.067799
married_tv | .3367909 .0512634 -7.15 0.000 .2499181 .4538611
coresident_tv | .2366521 .0344213 -9.91 0.000 .1779517 .3147159
children_in_hh_tv | 1.126575 .0513125 2.62 0.009 1.030363 1.231771
met_online_augmented | 1.154389 .1518652 1.09 0.275 .8920162 1.493935
ln_hhinc_2009dollars_tv | .8737686 .0500412 -2.36 0.018 .780994 .9775639
_cons | .1526157 .0926196 -3.10 0.002 .0464538 .5013921
----------------------------------------------------------------------------------------------
* I talked a little about the relevance of clustering the observations by couple, because multiple observations within couples are not really independent. In this case clustering makes hardly any change to the output, but we do lose full likelihood maximization, which is bad.
. stcox met_online_augmented
failure _d: broke_up_tv_v2
analysis time _t: how_long_relationship_tv
enter on or after: case_duration==1
id: caseid_new
Iteration 0: log likelihood = -2567.7135
Iteration 1: log likelihood = -2567.7073
Iteration 2: log likelihood = -2567.7073
Refining estimates:
Iteration 0: log likelihood = -2567.7073
Cox regression -- Breslow method for ties
No. of subjects = 2593 Number of obs = 116677
No. of failures = 482
Time at risk = 9723.033187
LR chi2(1) = 0.01
Log likelihood = -2567.7073 Prob > chi2 = 0.9116
--------------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------------+----------------------------------------------------------------
met_online_augmented | 1.013255 .1200004 0.11 0.911 .8033608 1.277989
--------------------------------------------------------------------------------------
. stcurve, survival
*If we run a Cox model with no significant covariates, then we generate the post-Cox survival curve, it is a LOT like the Kaplan-Meier curve below. Survival curves are monotonically decreasing.
. sts graph
failure _d: broke_up_tv_v2
analysis time _t: how_long_relationship_tv
enter on or after: case_duration==1
id: caseid_new
. sts list, at(0 1 2 3 4 5 10 20 30)
failure _d: broke_up_tv_v2
analysis time _t: how_long_relationship_tv
enter on or after: case_duration==1
id: caseid_new
Beg. Survivor Std.
Time Total Fail Function Error [95% Conf. Int.]
-------------------------------------------------------------------------------
0 0 0 1.0000 . . .
1 127 64 0.3572 0.0543 0.2529 0.4626
2 180 61 0.2378 0.0384 0.1669 0.3158
3 200 52 0.1838 0.0304 0.1286 0.2468
4 150 31 0.1541 0.0259 0.1074 0.2085
5 275 28 0.1402 0.0237 0.0977 0.1903
10 332 117 0.1000 0.0172 0.0695 0.1368
20 189 94 0.0746 0.0131 0.0517 0.1029
30 117 31 0.0629 0.0112 0.0434 0.0872
-------------------------------------------------------------------------------
Note: Survivor function is calculated over full data and evaluated at
indicated times; it is not calculated from aggregates shown at left.
* sts list just helps you identify exact values in the sts graph.
. log close
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\Soc_382\logs\prep for 2nd soc 382 event hist
> ory class.log
log type: text
closed on: 21 Feb 2019, 12:24:19
--------------------------------------------------------------------------------------------------