. use "C:\Users\mexmi\Documents\current class files\intro soc methods\Soc 382 stuff\Soc 382 event

> history\HCMST file for class, single observation per couple v2.dta", clear

( )

 

. *class begins here

 

*resetting the single observation stset command with the id option, to get ready for multiple cases per observation.

 

. stset relationship_duration_at_end, failure(ever_broke_up_w2345==1) enter(time how_long_relationship_w1) id( caseid_new)

 

                id:  caseid_new

     failure event:  ever_broke_up_w2345 == 1

obs. time interval:  (relationship_duration_at_end[_n-1], relationship_duration_at_end]

 enter on or after:  time how_long_relationship_w1

 exit on or before:  failure

 

------------------------------------------------------------------------------

     2670  total observations

       33  event time missing (relationship_duration_at_end>=.) PROBABLE ERROR

------------------------------------------------------------------------------

     2637  observations remaining, representing

     2637  subjects

      494  failures in single-failure-per-subject data

 8265.833  total analysis time at risk and under observation

                                              at risk from t =         0

                                   earliest observed entry t =         0

                                        last observed exit t =  73.33334

 

. drop case_duration

 

*stsplit is the key command that turns our 2670 observation dataset into a 100K observation dataset.

 

. stsplit case_duration, every(0.083334) after (how_long_relationship_w1)

(96553 observations (episodes) created)

 

*0.08333 is 1/12, or in other words a month.

 

. gen age_tv=ppage+case_duration

(33 missing values generated)

 

. gen how_long_relationship_tv=how_long_relationship+case_duration

(33 missing values generated)

 

. list caseid_new case_duration ppage age_tv how_long_relationship_tv ever_broke_up_w2345 _d if ca

> seid_new==26315

 

       +-------------------------------------------------------------------+

       | caseid~w   case_d~n   ppage     age_tv   how_lo~v   eve~2345   _d |

       |-------------------------------------------------------------------|

  103. |    26315          0      31         31          8          .    0 |

  104. |    26315    .083334      31   31.08333   8.083334          .    0 |

  105. |    26315    .166668      31   31.16667   8.166668          .    0 |

  106. |    26315    .250002      31      31.25   8.250002          .    0 |

  107. |    26315    .333336      31   31.33334   8.333336          .    0 |

       |-------------------------------------------------------------------|

  108. |    26315     .41667      31   31.41667    8.41667          .    0 |

  109. |    26315    .500004      31       31.5   8.500004          .    0 |

  110. |    26315    .583338      31   31.58334   8.583338          .    0 |

  111. |    26315    .666672      31   31.66667   8.666672          1    1 |

 

 

 

* I should add that although it is easy enough to generate the time dependency of age and relationship duration by period in this dataset, we don’t actually *Need* to do this. stcox regress has a tvc option that will allow you to take advantage of time varying covariates that change in a predictable way without expanding the dataset to the vastly larger couple-period version. What cannot be done any other way other than in a person-period dataset, however, is the time varying covariates like marriage and cohabitation and the presence of children that change in unpredictable ways.

 

. clear all

 

. use "C:\Users\mexmi\Documents\current class files\intro soc methods\Soc 382 stuff\Soc 382 event history\wave1 to 5 combined for sharing with class stset v3 reduced vars.dta", clear

( )

 

. stcox resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv

 

         failure _d:  broke_up_tv_v2

   analysis time _t:  how_long_relationship_tv

  enter on or after:  case_duration==1

                 id:  caseid_new

 

Iteration 0:   log likelihood = -2567.7135

Iteration 1:   log likelihood = -2418.7815

Iteration 2:   log likelihood =  -2413.966

Iteration 3:   log likelihood = -2413.9618

Refining estimates:

Iteration 0:   log likelihood = -2413.9618

 

Cox regression -- Breslow method for ties

 

No. of subjects =         2593                     Number of obs   =    116677

No. of failures =          482

Time at risk    =  9723.033187

                                                   LR chi2(6)      =    307.50

Log likelihood  =   -2413.9618                     Prob > chi2     =    0.0000

 

-----------------------------------------------------------------------------------------

                     _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

------------------------+----------------------------------------------------------------

       resp_col_dgre_tv |   .8951854   .0870987    -1.14   0.255     .7397651    1.083259

             married_tv |   .3466575   .0493536    -7.44   0.000     .2622499    .4582327

          coresident_tv |    .274713   .0341612   -10.39   0.000     .2152931    .3505323

      children_in_hh_tv |   1.117576   .0524007     2.37   0.018      1.01945    1.225146

   met_online_augmented |   1.102708   .1311427     0.82   0.411     .8734317    1.392169

ln_hhinc_2009dollars_tv |   .8729056   .0472785    -2.51   0.012     .7849904    .9706669

-----------------------------------------------------------------------------------------

 

. stcurve, hazard

 

* Shows the plot of the hazard function of breakup by relationship duration, controlling for the covariates in the cox model.

 

. logit broke_up_tv_v2 how_long_relationship_tv inv_relationship_duration_tv resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv, or

 

Iteration 0:   log likelihood = -3137.4236 

Iteration 1:   log likelihood = -2922.1718 

Iteration 2:   log likelihood = -2760.0368 

Iteration 3:   log likelihood = -2688.6227 

Iteration 4:   log likelihood =  -2642.293 

Iteration 5:   log likelihood = -2641.5022 

Iteration 6:   log likelihood = -2641.5016 

Iteration 7:   log likelihood = -2641.5016 

 

Logistic regression                               Number of obs   =     119270

                                                  LR chi2(8)      =     991.84

                                                  Prob > chi2     =     0.0000

Log likelihood = -2641.5016                       Pseudo R2       =     0.1581

 

----------------------------------------------------------------------------------------------

              broke_up_tv_v2 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

    how_long_relationship_tv |   .9498672   .0061788    -7.91   0.000     .9378337     .962055

inv_relationship_duration_tv |   1.087082    .027266     3.33   0.001     1.034934    1.141858

            resp_col_dgre_tv |   .8729314     .08529    -1.39   0.164     .7207974    1.057175

                  married_tv |   .3367909   .0471636    -7.77   0.000     .2559526    .4431606

               coresident_tv |   .2366521   .0282098   -12.09   0.000     .1873459    .2989349

           children_in_hh_tv |   1.126575    .053512     2.51   0.012     1.026428    1.236494

        met_online_augmented |   1.154389   .1377265     1.20   0.229     .9136883      1.4585

     ln_hhinc_2009dollars_tv |   .8737686   .0477246    -2.47   0.013     .7850629    .9724974

                       _cons |   .1526157   .0882817    -3.25   0.001     .0491151    .4742238

----------------------------------------------------------------------------------------------

 

* Key point: if the logit model has good and appropriate controls for the time axis, in this case relationship duration, the odds ratios in this discrete time model should be the same as the hazard ratios in the Cox proportional hazard model.

 

. logit broke_up_tv_v2 how_long_relationship_tv inv_relationship_duration_tv resp_col_dgre_tv married_tv coresident_tv children_in_hh_tv met_online_augmented ln_hhinc_2009dollars_tv, or cluster(caseid_new)

 

Iteration 0:   log pseudolikelihood = -3137.4236 

Iteration 1:   log pseudolikelihood = -2922.1718 

Iteration 2:   log pseudolikelihood = -2760.0368 

Iteration 3:   log pseudolikelihood = -2688.6227 

Iteration 4:   log pseudolikelihood =  -2642.293 

Iteration 5:   log pseudolikelihood = -2641.5022 

Iteration 6:   log pseudolikelihood = -2641.5016 

Iteration 7:   log pseudolikelihood = -2641.5016 

 

Logistic regression                               Number of obs   =     119270

                                                  Wald chi2(8)    =    1013.48

                                                  Prob > chi2     =     0.0000

Log pseudolikelihood = -2641.5016                 Pseudo R2       =     0.1581

 

                                          (Std. Err. adjusted for 2593 clusters in caseid_new)

----------------------------------------------------------------------------------------------

                             |               Robust

              broke_up_tv_v2 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

-----------------------------+----------------------------------------------------------------

    how_long_relationship_tv |   .9498672   .0072971    -6.70   0.000     .9356723    .9642774

inv_relationship_duration_tv |   1.087082   .0148074     6.13   0.000     1.058444    1.116495

            resp_col_dgre_tv |   .8729314   .0897432    -1.32   0.186     .7136264    1.067799

                  married_tv |   .3367909   .0512634    -7.15   0.000     .2499181    .4538611

               coresident_tv |   .2366521   .0344213    -9.91   0.000     .1779517    .3147159

           children_in_hh_tv |   1.126575   .0513125     2.62   0.009     1.030363    1.231771

        met_online_augmented |   1.154389   .1518652     1.09   0.275     .8920162    1.493935

     ln_hhinc_2009dollars_tv |   .8737686   .0500412    -2.36   0.018      .780994    .9775639

                       _cons |   .1526157   .0926196    -3.10   0.002     .0464538    .5013921

----------------------------------------------------------------------------------------------

 

* I talked a little about the relevance of clustering the observations by couple, because multiple observations within couples are not really independent. In this case clustering makes hardly any change to the output, but we do lose full likelihood maximization, which is bad.

 

. stcox met_online_augmented

 

         failure _d:  broke_up_tv_v2

   analysis time _t:  how_long_relationship_tv

  enter on or after:  case_duration==1

                 id:  caseid_new

 

Iteration 0:   log likelihood = -2567.7135

Iteration 1:   log likelihood = -2567.7073

Iteration 2:   log likelihood = -2567.7073

Refining estimates:

Iteration 0:   log likelihood = -2567.7073

 

Cox regression -- Breslow method for ties

 

No. of subjects =         2593                     Number of obs   =    116677

No. of failures =          482

Time at risk    =  9723.033187

                                                   LR chi2(1)      =      0.01

Log likelihood  =   -2567.7073                     Prob > chi2     =    0.9116

 

--------------------------------------------------------------------------------------

                  _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

---------------------+----------------------------------------------------------------

met_online_augmented |   1.013255   .1200004     0.11   0.911     .8033608    1.277989

--------------------------------------------------------------------------------------

 

. stcurve, survival

 

*If we run a Cox model with no significant covariates, then we generate the post-Cox survival curve, it is a LOT like the Kaplan-Meier curve below. Survival curves are monotonically decreasing.

 

. sts graph

 

         failure _d:  broke_up_tv_v2

   analysis time _t:  how_long_relationship_tv

  enter on or after:  case_duration==1

                 id:  caseid_new

 

. sts list, at(0 1 2 3 4 5 10 20 30)

 

         failure _d:  broke_up_tv_v2

   analysis time _t:  how_long_relationship_tv

  enter on or after:  case_duration==1

                 id:  caseid_new

 

              Beg.                      Survivor      Std.

    Time     Total     Fail             Function     Error     [95% Conf. Int.]

-------------------------------------------------------------------------------

       0         0        0              1.0000         .          .         .

       1       127       64              0.3572    0.0543     0.2529    0.4626

       2       180       61              0.2378    0.0384     0.1669    0.3158

       3       200       52              0.1838    0.0304     0.1286    0.2468

       4       150       31              0.1541    0.0259     0.1074    0.2085

       5       275       28              0.1402    0.0237     0.0977    0.1903

      10       332      117              0.1000    0.0172     0.0695    0.1368

      20       189       94              0.0746    0.0131     0.0517    0.1029

      30       117       31              0.0629    0.0112     0.0434    0.0872

-------------------------------------------------------------------------------

Note:  Survivor function is calculated over full data and evaluated at

       indicated times; it is not calculated from aggregates shown at left.

 

* sts list just helps you identify exact values in the sts graph.

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\Soc_382\logs\prep for 2nd soc 382 event hist

> ory class.log

  log type:  text

 closed on:  21 Feb 2019, 12:24:19

--------------------------------------------------------------------------------------------------