Data Set
This example uses the Artificial Data Y-missing, Z-full
which was the data used for the Table 2 results: 5 longitudinal observations at times 0 1 2 3 4 plus
a background variable (missing code =999). These data, shown (in part)
in Exhibit 1, p.153 were created by tpsim --see description p.159
and Appendix A.
In this data set each individual is a "row" (which may be wrapped in this electronic form) with
column 1 the ID, columns 2 through 6 the Y values, and the rightmost column the values of the
exogenous variable (called Z or W).
Program Input Information
A recording of the program interface is provided to show the
questions that the timepath program asks and the appropriate responses for this data set. Also
shown is an abbreviated version of the console information provided on the progress of
the bootstrap resampling.
The program asks about the following items:
- Run File
- The run file is a simple ASCII file that contains the information asked for by the program,
allowing one to avoid responding individually to the series of queries from timepath. Typically, one
will not have an already created run file; the run file created by this run
is provided for reference.
- Bootstrap Replications
- The program requests the number of resamplings (4000 used here should be adequate) and the
coverage coefficient (here .90) to be used for computing the endpoints of the bootstrap confidence
intervals.
- Time Observations
- The program requires the number of longitudinal observations in the design (some individual data
can be missing) and the numerical values of those time observations.
- Missing Data Code
- Our code is 999 in this data set
- Background Variable
- The program needs to know if there is an exogenous variable (Z) in the data set.
- Input and Output Files
- filenames (and path if appropriate) required
Program Output
The output from this run is available. An ordered listing of what the
output contains is given below. Some explanation of the quantities involved is in Rogosa-Saner pp.155-6,
with technical details and forms for the estimates given in Appendix B.
- Initial Descriptive Pages (4 in this example)
- These initial pages summarize the individual Y on t least squares regressions, giving the values
for the Empirical Rate (theta-hat), the squared multiple correlation for the Y-on-t regression, the increase in
squared multiple correlation if a quadratic fit were used (with then the epirical rate representing the average
slope). Useful supplements such as stem-and-leaf diagrams are easily built from this electronic output.
The rightmost columns provide the data: the exogenous variable followed by the Y-observations. The
data listing and the summary values have many diagnostic and data-cleaning uses.
- Cross-sectional Description
- Cross-sectional averages and spread are provided
- Extreme Cases (top and bottom 10%) on R-square, MS residual, Rate
- Examining cases with greatest and smallest rates often has substantive interest. For data-checking or
cleaning examining cases with largest residual variance or smallest R-square is very useful.
(The situation of case #183 in these data illustrates that these are not identical indicators).
- Between-wave Correlations
- These correlations (and hopefully scatterplots too) are often the emphasis in traditional description
of longitudinal data.
- Descriptive Summaries of Rate, R-square, MS residual
- Percentiles and descriptive measures for the sample distributions (for construction of
displays such as 5-number summaries are provided). The rightmost column gives values for
an individual version of the Foulkes-Davis tracking measure called gamma (see below).
- Point Estimates of Parameters and Variance Components
- This section (one-page if no exogenous variable, two-pages if there is a Z-variable)
contains the point estimates for parameters discussed in Rogosa-Saner (see also Rogosa, 1995).
The Foulkes-Davis Tracking Index (which they term gamma) is a measure of consistency of
individual differences, described in Rogosa et al 1984 and Rogosa (1995) (and refs); see my
vita
- Bootstrap Estimates, Standard Errors and Confidence Intervals
- From the 4000 bootstrap replications, average values, standard deviations, and percentiles (e.g., 5%
and 95% for specified confidence .90) are separately given in the same format as the point estimates.
These values are close but not an exact match to those reported in Table 2. (As noted in Rogosa-Saner,
better methods for the bootstrap confidence intervals could be employed.)
In addition to the main output, the timepath program produces a file automatically named
bootreps.dat. This auxiliary file contains some summary information
on the bootstrap resamples, which may be useful for diagnostic examination. There is a row for
each bootstrap resampling (here 4000 rows) with the four columns containing the estimates of
t^o, variance(theta), reliability(theta-hat), correlation(theta, eta(t_1)) that are obtained from each
resample.