next up previous contents
Next: Columns: Up: Format for the data Previous: Components of the data   Contents

Rows

  1. Marker Names (Optional; string) The first row in the file can contain a list of identifiers for each of the markers in the data set. This row contains $ L$ strings of integers or characters, where $ L$ is the number of loci.
  2. Inter-Marker Distances (Optional; real) the next row in the file is a set of inter-marker distances, for use with linked loci. These should be genetic distances (e.g., centiMorgans), or some proxy for this based, for example, on physical distances. The actual units of distance do not matter too much, provided that the marker distances are (roughly) proportional to recombination rate (the algorithm estimates an appropriate scaling from the data). The markers must be in map order within linkage groups. When consecutive markers are from different linkage groups (e.g., different chromosomes), this should be indicated by the value -1. The first marker is also assigned the value -1. All other distances are non-negative. This row contains $ L$ real numbers.
  3. Individual Data (Required) Data for each sampled individual is arranged into one or more rows as described above (further details below).
  4. Phase Information (Optional; diploid data only; real number in the range [0,1]). This is for use with linked loci only. This is a single row of $ L$ probabilities that appears after the genotype data for each individual. If phase is known completely, or no phase information is available, these rows are unnecessary. They may be useful when there is partial phase information from family data. There are two alternative representations for the phase information: (1) the two rows of data for an individual are assumed to correspond to the paternal and maternal contributions, respectively. The phase line indicates the probability that the ordering is correct at the current marker; (2) the phase line indicates the probability that the phase of one allele relative to the previous allele is correct. The first entry should be filled in with 0.5 to fill out the line to $ L$ entries.

next up previous contents
Next: Columns: Up: Format for the data Previous: Components of the data   Contents
William Wen 2002-07-18