Rosenberg lab at Stanford University
Researchers uncover unexpected genetic relatedness within HapMap 3 populations
By Tracy Vence

Genome Technology, December 2010/January 2011

Members of Noah Rosenberg's lab at the University of Michigan were working on principal components analysis, genotype imputation, and other projects using the publicly available HapMap Phase 3 data set and were surprised to produce unusual experimental results — and consequently, abnormal plots — indicative of a statistical sensitivity to the presence of unreported close relatives in the cohort. Trevor Pemberton, a postdoc in the Rosenberg lab, decided to take a closer look at the data. Using allele-sharing and RELPAIR analyses, Pemberton and his colleagues were able to empirically ascertain familial relationships.

Pemberton says that the aim of his team's study, published in the American Journal of Human Genetics in September, "was just to make the HapMap resource more useful for individuals that need to have a data set with clearly defined relatedness in it. ...It started off being a 'We need this,' and then it became a 'This is something that will be useful to a lot more people,' so we reported it."

In a nutshell, Pemberton says that RELPAIR — which was developed by researchers at Michigan in 1997 — examines tracks of genomic sequence that are shared between individuals and based on the length and frequency of these shared tracks assigns a probabilistic relationship to the pair — whether they are likely parent and offspring, full siblings, or otherwise closely related.

Pemberton et al. confirmed the 358 relative pairs reported in notes that accompanied the release of the HapMap 3 data set and also identified 25 previously unreported parent-offspring pairs, 33 unexpected full sibling pairs, 118 unreported second-degree relative pairs, and five surprising parent-parent-offspring trios.

Using the same methodologies with which researchers constructed the HGDP-CEPH Human Genome Diversity Cell Line Panel, Pemberton and his colleagues assembled two subsets of individuals — dubbed HAP1161 and HAP1117 — that contain no known pairs of individuals with a first-degree relationship and no known pairs of individuals with a relationship closer than that of first cousins, respectively. Pemberton is now using these subsets in his investigations of genome-wide homozygosity across human populations worldwide.

He says that he and his collaborators hope that "the study we've reported on HapMap 3 is going to be the definitive study of relatedness in this project." Should the HapMap researchers release genetic data gleaned from additional populations in the future, Pemberton intends to "revisit this project and repeat what we've done," he says.