The Y-Chromosome and Genetic Genealogy.

DNA. DNA contains the codes that determine our inherited characteristics. It consists of long strings of molecules in the form of the now famous "double helix," which looks somewhat like a spiral staircase. The coding part of DNA consists of four types of base pairs weakly bonded across the "steps" of the staircase. These four bases have been named adenine (A), thymine (T), cytosine (C), and guanine (G). The four letters A, T, C and G are consequently used to describe sequences of DNA. "Genes" consist of specific sequences of the four bases, which control the production of RNA or proteins. However mixed in among the genes are long segments of DNA that appear to serve no function. DNA is contained in 46 chromosomes found in 23 pairs in the nucleus of nearly every cell of the human body.

The Y-Chromosome. The y-chromosome is inherited more or less unchanged from father to son to grandson, indefinitely. Chromosomes contain the DNA that determines our inherited characteristics, and the y-chromosome is one of the 46-chromosomes in the nucleus of each of the cells of all human males. Most chromosomes, including the two x-chromosomes possessed by females, get recombined or shuffled each generation before being passed down to offspring. But the y-chromosome is unique in remaining more or less unchanged when passed from father to son. Thus while most chromosomes will contain a random mixture of genetic codes from one's grandparents and great-grandparents, a male's y-chromosome will be identical or nearly identical to that of his father, grandfather, great-grandfather and beyond for countless generations. Since surnames tend to be inherited in the same manner as y-chromosomes (from father to son or patrilineally), y-chromosome testing lends itself particularly well to surname studies.

"Unchanged" must be qualified by "more or less" because mutations occasionally occur. Otherwise all males would have identical y-chromosomes, making them useless for genealogical purposes. By looking at specific locations on the y-chromosome (known as markers among genealogists), we can compare individuals and support or disprove suspected genealogical relationships.

Application to Genealogy. One approach is to use y-chromosome testing to focus on certain well-defined puzzles or hypotheses. Several ancestral Bachmans lived in the same area of the same village in 17th century Switzerland. A reasonable supposition would be that they might share a common ancestor from which they inherited their surnames. By comparing the y-chromosomes of descendants of each of the ancestral Bachmans, we should be able to substantiate or disprove the hypothesis of a common Bachman ancestor. This approach requires two or more people to submit samples together.

Another puzzle we had concerned the naturalist, the Rev. John Bachman, Audubon's collaborator and coauthor. John Bachman's biographical sketch suggested he was a descendant of Johann Georg Bachman who had settled in Saucon township in Pennsylvania in the early 1700's. But other details were inconsistent and a link had never been proven. By testing a known descendant of Johann Georg and a descendant of John Bachman, we were able to show that the two men did indeed have a common ancestor, tending to substantiate the relationship.

Another approach is to establish a surname y-chromosome study and invite any male sharing the same surname to participate. Our Bachman/Baughman study has evolved in this direction. The testing companies encourage this approach by giving discounts to surname groups and publicizing the fact that specific surname groups exist. As the number of participants grow, some who share a surname will be found to have a previously unknown link to each other through a (possibly unknown) common ancestor. This approach is particularly useful when you have a combination of individuals with deep patrilineal lineages and others with fairly shallow knowledge (say to about 1800) but hopes of finding connections. Even if the results are negative, knowing that two branches sharing the same last name do not share a common ancestor may result in less time wasted searching for possible connections that do not exist. One participant in the Bachman study, whose earliest know ancestor was a John Baughman who was born about 1800 in Westmoreland Co., PA was tested. He was found to share a y-chromsome with several American Bachman/Baughman lines in Pennsylvania and Kentucky and with a Swiss Bachmann who had originated in Canton Aragau. Although the links have not yet been found, the common origin has been established.

Females cannot participate directly in y-chromosome studies, having no y-chromosomes. However if they wish to research their father's patrilineage, they can help sponsor their father or a brother or other patrilateral relative of their father.

Markers. There are a number of different kinds of mutations (changes in the genetic code) that can occur when DNA is copied within a cell and passed on to the next generation. Short-tandem repeats (STR's), also known as microsatellites, are the markers tested in most genealogical y-chromosome studies. STR's occur at specific locations on the y-chromosome, which are often referred to as loci, and are given names such as "DYS391." STR's occur when short segments of DNA sequences get repeated over and over along a portion of a chromosome. For example, DYS391 consists of repeats of the base sequence -GATA-. Once an STR exists, it may change by adding or subtracting a repeat or two during the replication process. Estimates of the frequency of changes range from less than 2 mutations per 1000 generations to over 7 per 1000 generations for each STR, depending on which marker. Thus over a long period of time, individuals will tend to have at least some differences in the values (number of repeats) on the various STR markers on their y-chromosome. If you look at 25 markers, there is about a 50% chance you will find at least 1 mutation in 9-10 generations (or, counting both up and down from the common ancestor, between yourself and a 4th cousin). DYS391 can have values ranging from 7 to 14 repeats, with 10 and 11 being common in populations with European ancestry. There have been over 200 STR markers identified on the y-chromosome, but not all are variable enough for genealogical purposes. Testing companies currently test between 10 and 43 markers.

Haplogroups and "clans." Another kind of mutation is a base substitution (single nucleotide polymorphism or SNP). A change to a given base is extremely rare compared to changes in STR's, and specific substitutions are believed to have occurred only once in human history. Thus all people who share a specific SNP value usually can trace it back to a mutation in a single ancestor. Consequently, SNP's can be used for broad anthropological studies of our ancestry, and have been used to create a "family tree" of the paternal heritage of all humankind. Large haplogroups (or "clans" in the terminology of some testing companies) originated with a single ancestor who had a specific SNP mutation, and these haplogroups have been given names beginning with capital letters. The most common haplogroup among Europeans is labeled R1b. It is especially common along the Atlantic seaboard (over 80% of some populations), but is also frequent throughout Europe. Other common European haplogroups include R1a and I, which are common in northern and central Europe. In order to know your haplogroup with 100% certainty, you would need to pay for a separate SNP test. But certain combinations of STR values are commonly associated with specific haplogroups, and most people's haplogroups can be accurately guessed from their STR values. This is because even after thousands of years, the STR values of the original fathers of the various haplogroups are still reflected in the STR values found among his descendants. For example, most of the Richterswill Bachman/Baughmans who have been tested so far have STR haplotypes that are clearly R1b.

Knowing one's haplogroup may not tell you much about your more recent genealogy, but it is of interest to many to know if their ancient patrilineal ancestor was one of the Cro-Magnon people who first resettled western Europe after the ice ages (R1b), one of the Gravetians who came into Europe from the east a bit later (I), or one of the early agriculturalists who came from the Middle-east thousands of years later (J, among others).

Another kind of haplogroup or "clan" involves classification using mitochondral DNA (mtDNA). MtDNA is found outside of the nucleus of cells, does not recombine, and is passed on by females. Thus it can be used to establish matrilineal groupings (popularized in the book "The Seven Daughters of Eve"). But because it mutates relatively slowly, is of less use for genealogy. More information on mtDNA.

Risks. The markers that are tested for family studies are "junk" DNA and believed unrelated to any physical or medical characteristics. This means there is little danger if the privacy guarantees of the testing companies were somehow breeched. Nor can the markers tested be used to uniquely identify individuals, since the same set of marker values may be shared by many related or even unrelated (in a genealogical time frame) individuals. Thus the results would not be useful for civil or criminal proceedings. The single largest potential risk of y-chromosome tests (unless one includes the costs) is the possibility that a participant will discover that he is not biologically related to someone else in the way expected. Sometimes a long established and accepted genealogy will turn out to have been wrong. Illegitimacy rates vary by time, place, and economic and social status, but have always occurred. Adoptions have also always occurred, and their knowledge might easilty be lost by later generations, especially prior to widespread vital records. Thus unexpected non-matches can occur, and some people may find this disturbing or even traumatic, especially if a "non-paternal event" may have occurred in the recent past.

Conclusion. The most effective use of y-chromosome testing for genealogical purposes will be either within a family surname project or when testing a specific hypothesis about a possible common ancestry of two individuals. Not every genealogical puzzle can be solved with DNA, and it is important that participants in such studies realize that there is no guarantee that the results will be as desired or expected. However under the appropriate circumstances, genetic or molecular genealogy can be a powerful tool to substantiate or disprove hypotheses where traditional documentation is weak or non-existent.

Philip Ritter, 2005