Arches. Photo by Daniel Chia
HOPES: Huntington's Outreach Project for Education, at Stanford
Oct
26
2010

Human Genome Project

What is a Genome?

A genome is the complete set of DNA in an organism. The HOPES article DNA and Chromosomes explains that DNA is the biological material that stores information about how to build and run a living organism. So a genome is all of the genetic information for one organism. Every living thing, from bacteria to daisies to humans, has its own genome and each individual has an entire copy of its genome in almost every cell in its body. All sexually reproducing organisms, such as humans, get DNA from both the mother and the father, so each individual has a slightly different genome (except in the case of identical twins). However, the difference between individuals of the same species is much, much smaller than the difference in between species. For this reason, it is possible to talk about both “John Doe’s” specific genome and the more general “human genome.”Most scientific studies of the information contained in DNA have been focused on relatively short sections called genes. Genes tell the body how to make RNA and proteins, both of which have innumerable functions within the body. For more background information on DNA, genes and the human genome please click here.

What was the Human Genome Project?^

DNA is essentially a long, double-stranded chemical string made up of four nucleotide “letters”: A, C, T and G. Information is “encoded” by specific combinations of these letters in different orders and different lengths. Scientists have developed many methods of sequencing DNA (that is, determining the order of those letters) so that we can start to understand the information encoded in the DNA. Most sequencing efforts focus on a single gene or other DNA segment. The Human Genome Project (1990-2003) was a public project whose two central goals were to sequence the entire human genome and find all of the genes within it. Finding genes can be difficult because, according to recent estimates, they make up only 1% – 3% of the whole genome. (The rest, sometimes called “Junk DNA”, was once thought to be completely inactive, but more recent studies have shown that at least some of it is actually very important.) Knowing where the genes are is important for medicine because the proteins that they make are often involved in disease. Although a small amount of work remains to be done in these areas, the Human Genome Project did achieve its goals and was declared finished in 2003.

In the Beginning^

The only genomes that had been sequenced by the mid ’80s were those of tiny viruses and consisted of just thousands of nucleotide letters, also called bases. The human genome, on the other hand, has about 3.2 billion bases. During early discussions about sequencing the human genome, most scientists and researchers believed that the project was probably impossible, and potentially dangerous. The first meetings on the subject, in 1985 and 1986, failed both to generate significant interest in the rest of the scientific community and to bring in grant money.

One of the problems was that it was difficult to imagine the field of biology taking a step into what is called “Big Science”. Generally, scientific study is done on a small, laboratory-specific scale. “Big Science” on the other hand, involves projects that require a large budget, a huge amount of technology, and the cooperation of many laboratories. Examples of “Big Science” from other scientific fields are the Hubble Space Telescope and the development of the atomic bomb. Some scientists were also concerned about breaking with the traditional scientific method. This method requires that science be done by experiments that test hypotheses. Sequencing is not an experiment; it is simply a procedure that determines the order of nucleotide “letters” in the genome. Finally, many people were worried that personal genetic information, such as susceptibility to diseases, would become publicly available and that people might be discriminated against by insurance companies and other entities.

Despite these worries, the potential scientific and medical benefits of knowing the human genome sequence were too great to put off. In the end, it was the US Department of Energy (DOE) that began the sequencing project with their Human Genome Initiative in 1986. The DOE had been researching the human genome and the ways that it could be affected by radiation since the atomic bombs were dropped in 1945. The National Institutes of Health (NIH) joined the project in 1988 with their National Human Genome Research Institute and together they drafted the first five-year plan to present to Congress. The Human Genome Project officially began on October 1st, 1990 when Congress committed $200 million per year, or $3 billion total, and set the project length at 15 years. DNA was taken from a large number of donors, but only a few of these samples were ultimately used to determine the human genome sequence. This method was intended to protect the identity of the individuals whose DNA sequence was being made public.

Soon, sequencing the human genome became an international endeavor. The Wellcome Trust Sanger Institute in the United Kingdom became one of the five major contributors to the final sequence. The other four were US based laboratories. Leading among the many other countries that participated in the project were Japan, France, Germany and China. The international effort was coordinated in large part by the Human Genome Organisation (HUGO), which seeks to promote research and collaboration in the field of human genetics. For more information on participating labs around the world, click here.

Progress of the HGP^

People are often surprised to hear that the first chromosome sequence (Chromosome 22) was not finished until December 1st, 1999. What exactly were HGP researchers working on during those first nine years?

Some of them were working on the less central, but equally important, goals of the HGP. These included technology advancement, sequencing of model organisms, creating a human genetic map, and exploring the ethical, legal and social implications (abbreviated as “ELSI”) of genetic research. The ELSI program was designed to address the concerns expressed at the beginning of the project about the potential for immoral use of genome data. It received five percent of the DOE and NIH budgets each year. Among the issues considered were privacy and confidentiality, uncertainty of genetic tests, impact on reproductive decision making, and the line between medical treatment and enhancement. For more information, please see the HOPES article, HGP: Ethical, Legal and Social Implications.

The strategy of sequencing the entire Human Genome at the once, instead of chromosome by chromosome, also caused a delay in getting the first piece of the sequence finished. The good thing about this technique was that the whole sequence was completed within a short time frame; the entire project finished less than four years after the completion of Chromosome 22. You might think about it in terms of cooking a meal for 200 people. If you make each serving separately, the first meal will be ready very quickly. If you make all of the meals at the same time, the first person will be served a little later but all of the meals will be ready around the same time. This means that the last person will be served much more quickly than if you had prepared the meals one by one.

The Human Genome Project vs. Celera Genomics^

There was another reason that the Human Genome Project was completed in 2003, two full years ahead of schedule. In 1998, Celera Genomics a private company led by Dr. Craig Venter, announced that it would sequence the human genome in three years. Celera was critical of the amount of time and money it was taking the HGP to complete the human genome sequence. It intended to obtain a patent on the information and charge a fee to anyone who wanted to access substantial amounts of the sequence. This idea was generally unpopular because most public scientists felt that the human genome sequence should belong to everyone. HGP researchers released their data daily into an internet database open to the public. They believed it was vital to make their information available to other researchers right away so that important medical discoveries could be made as quickly as possible. Celera refused to release its data in the same way, so plans for the two to collaborate and share data fell through. In the end it became clear that the only way to keep Celera from getting a patent was for the HGP to publish its data first. Francis Collins, director of the NIH’s efforts, was able to get laboratories around the world to commit to a rigorous new schedule that would produce a “working draft” for publication in 2001.

President Bill Clinton was able to facilitate an ultimately short-lived truce between the HGP and Celera in June 2000, by recognizing both Collins and Venter for their work at a White House ceremony. The two project leaders announced plans to publish both drafts simultaneously in the journal Science. By December of 2000, relations had become tense again, each side doubting the accuracy of the other’s sequence and the motives behind it. Although the two papers were still published at the same time in February of 2001, the HGP moved its paper to the journal Nature.

In March of 2000, US President Bill Clinton and UK Prime Minister Tony Blair made a joint announcement that no patents would be given for simply figuring out a genetic sequence. This decision meant that Celera Genomics could not patent the human genome sequence. It could still charge people to view and use its data, but it could not prevent other groups, such as the HGP, from publishing their results as well. It is possible to get a patent for a specific genetic sequence, meaning a small piece of the whole genome, if you know what it does (for example: what protein it codes for). Many people think that more strict guidelines are needed to protect the accessibility of the genome sequence. In their opinion, patenting keeps multiple researchers from working on the same genetic sequence and so slows down the rate of discoveries that could be medically important. The other side of the argument is that private scientists might choose not to work with genetic sequences at all if they cannot get a patent for their discoveries. For more information on this debate and related topics, please see HGP: Ethical, Legal and Social Implications.

The “Final” Sequence^

The papers published in 2001 presented a “working draft” that was 99.9 percent accurate. That level of accuracy seems excellent, but because there are about three billion nucleotide “letters” in the human genome, it actually meant that an estimated 300 million “letters” were incorrect or missing. Still, the data was very good, containing approximately 97 percent of the human DNA sequence. Sequencing is done using small chunks of DNA at a time so some of the sequence was still in unidentified bits. However, about 85 percent of the sequence had already been put in the right order and 24 percent was already in “finished” form, meaning that it was 99.99 percent accurate.

In 2003, both Celera and the HGP published their “finished” sequences, which are 99.99 percent accurate for about 92 percent of the genome. This is accepted as “finished” because it is not possible to sequence the other eight percent with current technology. Most of these sections are highly repetitive and appear to contain very few genes. They include sequences that mainly have structural importance such as centromeres, present at the middle of each chromosome, and telomeres, the sequences of DNA at the ends of a chromosome. Papers discussing the completed sequence were published in the journals Science and Nature to commemorate the 50th anniversary of James Watson and Francis Crick’s discovery of the double-helical structure of DNA in 1953. To find out more about DNA and its structure please see An Introduction to DNA and Chromosomes.

The Human Genome Project was biology’s first step into “big science”. It involved a substantial amount of risk for those who first advocated it, but the results met or exceeded expectations in all areas in terms of accuracy and scope. Additionally, the project was finished two years ahead of schedule and at a final cost of $2.7 billion, $300 million under budget. In many ways, the HGP opened the door into large scale projects for the life sciences and marked, some say, the true beginning of the “Age of Biology” in science.

Further Reading^

  • The National Human Genome Research Institute’s website for the Human Genome Project: click here
    This website is intended for the general public and is very informative. The Human Genome Project is only one of the topics discussed, but the whole website is devoted to human genetics so it is very helpful.
  • The Department of Energy’s website for the Human Genome Project: click here
    This website, too, is directed at the general public; the material is very accessible.
  • National Institutes of Health (June 26, 2000). International Human Genome Sequencing Consortium Announces “Working Draft” of Human Genome. Press Release. Click here to read this article.
    This is the NIH’s press release about the working draft of the human genome. It is fairly easy to read.

F. Clum, 12/06/08