Scientific Background

WHAT ARE PROTEINS?

Proteins are necklaces of amino acids --- long chain molecules. Proteins are the basis of how biology gets things done. As enzymes, they are the driving force behind all of the biochemical reactions which makes biology work. As antibodies, they recognize invading elements and allow the immune system to get rid of the unwanted invaders. For these reasons, scientists have sequenced the human genome -- the blueprint for all of the proteins in biology -- but how can we understand what these proteins do and how they work?

 

WHAT ARE GENES?

Genes are the functional units of the genome. Each gene encodes the information for how to make a specific protein. Scientists understand exactly how a gene specifies the sequence of amino acids in a protein -- the genetic code. What we don't understand is how and why many different protein sequences, each encoded by a different gene, can form the same three-dimensional protein structure.

The number of different genes, each encoding a specific protein sequence, is thousands of times greater than the number of different three-dimensional protein structures. Protein sequence design seeks to understand this mystery by designing as many sequences as possible that could form the same structure.

WHY IS PROTEIN DESIGN SO DIFFICULT?

Because there are twenty different amino acids, and proteins contain dozens to hundreds of amino acids, the number of possible protein sequences is astronomically large. To find the small subset of protein sequences that will form any given protein structure requires complex algorithms that try to fit sequences onto a structure in realistic ways.

 

THE HUMAN GENOME PROJECT

Since proteins play such fundamental roles in biology, scientists have sequenced the human genome. The genome is in a sense a "blueprint" for these proteins -- the genome contains the DNA code which specifies the sequence of the amino acids beads along the protein "necklace."

DESIGNING NEW GENOMES

Scientists have been working for decades to unravel the protein "sequence - structure relationship". Now that the human genome, and many others, are being sequenced, there is a unique opportunity to compare natural genomes and their proteins to sets of designed proteins. Why does nature choose specific genes? Can we design better proteins? Why do some genetic mutations (which change the protein sequence) cause diseases and others do not?

As the Human Genome Project nears completion, we need to further our understanding of the intimate relationship between genes and protein structures.

 

 

 

How is this project supposed to help us understand "real" genomes and proteins?

Genome@home studies real genomes and proteins directly, by designing new sequences for existing 3-D protein structures, which come from real genomes. The protein structure files that are sent out as work contain the Cartesian atomic coordinates of a protein. This data was obtained experimentally through X-ray crystallography or NMR techniques. Note that this was not done by us; thousands of scientists have spent decades compiling this data, which is generously made freely available to the public. By designing new sequences that could form these specific protein structures, we're setting the stage to attack a number of significant contemporary issues in structural biology, genetics, and medicine. For example, the Genome@home data will be used to:

  • Try to unravel a fundamental issue in the "protein folding problem" (which itself lies at the heart of a huge amount of modern biomedical research): the fact that thousands of different sequences can all form the same three-dimensional structure.

  • Predict the functions of newly discovered genes and protein structures. Modern approaches to structural biology, known as "proteomics" or "structural genomics", often solve protein structures without knowing what the proteins do. Because techniques for function prediction tend to work best with large amounts of sequence data, a virtual library of sequences for a new protein structure will be an invaluable resource.

  • Potentially design and make new versions of existing proteins for use in medical therapy.

 

EXPLORING FURTHER: INTERESTING LINKS TO OTHER SITES

To learn more about DNA, genomes, proteins, or folding, we suggest the following links (recommended links in boldface):

SIMULATIONS OF PROTEIN FOLDING:

 

DNA:

BIOLOGY OF PROTEIN FOLDING:

GENOME PROJECT:

BIOCHEMISTRY:

 

BIOTECHNOLOGY:

WHAT THE PRESS IS SAYING ABOUT GENOME@HOME & FOLDING@HOME:

PRESS ARTICLES ABOUT THE PANDE GROUP'S RESEARCH: