Job Dekker, PhD (left) and Zhiping Weng, PhD
The first comprehensive decoding and annotation of the human genome is being published today by the ENCyclopedia Of DNA Elements (ENCODE) project, an international consortium of scientists from 32 institutions, including UMass Medical School. The groundbreaking ENCODE discovery appears in a set of 30 papers in Nature, Genome Research and Genome Biology.
Using data generated from 1,649 experiments—with prominent contributions from the labs of Job Dekker, PhD, professor of biochemistry & molecular pharmacology and molecular medicine, and Zhiping Weng, PhD, professor of biochemistry & molecular pharmacology—the group has identified biochemical functions for an astounding 80 percent of the human genome. These findings promise to fundamentally change our understanding of how the tens of thousands of genes and hundreds of thousands of gene regulatory elements, or switches, contained in the human genome interact in an overlapping regulatory network to determine human biology and disease.
As little as a decade ago, the human genome was viewed by scientists as a collection of independent genes that contained the instructions for making the proteins that carried out the basic biological functions necessary for life. Driven by this premise, most researchers focused on understanding the relatively small portion of the genome that made up protein-coding genes, while the non-coding portion of the genome—often referred to as “junk DNA”—received little attention. The sequencing of the human genome in 2003 and more recent efforts by the ENCODE consortium, which is funded by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH), and others over the last decade, has begun to fundamentally change researchers’ views on the importance of the non-coding portion of the genome.
Scientists now know that the protein-coding portions of the genome make up only one part of our genetic picture. Of equal importance are those areas of the genome that regulate genes. These elements, such as regulatory DNA elements and non-coding RNA, control when a gene is turned on and off and can also amplify or curtail expression of a gene. Even a small change in when a gene is turned on can have a huge biological impact, or in specific circumstances, contribute to disease.
Taken together, genes and their regulatory elements create a vast network of overlapping systems that carry out the basic biological processes necessary for life, a system that scientists are only now beginning to understand. Using a wide variety of experimental and computational approaches, members of the ENCODE consortium have generated comprehensive information about the identities, locations and characteristics of human genes and regulatory switches throughout the genome. This data represents an expansive resource that biomedical researchers can use to begin unraveling how this system works and how it contributes to disease.
“ENCODE’s work provides a critical map of tens of thousands of genes and hundreds of thousands of regulatory switches that are scattered all over the 3 billion nucleotides of the genome,” said Dr. Dekker, co-director of the Program in Systems Biology. “As a group, we’ve identified more than 4 million sites that, through binding specific proteins, affect biological function.”
Dekker: Three dimensional wiring of the genome
What this map doesn’t tell scientists, though, is which switches or elements regulate which genes. “The genome is like a panel of light switches in a room full of lights, except there are thousands of lights and almost a million switches,” said Dekker, the lead author on one of the six ENCODE papers that appear in Nature. “We don’t know what switches turn on which lights. And some switches turn on the same lights or turn on multiple lights.”
That is where his work, provides unique insights. Over the last decade, Dekker has pioneered the development of chromosome conformation capture technologies (3C) and combined it with next-generation sequencing technologies (5C) to create three-dimensional models of folded chromosomes. In turn, these models can be used to determine which parts of the genome, when folded, come in physical contact with each other.
Over the last several years, it has become clear that one of the ways in which regulatory elements can turn genes on and off is through direct physical contact. As part of the ENCODE project, Dekker’s task was to determine where these switches and genes were touching along the genome. To do this, Dekker and his team produced the first three-dimensional diagram of a section of the genome that shows which gene regulatory switches touch, and control, which genes—in essence producing a wire diagram for the genome. “These switches can be located far from the genes they regulated in the one-dimensional genome sequence but in three dimensions, the chromosome is folded so that they physically touch,” said Dekker.
In the Nature paper, Dekker and his UMMS colleagues discovered important patterns in the three-dimensional wiring of the genome, which may help researchers understand how the genome is put together and works as a system. For instance, there is a preferred order and distance between regulatory elements and their targets. Identification of more such “rules” could help predict the three-dimensional wiring between genes and regulatory elements for other genomes in the future, and ultimately enhance understanding of the genetics of disease.
“Genetic variation between individuals is often the result of differences in regulatory elements, not genes,” said Dekker. “Unraveling the three-dimensional wiring behind the system of switches and lights that makes up the genome will help us find genes that are misfiring because of a defect in a regulatory element that might be causing disease.”
Weng: Transcription factors
Controlling many of the regulatory switches identified by Dekker is a type of protein called a transcriptional factor. The human genome has roughly 1,500 different transcription factors that bind to DNA, as well as each other. Together, this complex interaction of DNA and proteins form intricate networks which control regulatory switches and dictate the expression levels of genes in a cell.
To understand how regulatory switches are turned on and off, members of the ENCODE project consortium went about systematically identifying where transcription factors bind to DNA in particular cell types and the expression levels of all the genes in those cells. With this experimental data in hand, a team led by Dr. Weng, director of the Program in Bioinformatics and Integrative Biology, set out to integrate all that information in an effort to better understand the basic components of transcriptional networks.
Using new computational methods developed in her lab, Weng and colleagues performed a comprehensive analysis on all 457 sets of transcriptional factor and DNA interaction data generated by the ENCODE consortium—the results of which are published in Genome Research.
What they found, according to Weng, was that “some regulatory factors like to bind to neighboring sites within the same switch in an effort to co-regulate a gene, while other transcription factors piggyback onto other transcription factors in order to exert another layer of control.”
They also found in cells that use a particular regulatory switch, the DNA of the switch is depleted of nucleosomes, the storage unit for genomic DNA. In other cells where the switch isn’t needed, however, the DNA making up the regulatory switches is packaged into nucleosomes. “It appears that the sequence features of the DNA in regulatory switches actually promote nucleosome formation, which is a great way to prevent turning on a switch in the wrong cell type, which could lead to disease or tumor formation,” said Weng.
Weng’s lab published the results of two other studies in Genome Biology, the first of which describes a method for computationally predicting and experimentally testing binding sites for transcriptional factors inside regulatory switches. In the other study, Weng and colleagues built a computational algorithm that could predict the expression of a gene from the epigenetic state of its regulatory switch.
“Together, these three studies significantly further our understanding of gene regulation in the human genome,” said Weng. “This new knowledge has an impact on improving human health because many diseases are caused, not by genetic defects in genes, but by miss-regulation of otherwise normal genes.”
Next steps
For the next phase of the ENCODE project, Weng received a four-year, $8 million grant from the NIH to lead the Data Analysis Center of the project. The effort will include researchers from the Massachusetts Institute of Technology, Yale University, the Dana-Farber Cancer Institute, Johns Hopkins University, the University of Washington and the Institut Municipal d’Investigacio Medica in Spain. Set to begin in September under the direction of Weng, the team will perform a comprehensive and integrative analysis of the data collected by the ENCODE consortium.
Meanwhile, Dekker and his laboratory will expand their work to map the 3D wiring of the entire genome. This includes analyzing the remaining 99 percent of the human genome for which long-range interactions between genes and switches have yet to be studied.