A multidisciplinary team of scientists led by Elinor Karlsson, PhD, associate professor of molecular medicine in the Program in Bioinformatics and Computational Biology, has captured biodiversity at a genetic level. By sequencing the genome of 240 mammalian species, 122 of which had never been sequenced, researchers identified a correlation between regions of reduced genetic diversity in species with a higher risk extinction. Further use of these comparative genomes will allow scientists to identify stretches of DNA that have remained unchanged (or conserved) in mammals for millions of years, leading to new insights into human health, disease and biodiversity.
“What we’ve been able to do by sequencing these genomes is capture biodiversity at a genetic level,” said Dr. Karlsson. “Taking this data, we can analyze mammalian genomes across species to see what’s changing or not changing over millions of years in interesting ways across all these genomes. This includes areas of the genome where changes are most likely to lead to disease or illness.”
The data, published in Nature, has already been used to further understanding of disease and illness. Earlier this year, Karlsson was one of the authors that used the work in a Proceedings of the National Academy of Sciences study that identified species that may be especially vulnerable to human-to-animal transmission for SARS-CoV-2.
To capture a diverse and broad array of species to generate a genomic data set that was useful, Karlsson included at least one species from each eutherian family. Among the species selected are nine that are the sole members of their family and seven that are critically endangered, including the Mexican howler monkey, hirola, Russian saiga, social tuco-tuco, indri, northern white rhinoceros and black rhinoceros. In total, 80 percent of mammalian families are represented in Karlsson’s comparative analysis.
“A lot of these animals can’t be found in zoos,” explained Karlsson. “We could only get DNA samples by going out into the field and finding these species in their native habitat. For species that live in remote places, like the rain forest or deep ocean, getting a DNA sample back to the lab that was of a quality that could be sequenced is a huge challenge.”
Once Karlsson and her team had the sequences, they had to analyze the data. To do this, the various genomes had to be lined up correctly so that corresponding genetic regions were being accurately studied. Comparing 240 genomes, including humans, base-by-base and lining them all up accurately took nine months of cloud computing to get to a single base resolution. “Computationally, this is a huge lift,” said Karlsson.
Once all the data was processed, scientists were able to isolate 3.1 percent of the mammalian genome that was nearly identical between all 240 species.
“What this means,” said Karlsson, “is that these DNA sequences were unchanged since the time all these species shared a common ancestor—going back millions and millions of years. This is more than we would expect from random mutations. This would suggest that these areas of DNA are critical to life, and that animals with mutations in these areas tended not to survive long enough to reproduce.”
One of the initial questions Karlsson was able to investigate was how much diversity exists in the genome of a given species.
“If we are looking for early signals that a population might be threatened, and could benefit from intervention from conservation groups, we can find that in the genetic data,” said Karlsson. “Species with less biodiversity are likely to have fewer genetic differences between the DNA inherited from mom and the DNA inherited from dad. These species could be identified using genetic data before population numbers drop precipitously, and prioritized for in-depth study.”
While looking for areas of similarities between species can lead to insights into human health and disease, Karlsson is also intrigued by genetic differences between species. “If you think of all the things other species can do that humans can’t, like hibernation,” said Karlsson. “Every year, animals that hibernate stock up on calories, they become insulin resistant, and they hibernate. Then they just bounce back. Humans cannot do that. It would be disastrous. What are the genes that control that? What does that mean and how does it relate back to how the human genome works? That’s the ultimate question.”