New Mathematical Model to Add Rigor to Studies of Disease Genetics and Evolution

USC College scientist develops model and software to better understand hotspots of genetic shuffling.
ByEva Emerson

USC College computational biologist Peter Calabrese has developed a new model to simulate the evolution of so-called recombination hotspots in the genome.

Published this month in the online edition of the Proceedings of the National Academy of Sciences, the mathematical model and its associated software bring much-needed precision to evolutionary investigations of how natural selection acts on individual genes, said Calabrese, a research assistant professor of biological sciences.

And, they may also aid the search for disease-associated genes within the human genome.

The new tools “are more rigorous and less time-consuming than previous, simpler models,” Calabrese said.

Recent interest in genetic recombination hotspots — one possible location where mutations and evolutionary change may occur in an organism — has been fueled partly by the promise of genome association studies, which try to locate the chromosomal regions responsible for genetic diseases. Such studies require an understanding of genetic recombination at a very fine scale.

Genes are packaged in larger structures called chromosomes. Humans have 23 pairs of matching chromosomes, one inherited from each parent. But the sex cells (sperm and egg) of a person each carry only one copy of each chromosome that contains a mix of the genetic material from their mother and father.

The mixing occurs through the process of genetic recombination. During the creation of new sperm or egg cells, the maternal and paternal copies of each chromosome line up and exchange stretches of DNA before dividing. In this way completely new chromosomes can arise, to be passed on to offspring.

This genetic re-arrangement and re-shuffling is a major source of genetic diversity, and so is considered the primary benefit of sexual reproduction. It’s the biological process that makes each individual (save identical twins) unique, even from close relatives such as siblings.

To the surprise of many, recent research indicates that most recombination occurs in small regions of the genome called hotspots. As scientists have explored details of this process, it’s become clear that the majority — approximately 80 percent — of recombination occurs at these narrow bands of activity. Hotspots make up only 10 to 20 percent of the human genome. Rates of recombination at a hotspot may be as much as hundreds to thousands of times that of the surrounding gene sequence. Little is known about hotspot origins or how they work.

Scientists had identified a small number of human recombination hotspots over the last few years. An important advance came in 2005, when an Oxford University team estimated the location of approximately 25,000 potential hotspots on the human genome. In doing so, they assumed the locations of hotspots would not differ greatly between individuals.

However, comparisons of large sets of human genomic data (including data from the International HapMap Project, which created rough maps of the genomes of hundreds of people from all around the globe) have revealed a much more complex picture of hotspots across the human population. And, work by scientists including geneticist Norman Arnheim, a USC Distinguished Professor and the Ester Dornsife Chair in Biological Sciences in the College, and his colleagues, shows that hotspots, like genes themselves, do vary across the population.

Arnheim, one of Calabrese’s collaborators, runs one of a handful of laboratories in the world that uses sperm typing, a painstaking and powerful lab technique, to study genetic recombination in great detail. He and others have shown that some hotspots are heterogeneous — not everyone has the same the hotspots at the same locations.

Calabrese’s model and software take these differences, as well as the chance that the rate of recombination might not be constant over time, into account, where older models did not.

“Pete has made a major contribution to the field through his pioneering computational approach,” Arnheim said.

Moving scientists a step closer to understanding the evolution of hotspots, his work helps explain a number of puzzles confronting scientists.

The first is that while chimpanzees, our closest primate relative, share 99 percent of their genetic code with humans, studies have revealed almost no overlap in hotspot positions in their two genomes.

“The chimp-human comparison really was a surprise,” Calabrese said. “Even with a very similar DNA sequence, the chromosomal position of the chimps’ hotspots appear completely independent of hotspot positions in humans.”

Calabrese’s model fits with and helps to explain this finding. Since the last common ancestor of chimpanzees and humans lived 6 to 7 million years ago, the model predicts that enough time has passed for humans to evolve a distinct set of hotspots.

The model also fits with human evidence. Data from the HapMap Project, for example, shows that African-Americans and Asian-Americans have differences in the locations and frequency of some genomic hotspots, findings backed by other studies in a number of ethnic groups.

Only about 100,000 years have passed since the last major human migration out of Africa, Calabrese writes in his paper, which his model reveals is not enough time for geographically separated populations to have evolved completely unique sets of hotspots.

To Calabrese, one of the most exciting applications of his model is how it might inform the discussion of the “hotspot paradox.”

The paradox considers how hotspot regions in the genome are identified by the cellular machinery that splices and then patches the long strands of DNA during recombination.

Previously, researchers identified at least one short sequence of DNA bases, called a DNA motif, that act like a tag to identify a hotspot location. They showed that this motif was associated with about 10 percent of hotspots in humans.

The paradox arises from the idea that if a tag or sequence motif lies too close to the hotspot location, the high rate of DNA splicing near the hotspot is likely to eventually affect the motif itself. So, scientists expect that within just a relatively few generations, any tagging motif in the DNA sequence would disappear. Yet, scientists know that the hotspots remain.

Calabrese’s model incorporates a distance between the motif and the break point, so there is some probability that the motif will not be lost in a recombination event. Using experimental estimates for this distance, the model shows that motifs will not quickly disappear, thus possibly resolving the paradox.

The model also considers other ways a hotspot may be identified by the cell’s machinery, such as a so-called epigenetic tag — a molecule attached to the DNA sequence but not encoded in the DNA sequence itself.

The simulation software allows scientists to compare DNA sequences to find hotspot patterns in the population, which may be important to understanding disease or evolution.

The simulation computer program, available free for download at Calabrese’s Web site, showed that existing software can reliably detect the most common (present in 50 percent or more of individuals) hotspots in large sets of human genome data, but probably miss the majority of rarer (present in less than 10 percent of individuals) hotspot sites.

Clarifying the location of and how recombination hotspots work is critical to building fundamental understanding of the biological mechanisms that promote genetic variation, Calabrese said. Indirectly, the knowledge also may inform the work of scientists designing better, faster ways to search for genes thought to play a role in human disease.

The project was supported by a grant from the National Human Genome Research Institute’s Center for Excellence in Genomic Sciences at USC.