Breakthroughs in genomics show that 'junk' DNA is incredibly important
One of the most important pieces of news to come out of science this year sounds a bit, well, familiar. In March, a team of dozens of researchers announced that they'd published a complete draft of the human genome. For the first time.
But wait: didn't the Human Genome Project make the same claim 20 years ago?
I asked neurogeneticist Erich Jarvis, a member of the team behind this year's announcement, to explain.
“I had the same response as you did before I got involved in genomics in this way,” he said. But in Jarvis's case, the surprise didn't stem from idle curiosity or journalistic skepticism. Jarvis — now a professor at The Rockefeller University and an investigator at Howard Hughes Medical Institute — was an assistant professor at Duke University when the Human Genome Project made its earth-shaking announcement was made in 2003.
How did humans evolve language?
Jarvis has spent his academic career trying to solve some challenging puzzles. “I'm fascinated by some of the more complex things we [humans] can do, like speech, consciousness, and theory of mind. I’m also interested in brain evolution,” he says. “Putting all of that together, I've been interested in understanding the genetics behind our ability to imitate sounds.”
“Imitating sounds” may seem like an underwhelming way to describe human language. Still, that broader definition is vital to scientists like Jarvis because it makes it possible to see what humans have in common with the handful of other animals that are also capable of imitating sounds. That short list includes songbirds, hummingbirds, parrots, dolphins, some whales, and bats.
“Somehow, in the last 65 million years, eight or so lineages evolved this ability to imitate sounds,” Jarvis says. That's a remarkable timeline because those animals aren't especially close evolutionary cousins. Humans and birds set off on different evolutionary paths roughly 300 million years ago. Even more incredibly, the species that can imitate do so in similar ways. They apparently arrived at the same place independently.
Genomes may hold important clues
Twenty years ago, Jarvis and his graduate students at Duke turned to the relatively new science of genomics to hunt for genetic similarities across those species with the rare ability to imitate sounds. But there was a problem. When they started trying to compare the genomes, they realized that many of the long sequences of A, T, C, and G had severe issues.
The human genome wasn’t so bad “because there was a lot of effort put into it,” but “many other species had incorrect sequences or they had pieces of the sequence that were missing,” he says.
It soon became clear that the human genome was missing essential elements as well, especially in areas of the DNA that are highly repetitive — they might read something like CGGCGGCGGCGGCGG. Those typically occur on the telomeres at the end of a chromosome or at the centromeres at the very center of a chromosome. The technology of the day wasn't suited to sequencing those sections (and they seemed dull), so the Human Genome Project simply didn't include them in its definition of the entire human genome.
But that wasn't good enough for researchers like Jarvis.
"I needed a lot of genomes for comparative analysis, including humans. And I ended up getting involved in genomics as a neuroscientist because my students were suffering from having this missing sequence," he says.
Creating high-quality genomes is incredibly difficult
Sequencing a genome isn't as simple as unraveling a chromosome, feeding the whole thing through a machine, and reading the long strand of DNA in its entirety. In reality, several new techniques have been developed since the Human Genome Project. One involves watching DNA polymerase molecules as they copy DNA, using a high-speed camera and microscope, and incorporating different colors of bright dyes, one for each base. Another new technology involves using nanopores to detect and record each nucleotide base — A, T, C, or G — as it passes through.
This happens on a vast scale. The DNA is traveling through millions of pores simultaneously, Jarvis says. But the technology isn’t perfect, so researchers have to use many copies of the same DNA sequence to get reliable results. “You want the same sequence to come through the pores somewhere between 30 to 50 times [so] you can average the information,” Jarvis. “Any one individual sequence has errors in it.”
But that's not the hard part. Once the pieces have been sequenced, researchers have to figure out how the sequences go together. For the whole human genome, that’s about 30 million pieces.
Some stretches of the genome are relatively easy to put together because the sequences are distinctive. Luckily for researchers, 20 years ago, most scientists were studying genes at the time that were located in areas of the genome that were relatively easy to keep track of. By contrast, the highly repetitive stretches of DNA were so boring that some people called them "junk DNA." And that was just as well because sequencing the junk regions — imagine CGGCGGCGGCGGCGG ad nauseam — was beyond what researchers could do at that time.
New technologies and methods made a complete human genome possible
The team of researchers behind the new sequence has a few advantages over the Human Genome Project researchers. For one, modern sequencing machines can eliminate much of the work of stitching the pieces together because they can sequence longer segments of chopped-up DNA. Researchers have also figured out how to use genetic information from an unusual pregnancy complication called a hydatidiform mole as a sample that only contains DNA from one parent, simplifying the entire process.
They also know that "junk" DNA is actually essential. Some of those C- and G-rich regions include what researchers call promoters. It turns out that promoters regulate which genes in which cells are producing the proteins.
Those regions regulate "how much of a protein you make in the brain, how much of another protein is made in the liver," Jarvis says. They determine "whether a gene is turned off or turned on at a certain time."
That's a big deal.
"If you make too much of a protein, that can cause the disease. If you make too little, that can cause disease," he says.
Changes to promoters can also drive evolution. "If you [start to] make a lot of a protein in the brain that wasn't made before, you can cause a new connection to form a new trait, like speech," Jarvis says.
"I wasn't sure if I [would] need a complete genome for my vocal learning and language studies, but once you start to discover the missing 8 percent [from the Human Genome Project sequence] and the missing whatever percent from the other [species' genomes] that we were comparing, you start to realize that in that dark matter are some nuggets that seem to be correlated with traits of interest for a number of people [studying all different kinds of questions]", he says.
Those more accurate genomes have also helped him answer some questions.
The answers to Jarvis's big questions are finally in sight
After comparing the eight lineages that can imitate sounds, “we’ve found that the regulation of hundreds of genes that control speech has occurred in a similar way," Jarvis says. “In these vocal learning birds and humans, in a similar brain circuit has evolved,” he says.
Jarvis thinks “the human speech circuit has evolved out of a surrounding brain pathway that controls learning how to move.” That pathway developed excellent control over the larynx, tongue, and jaw in humans. Similar circuits give birds precise control over their beak.
But there are still some big questions. What exact genetic changes caused the shift? Why did this happen multiple times in distantly related species? Was it the same genes? What evolutionary forces caused the changes?
Jarvis and his colleagues need time — and genomes — to figure it out.