A huge team of scientists finally finishes decoding the last 8% of the human genome
A team of 99 researchers from across the globe published a complete draft of the human genome today in the academic journal Science.
The breakthrough comes nearly twenty years after the Human Genome Project made a similar claim by ignoring sections of DNA that were then believed to be unimportant.
“The sequence means that we've entered the beginning of a new frontier,” neurogeneticist Erich Jarvis, who is a co-author on the new paper, tells IE.
“With complete genomes, I can start to ask new questions of biology that were not possible before,” he says.
This breakthrough was decades in the making
The new dataset amounts to an extraordinarily long sequence of just four letters — A, T, C, and G — that represent the four molecules that encode a person’s genes.
Unlike the genome unveiled in 2003, today’s announcement includes highly repetitive (but crucially important) regions of the genome that were too difficult for researchers to parse in the 1990s and early 2000s. The full genome is more than three billion letters long. That means that if it were printed in 12-point font, your genetic code would stretch all the way from Houston to Boston.
The breakthrough was possible because scientists have better technology and a more refined understanding of genomics than they did two decades ago. It also took a lot of collaboration.
The researchers refined older techniques
Almost every cell in the body contains a person’s entire genome, recorded in the exact molecular structure of their DNA. The molecules represented by A, T, C, and G are arranged in a sequence along the length of the DNA. If it were unraveled and stretched out, the DNA contained in a single cell would be roughly eight feet long. Of course, that’s not how DNA exists inside our cells. Evolution has led living things to discover all kinds of innovative ways to fold DNA into a set of packages so small they easily fit inside a cell’s nucleus.
Researchers read DNA by chopping it up into pieces that are small enough for existing technology to manage. One reason researchers were able to decipher the complete genome now is that newer machines are able to read longer pieces than they ever have before. Under ideal circumstances, a cutting-edge machine can read DNA fragments that are a few hundred thousand base pairs long.
“The DNA is physically going through this pore,” Jarvis says. “As it passes through, the pore reads off the different base pairs.”
Researchers don’t read just one copy of DNA. They cultured special cells to produce dozens of identical copies. These are chopped into fragments and read simultaneously.
“Imagine your phone is a very thin wafer, filled with millions of pores, and you have the DNA going through all the pores at the same time… you want the same sequence to come through somewhere between 30 to 50 times,” he says. “Then you want to average the information.”
Fewer errors helped the researchers assemble the complete genome
That redundancy makes it possible to find and fix errors. Not only do errors present a hurdle to scientists who will use this dataset in their research down the line. Errors also add an additional layer of difficulty for the researchers tasked with reassembling the fragments into a complete genome.
Commercially available algorithms are able to get roughly 97 or 98 percent of the sequence correct, Jarvis says, “but the remaining two percent still has errors in it.” Those errors present a tremendous challenge when they occur in highly repetitive and “hard-to-sequence regions where it's hard to sort out one copy from another copy.”
A member of Jarvis's lab, Giulio Formenti, developed an algorithm that serves as “the last check of sequence accuracy… to clean up the last remaining two percent,” Jarvis says.
That contribution — among many others from researchers across the world — made it possible for these researchers to fill in the missing sections of the genome.
The researchers plan to sequence a lot more genomes
But this is hardly the end of the effort to decode and understand the human genome and its impact on organisms. Bioinformatician Adam Phillippy, a co-leader of the project, says “[t]ruly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means.”
Having one complete genome puts us a big step closer to the kind of personalized medicine that researchers have been talking about for decades. "In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare,” Phillippy says.
The new genome is also an important step for researchers who need a complete genome for other reasons. Jarvis is co-leading an effort to sequence hundreds of complete genomes from people around the world.
“The goal is to create as complete a human genome as possible, representing much more of human diversity,” he says.