After about 24 years in the hands of Human Genome Project contributors, the human reference genome project, which aims to map our DNA, was moved to Genome Reference Consortium in 2004. Recently, for the first time, the Consortium's latest issue of GRCh38 included a fully assembled chromosome X from end-to-end, or more accurately, telomere-to-telomere.
While the genome assembly we conduct makes each issue of the reference human genome more and more accurate, scanning a chromosome from end to end proves a challenge. There are hundreds of gaps waiting to be meticulously analyzed.
A new method for assembly: nanopore sequencing
What makes this process especially challenging is the repetitive sections that span millions of pairs that make our methods error-prone. Because, the methods scientists have relied on reading short pieces of DNA and stitching them together, but short snippet length keeps us from accurately assembling longer parts.
Ideally, researchers also want to stitch together full dovetails as shown below, this minimizes the error-risk. But with the advent of a new sequencing method, it is safe to say that this shortcoming has been addressed.
In a press release, senior author Adam Phillippy, from the National Human Genome Research Institute (NHGRI), said. “Imagine having to reconstruct a jigsaw puzzle. If you are working with smaller pieces, each contains less context for figuring out where it came from, especially in parts of the puzzle without any unique clues, like a blue sky.
The same is true for sequencing the human genome. Until now, the pieces were too small, and there was no way to put the hardest parts of the genome puzzle together.”
But thanks to a new sequencing method detailed in their research as nanopore sequencing, this is no longer the case. In this method, DNA molecules are fed through a little hole and the changes in electric current around the hole are measured. This enables a longer uninterrupted read.
But a notorious region in chromosomes that goes by the name centromere still proves a challenge with its highly repetitive structure. In this instance it was roughly 3.1 megabases long (3.1 million pairs), scientists thankfully found distinct sequences in this part to anchor their data and closed all 29 gaps in this region.
To double-check their findings the team also ran two other industry-standard sequencing methods and compared the results with their findings. The results overlapped more than 99.9%.
The take from this research
This research may exponentially improve how we approach biomedical research, as lead author Karen Miga, from the UC Santa Cruz Genomics Institute, puts it, “We’re starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we’ve been missing a lot of information that could be important to understanding human biology and disease.”
Also, while the completion of the X chromosome has the spotlight, the researchers also announced that they're planning to go through each and every chromosome and put together a complete human genome by the end of 2020.