First complete sequence of the human genome published

4th Apr 2022

First complete sequence of the human genome published

After more than 20 years, the final pieces of the human genome puzzle are put in place.

On the 1st of October 1990, humanity embarked on a voyage uncovering the secrets of the human genome. This expedition deep into the darkest corners of our existence led to unravelling the blueprint that nature uses to create a new member of the Homo Sapiens species.



For the first 13 years, a team of international scientists was able to map most protein-coding genes in our bodies, leaving out a small portion of "junk DNA" undiscovered, consisting of long stretches of highly repetitive sequences. This "insignificant" 9% remaining turned out to be roughly 200 million base pairs (bp) long, making the complete genome a mind-boggling 3.005 billion bp long.



Until recently, the sequences of the short arms of chromosomes 13, 14, 15, 21, and 22 were a black box, but it all changed when Telomere-2-Telomere Consortium came along with new sequencing methods, allowing them to not only discover a whole chromosome's worth of new base-pairs but also to correct some quite significant mistakes made years ago.



“Genomes that we generate in the lab can have many errors in them. If even just one or a few base pairs are wrong, that can have big consequences for the overall accuracy of the genomic sequence. Stretches of identical base pairs, such as AAA, are hard for existing technology to assess. There are often errors in those sequences, even now. Merfin corrects them.” says Giulio Formenti, a postdoc researcher developing Merfin in Jarvis’ lab.



Merfin is a k-mer based variant-filtering algorithm, which greatly improved the genotyping accuracy and also polishing of the genome assembly. It can easily evaluate each variant based on the expected k-mer multiplicity in the reads, without being affected by the quality of the read alignment or the variant caller’s internal score. With the introduction of this new algorithm, the precision of genotyped calls in several benchmarks has been greatly increased alongside reducing frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads.



This grand achievement brings us closer to fully understanding how different diseases including cancer work. Will this mean that we are a step closer to better or even personalised medicine? Maybe, but there is still a lot to learn before we reach that point. What we can be sure about, is that this project and others alike have greatly impacted research and medical practice, bringing different fields together to reach a common goal. 


References: 

Filling the gaps


The human genome is, at long last, complete


Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation


The human genome project: big science transforms biology and medicine