REVIEW ARTICLE


The Challenge of Genome Sequence Assembly



Andrew Collins*
Genetic Epidemiology and Bioinformatics Research Group, Faculty of Medicine, Duthie Building (MP 808), University of Southampton, Southampton General Hospital, Southampton, SO16 6YD, UK


Article Metrics

CrossRef Citations:
4
Total Statistics:

Full-Text HTML Views: 5093
Abstract HTML Views: 1353
PDF Downloads: 685
ePub Downloads: 439
Total Views/Downloads: 7570
Unique Statistics:

Full-Text HTML Views: 3083
Abstract HTML Views: 892
PDF Downloads: 517
ePub Downloads: 312
Total Views/Downloads: 4804



Creative Commons License
© 2018 Andrew Collins.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: (https://creativecommons.org/licenses/by/4.0/legalcode). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to the author at the Genetic Epidemiology and Bioinformatics Research Group, Faculty of Medicine, Duthie Building (MP 808), University of Southampton, Southampton General Hospital, Southampton, SO16 6YD, UK; Tel: 44(0)2381206939; E-mail: arc@soton.ac.uk


Abstract

Background:

Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs.

Objective:

Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs.

Results and Conclusion:

A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

Keywords: Chromosome assembly, Cross-species synteny, Earth BioGenome Project, Linkage disequilibrium map, Sequence contigs, Whole genome sequencing .