Advanced computational approaches for understanding allele-specific biology of complex diseases
From Isabelle Hanlon
From Isabelle Hanlon
Reconstructing the complete phased sequences of every chromosome copy in human and non-human species are important for medical, population and comparative genetics. The unprecedented advancements in sequencing technologies have opened up new avenues to reconstruct these phased sequences that would enable a deeper understanding of molecular, cellular and developmental processes underlying complex diseases. Despite these interesting sequencing innovations, the highly polymorphic and gene-dense regions human leukocyte antigen (HLA) are not yet fully phased in the reference genome. The reference genome still contains gaps in multi-megabase repetitive regions, and thus annotating novel expression and methylation results are incomplete and inaccurate, that affect the interpretation of molecular genetics and epigenetics of diseases. There is a pressing need for a streamlined, production-level, easy-to-use computational algorithmic approaches that can reconstruct high-quality chromosome-scale phased sequences, and that can be applied to hundreds of human genomes.
In this talk, first, I will present a combinational optimization formulation and solution to the haplotype reconstruction problem that leverages new long-range Strand-specific technology and long reads to generate chromosome-scale phasing. Second, I present an efficient graph-based algorithm to perform accurate haplotype-resolved assembly of human individuals. The advantage of graphs is that they enable a unique compact representation of massive datasets for their integration on the common genome sequence space. This method takes advantage of new long accurate data type (PacBio HiFi) and long-range Hi-C data. We for the first time can generate accurate chromosome-scale phased assemblies with base-level-accuracy of Q50 and continuity of 25Mb within 24 hours per sample, therefore, setting up a milestone in the genomic community. Third, I will present the generalized computational approach that has the advantage to work on any type of sequencing data types for different number haplotypes and repeat variation. Finally, I will present the importance of haplotype-resolved assemblies to various medical applications.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336, VAT Registration Number GB 592 9507 00, and is acknowledged by the UK authorities as a “Recognised body” which has been granted degree awarding powers.
Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2021 and may only be used in accordance with the terms of the licence.