Nanopore sequencing for robust detection of DNA methylation signatures
Laboratory Genetics and Genomics
-
Primary Categories:
- Laboratory Genetics
-
Secondary Categories:
- Laboratory Genetics
Introduction:
Many neurodevelopmental disorders are caused by genomic variants that disrupt the function of genes involved in reading, writing, or removing epigenomic information (e.g., DNA methylation) across the genome. Over the past decade, DNA methylation profiling using Illumina EPIC DNA methylation microarrays, combined with machine learning, has led to the development of episignatures. Episignatures are characteristic DNA methylation profiles associated with specific genes or diseases and can be used to diagnose disease as well as to clarify variants of uncertain significance (e.g., missense or noncoding variants) in genes regulating the epigenome. Nanopore long-read whole-genome sequencing (LRS), which reads native DNA methylation, is a potential singular assay for detecting genomic variants and asserting causality via episignature classification. We investigated the capacity of LRS to leverage episignatures developed on the EPIC microarray to classify pathogenic variants in several genes including KANSL1, KMT2D, EHMT1, and the Prader-Willi syndrome (PWS) locus.
Methods:
We acquired blood samples from patients harboring pathogenic variants and control individuals. High molecular weight DNA was isolated and sequenced on a nanopore flowcell (P24, R10.4.1). POD5 files were basecalled using Dorado and processed to BAM and MethylBed formats using EPI2ME/wf-human-variation. Data was reformatted with custom scripts and analyzed within EpigenCentral for all available episignatures. For control samples, we compared EPICv2 DNA methylation microarray data beta values to MethylBed methylation frequencies to assert correlation and concordance between platforms. Additionally, a control sample was sequenced multiple times to measure assay variability.
Results:
Analysis of control samples sequenced multiple times showed significant concordance between nanopore sequencing runs, and comparison to the EPIC microarray demonstrated significant correlation between the two independent measures of CpG methylation. Control samples were negative for all tested episignatures. Samples with pathogenic variants sequenced with LRS showed recovery of the pathogenic variants and positive classification for the corresponding episignatures.
Conclusion:
Our study demonstrates that LRS can serve as a single assay to identify disease-associated variants and infer their pathogenicity by utilizing the CpG methylation information inherent to the sequencing data. We also demonstrated the diagnostic utility of EpigenCentral to classify variants in several epigenetic regulators using LRS data. Further evaluation of positive and negative samples in a clinical laboratory setting will facilitate clinical validation and support the adoption of LRS as a single diagnostic assay for patients with neurodevelopmental disorders and of EpigenCentral for variant classification.
Many neurodevelopmental disorders are caused by genomic variants that disrupt the function of genes involved in reading, writing, or removing epigenomic information (e.g., DNA methylation) across the genome. Over the past decade, DNA methylation profiling using Illumina EPIC DNA methylation microarrays, combined with machine learning, has led to the development of episignatures. Episignatures are characteristic DNA methylation profiles associated with specific genes or diseases and can be used to diagnose disease as well as to clarify variants of uncertain significance (e.g., missense or noncoding variants) in genes regulating the epigenome. Nanopore long-read whole-genome sequencing (LRS), which reads native DNA methylation, is a potential singular assay for detecting genomic variants and asserting causality via episignature classification. We investigated the capacity of LRS to leverage episignatures developed on the EPIC microarray to classify pathogenic variants in several genes including KANSL1, KMT2D, EHMT1, and the Prader-Willi syndrome (PWS) locus.
Methods:
We acquired blood samples from patients harboring pathogenic variants and control individuals. High molecular weight DNA was isolated and sequenced on a nanopore flowcell (P24, R10.4.1). POD5 files were basecalled using Dorado and processed to BAM and MethylBed formats using EPI2ME/wf-human-variation. Data was reformatted with custom scripts and analyzed within EpigenCentral for all available episignatures. For control samples, we compared EPICv2 DNA methylation microarray data beta values to MethylBed methylation frequencies to assert correlation and concordance between platforms. Additionally, a control sample was sequenced multiple times to measure assay variability.
Results:
Analysis of control samples sequenced multiple times showed significant concordance between nanopore sequencing runs, and comparison to the EPIC microarray demonstrated significant correlation between the two independent measures of CpG methylation. Control samples were negative for all tested episignatures. Samples with pathogenic variants sequenced with LRS showed recovery of the pathogenic variants and positive classification for the corresponding episignatures.
Conclusion:
Our study demonstrates that LRS can serve as a single assay to identify disease-associated variants and infer their pathogenicity by utilizing the CpG methylation information inherent to the sequencing data. We also demonstrated the diagnostic utility of EpigenCentral to classify variants in several epigenetic regulators using LRS data. Further evaluation of positive and negative samples in a clinical laboratory setting will facilitate clinical validation and support the adoption of LRS as a single diagnostic assay for patients with neurodevelopmental disorders and of EpigenCentral for variant classification.