Skip to main content

Conference Program

Subpage Hero

Loading

Utilizing long-read sequencing technologies to enable discovery of disease genes associated with complex neurological phenotypes

Education and Research Strategies
  • Primary Categories:
    • Genomic Medicine
  • Secondary Categories:
    • Genomic Medicine
Introduction:
Diagnosing rare diseases remains a significant challenge, with many conditions remaining unexplained despite prolonged diagnostic odysseys involving routine genome sequencing techniques. There are approximately 7,000 rare diseases, with around 80% having a genetic origin. Although individually rare, these diseases collectively affect 1 in 10 people in the US, representing a growing public health issue. Complex neurological phenotypes, such as ataxia, pose particular diagnostic and research difficulties due to their clinical and genetic heterogeneity. Ataxia, characterized by impaired coordination, can arise from a variety of etiologies, including genetic factors. Despite the increasing availability of advanced genetic testing, a substantial proportion of patients remain undiagnosed, highlighting critical gaps in the ability to detect and interpret complex genomic changes. Emerging evidence suggests that these unexplained phenotypes are often associated with repeat expansions, structural variants, and other genetic alterations that are difficult to identify using traditional sequencing methods. Consequently, we are leveraging cutting-edge long-read sequencing technologies to uncover the molecular underpinnings of these rare neurological diseases.

Methods:
High molecular weight DNA was extracted from blood (buffy coat) using the Nanobind CCB Big DNA kit (Circulomics), and its concentration was measured using the Qubit fluorometry assay and NanoDrop 8000 spectrophotometer. DNA was sheared with the MegaRuptor 3 system, ligated with SMRTbell adaptors, and libraries were prepared for loading onto the PacBio Sequel II System at the Mayo Clinic Genome Analysis Core, utilizing 2 SMRT cells per subject to ensure optimal coverage. Analysis of complex neurological phenotypes was performed using a custom bioinformatics workflow that employed HiFi reads (>99% accuracy) aligned to the GRCh38 reference genome. Small nucleotide variants (SNVs) and indels were called with DeepVariant, while structural variants were identified using pbsv, Sniffles, and TRGT. Variants were annotated using snpEff and AnnotSV. Tandem repeat regions, as well as a catalog of over 50 pathogenic repeats linked to genetic disorders, were analyzed for repeat expansions, with results visualized using the Integrative Genomics Viewer (IGV).

Results:
Our cohort consists [JF1] of high-priority cases (n =13) with adult-onset ataxia, a positive family history, and cerebellar atrophy observed on MRI. With long-read sequencing, we identified pathogenic events in RFC1 in 2 probands, both of whom had clinical diagnoses of cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS). CANVAS is known to be cause by either biallelic pentanucleotide repeat expansions in RFC1 or a repeat expansion in trans with a truncating variant.  In proband 1 (age of onset: 35 years), we identified a biallelic pentanucleotide repeat expansion (AAGGG) in the RFC1 gene, with 448 and 991 repeats. In proband 2 (age of onset: 22 years), we detected an AAGGG repeat expansion with 1511 repeats in trans with a truncating variant, c.1267C>T(p.Arg423Ter).

Conclusion:
In our cohort of 13 patients with complex neurological phenotypes, we successfully diagnosed 2 patients using this approach with the intronic AAGGG repeat expansion in the RFC1 gene (reference region of 11 AAAAG repeats). This tandem repeat was first reported to be pathogenic when expanded in 2020 and is now recognized as a frequent cause of late-onset ataxia, particularly when sensory neuropathy and vestibular areflexia are present. The smallest pathogenic repeat reported is 250 repeats. The RFC1 locus is highly polymorphic, containing multiple motifs; however, AAAGG (size-related) and biallelic AAGGG repeats are the disease-causing variants. Thus long-read sequencing enables the detection and analysis of these complex repeat expansions, facilitating accurate diagnoses, novel gene discoveries, and a deeper understanding of repeat-mediated diseases offering hope to patients with previously undiagnosed or misdiagnosed genetic conditions.

Agenda

Sponsors