Skip to main content

Conference Program

Subpage Hero

Loading

A Virtual Registry of 9K Leigh Syndrome and Primary Mitochondrial Disease Cases Constructed through Semi-Automated Literature Mining and Expert Curation 

Laboratory Genetics and Genomics
  • Primary Categories:
    • Laboratory Genetics
  • Secondary Categories:
    • Laboratory Genetics
Introduction:
Primary mitochondrial disease (PMD) has a low prevalence of 1 in 4,300 people and the most common PMD is Leigh Syndrome (LS), with a prevalence of 1 in 34,000 people. PMD community registry creation requires intensive collaborative effort thus most registries have limited local or national cases. For example, 1,555 participants were enrolled in the North American Mitochondrial Disease Consortium (NAMDC) Registry as of 2020.

Methods:
Method: To enhance our knowledge of PMD, we built a PMD Virtual Registry within MSeqDR using both semi-automated literature mining and expert review. Specifically, we developed a semi-automated pseudo-case curation platform, hosted at MSeqDR.org (https://mseqdr.org/virtualregistry.php), to aid in the capture, extraction, and standardization of tabular format case-level data along with metadata from publications. In parallel, the NIH-funded ClinGen Mitochondrial Diseases Gene Curation Expert Panel (HD-093483, https://www.clinicalgenome.org/affiliation/40027/ ) manually curated ~350 deeply-phenotyped cases associated with 113 genes from literature to evaluate the causative gene associations for Leigh Syndrome spectrum (LSS) disorder.

Results:
Results: As of November 2024, the MSeqDR PMD Virtual Registry collection contains >9000 de-identified PMD pseudo-cases, including ~ 2000 LS/LLS cases (>1700 from semiautomatic literature mining, plus 350 from U24 project expert curation) and ~3,500 other PMD cases which include 170 MELAS, 225 CPEO, 75 LHON. By courtesy of the MitoPhen team, 3,600 virtual cases were downloaded from MitoPhen.org and transformed for presentation within the Virtual Registry.

The heterogeneous clinical and demographic data were mapped to over 100 “standard” clinical, biochemical, and genetic feature terms including Human Phenotype Ontology (HPO) and OMIM disease terms. Through data transformation, inheritance modes for nearly 6,000 cases are inferred, including 3,699 cases with mitochondrial (HP:0001427), 498 for autosomal recessive (HP:0000007), 314 with autosomal dominant (HP:0000006), and 104 with X-linked (HP:0001417) inheritance.

While the HPO mapping is partially completed, there are already 4397 cases having two or more HPO terms mapped. The most common phenotypes are skeletal muscle atrophy (HP:0003202), increased serum lactate (HP:0002151), global developmental delay ( HP:0001263), intellectual disability (HP:0001249), and abnormal basal ganglia morphology (HP:0002134). The usually ACMG-guideline-bases pathogenicity assessment between clinical phenotypes, diseases, and causative variants is available for most cases, where 6100 cases carry pathogenicity-assessed mutations, including 3349 cases with mtDNA variants.

The Web front end includes a Case Browser for queries using single or composite filters constructed from atomized and standardized keywords such as OMIM disease, (causative) genes and variants, phenotypes and HPO terms, inheritance mode, zygosity, consanguinity, sex, age at onset and death, ethnicity. Matching cases are hyperlinked to the single case full report and curation pages, which present all case data elements in both standardized and original terms. Registered MSeqDR users can curate phenotype information in a crowdsource fashion to collaboratively build this community resource. Each user’s curations are saved and tracked, without overwriting other contributors’ input. This web-based Virtual Registry platform also supports additional Virtual Registry creation by eligible users from de-identified pseudo cases.

Conclusion:
The PMD Virtual Registry of >9000 de-identified cases (>2000 Leigh/LSS) exceeds most ad hoc Leigh/LSS / PMD case registries.

The platform can be used to build custom virtual case registries for non-PMD diseases from tabular data of case to disease/phenotype, gene, and variant associations

This Virtual Registry supported the ClinGen Mito-VCEP and GCEP expert panels in the curation of variants and genes following ACMG guidelines for mitochondrial disease and pathogenicity.

Future work will curate and refine the clinical phenotype data mapping and standardization into disease and phenotype ontologies to improve data interoperability.

Novel GPT-style AI tools will be introduced to improve the speed and quality of clinical data capture, transformation, and reporting.

Agenda

Sponsors