Skip to main content

Conference Program

Subpage Hero

Loading

Advancing Scientific Annotation: Introducing LEAP for Efficient Literature Review

Laboratory Genetics and Genomics
  • Primary Categories:
    • Laboratory Genetics
  • Secondary Categories:
    • Laboratory Genetics
Introduction:
Literature review is an essential component of gene curation and variant classification. Identifying and reviewing relevant papers ensures that the information used to analyze variants is as up to date and accurate as possible. While there are generally defined sections in scientific and clinical articles, there is no standardized way to describe and format research results. Thus, the information gathered from these review efforts have been largely unstructured and manually summarized ad hoc. Development of a tool to annotate the relevant data from literature in a structured format advances the ability to capture trends and perform comprehensive assessment of gene-disease relationships. The key to success is to annotate the elements of the paper that document the raw data reported along with the linkage of each data element to one another. For example, knowing that a variant is in a paper is essential, but knowing the connection of the variant to a patient with a specific clinical phenotype is significantly more valuable when analyzing the data.

Methods:
We have categorized key entities and their relationships for clinical variant classification into the following domains: genomic (including variant, gene, zygosity, and phase), demographic (encompassing individual, family, and population), phenotypic (covering clinical characteristics, age of onset, age at testing, biomarkers, and disorders), effect (such as functional effect on the gene product and splicing effect), and technology (e.g., detection method). To systematically capture and document these entities from literature sources, we developed the Literature Evidence Annotation Platform (LEAP) application. LEAP is an interactive application designed to facilitate the structured annotation of relevant entities and their relationships, enabling large-scale literature curation within a unified framework.

Results:
We have reviewed a cohort of 750 clinical genetics publications within LEAP and captured more than 15,000 annotations, with the top three annotations being individuals (4,108), variants (3,434), and phenotypes (2,787). Among these entities, we've also established entity relationships including 7,971 individual-phenotype relationships, 5,309 variant-individual relationships, and 529 individual-individual (family) relationships. As a result of these annotation efforts, we are able to automatically extract and summarize genetic variant and clinical evidence for downstream variant classification, gene curation, labels for modeling, and clinical reporting use cases.

Conclusion:
LEAP demonstrates our efforts towards a human-in-the-loop development strategy of advancing our capabilities to organize complex clinical and medical genetics literature. Leveraging our domain expertise, we have defined entities and entity relationships of value for variant classification, and further established a framework and curation platform that systematically collects and summarizes the information. This is particularly critical with increasing opportunities in large language models and machine learning to further supplement and enhance the scientists' curation and insights in the clinical genomics domain.

Agenda

Sponsors