Skip to main content

Conference Program

Subpage Hero

Loading

Excavator: Removing the Noise in Exome Analysis Using Machine Learning

Laboratory Genetics and Genomics
  • Primary Categories:
    • Laboratory Genetics
  • Secondary Categories:
    • Laboratory Genetics
Introduction:
Historically, exome sequencing (ES) analysis has been a manual process, with hundreds of variants requiring individual review. As such, the triage, curation, and reporting of ES results is a labor intensive, subjective process that causes a significant bottleneck. With data showing the utility of ES as a first-tier test there is a need for expedited review processes to decrease turnaround times (TAT). Algorithms and machine learning (ML) based artificial intelligence have been used to assist in variant identification in phenotypic driven testing such as ES. However, these programs are largely not freely available or use inputs not accessible to the public.

In this study, we describe the creation and validation of PreventionGenetics’ internally developed ML algorithm known as Excavator used to assist in variant prioritization.

Methods:
Model training and testing was performed on a dataset of 169,817,734 categorized variants from 15,770 ES tests performed at PreventionGenetics since 2022. Inputs for Excavator’s training include features like PreventionGenetics’ internal allele frequencies and variant interpretations along with variant data from sources such as gnomAD, ClinVar, Human Gene Mutation Database, Exomiser, and Online Mendelian Inheritance in Man. Model training was performed using 10-fold cross validation after randomly splitting the data 70/30 into training and testing sets.

Three types of ensemble machine learning algorithms were considered for training the model: adaptive boosting, random forests, and histogram gradient boosted trees. For each algorithm, models were trained using different combinations of hyperparameters with the goal of maximizing recall. A total of 44 models were trained during cross validation, and those with average sensitivity greater than 96% and average false positive rate (FPR) below 0.3% were trained using the full training set. A total of 15 models met these criteria. The model with the highest recall on the test set and with performance consistent with what was observed during cross validation (sensitivity and FPR meeting cross validation thresholds and within one standard deviation of mean cross validation value) was selected as the final model. The final model, trained with an adaptive boosting algorithm, achieved a recall of 98.4% on the testing set with a FPR of 0.2%.

Results:
After the final Excavator model was established, it was run in parallel to PreventionGenetics’ standard exome review for ~6 months to determine its performance. A total of 613 diagnostic variants were reported. Excavator flagged 591 variants correctly but failed to flag 22 variants. However, 12 of these 22 variants were common risk variants in APOL1 associated with renal disease and were excluded from this analysis as they were not considered diagnostic. Excluding the risk variants, Excavator’s sensitivity was 98.3% while flagging just 11.46 variants per case. The 10 variants that Excavator failed to identify were either associated with autosomal recessive disease (N=8) or only explained minor features of the patient (N=2). In 6 of the 8 cases associated with autosomal recessive disease not flagged by Excavator, the likely pathogenic/pathogenic variant was flagged while the variant of uncertain significance was not.

Conclusion:
Excavator demonstrates the ability to detect diagnostic variants with high sensitivity and minimal FPR and outperforms many existing tools. A consistent weakness included two or more variants in genes associated with autosomal recessive disease, which was not a predictor included in the model due to technical challenges. Future iterations taking this into account should further increase recall. Excavator’s low flagged variant count per case shows promise for this tool to be used at scale. Although Excavator was designed for primary diagnosis it could potentially be used to aid automated reanalysis of unsolved cases.

Agenda

Sponsors