Comparing automated vs. ACMG/AMP-based variant interpretation for the CDC Tier 1 conditions in a clinicogenomic cohort from US-based health systems
Health Services and Implementation
-
Primary Categories:
- Population Genetics
-
Secondary Categories:
- Population Genetics
Introduction:
Population genetic screening for disease risk is becoming common in healthcare, especially for the CDC Tier 1 (CDCT1) conditions, Hereditary Breast and Ovarian Cancer (HBOC), Familial Hypercholesteremia (FH), and Lynch Syndrome (LS). The variant interpretation method used for these screening efforts remains based on the ACMG/AMP rubric, which was developed for case-by-case diagnostic interpretations, incorporates patient-specific information, and is not scalable for large cohorts. We tested whether an automated approach to variant interpretation, based on well-understood properties of disease-causing variants and clinically-established variant calls, could match the accuracy of the results from ACMG/AMP diagnostic-style interpretations. We re-interpreted small variants in the 8 genes underlying the CDCT1 conditions–BRCA1, BRCA2, APOB, PCSK9, MSH2, MLH1, MSH6, and PMS2–in a clinicogenomic cohort of 161,406 individuals from seven health systems with previous diagnostic-style variant interpretation. We compared the variant interpretations from each method and then validated the genotypic risk groups by looking at clinical phenotypes across participants for relevant conditions.
Methods:
SNVs and indels (<50bps) were called and vetted for quality via clinical-grade exome. Diagnostic-style interpretations for all conditions were performed between 2020-2024 in a clinical testing laboratory. For automated interpretations, each variant was classified to one of four categories – Pathogenic, high scoring VUS, low scoring VUS or Benign – using the following rules: 1) Well-established pathogenic and likely pathogenic variants from ClinVar 2) variants scoring six or more points via data categories from the most up to date VCEPs for each condition (e.g. ENIGMA, InSIGHT, FH) that could be applied unambiguously and automatically across the cohort. Variant interpretation categorizations for each condition were compared against diagnostic-style interpretations. Clinical relevance was analyzed using Cox proportional hazards regression with age at first diagnosis of either breast or ovarian cancer in those with female sex for HBOC; colorectal, endometrial, ovarian, stomach, small bowel, kidney and bladder cancer for LS; and high LDL or first statin prescription for FH.
Results:
1146 (0.7%), 751 (0.46%), and 559 (0.35%) of individuals received a pathogenic variant interpretation for HBOC, FH, and LS via our automated interpretation pipeline. These results closely match the interpretations made through diagnostic-style testing, with high sensitivity and specificity (HBOC:96.01%,99.03%, FH:87.04%,96.45% LS: 99.03%,93.84%). Individuals harboring pathogenic variants from the automated method for each condition exhibited expected trends of earlier onset and higher lifetime risk of relevant diagnoses compared to those without risk variants (HBOC: HR=7.3[6.1-8.8], FH: HR=4.1[3.6-4.7], LS: HR=6.7[4.0-8.9]). Interestingly, we see little evidence for early onset or higher incidence of relevant diagnoses in the groups of individuals harboring VUS, whether higher (HBOC: n=84, HR=0.9[0.3-3.1]; FH:n=9, HR=N/A; LS: n=100, HR=0.6[0.08-4.4]) or lower (HBOC:n=352, HR=1.1[0.6-1.7], FH: n=2060, HR=1.6[1.45-1.85]; LS: n=468, HR=1.1[0.9-1.2])scoring, suggesting low to no collective risk across the variants within these interpretation groups.
Conclusion:
For the genes underlying the CDCT1 conditions, automated interpretation is highly reliable. Using an automated strategy alleviates the operational challenges of applying case by case diagnostic-style interpretation to populations. Further, systematic variant interpretation for all individuals opens up opportunities to study disease risk across each variant interpretation group in clinicogenomic cohorts. We find that those harboring pathogenic variants called by the automated pipeline have diagnosis rates and trends in line with known estimates for CDCT1 conditions. Individuals harboring VUS, even when split into higher and lower scoring groups, show little evidence of increased risk, which supports only pathogenic variants in screening scenarios.
Population genetic screening for disease risk is becoming common in healthcare, especially for the CDC Tier 1 (CDCT1) conditions, Hereditary Breast and Ovarian Cancer (HBOC), Familial Hypercholesteremia (FH), and Lynch Syndrome (LS). The variant interpretation method used for these screening efforts remains based on the ACMG/AMP rubric, which was developed for case-by-case diagnostic interpretations, incorporates patient-specific information, and is not scalable for large cohorts. We tested whether an automated approach to variant interpretation, based on well-understood properties of disease-causing variants and clinically-established variant calls, could match the accuracy of the results from ACMG/AMP diagnostic-style interpretations. We re-interpreted small variants in the 8 genes underlying the CDCT1 conditions–BRCA1, BRCA2, APOB, PCSK9, MSH2, MLH1, MSH6, and PMS2–in a clinicogenomic cohort of 161,406 individuals from seven health systems with previous diagnostic-style variant interpretation. We compared the variant interpretations from each method and then validated the genotypic risk groups by looking at clinical phenotypes across participants for relevant conditions.
Methods:
SNVs and indels (<50bps) were called and vetted for quality via clinical-grade exome. Diagnostic-style interpretations for all conditions were performed between 2020-2024 in a clinical testing laboratory. For automated interpretations, each variant was classified to one of four categories – Pathogenic, high scoring VUS, low scoring VUS or Benign – using the following rules: 1) Well-established pathogenic and likely pathogenic variants from ClinVar 2) variants scoring six or more points via data categories from the most up to date VCEPs for each condition (e.g. ENIGMA, InSIGHT, FH) that could be applied unambiguously and automatically across the cohort. Variant interpretation categorizations for each condition were compared against diagnostic-style interpretations. Clinical relevance was analyzed using Cox proportional hazards regression with age at first diagnosis of either breast or ovarian cancer in those with female sex for HBOC; colorectal, endometrial, ovarian, stomach, small bowel, kidney and bladder cancer for LS; and high LDL or first statin prescription for FH.
Results:
1146 (0.7%), 751 (0.46%), and 559 (0.35%) of individuals received a pathogenic variant interpretation for HBOC, FH, and LS via our automated interpretation pipeline. These results closely match the interpretations made through diagnostic-style testing, with high sensitivity and specificity (HBOC:96.01%,99.03%, FH:87.04%,96.45% LS: 99.03%,93.84%). Individuals harboring pathogenic variants from the automated method for each condition exhibited expected trends of earlier onset and higher lifetime risk of relevant diagnoses compared to those without risk variants (HBOC: HR=7.3[6.1-8.8], FH: HR=4.1[3.6-4.7], LS: HR=6.7[4.0-8.9]). Interestingly, we see little evidence for early onset or higher incidence of relevant diagnoses in the groups of individuals harboring VUS, whether higher (HBOC: n=84, HR=0.9[0.3-3.1]; FH:n=9, HR=N/A; LS: n=100, HR=0.6[0.08-4.4]) or lower (HBOC:n=352, HR=1.1[0.6-1.7], FH: n=2060, HR=1.6[1.45-1.85]; LS: n=468, HR=1.1[0.9-1.2])scoring, suggesting low to no collective risk across the variants within these interpretation groups.
Conclusion:
For the genes underlying the CDCT1 conditions, automated interpretation is highly reliable. Using an automated strategy alleviates the operational challenges of applying case by case diagnostic-style interpretation to populations. Further, systematic variant interpretation for all individuals opens up opportunities to study disease risk across each variant interpretation group in clinicogenomic cohorts. We find that those harboring pathogenic variants called by the automated pipeline have diagnosis rates and trends in line with known estimates for CDCT1 conditions. Individuals harboring VUS, even when split into higher and lower scoring groups, show little evidence of increased risk, which supports only pathogenic variants in screening scenarios.