Leveraging a Genotype-First Approach in All of Us Cohort to Describe Prevalence and Phenotypic Spectrum of Rare Genetic Disorders
Clinical Genetics and Therapeutics
-
Primary Categories:
- Genomic Medicine
-
Secondary Categories:
- Genomic Medicine
Introduction:
Estimates of prevalence and understanding of phenotypic presentation of rare Mendelian disorders are often biased due to symptoms-based ascertainment of participants. A genotype-first approach reverses the typical phenotype-first paradigm of clinical practice and reduces the effect of ascertainment bias, thereby allowing researchers to agnostically evaluate genotype-phenotype associations. Implementing a genotype-first approach in large electronic health record (EHR) biobanks, like the NIH’s All of Us Research Program, facilitates fast and inexpensive population-scale exploration. We hypothesize that this approach will improve understanding of the prevalence and phenotypic spectrum of rare Mendelian disorders.
Methods:
We developed a pipeline to efficiently screen for known and predicted pathogenic variants and explore genotype-phenotype associations. This pipeline filters variants based on ClinVar classifications of pathogenic/likely pathogenic, predicted loss of function (pLOF), and in silico predictors (REVEL and SpliceAI). The pipeline requires minimal user input, apart from gene coordinates and optional adjustments to hyperparameters (e.g., minor allele frequency (MAF) threshold and REVEL and SpliceAI cutoffs). The output – unique participant IDs and variants – allows users to review clinical data of participants carrying these variants and run phenome-wide association studies (PheWAS) to assess for cohort-wide case-control differences. The pipeline incorporates Hail, an open-source library for genomic analysis, and is integrated with the All of Us Researcher Workbench. As a proof of principle, we tested this approach on autosomal dominant polycystic kidney disease (ADPKD, PKD1 and PDK2) and X-linked recessive ornithine transcarbamylase deficiency (OTC deficiency, OTC).
Results:
In PKD1, 43 known or predicted pathogenic variants were identified in 197 heterozygous All of Us participants. In PKD2, 35 variants were identified in 123 heterozygous participants. We estimated the prevalence of ADPKD in All of Us as ~1:800-1:1,300, which is consistent with published numbers. In PheWAS, Cystic kidney disease was the strongest association with variants in both PKD1 (OR = 59.74 (40.61-87.88)) and PKD2 (OR = 55.74 (34.26-90.65)), followed by codes for congenital anomalies of urinary system and genitourinary congenital anomalies. Variants in PKD2 were also significantly associated with elevated risk of pancreatic cancer (OR = 14.28 (4.45-45.88)), a novel finding in large cohort studies. In OTC, 9 variants were identified in 37 participants. These variants were primarily hypomorphic and late-onset, according to previous case reports and functional studies. A PheWAS of OTC variants showed increased risk for disorders of urea cycle metabolism (OR = 11.21 (3.88-32.43)) and disorders of amino acid metabolism (OR = 8.81 (3.05-25.43)). Heterozygous females displayed phenotypes characteristic of the disorder, including diagnostic codes for pregnancy and perinatal complications, psychiatric disorders, and migraines.
Conclusion:
Our findings indicate that the All of Us cohort may be sufficiently powered to provide estimates of prevalence and expand the phenotypic spectrum of rare Mendelian disorders using a genotype-first approach. Implementing an efficient and streamlined, scalable genotype-first pipeline in All of Us can be generalizable to any gene of interest, particularly for other autosomal dominant and X-linked conditions.
Estimates of prevalence and understanding of phenotypic presentation of rare Mendelian disorders are often biased due to symptoms-based ascertainment of participants. A genotype-first approach reverses the typical phenotype-first paradigm of clinical practice and reduces the effect of ascertainment bias, thereby allowing researchers to agnostically evaluate genotype-phenotype associations. Implementing a genotype-first approach in large electronic health record (EHR) biobanks, like the NIH’s All of Us Research Program, facilitates fast and inexpensive population-scale exploration. We hypothesize that this approach will improve understanding of the prevalence and phenotypic spectrum of rare Mendelian disorders.
Methods:
We developed a pipeline to efficiently screen for known and predicted pathogenic variants and explore genotype-phenotype associations. This pipeline filters variants based on ClinVar classifications of pathogenic/likely pathogenic, predicted loss of function (pLOF), and in silico predictors (REVEL and SpliceAI). The pipeline requires minimal user input, apart from gene coordinates and optional adjustments to hyperparameters (e.g., minor allele frequency (MAF) threshold and REVEL and SpliceAI cutoffs). The output – unique participant IDs and variants – allows users to review clinical data of participants carrying these variants and run phenome-wide association studies (PheWAS) to assess for cohort-wide case-control differences. The pipeline incorporates Hail, an open-source library for genomic analysis, and is integrated with the All of Us Researcher Workbench. As a proof of principle, we tested this approach on autosomal dominant polycystic kidney disease (ADPKD, PKD1 and PDK2) and X-linked recessive ornithine transcarbamylase deficiency (OTC deficiency, OTC).
Results:
In PKD1, 43 known or predicted pathogenic variants were identified in 197 heterozygous All of Us participants. In PKD2, 35 variants were identified in 123 heterozygous participants. We estimated the prevalence of ADPKD in All of Us as ~1:800-1:1,300, which is consistent with published numbers. In PheWAS, Cystic kidney disease was the strongest association with variants in both PKD1 (OR = 59.74 (40.61-87.88)) and PKD2 (OR = 55.74 (34.26-90.65)), followed by codes for congenital anomalies of urinary system and genitourinary congenital anomalies. Variants in PKD2 were also significantly associated with elevated risk of pancreatic cancer (OR = 14.28 (4.45-45.88)), a novel finding in large cohort studies. In OTC, 9 variants were identified in 37 participants. These variants were primarily hypomorphic and late-onset, according to previous case reports and functional studies. A PheWAS of OTC variants showed increased risk for disorders of urea cycle metabolism (OR = 11.21 (3.88-32.43)) and disorders of amino acid metabolism (OR = 8.81 (3.05-25.43)). Heterozygous females displayed phenotypes characteristic of the disorder, including diagnostic codes for pregnancy and perinatal complications, psychiatric disorders, and migraines.
Conclusion:
Our findings indicate that the All of Us cohort may be sufficiently powered to provide estimates of prevalence and expand the phenotypic spectrum of rare Mendelian disorders using a genotype-first approach. Implementing an efficient and streamlined, scalable genotype-first pipeline in All of Us can be generalizable to any gene of interest, particularly for other autosomal dominant and X-linked conditions.