Skip to main content

Conference Program

Subpage Hero

Loading

Discovery of Rare Disease variants specific to populations using genomic data from diverse populations in All of Us 

Clinical Genetics and Therapeutics
  • Primary Categories:
    • Population Genetics
  • Secondary Categories:
    • Population Genetics
Introduction:
Failure to include diverse populations and the conflation of non-biological factors such as race and ethnicity with genetic ancestry are notorious problems in genetic medicine. Given that disease susceptibility and manifestation vary by ancestry, it is essential that individuals from diverse ancestral backgrounds are included in research studies to expand the benefits of genomic medicine to all. Here, we aimed to identify populations inferred from genomic data and understand variation in the genetic architecture of these disease risks among populations by comparing the transportability of Polygenic risk scores (PRS) and discovering population-specific disease-causing variants. 

Methods:
To identify groups of individuals with increased 'genetic similarity' only relying on genetic information, we first conducted Identity-by-descent (IBD) sharing network analysis on genomic data of 245,388 individuals in the All of Us Research Program (AoU). Related individuals with kinship coefficient >0.1 were removed prior to analyses. IBD segments, genomic segments shared between a pair of individuals, inherited from the common ancestor, were detected for all pairs of individuals to construct a network using total amount of IBD sharing. This enables us to detect groups of individuals who shared recent ancestry, especially groups with high consanguinity or with strong founder effect, who are likely to have founder variants for rare diseases.  To investigate different genetic architecture among populations, we estimated PRS across all populations with more than 100 individuals and examined transportability of PRS among them. Finally, we examined population-specific enrichments of 25,763 variants classified as pathogenic or likely pathogenic in ClinVar to find candidate disease-causing genetic variants for genetic screening, particularly in understudied groups with high consanguinity or a strong founder effect. The enrichments were tested with Fisher’s Exact test and the adjusted P values were obtained by Benjamini-Hochberg methods.  Clinical significance of the variants was examined by gold star ratings in ClinVar and association analyses across 3331 phenotypes in electronic health records (EHR) in AoU: variants with minor allele frequency (MAF) >=0.01were annotated with the associated phenotypes with p<5e-08 in ancestry specific GWAS and variants with MAF <0.01 were annotated with those with p<0.05/3331 in rare variant association studies using SAIGE-GENE. The level of relatedness within population was assessed by estimating normalized IBD sharing within populations. 

Results:
We identified 58 groups of individuals (n>=20) with increased 'genetic similarity' based on shared ancestry on a much finer scale than continental ancestry. We succeeded in identifying different admixed American communities (Puerto Ricans, Mexicans, Dominicans and Cubans) and populations with the elevated levels of relatedness within groups (i.e. several Jewish groups, Middle Eastern groups, South Asian groups and the Garifuna ). Variability of PGS portability was observed even within the same continental ancestries, suggesting that fine-scale population structure provides more accurate insight into genetic risk.  We also detected 11,055 significant enrichments (9,021 unique variants) of likely disease-causing variants including known founder variants in well-studied populations such as Ashkenazi Jewish and Puerto Ricans. Of these, 1256 exceeded the Tier two frequency thresholds (MAF > 0.005) with significant clinical implication. A variant on MYBPC3, likely associated with cardiomyopathy, was observed in high frequency (MAF=0.02) in Garifua group, who showed the highest risk of cardiomyopathy among all studied populations.  

Conclusion:
We demonstrated genetic diversity of the populations in the US and different disease risks among them. We identified 9,021 population or continental ancestry specific disease-causing variants, which help us to expand benefit of clinical genetics by implementing genetic screening in understudied populations in the US. 

Agenda

Sponsors