Skip to main content

Conference Program

Subpage Hero

Loading

Comparison of Electronic Health Record Condition Codes and Personal Health History Survey Data in the All of Us Research Program 

Education and Research Strategies
  • Primary Categories:
    • Population Genetics
  • Secondary Categories:
    • Population Genetics
Introduction:
Electronic Health Records (EHRs) provide a rich source of data for researchers, but complexity, contradictory findings, and missingness pose challenges and may introduce bias when studies utilize them to identify disease presence.  Incorporating EHR components other than billing codes, such as medication exposures or laboratory values, bolsters phenotyping performance, but it is unclear how the addition of survey data impacts performance.  The All of Us Research Program (All of Us) provides a diverse, trans-ancestry dataset of EHR-linked genomic information from over 245,000 individuals across the United States.  This project aims to evaluate the performance of survey data phenotyping in genetic association studies compared to EHR billing code phenotyping alone.

 

Methods:
The All of Us “Personal and Family Health History” (PFHH) survey contains 152 unique medical conditions. We mapped 129 of these to one or more phecodes using the Phecode 1.2 Map.  Phecodes are ICD code aggregates that represent clinically meaningful phenotypes.  We used PheTK, a Python package designed to efficiently handle biobank-scale data, to generate phecode counts for those with ICD-9-CM and ICD-10-CM codes in All of Us.  Of these, 52 phecodes matched to 3259 single nucleotide polymorphism (SNP)-phecode associations in the Phenotype-Genotype Reference Map (PGRM), a set of well-established variant-disease associations that have been mapped to phecodes.  We then ran a binomial logistic regression associating known genetic variants with disease status, adjusted by age, sex at birth, and 10 principal components.  We performed this regression for cases defined as those with either survey or EHR evidence of disease, and again for cases defined as those with concordant survey and EHR evidence.

 

Results:
The proportion of participants with both billing codes and PFHH survey data for a condition ranged from 1% (Chickenpox) to 66.3% (HIV/AIDS).  The proportion of participants with only EHR codes compared to only survey responses also varied greatly.  For example, in blindness, 74.5% of participants had a billing code without reporting it in the survey, while for sexually transmitted infections, 81.2% of participants reported this in the survey without having a corresponding billing code.

Among the 245,388 All of Us participants with short read whole genome sequencing available, 86,424 took the PFHH survey and had at least one billing code.  We found an R2 = 0.5756 for the correlation between odds ratios (ORs) when using either survey responses or phecodes versus using both.  In some SNPs, we found a considerably higher OR when using both survey responses and EHR billing codes.  For a known Type 1 Diabetes pathogenic variant (rs9273363), we found an OR of 4.046 (3.641, 4.496) with both survey and billing codes, compared to 1.692 (1.592, 1.780) using either survey responses or phecodes (PGRM OR =  5.48 (4.80, 6.27)).  For a known Celiac pathogenic variant (rs2187668), we found an OR of 4.276 (3.736, 4.894) using both survey and billing codes, compared to 2.441 (2.225, 2.678) using either survey responses or phecodes (PGRM OR = 6.326 (5.95, 6.52)).



 

Conclusion:
We present here a first map of personal and family history conditions to phecodes, which we hope will facilitate genetic and other analyses that combine billing and survey data. When comparing phenotyping methods using either phecodes or survey responses versus requiring both, we found that the inclusion of survey data strengths genetic association in known variants.  The varied distributions of EHR and survey data in our cohort suggest that further study will be needed to compare the phenotyping performance of survey data across different conditions.

 

Agenda

Sponsors