Evaluating indication for testing based on cluster analysis of human phenotype ontology terms
Clinical Genetics and Therapeutics
-
Primary Categories:
- Clinical Genetics
-
Secondary Categories:
- Clinical Genetics
Introduction:
Historically, rare diseases have been categorized by the affected organ system, such as disorders that affect the neurological, cardiovascular, or metabolic systems. This superficial grouping is inadequate as we gain a deeper understanding of the natural history and phenotypic spectrum of genetic conditions and it hampers our ability to characterize patients. Here, we describe a method to generate phenotypic profiles based on human phenotype ontology (HPO) terms. We evaluated this approach to describe the diagnostic yield from exome sequencing (ES) based on the indications for testing.
Methods:
We tabulated patient phenotypic information from clinician-provided ICD-10 codes, HPO terms, and clinical notes into a set of disease-indicative HPO terms for individuals referred for ES between January 2020 and August 2023. Using these HPO terms, we calculated pairwise semantic similarity values using the graph information content (IC) measure (Deng et al. (2015)) with IC derived from OMIM. We then converted these scores to dissimilarity measures and conducted hierarchical clustering analysis using Ward’s method, following Rojano et al. 2021. Cluster labels were assigned using the dynamic branch-cutting method developed by Langfelder et al. 2008. To visualize associations between cluster label and clinical indication, we counted the number of HPO terms applied under each of the third level terms of the Ontology by cluster and plotted them in chord diagrams. For each cluster we assessed patient demographics, the most highly applied terms, and the genes for which diagnostic findings were discovered. Statistical significance was assessed using ANOVA tests and G-tests of independence with an alpha of 0.05.
Results:
Our cohort consisted of 2,810 individuals with a variety of clinical indications. The average number of HPO terms per individual was 16.7 (range 1-177). Hierarchical clustering identified 18 clusters that range in size from 31 patients to 381 patients, with diagnostic yield ranging from 8-45%. Mean age, mean number of HPO terms assigned, and the diagnostic yield differed significantly between clusters (p < 0.0001 for all tests). The chord diagrams allow visual assessment of the main indications found in each cluster and the variability of indications within clusters. Whereas some clusters align with specific genetic diseases (e.g. late-onset epilepsy), others are much more variable and affect multiple organs/systems. The cluster that resulted in the lowest diagnostic yield of 8% had testing at a mean age of 41.3 years old with the most frequent HPO terms of muscle weakness, neuropathy and difficulty walking. The cluster that resulted in the highest diagnostic yield of 45% had testing at a mean age of 6.1 years old with the most frequent HPO terms of abnormality of the head, short stature, neurodevelopmental delay, and failure to thrive.
Conclusion:
This work highlights a method to systematically describe patient phenotypic profiles. It can be used to more accurately describe the constellation of phenotypic features in patients who are more likely to receive positive genetic testing, and can potentially inform strategies for variant ranking in ES and genome sequencing workflows. Moreover, detailed information on the evolving phenotypic spectrum of rare diseases can help guide clinical management and accelerate the development of targeted therapies.
Historically, rare diseases have been categorized by the affected organ system, such as disorders that affect the neurological, cardiovascular, or metabolic systems. This superficial grouping is inadequate as we gain a deeper understanding of the natural history and phenotypic spectrum of genetic conditions and it hampers our ability to characterize patients. Here, we describe a method to generate phenotypic profiles based on human phenotype ontology (HPO) terms. We evaluated this approach to describe the diagnostic yield from exome sequencing (ES) based on the indications for testing.
Methods:
We tabulated patient phenotypic information from clinician-provided ICD-10 codes, HPO terms, and clinical notes into a set of disease-indicative HPO terms for individuals referred for ES between January 2020 and August 2023. Using these HPO terms, we calculated pairwise semantic similarity values using the graph information content (IC) measure (Deng et al. (2015)) with IC derived from OMIM. We then converted these scores to dissimilarity measures and conducted hierarchical clustering analysis using Ward’s method, following Rojano et al. 2021. Cluster labels were assigned using the dynamic branch-cutting method developed by Langfelder et al. 2008. To visualize associations between cluster label and clinical indication, we counted the number of HPO terms applied under each of the third level terms of the Ontology by cluster and plotted them in chord diagrams. For each cluster we assessed patient demographics, the most highly applied terms, and the genes for which diagnostic findings were discovered. Statistical significance was assessed using ANOVA tests and G-tests of independence with an alpha of 0.05.
Results:
Our cohort consisted of 2,810 individuals with a variety of clinical indications. The average number of HPO terms per individual was 16.7 (range 1-177). Hierarchical clustering identified 18 clusters that range in size from 31 patients to 381 patients, with diagnostic yield ranging from 8-45%. Mean age, mean number of HPO terms assigned, and the diagnostic yield differed significantly between clusters (p < 0.0001 for all tests). The chord diagrams allow visual assessment of the main indications found in each cluster and the variability of indications within clusters. Whereas some clusters align with specific genetic diseases (e.g. late-onset epilepsy), others are much more variable and affect multiple organs/systems. The cluster that resulted in the lowest diagnostic yield of 8% had testing at a mean age of 41.3 years old with the most frequent HPO terms of muscle weakness, neuropathy and difficulty walking. The cluster that resulted in the highest diagnostic yield of 45% had testing at a mean age of 6.1 years old with the most frequent HPO terms of abnormality of the head, short stature, neurodevelopmental delay, and failure to thrive.
Conclusion:
This work highlights a method to systematically describe patient phenotypic profiles. It can be used to more accurately describe the constellation of phenotypic features in patients who are more likely to receive positive genetic testing, and can potentially inform strategies for variant ranking in ES and genome sequencing workflows. Moreover, detailed information on the evolving phenotypic spectrum of rare diseases can help guide clinical management and accelerate the development of targeted therapies.