StarPhase: Leveraging Long-Read Sequencing to Update Pharmacogenomic Benchmarks
Laboratory Genetics and Genomics
-
Primary Categories:
- Genomic Medicine
-
Secondary Categories:
- Genomic Medicine
Introduction:
Pharmacogenomics (PGx) is a critical part of personalized medicine, informing dosage and safety of treatments for individuals. The first step of pharmacogenomics is “PGx diplotyping” to identify the haplotypes present in an individual, commonly referred to as “star (*) alleles”. For simple PGx genes, this process requires the detection of phased small variants. To fully characterize complex PGx genes like HLA-A or CYP2D6, both small variants and copy number changes must be considered. When developing PGx assays, most labs rely on the gold-standard PGx benchmark from Genetic Testing Reference Materials (GeT-RM). This original benchmark was generated from a consensus of nine labs using seven different SNP array or PCR amplification assays. Many of these assays cannot reliably detect variant phase or directly observe full PGx haplotypes, limiting the scope of star alleles that could be accurately assessed when the benchmark was generated in 2016. In contrast, long-read sequencing inherently captures adjacent alleles in a single molecule, directly phasing variation even for complex star alleles. These long DNA reads can also be assembled with relative ease, enabling full-length diplotyping of highly polymorphic genes like HLA-A and HLA-B.
Methods:
StarPhase is a robust and flexible tool that diplotypes 21 genes with CPIC Level A recommendations, including three structurally diverse genes: HLA-A, HLA-B, and CYP2D6. StarPhase reports full-length consensus sequences enabling detailed reporting of 4-field HLA alleles and CYP2D6 sub-alleles. StarPhase also provides visualizations for these consensus sequences, enabling easier exploration of potentially novel star alleles. We first measure StarPhase’s concordance with 25 samples that are part of the gold-standard GeT-RM PGx benchmark, and then adjudicate any discrepancies. To measure StarPhase’s accuracy across more diverse populations, we diplotype 147 samples from the Human Pangenome Reference Consortium (HPRC) and compare the results to previous published diplotypes.
Results:
The StarPhase diplotypes match across 3,188 comparisons, or 96.2% of the combined benchmark set. We identify a combined 109 minor discrepancies (3.3%), which are caused by either new star alleles in the PGx databases or differences in how diplotypes are reported when they are not defined in a PGx database. For the remaining 16 mismatches (0.5%), manual inspection of each supports the StarPhase diplotype, in most cases with an explanation that can be traced to known comparator limitations. Critically, we did not identify any diplotyping errors from StarPhase across all datasets. StarPhase identifies updated or corrected diplotypes for a combined 67 of the GeT-RM benchmark diplotypes (26.2%), while also generating an additional 269 diplotypes for sample-gene combinations that are not included in the existing gold-standard GeT-RM benchmark.
Conclusion:
Pharmacogenomic diplotyping is a difficult process due to the complexity of PGx genes, and existing benchmarks harbor deficiencies due to the limitations of the technologies used to generate them. Long-read sequencing addresses many of these complexities, and StarPhase leverages the technology to generate highly accurate PGx diplotypes for patient samples. StarPhase’s accuracy and supporting visualizations enable us to update or correct previous benchmark diplotypes with high confidence, while also providing additional PGx diplotypes for commonly sequenced samples in the GeT-RM and HPRC pharmacogenomic benchmarks.
Pharmacogenomics (PGx) is a critical part of personalized medicine, informing dosage and safety of treatments for individuals. The first step of pharmacogenomics is “PGx diplotyping” to identify the haplotypes present in an individual, commonly referred to as “star (*) alleles”. For simple PGx genes, this process requires the detection of phased small variants. To fully characterize complex PGx genes like HLA-A or CYP2D6, both small variants and copy number changes must be considered. When developing PGx assays, most labs rely on the gold-standard PGx benchmark from Genetic Testing Reference Materials (GeT-RM). This original benchmark was generated from a consensus of nine labs using seven different SNP array or PCR amplification assays. Many of these assays cannot reliably detect variant phase or directly observe full PGx haplotypes, limiting the scope of star alleles that could be accurately assessed when the benchmark was generated in 2016. In contrast, long-read sequencing inherently captures adjacent alleles in a single molecule, directly phasing variation even for complex star alleles. These long DNA reads can also be assembled with relative ease, enabling full-length diplotyping of highly polymorphic genes like HLA-A and HLA-B.
Methods:
StarPhase is a robust and flexible tool that diplotypes 21 genes with CPIC Level A recommendations, including three structurally diverse genes: HLA-A, HLA-B, and CYP2D6. StarPhase reports full-length consensus sequences enabling detailed reporting of 4-field HLA alleles and CYP2D6 sub-alleles. StarPhase also provides visualizations for these consensus sequences, enabling easier exploration of potentially novel star alleles. We first measure StarPhase’s concordance with 25 samples that are part of the gold-standard GeT-RM PGx benchmark, and then adjudicate any discrepancies. To measure StarPhase’s accuracy across more diverse populations, we diplotype 147 samples from the Human Pangenome Reference Consortium (HPRC) and compare the results to previous published diplotypes.
Results:
The StarPhase diplotypes match across 3,188 comparisons, or 96.2% of the combined benchmark set. We identify a combined 109 minor discrepancies (3.3%), which are caused by either new star alleles in the PGx databases or differences in how diplotypes are reported when they are not defined in a PGx database. For the remaining 16 mismatches (0.5%), manual inspection of each supports the StarPhase diplotype, in most cases with an explanation that can be traced to known comparator limitations. Critically, we did not identify any diplotyping errors from StarPhase across all datasets. StarPhase identifies updated or corrected diplotypes for a combined 67 of the GeT-RM benchmark diplotypes (26.2%), while also generating an additional 269 diplotypes for sample-gene combinations that are not included in the existing gold-standard GeT-RM benchmark.
Conclusion:
Pharmacogenomic diplotyping is a difficult process due to the complexity of PGx genes, and existing benchmarks harbor deficiencies due to the limitations of the technologies used to generate them. Long-read sequencing addresses many of these complexities, and StarPhase leverages the technology to generate highly accurate PGx diplotypes for patient samples. StarPhase’s accuracy and supporting visualizations enable us to update or correct previous benchmark diplotypes with high confidence, while also providing additional PGx diplotypes for commonly sequenced samples in the GeT-RM and HPRC pharmacogenomic benchmarks.