Population-Specific Structural Variation Linked to Metabolic Diseases in People of Pacific Ancestry
Biochemical/Metabolic and Therapeutics
-
Primary Categories:
- Basic Research
-
Secondary Categories:
- Basic Research
Introduction:
People of Pacific ancestry have been significantly underrepresented in human genomic research, particularly in studies focusing on structural genetic variation. Limited whole-genome sequence data from the Human Genome Diversity Project suggest that individuals from the Pacific region may carry a high number of population-specific structural variants at high frequencies. However, comprehensive studies are still lacking. Studying these high-frequency, population-specific variants offers a unique opportunity to increase the statistical power for detecting disease associations and to uncover novel gene-disease links that may not be apparent in other populations. This research is especially critical for people of Pacific ancestry, as it has the potential to uncover genetic factors underlying the higher rates of metabolic diseases observed in this population, including obesity, gout, type 2 diabetes, and kidney disease.
Methods:
We sequenced the genomes of 23 individuals with Pacific ancestry from Tonga (6), Samoa (6), Fiji (4), the Philippines (2), the Marshall Islands (2), Guam (1), Tahiti (1), and Pohnpei (1) using PacBio’s HiFi long-read sequencing technology, achieving an average depth of 33X. From these reads, we generated 46 high-quality, partially-phased haploid assemblies and identified whole-gene duplications. We further identified whole-gene duplications specific to individuals of Pacific ancestry by excluding those duplications present in the Human Pangenome Reference Consortium (HPRC) 94 genome assemblies. Next, we identified large insertions and deletions (>50bp) in the 46 haploid assemblies and genotyped these variants in 203 other samples of Pacific ancestry using pangenomic methods. We then used these genotyped variants to perform association analyses on disease phenotypes, including BMI, type 2 diabetes, gout, and estimated glomerular filtration rate (eGFR).
Results:
We identified between 50 and 100 duplicated genes per assembly among individuals of Pacific ancestry, totaling 704 unique gene duplications. Of these, 376 duplications were specific to Pacific ancestry, and 39 were observed in two or more assemblies. Notably, duplications of the NPHP1 gene, associated with renal disease, were absent in all Human Pangenome Reference Consortium (HPRC) assemblies but were present in three Pacific ancestry assemblies. Differences in gene duplication frequencies were also observed between Pacific ancestry and HPRC assemblies. The CLPS gene, which encodes pancreatic colipase and is associated with type 2 diabetes, was duplicated in 57% of HPRC assemblies but only in 32% of Pacific ancestry assemblies.
Upon analyzing large insertions and deletions in our 23 Pacific ancestry genomes, we identified a Pacific ancestry-specific 258 bp in-frame deletion in exon 6 of Proteoglycan 4 (PRG4) in three individuals. PRG4, also known as Lubricin, plays a key role in joint lubrication and cartilage protection, but recent studies have linked it to glucose control and obesity. Interestingly, this variant, located within a variable number tandem repeat (VNTR) region, was undetectable using Illumina short-read sequences for the same samples. Genotyping this deletion in 203 Pacific ancestry samples revealed its presence in 32 individuals (allele frequency = 0.08), all of whom were heterozygous. Phenotype association analysis in these 203 individuals showed a significant association between the PRG4 deletion and increased BMI (p = 0.017).
Conclusion:
These findings represent the most detailed analysis of whole-gene duplications and other structural variants in individuals of Pacific ancestry to date. Many of these variants are unique to people of Pacific ancestry and may explain genetic factors contributing to higher rates of metabolic diseases in this population. Additionally, the structural variant callsets generated from our 46 haploid assemblies can serve as a reference panel to genotype clinically relevant structural variants that might otherwise be missed with short-read sequencing, as in the case of the PRG4 deletion. Our work highlights the importance of including underrepresented populations in genetic research to uncover unique variants linked to disease susceptibility.
People of Pacific ancestry have been significantly underrepresented in human genomic research, particularly in studies focusing on structural genetic variation. Limited whole-genome sequence data from the Human Genome Diversity Project suggest that individuals from the Pacific region may carry a high number of population-specific structural variants at high frequencies. However, comprehensive studies are still lacking. Studying these high-frequency, population-specific variants offers a unique opportunity to increase the statistical power for detecting disease associations and to uncover novel gene-disease links that may not be apparent in other populations. This research is especially critical for people of Pacific ancestry, as it has the potential to uncover genetic factors underlying the higher rates of metabolic diseases observed in this population, including obesity, gout, type 2 diabetes, and kidney disease.
Methods:
We sequenced the genomes of 23 individuals with Pacific ancestry from Tonga (6), Samoa (6), Fiji (4), the Philippines (2), the Marshall Islands (2), Guam (1), Tahiti (1), and Pohnpei (1) using PacBio’s HiFi long-read sequencing technology, achieving an average depth of 33X. From these reads, we generated 46 high-quality, partially-phased haploid assemblies and identified whole-gene duplications. We further identified whole-gene duplications specific to individuals of Pacific ancestry by excluding those duplications present in the Human Pangenome Reference Consortium (HPRC) 94 genome assemblies. Next, we identified large insertions and deletions (>50bp) in the 46 haploid assemblies and genotyped these variants in 203 other samples of Pacific ancestry using pangenomic methods. We then used these genotyped variants to perform association analyses on disease phenotypes, including BMI, type 2 diabetes, gout, and estimated glomerular filtration rate (eGFR).
Results:
We identified between 50 and 100 duplicated genes per assembly among individuals of Pacific ancestry, totaling 704 unique gene duplications. Of these, 376 duplications were specific to Pacific ancestry, and 39 were observed in two or more assemblies. Notably, duplications of the NPHP1 gene, associated with renal disease, were absent in all Human Pangenome Reference Consortium (HPRC) assemblies but were present in three Pacific ancestry assemblies. Differences in gene duplication frequencies were also observed between Pacific ancestry and HPRC assemblies. The CLPS gene, which encodes pancreatic colipase and is associated with type 2 diabetes, was duplicated in 57% of HPRC assemblies but only in 32% of Pacific ancestry assemblies.
Upon analyzing large insertions and deletions in our 23 Pacific ancestry genomes, we identified a Pacific ancestry-specific 258 bp in-frame deletion in exon 6 of Proteoglycan 4 (PRG4) in three individuals. PRG4, also known as Lubricin, plays a key role in joint lubrication and cartilage protection, but recent studies have linked it to glucose control and obesity. Interestingly, this variant, located within a variable number tandem repeat (VNTR) region, was undetectable using Illumina short-read sequences for the same samples. Genotyping this deletion in 203 Pacific ancestry samples revealed its presence in 32 individuals (allele frequency = 0.08), all of whom were heterozygous. Phenotype association analysis in these 203 individuals showed a significant association between the PRG4 deletion and increased BMI (p = 0.017).
Conclusion:
These findings represent the most detailed analysis of whole-gene duplications and other structural variants in individuals of Pacific ancestry to date. Many of these variants are unique to people of Pacific ancestry and may explain genetic factors contributing to higher rates of metabolic diseases in this population. Additionally, the structural variant callsets generated from our 46 haploid assemblies can serve as a reference panel to genotype clinically relevant structural variants that might otherwise be missed with short-read sequencing, as in the case of the PRG4 deletion. Our work highlights the importance of including underrepresented populations in genetic research to uncover unique variants linked to disease susceptibility.