Leveraging biobank-scale whole-genome sequencing for polygenic risk prediction
利用生物库规模的全基因组测序进行多基因风险预测
基本信息
- 批准号:10716534
- 负责人:
- 金额:$ 44.75万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-18 至 2027-07-31
- 项目状态:未结题
- 来源:
- 关键词:AlgorithmsAllelesBase PairingBloodBlood CellsCardiovascular DiseasesChromosome abnormalityClonal ExpansionCollaborationsComplexComputer softwareComputing MethodologiesDNADNA SequenceDataData SetDiseaseEuropean ancestryFrequenciesFutureGene FrequencyGenesGeneticGenetic DiseasesGenetic ModelsGenetic PolymorphismGenetic RiskGenetic VariationGenomeGenomicsGenotypeHaplotypesHematologic NeoplasmsHeritabilityIndividualInheritedLettersLinkMediatingMemoryMethodologyMethodsModelingMutationMutation DetectionPerformancePoint MutationPopulationPublicationsResearchResolutionResourcesRiskSNP arraySamplingSingle Nucleotide PolymorphismSomatic MutationSourceStatistical AlgorithmStatistical MethodsStructureTherapeuticTrainingVariantage relatedanalytical methodbiobankcardiovascular risk factorcausal variantcohortcostdisorder riskempowermentexperiencegenetic risk factorgenetic variantgenome sequencinggenome-widegenome-wide analysishigh riskhuman diseaseimprovedinsertion/deletion mutationpolygenic risk scoreprecision medicinerare variantrisk predictionrisk varianttraitwhole genome
项目摘要
Project Summary/Abstract
Whole-genome sequencing of population biobank cohorts holds great promise for enabling accurate prediction
of genetically-mediated risk for heritable human diseases and traits. Such information has the potential to be a
powerful resource for precision medicine, informing preventative and therapeutic decisions. To more fully
realize this potential, new statistical methods are needed to incorporate all genetic variants – including
structural variants, blood-derived somatic mutations, and rare SNPs and indels – into genetic risk models.
These classes of genetic variation, which are known to include many variants with large effects on disease
risk, can be detected in high-coverage whole-genome sequencing data now being generated at biobank scale.
However, such variants have not been accessible from previous genetic data sets (which have relied on SNP-
array genotyping and imputation). Consequently, existing methods for polygenic prediction have typically
considered only common inherited SNPs and indels.
We propose to develop a suite of statistical methods to enable these additional classes of genetic variants to
be incorporated into models of genetic risk, thereby improving predictive power. For variant types that are
currently difficult to ascertain even from whole-genome sequencing data – including somatic mutations and
some types of structural variants – we will develop new genotyping algorithms that improve statistical inference
by harnessing information across large sequenced cohorts. We will efficiently integrate information across all
variant types into genetic risk models using fast Bayesian regression methods. We will apply these approaches
to train genetic risk models for common diseases using data from very large biobank sequencing projects.
This project will have three specific aims. First, we will develop and apply methods for incorporating structural
variants into polygenic scores. Many structural variants are known to confer substantial disease risk but are at
imperfectly modeled by existing polygenic scores, such that directly including such variants will increase
prediction accuracy and cross-ancestry transferability. Second, we will develop and apply methods for
incorporating somatic mutations detectable in blood-derived DNA into genetic risk models. Such acquired
mutations, often indicative of clonal expansions of blood cells, provide an orthogonal source of risk compared
to the inherited variants considered by standard polygenic scores. Third, we will develop and apply efficient
computational methods for training polygenic score models on biobank-scale sequencing data. These methods
will allow model-fitting to be performed on individual-level genetic data, optimizing prediction accuracy. We
anticipate that these efforts will significantly improve performance of genetic risk models trained on current and
future population-scale whole-genome sequencing data sets.
项目概要/摘要
人口生物库队列的全基因组测序为实现准确预测带来了巨大希望
此类信息有可能成为人类遗传疾病和性状的遗传风险。
精准医学的强大资源,为预防和治疗决策提供信息。
认识到这一潜力,需要新的统计方法来纳入所有遗传变异——包括
结构变异、血液来源的体细胞突变以及罕见的 SNP 和插入缺失 – 纳入遗传风险模型。
已知这些类别的遗传变异包括许多对疾病有重大影响的变异
风险,可以在生物样本库规模生成的高覆盖率全基因组测序数据中检测到。
然而,这些变异还无法从以前的遗传数据集(依赖于 SNP-
阵列基因分型和插补)检查,现有的多基因预测方法通常具有
仅考虑常见的遗传 SNP 和插入缺失。
我们建议开发一套统计方法,使这些额外类别的遗传变异能够
被纳入遗传风险模型中,从而提高对变异类型的预测能力。
目前即使从全基因组测序数据也很难确定——包括体细胞突变和
某些类型的结构变异——我们将开发新的基因分型算法来改进统计推断
通过利用大型测序群体的信息,我们将有效地整合所有信息。
我们将使用快速贝叶斯回归方法将变异类型纳入遗传风险模型中。
使用来自大型生物库测序项目的数据训练常见疾病的遗传风险模型。
该项目将有三个具体目标,首先,我们将开发和应用合并结构的方法。
已知许多结构变异会带来重大疾病风险,但目前仍存在这种风险。
现有的多基因评分不完美地建模,因此直接包含此类变异会增加
其次,我们将开发和应用预测准确性和跨祖先可迁移性。
将血液来源 DNA 中可检测到的体细胞突变纳入遗传风险模型。
突变通常表明血细胞的克隆扩增,提供了比较风险的正交来源
第三,我们将开发和应用有效的多基因评分。
用于在生物样本库规模测序数据上训练多基因评分模型的计算方法。
将允许对个体水平的遗传数据进行模型拟合,从而优化预测准确性。
预计这些努力将显着提高在当前和未来训练的遗传风险模型的性能
未来人口规模的全基因组测序数据集。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Uncovering complex trait heritability hidden in the repeatome.
揭示隐藏在重复组中的复杂性状遗传力。
- DOI:
- 发表时间:2023-12-13
- 期刊:
- 影响因子:0
- 作者:Loh; Po
- 通讯作者:Po
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Po-Ru Loh其他文献
Po-Ru Loh的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Po-Ru Loh', 18)}}的其他基金
Identifying structural variants influencing human health in population cohorts
识别影响人群健康的结构变异
- 批准号:
10889519 - 财政年份:2023
- 资助金额:
$ 44.75万 - 项目类别:
Fast and powerful extensions of mixed model methods for GWAS
GWAS 混合模型方法的快速而强大的扩展
- 批准号:
8974184 - 财政年份:2014
- 资助金额:
$ 44.75万 - 项目类别:
Fast and powerful extensions of mixed model methods for GWAS
GWAS 混合模型方法的快速而强大的扩展
- 批准号:
9186420 - 财政年份:2014
- 资助金额:
$ 44.75万 - 项目类别:
Fast and powerful extensions of mixed model methods for GWAS
GWAS 混合模型方法的快速而强大的扩展
- 批准号:
8712922 - 财政年份:2014
- 资助金额:
$ 44.75万 - 项目类别:
相似国自然基金
等位基因聚合网络模型的构建及其在叶片茸毛发育中的应用
- 批准号:32370714
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于人诱导多能干细胞技术研究突变等位基因特异性敲除治疗1型和2型长QT综合征
- 批准号:82300353
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
肠杆菌多粘菌素异质性耐药中phoPQ等位基因差异介导不同亚群共存的机制研究
- 批准号:82302575
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
ACR11A不同等位基因调控番茄低温胁迫的机理解析
- 批准号:32302535
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
玉米穗行数QTL克隆及优异等位基因型鉴定
- 批准号:
- 批准年份:2022
- 资助金额:55 万元
- 项目类别:面上项目
相似海外基金
A comprehensive study of tandem repeat variation as a cause of Alzheimer's disease
串联重复变异作为阿尔茨海默病病因的综合研究
- 批准号:
10585034 - 财政年份:2023
- 资助金额:
$ 44.75万 - 项目类别:
Canine MHC-I genotyping and tumor specific neoantigen determination
犬 MHC-I 基因分型和肿瘤特异性新抗原测定
- 批准号:
10220542 - 财政年份:2021
- 资助金额:
$ 44.75万 - 项目类别:
Canine MHC-I genotyping and tumor specific neoantigen determination
犬 MHC-I 基因分型和肿瘤特异性新抗原测定
- 批准号:
10404109 - 财政年份:2021
- 资助金额:
$ 44.75万 - 项目类别:
Canine MHC-I genotyping and tumor specific neoantigen determination
犬 MHC-I 基因分型和肿瘤特异性新抗原测定
- 批准号:
10630913 - 财政年份:2021
- 资助金额:
$ 44.75万 - 项目类别:
Sequence-resolved structural variation of human genomes
人类基因组的序列解析结构变异
- 批准号:
10202688 - 财政年份:2018
- 资助金额:
$ 44.75万 - 项目类别: