Statistical Methods for Integrative Analysis of Large-Scale Multi-Ethnic Whole Genome Sequencing Studies and Biobanks of Common Diseases
大规模多民族全基因组测序研究和常见疾病生物样本库综合分析的统计方法
基本信息
- 批准号:10622567
- 负责人:
- 金额:$ 49.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-15 至 2026-04-30
- 项目状态:未结题
- 来源:
- 关键词:AddressBiological MarkersCellsCellular AssayCloud ComputingCodeComplexComputer softwareComputing MethodologiesDataData SetDiseaseElectronic Health RecordEnvironmentEpidemiologyEuropeanEuropean ancestryFAIR principlesFaceGeneticGenetic ResearchGenetic studyGenomeHeart DiseasesIndividualInstitutionInterventionLung diseasesMendelian randomizationMeta-AnalysisMethodsModelingNational Heart, Lung, and Blood InstituteNational Human Genome Research InstituteNaturePathway interactionsPerformancePopulationPrevention strategyPrincipal Component AnalysisResearchRisk FactorsSample SizeSamplingStatistical MethodsStructureSystemTestingTrans-Omics for Precision MedicineUnderrepresented PopulationsUnited States National Institutes of HealthUntranslated RNAVariantVeteransbiobankcatalystcell typecloud platformcluster computingdata privacydisorder riskempowermentexomegenetic variantgenome sequencinggenome wide association studyhealth disparityimprovedinterestlearning strategymulti-ethnicpolygenic risk scorepower analysisprivacy protectionprogramsrare variantresearch studyrisk predictionstatistical learningtraituser friendly softwarewhole genome
项目摘要
This proposal aims to develop advanced and scalable statistical methods for integrative analysis of large-scale
Whole Genome Sequencing (WGS) studies and biobanks of common diseases, such as heart and lung
diseases. Genome-Wide Association Studies (GWAS) have revealed thousands of genetic variants associated
with many common diseases, but are limited to common variants from a majority of individuals of only
European ancestry. Large-scale multi-ethnic WGS studies and biobanks have been rapidly arising to overcome
these limitations, and to study the genetic underpinnings of complex diseases and traits in both coding and
non-coding rare variants across populations. Examples include the NHLBI Trans-Omics Precision Medicine
Program (TOPMed) and the NHGRI Genome Sequencing Program (GSP), UK biobank, and All of Us. Various
omics data are also available in TOPMed. Full usage of these datasets can fuel genetic discoveries applicable
to genetically understudied populations. These studies consist of hundreds of millions of rare variants (RVs),
and their analysis faces several challenges. First, although several methods have been developed for RV
analysis, they have limited power for analysis of non-coding RVs, as their functions are unknown or cell-type
specific. There is a pressing need to empower RV Association Tests (RVATs) for non-coding variants by
developing more powerful statistical learning methods using integrative analysis and incorporating cell-type
specific variant functional annotations. Second, large sample sizes of WGS studies and data privacy
consideration of many national and institutional biobanks with unbalanced case and control ratios call for
distributed WGS analyses. Third, it is of substantial interest to develop polygenic risk scores using both
common and rare variants in WGS studies, and to investigate causal effects of biomarkers and omics’ markers
on diseases using Mendelian Randomization (MR) using both common and rare variants as instrumental
variables. This proposal aims at addressing these needs with four aims. First, we will develop statistical
learning based ensemble RVATs to boost power. This ensemble RVAT framework will be extended to use
cell-type-specific functional annotations calculated from single-cell assays, and to perform meta-analysis.
Second, we will develop distributed methods for important tasks in the analysis of large WGS and federated
biobank data: estimating population structure via distributed fast principal component analysis, distributed
methods for fitting generalized linear mixed models, and distributed RVATs. Third, we will develop methods for
polygenic risk score (PRS) using both common and rare variants in WGS studies, and develop Mendelian
Randomization methods for studying the causal effects of biomarkers and omics markers on diseases by using
WGS-based PRs as instrumental variables. Fourth, we will develop open-access statistical software capable of
implementing our proposed methods in both offline and cloud computing environments. We will apply the
proposed methods to the analysis of the TOPMed and GSP data and the biobanks.
该提案旨在开发先进且可扩展的统计方法,用于大规模数据的综合分析
心脏和肺部等常见疾病的全基因组测序 (WGS) 研究和生物库
全基因组关联研究(GWAS)揭示了数千种相关的遗传变异。
与许多常见疾病有关,但仅限于大多数个体的常见变异
欧洲血统的大规模多种族全基因组测序研究和生物库已迅速兴起,以克服这一问题。
这些局限性,并研究复杂疾病的遗传基础和编码和特征
跨人群的非编码罕见变异的例子包括 NHLBI Trans-Omics Precision Medicine。
计划 (TOPMed) 和 NHGRI 基因组测序计划 (GSP)、英国生物银行和 All of Us。
TOPMed 中还提供了组学数据,充分利用这些数据集可以促进适用的遗传发现。
这些研究包括数亿个罕见变异(RV),
首先,尽管已经为 RV 开发了多种方法,但他们的分析面临着一些挑战。
分析时,它们对非编码 RV 的分析能力有限,因为它们的功能或细胞类型未知
迫切需要通过以下方式为非编码变体提供 RV 关联测试 (RVAT)。
使用综合分析和合并细胞类型开发更强大的统计学习方法
第二,WGS研究的大样本量和数据隐私。
考虑到许多国家和机构生物样本库的案例和对照比率不平衡,需要
第三,使用两者来开发多基因风险评分具有重大意义。
WGS 研究中常见和罕见的变异,并研究生物标志物和组学标志物的因果效应
使用孟德尔随机化 (MR) 来研究疾病,使用常见和罕见变异作为工具
该提案旨在满足这些需求,有四个目标:首先,我们将开发统计数据。
基于学习的集成 RVAT 来提高功率 该集成 RVAT 框架将被扩展使用。
根据单细胞测定计算出细胞类型特异性功能注释,并进行荟萃分析。
其次,我们将为大型 WGS 和联合分析中的重要任务开发分布式方法。
生物样本库数据:通过分布式快速主成分分析估计种群结构,分布式
第三,我们将开发拟合广义线性混合模型和分布式 RVAT 的方法。
使用 WGS 研究中常见和罕见变异的多基因风险评分 (PRS),并开发孟德尔
利用随机化方法研究生物标志物和组学标志物对疾病的因果影响
第四,我们将开发基于WGS的PR作为工具变量。
我们将在离线和云计算环境中实施我们提出的方法。
提出了分析 TOPMed 和 GSP 数据以及生物库的方法。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
XIHONG LIN其他文献
XIHONG LIN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('XIHONG LIN', 18)}}的其他基金
Powering whole genome sequence-based genetic discovery for common human diseases- Extended 2021-2022.
为常见人类疾病提供基于全基因组序列的基因发现 - 延期 2021-2022 年。
- 批准号:
10355760 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
Powering whole genome sequence-based genetic discovery for common human diseases
为常见人类疾病提供基于全基因组序列的基因发现
- 批准号:
10168752 - 财政年份:2020
- 资助金额:
$ 49.98万 - 项目类别:
Powering whole genome sequence-based genetic discovery for common human diseases
为常见人类疾病提供基于全基因组序列的基因发现
- 批准号:
10085285 - 财政年份:2020
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
10676866 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
10221623 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
8955524 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9980301 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9120850 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9321418 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
相似国自然基金
基于Bacillus subtilis 细胞传感器介导的肠道环境中结直肠癌相关生物标志物的动态检测策略
- 批准号:82372355
- 批准年份:2023
- 资助金额:48 万元
- 项目类别:面上项目
宏基因组生物标志物驱动的菌群单细胞基因组-表型组关联分析方法研究
- 批准号:32370097
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
APOBEC特征性突变作为卵巢透明细胞癌免疫治疗生物标志物的确认研究
- 批准号:82303968
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
血浆外泌体中tsRNA对胶质瘤诊断的生物标志物作用及调控胶质瘤细胞增殖的分子机制研究
- 批准号:82303586
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
吉兰-巴雷综合征潜在生物标志物APOC3通过介导代谢重编程调控巨噬细胞极化的分子机制研究
- 批准号:82371359
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Potential of tissue kallikreins as therapeutic targets for neuropsychiatric lupus
组织激肽释放酶作为神经精神狼疮治疗靶点的潜力
- 批准号:
10667764 - 财政年份:2023
- 资助金额:
$ 49.98万 - 项目类别:
Gain-of-function complement activators as a new class of immunotherapeutic molecules
功能获得性补体激活剂作为一类新型免疫治疗分子
- 批准号:
10629623 - 财政年份:2023
- 资助金额:
$ 49.98万 - 项目类别:
Characterization of Altered Immunity in Patients with Inflammatory Arthritis Induced by Immune Checkpoint Inhibitor Therapy
免疫检查点抑制剂治疗引起的炎症性关节炎患者免疫改变的特征
- 批准号:
10885381 - 财政年份:2023
- 资助金额:
$ 49.98万 - 项目类别:
Individualized Profiles of Sensorineural Hearing Loss from Non-Invasive Biomarkers of Peripheral Pathology
周围病理学非侵入性生物标志物的感音神经性听力损失个体化概况
- 批准号:
10827155 - 财政年份:2023
- 资助金额:
$ 49.98万 - 项目类别: