Novel methods to improve the utility of genomics summary statistics
提高基因组学汇总统计效用的新方法
基本信息
- 批准号:10646125
- 负责人:
- 金额:$ 41.22万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-19 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:AccelerationAddressAgeAgingAlzheimer&aposs DiseaseClinicalCognitiveCohort StudiesComputer softwareComputing MethodologiesConfidentiality of Patient InformationDataData ScienceData SetDatabasesDevelopmentDiseaseDocumentationEducational workshopElectronic Health RecordEnsureEtiologyFatty AcidsFundingFutureGeneticGenetic MarkersGenetic VariationGenetic studyGenomeGenomicsGenotypeHealthHeartHumanImpaired cognitionIndividualLinear ModelsMediatingMediationMeta-AnalysisMethodologyMethodsModelingNational Human Genome Research InstituteOutcomeOutcomes ResearchPhenotypePolyunsaturated Fatty AcidsPrivacyPythonsResearchResearch PersonnelRoleStatistical MethodsTestingTrainingValidationWorkbiobankclinically relevantcohortcostdata accessdata privacydietarydigital repositoriesflexibilitygenetic epidemiologygenetic variantgenome sciencesgenomic datahuman diseasehuman genome sequencingimprovedinnovationinterestmembernovelopen sourceopen source toolresponsesexstatisticssurvival outcometoolworking group
项目摘要
The repeated experimental and computational breakthroughs in the two decades since the sequencing of the
human genome have provided an unprecedented opportunity to understand the etiology of human diseases.
The diminishing cost of genomics data means it is now possible for researchers to obtain complete genome
sequence information on hundreds of thousands of individuals, with widespread access to those data via large
repositories of electronic health records (EHRs) and biobanks. However, interrelated computational, statistical
and privacy questions remain for about how to leverage these data to study the contribution of genetic variation
to common diseases. Importantly, there is a need for methodological innovation to minimize computational
complexity and respect data privacy concerns while maximizing data access and utility. At the forefront of
these innovations are computational methods that leverage non-individually identifiable summary statistics pre-
computed on biobank/EHR data to maximize downstream functional understanding and clinical utility. One
emerging set of summary statistics are point estimates and standard errors from separate regression models
of a phenotype (Y) on individual genotypes (Xi), sometimes with limited covariate adjustment (e.g., Age, Sex
and principal components (PCs)). A key limitation of any set of pre-computed summary statistics is not being
able to anticipate all possible downstream uses of such statistics. For example, researchers may want to use:
(a) different sets of covariates than those considered in pre-computed analyses, (b) sets of genetic variants as
predictors, instead of single markers and (c) alternative phenotype definitions that are functions of existing
variables (e.g., a researcher want to know about a phenotype, 𝑌𝑌𝐶𝐶, of clinical importance, but only has pre-
computed summary statistics on 𝑌𝑌1, 𝑌𝑌2, … , 𝑌𝑌𝑘𝑘, where 𝑌𝑌𝐶𝐶 = 𝑓𝑓( 𝑌𝑌1, 𝑌𝑌2, … , 𝑌𝑌𝑘𝑘)). In this project we will (1) develop a
computationally efficient framework to evaluate genetic variants with clinically relevant phenotypes using
summary statistics and apply these methods to perform harmonized analyses of clinically relevant phenotypes
in multi-cohort studies using summary statistics and (2) validate the feasibility of these methodological
innovations within two related consortia currently exploring the role genetic variants on cognitive outcomes and
the potential moderating and/or mediating role of dietary polyunsaturated fatty acid (PUFA) levels. Preliminary
methods will be implemented in open-source tools (R/python packages), and will also involve extensive testing
on both simulated and real data across a wide range of clinically relevant phenotypes. Work will set the stage
for a future R01 project to provide additional methodological expansion, more widespread testing and
comprehensive dissemination.
自测序以来的二十年里,实验和计算上不断取得突破
人类基因组为了解人类疾病的病因学提供了前所未有的机会。
基因组学数据成本的下降意味着研究人员现在可以获得完整的基因组
数十万个人的序列信息,可以通过大型数据库广泛访问这些数据
然而,电子健康记录(EHR)和生物样本库是相互关联的计算、统计。
关于如何利用这些数据来研究遗传变异的贡献仍然存在隐私问题
重要的是,需要方法创新来最大限度地减少计算。
复杂性并尊重数据隐私问题,同时最大限度地提高数据访问和实用性。
这些创新是利用非个人可识别的汇总统计数据进行计算的方法。
根据生物样本库/电子病历数据进行计算,以最大限度地提高下游功能理解和临床效用。
新兴的汇总统计数据集是来自单独回归模型的点估计和标准误差
表型 (Y) 对个体基因型 (Xi) 的影响,有时需要有限的协变量调整(例如年龄、性别)
任何一组预先计算的汇总统计数据的一个关键限制是不被计算。
能够预测此类统计数据的所有可能的下游用途,例如,研究人员可能希望使用:
(a) 与预先计算分析中考虑的协变量不同的协变量组,(b) 遗传变异组
预测变量,而不是单个标记,以及(c)作为现有功能的替代表型定义
变量(例如,研究人员想了解临床上重要的表型,𝑌𝑌𝐶𝐶,但只有预
计算 𝑌𝑌1, 𝑌𝑌2, … , 𝑌𝑌𝑘𝑘 的汇总统计数据,其中 𝑌𝑌𝐶𝐶 = 𝑓𝑓( 𝑌𝑌1, 𝑌𝑌2, … , 𝑌𝑌𝑘𝑘))))在这个项目中,我们将(1)开发一个
使用计算有效的框架来评估具有临床相关表型的遗传变异
汇总统计数据并应用这些方法对临床相关表型进行统一分析
在多队列研究中使用汇总统计数据并(2)验证这些方法的可行性
两个相关联盟的创新目前正在探索遗传变异对认知结果的作用和
膳食多不饱和脂肪酸(PUFA)水平的潜在调节和/或调节作用。
方法将在开源工具(R/python 包)中实现,并且还将涉及广泛的测试
基于广泛的临床相关表型的模拟和真实数据的工作将为我们奠定基础。
为未来的 R01 项目提供额外的方法扩展、更广泛的测试和
全面传播。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Nathan L Tintle其他文献
Nathan L Tintle的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Nathan L Tintle', 18)}}的其他基金
Wastewater data integration and modelling to accurately predict community and organizational outbreaks due to viral pathogens
废水数据集成和建模,以准确预测病毒病原体引起的社区和组织爆发
- 批准号:
10481536 - 财政年份:2022
- 资助金额:
$ 41.22万 - 项目类别:
Wastewater data integration and modelling to accurately predict community and organizational outbreaks due to viral pathogens
废水数据集成和建模,以准确预测病毒病原体引起的社区和组织爆发
- 批准号:
10768053 - 财政年份:2022
- 资助金额:
$ 41.22万 - 项目类别:
Large-scale data integration and harmonization to accurately predict sites facing future health-based drinking water crises
大规模数据整合和协调,以准确预测未来面临健康饮用水危机的地点
- 批准号:
10253600 - 财政年份:2021
- 资助金额:
$ 41.22万 - 项目类别:
Analyzing the behavior and interpreting the results of gene based tests of rare variant association
分析罕见变异关联的行为并解释基于基因的测试结果
- 批准号:
9099474 - 财政年份:2012
- 资助金额:
$ 41.22万 - 项目类别:
Analyzing the behavior and interpreting the results of gene based tests of rare v
分析稀有病毒的行为并解释基于基因的测试结果
- 批准号:
8367623 - 财政年份:2012
- 资助金额:
$ 41.22万 - 项目类别:
Analyzing the behavior and interpreting the results of gene based tests of rare variant association
分析罕见变异关联的行为并解释基于基因的测试结果
- 批准号:
9813293 - 财政年份:2012
- 资助金额:
$ 41.22万 - 项目类别:
Evaluating the Cost Effectiveness of Alternative Sample Designs for Genetic Assoc
评估遗传关联替代样本设计的成本效益
- 批准号:
7841342 - 财政年份:2009
- 资助金额:
$ 41.22万 - 项目类别:
Evaluating the Cost Effectiveness of Alternative Sample Designs for Genetic Assoc
评估遗传关联替代样本设计的成本效益
- 批准号:
8264409 - 财政年份:2008
- 资助金额:
$ 41.22万 - 项目类别:
Evaluating the Cost Effectiveness of Alternative Sample Designs for Genetic Assoc
评估遗传关联替代样本设计的成本效益
- 批准号:
7363067 - 财政年份:2008
- 资助金额:
$ 41.22万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
相似海外基金
The Proactive and Reactive Neuromechanics of Instability in Aging and Dementia with Lewy Bodies
衰老和路易体痴呆中不稳定的主动和反应神经力学
- 批准号:
10749539 - 财政年份:2024
- 资助金额:
$ 41.22万 - 项目类别:
The contribution of air pollution to racial and ethnic disparities in Alzheimer’s disease and related dementias: An application of causal inference methods
空气污染对阿尔茨海默病和相关痴呆症的种族和民族差异的影响:因果推理方法的应用
- 批准号:
10642607 - 财政年份:2023
- 资助金额:
$ 41.22万 - 项目类别:
Effects of Aging on Neuronal Lysosomal Damage Responses Driven by CMT2B-linked Rab7
衰老对 CMT2B 相关 Rab7 驱动的神经元溶酶体损伤反应的影响
- 批准号:
10678789 - 财政年份:2023
- 资助金额:
$ 41.22万 - 项目类别:
Parallel Characterization of Genetic Variants in Chemotherapy-Induced Cardiotoxicity Using iPSCs
使用 iPSC 并行表征化疗引起的心脏毒性中的遗传变异
- 批准号:
10663613 - 财政年份:2023
- 资助金额:
$ 41.22万 - 项目类别: