High-performance mixed model toolset for integrative omics analysis of big data
用于大数据综合组学分析的高性能混合模型工具集
基本信息
- 批准号:9312511
- 负责人:
- 金额:$ 58.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-04-15 至 2020-03-31
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAttentionBedsBig DataBiologicalBiological ModelsBiologyCloud ComputingCodeCollaborationsCommunitiesComplexComplex Genetic TraitComputer softwareDataData AnalysesData SetDevelopmentDiseaseEpigenetic ProcessEquationFamilyGeneticGenetic EpistasisGenomicsGenotypeGoalsHealthHeterogeneityHumanInvestmentsMemoryMeta-AnalysisMethodologyMitochondriaModelingNational Heart, Lung, and Blood InstituteNational Human Genome Research InstitutePerformancePhasePhenotypePlayPopulationProcessPublishingResearchResearch PersonnelResource AllocationResourcesRoleSample SizeSamplingSequence AnalysisShapesSystemSystems BiologyTechnologyTestingTimeTrans-Omics for Precision MedicineVariantWeightWorkanalytical methodanimal breedingbasebiological systemscloud basedcohortcostdata accessdata managementdata spaceepigenome-wide association studiesfile formatflexibilitygenetic analysisgenetic pedigreegenomic dataimprovedinsightlarge scale productionmethod developmentnovelnovel strategiesprecision medicinepressurerare variantresponsescale upsimulationsimulation softwareterabytetooltraitvirtualwhole genomeworking group
项目摘要
PROJECT SUMMARY/ABSTRACT
The recent large scale production of whole genome sequence and other multi-omics in TOPMed and other
projects calls for parallel development of comprehensive, powerful and flexible toolset capable of large data
management, analysis and integration. Mega/integrated analyses are essential to fully utilize these data to
elucidate the complexity of the biological mechanisms and advance our understanding of complex trait biology to
drive precision medicine. TOPMed estimates that the VCF for 60,000 subjects will contain 400M variants and
require 100TB of space, and much of our current genetic analysis toolset does not scale up to these data sizes.
For rare variant analysis, mixed model mega analysis is more powerful than meta-analysis as mega analysis can
include additional random effects to account for genetic relatedness between all subjects and cross-study
phenotypic, genetic and environmental heterogeneity. However cross-study mega analysis within the mixed
model is still an uncharted territory. We believe mega analysis will spur more creative analysis approaches
provided the needed toolsets are available. In cloud computing “time is money”, and new approaches are
required to solve structural differences in resource allocation and data access compared to local computing.
MMAP (Mixed Models for Analysis of Pedigrees/Populations) is robust mixed model software that already
published mixed model analysis on a sample size of 90,000 that included dominance variance and developed a
cloud-efficient version of mixed model rare variant analysis. The goal of this proposal is to further expand and
improve this toolset to deliver to the research community a flexible, versatile, and comprehensive cross-platform
mixed model toolset scalable to efficient local and cloud analysis of large WGS and omics data. We plan to
implement several new features in our toolset including: 1) Efficient binary genotype file format for optimal
storage of terabyte VCF genotypes. 2) Large-scale modeling of non-additive variation such as dominance, X-
lined, mitochondrial and epistasis. 3) Optimized rare variant analysis with flexible integration of annotation and
variant weighting resources. 4) Optimized expression/epigenome-wide association (EWA) analysis. 5)
Comprehensive multi-omics integration into the mixed model as fixed and random effects. 6) Development of a
multi-omics simulation software to guide systems biology modeling. 7) Integrating mixed model equations for
prediction from animal breeding. This proposal will deliver the research community an analysis toolset that will
push research boundaries well beyond additive SNP association to a space filled with complex biological fixed
and random effects models integrating the full spectrum of multi-omics data. We plan to develop a multi-omics
simulation tool to better understand the complex evolutionary processes that shape the complex trait landscape.
Our toolset will be extensively shaped by collaboration with TOPMed working groups to meet analysis priorities
and develop analysis plans. Our toolset will surely evolve in novel and unexpected directions in response to new
ideas and challenges as we dive deeper into this unique data set.
项目概要/摘要
TOPMed等最近大规模生产全基因组序列和其他多组学
项目需要并行开发能够处理大数据的全面、强大且灵活的工具集
管理、分析和集成对于充分利用这些数据至关重要。
阐明生物机制的复杂性并增进我们对复杂性状生物学的理解
TOPMed 估计 60,000 名受试者的 VCF 将包含 4 亿个变异,
需要 100TB 的空间,而我们当前的大部分基因分析工具集都无法扩展到这些数据大小。
对于罕见变异分析,混合模型巨量分析比荟萃分析更强大,因为巨量分析可以
包括额外的随机效应,以解释所有受试者和交叉研究之间的遗传相关性
然而,混合研究内的表型、遗传和环境异质性。
我们相信大型分析将激发更多创造性的分析方法。
只要有所需的工具集,在云计算中“时间就是金钱”,并且有新的方法。
与本地计算相比,需要解决资源分配和数据访问方面的结构差异。
MMAP(用于分析谱系/群体的混合模型)是强大的混合模型软件,已经
发表了对 90,000 个样本量的混合模型分析,其中包括显性方差,并开发了
该提案的目标是进一步扩展和混合模型稀有变异分析的云高效版本。
改进此工具集,为研究社区提供灵活、多功能且全面的跨平台
混合模型工具集可扩展,可对大型 WGS 和组学数据进行高效的本地和云分析。
在我们的工具集中实现了几个新功能,包括:1) 高效的二进制基因型文件格式,以实现最佳效果
TB 级 VCF 基因型的存储 2) 非加性变异的大规模建模,例如显性、X-
线性、线粒体和上位性 3) 优化罕见变异分析,灵活集成注释和分析。
变异加权资源。 4) 优化表达/表观基因组范围关联 (EWA) 分析。
将多组学综合集成到混合模型中作为固定效应和随机效应 6) 开发一个模型。
指导系统生物学建模的多组学模拟软件 7) 集成混合模型方程。
该提案将为研究界提供一个分析工具集,该工具集将
将研究边界远远超出附加 SNP 关联,扩展到充满复杂生物固定的空间
我们计划开发一个多组学数据的随机效应模型。
模拟工具可以更好地理解塑造复杂性状景观的复杂进化过程。
我们的工具集将主要通过与 TOPMed 工作组的合作来形成,以满足分析优先事项
并制定分析计划,以应对新的情况,我们的工具集肯定会朝着新颖和意想不到的方向发展。
当我们深入研究这个独特的数据集时,我们会遇到一些想法和挑战。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
JEFFREY R O'CONNELL其他文献
JEFFREY R O'CONNELL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('JEFFREY R O'CONNELL', 18)}}的其他基金
Elucidating the ancestry-specific genetic and environmental architecture of cardiometabolic traits across All of Us ethnic groups
阐明我们所有种族群体心脏代谢特征的祖先特异性遗传和环境结构
- 批准号:
10796028 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7104529 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7246523 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7421072 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
2864800 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
6043142 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
6169588 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
相似国自然基金
光学注意力调控机制的单透镜拓频及计算成像研究
- 批准号:62375067
- 批准年份:2023
- 资助金额:47 万元
- 项目类别:面上项目
智能车定位地图匹配方法中的交叉注意力机制研究
- 批准号:62373250
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于小波交叉注意力机制的单幅图像可变光圈散焦增强研究
- 批准号:62301332
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于自注意力机制的脑电信号智能特征提取芯片关键技术
- 批准号:62374121
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
复相干统计融合全局注意力模型的SAR微弱痕迹检测方法
- 批准号:62301403
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Deep Learning Assisted Scoring of Point of Care Lung Ultrasound for Acute Decompensated Heart Failure in the Emergency Department
深度学习辅助急诊室急性失代偿性心力衰竭护理点肺部超声评分
- 批准号:
10741596 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
A Modular Framework for Data-Driven Neurogenetics to Predict Complex and Multidimensional Autistic Phenotypes
数据驱动神经遗传学预测复杂和多维自闭症表型的模块化框架
- 批准号:
10826595 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Novel Strategy to Quantitate Delayed Aging by Caloric Restriction
通过热量限制来量化延迟衰老的新策略
- 批准号:
10594352 - 财政年份:2022
- 资助金额:
$ 58.48万 - 项目类别: