Scalable Coalescent Inference for Large Data Sets
适用于大型数据集的可扩展合并推理
基本信息
- 批准号:10192760
- 负责人:
- 金额:$ 30.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-05 至 2022-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAreaBayesian AnalysisBiologicalBiologyBisonCessation of lifeCommunicable DiseasesComputer softwareComputing MethodologiesDNADNA SequenceDataData SetDevelopmentDimensionsEncapsulatedEnsureEvolutionExplosionFrequenciesGenealogical TreeGenealogyGenesGeneticGenetic PhenomenaGenetic VariationGenetic studyGoalsHumanInvestigationLabelMathematicsMethodologyMethodsModelingModernizationMolecularMolecular AnalysisNorth AmericaPhylogenetic AnalysisPopulationPopulation GeneticsProcessProcessed GenesPropertyPublic HealthRecording of previous eventsResearchResearch PersonnelResolutionSample SizeSamplingShapesSiteStatistical MethodsStatistical ModelsStochastic ProcessesStructureTimeTreesUncertaintyZIKAbasecancer genomicsgenomic dataimprovedinnovationlarge datasetsmathematical modelnext generation sequencingnovelopen sourcesimulationstatisticstheoriestool
项目摘要
Mathematical and statistical modeling of gene genealogies-trees that reflect ancestral relationships among sampled
molecular sequences-is central to many biological fields, including population genetics, phylodynamics of infectious
disease, paleogenomics, phylogenetics, and cancer genomics. Kingman's n-coalescent is a stochastic process of gene
genealogies whose parameters depend on an evolutionary model. Inference of model parameters then contributes to an
understanding of the phenomena that have given rise to the sequences. Though many sophisticated methods have been
developed to date, major statistical and computational challenges remain because the state space of genealogies grows
superexponentially with the number of samples. We are no longer data-limited but instead, we lack computational and
statistical methods for analysis of large scale emerging genomic data sets. The long-term goal of the researchers is to
develop statistically consistent and computationally efficient coalescent methods for exact inference of evolutionary
parameters from next-generation sequencing datasets. The objective of this research is to apply the notion of
lumpability of Kingman's n-coalescent to address the state-space explosion problem of coalescent methods. The basic
idea is to model a coarser resolution of the underlying genealogy and reduce the cardinality of the hidden state space.
These coarser coalescent models include Tajima's coalescent and the pure-death process coalescent. The specific aims
include (1) prove theorems for coalescent models and provide theoretical and practical tools for addressing
computational challenges when modeling different resolutions or "lumpings" of Kingman's coalescent; (2) develop
scalable methods for inference of evolutionary parameters using different coalescent models; (3) theoretically and
empirically validate the inference methods, applying them in simulations and in molecular sequences from infectious
diseases such as Zika, as well as ancient DNA samples from bison in North America and ancient and modern human
samples; (4) implement the novel methods in open source software, ensuring fast dissemination of the methodology
among researchers. The research is innovative in many distinct ways. First, Tajima's coalescent has not yet been
exploited for inference despite the potential based on the smaller state space. Second, the methods developed here will
allow inference from data sets that have not been exploited before because of computational limitations. Third, we
will not only provide a suite of tools ready for application but we will also provide statistical results supporting our
implementations. Our proposed research on scalable modeling of genealogical trees will be significant in a number eJf
fields, including the theory of evolutionary trees, statistical inference in population genetics and phylogenetics, and
the analysis of molecular sequences from infectious disease and ancient DNA.
基因家谱树的数学和统计模型,反映了采样的祖先关系
分子序列 - 许多生物领域的中心
疾病,古生物学,系统发育学和癌症基因组学。金曼的n-钙化是基因的随机过程
参数的家谱取决于进化模型。然后,模型参数的推断有助于
了解引起序列的现象。尽管许多复杂的方法已经
迄今为止开发的主要统计和计算挑战仍然存在,因为家谱的状态空间不断增长
与样品数量的数量相大。我们不再是数据限制的,而是我们缺乏计算和
用于分析大型新兴基因组数据集的统计方法。研究人员的长期目标是
开发统计一致和计算有效的合并方法,以精确推断进化论
来自下一代测序数据集的参数。这项研究的目的是应用
金曼的N-钙化问题解决了共同方法的状态空间爆炸问题。基本
想法是为基础谱系的更粗糙分辨率建模,并降低隐藏状态空间的基础性。
这些更粗糙的合并模型包括塔吉马的合并和纯死亡过程合并。具体目标
包括(1)证明用于合并模型的定理,并提供理论和实用的工具来解决
在建模金曼(Kingman)合并的不同决议或“大小”时,都会面临计算挑战; (2)发展
使用不同的合并模型推断进化参数的可扩展方法; (3)理论上
从经验上验证推理方法,将其应用于模拟和分子序列中
寨卡病等疾病,以及北美野牛的古老DNA样本以及古代和现代人类
样品; (4)在开源软件中实施新方法,确保该方法的快速传播
在研究人员中。这项研究以许多不同的方式具有创新性。首先,塔吉马的融合尚未
尽管存在基于较小的状态空间的潜力,但仍利用推断。第二,这里开发的方法将
允许从由于计算限制而从未被利用的数据集进行推断。第三,我们
不仅将提供一套准备应用程序的工具,而且我们还将提供统计结果,以支持我们
实施。我们对族谱树的可扩展建模的拟议研究将在许多EJF中很重要
领域,包括进化树理论,人口遗传学和系统发育学的统计推断以及
来自传染病和古代DNA的分子序列的分析。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Statistical Challenges in Tracking the Evolution of SARS-CoV-2.
- DOI:10.1214/22-sts853
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Discussion on "Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories" by James R. Faulkner, Andrew F. Magee, Beth Shapiro, and Vladimir N. Minin.
James R. Faulkner、Andrew F. Magee、Beth Shapiro 和 Vladimir N. Minin 对“有效人口规模轨迹的基于马蹄形贝叶斯非参数估计”的讨论。
- DOI:10.1111/biom.13275
- 发表时间:2020
- 期刊:
- 影响因子:1.9
- 作者:Cappello,Lorenzo;Ghosh,Swarnadip;Palacios,JuliaA
- 通讯作者:Palacios,JuliaA
A simple derivation of the mean of the Sackin index of tree balance under the uniform model on rooted binary labeled trees.
- DOI:10.1016/j.mbs.2021.108688
- 发表时间:2021-12
- 期刊:
- 影响因子:4.3
- 作者:King MC;Rosenberg NA
- 通讯作者:Rosenberg NA
Exact limits of inference in coalescent models.
合并模型中推理的精确限制。
- DOI:10.1016/j.tpb.2018.11.004
- 发表时间:2019
- 期刊:
- 影响因子:1.4
- 作者:Johndrow,JamesE;Palacios,JuliaA
- 通讯作者:Palacios,JuliaA
Adaptive Preferential Sampling in Phylodynamics With an Application to SARS-CoV-2.
- DOI:10.1080/10618600.2021.1987256
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Julia Palacios其他文献
Julia Palacios的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Julia Palacios', 18)}}的其他基金
Novel Coalescent Approaches for Studying Evolutionary Processes
研究进化过程的新联合方法
- 批准号:
10552480 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别:
相似国自然基金
面向二氧化碳封存的高可扩展时空并行区域分解算法及其大规模应用
- 批准号:12371366
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
无界区域中非局部Klein-Gordon-Schrödinger方程的保结构算法研究
- 批准号:12301508
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于深度强化学习的约束多目标群智算法及多区域热电调度应用
- 批准号:62303197
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向多区域单元化生产线协同调度问题的自动算法设计研究
- 批准号:62303204
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
颜面缺损修复三维目标参照数据构建的区域权重非刚性配准算法研究
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:面上项目
相似海外基金
Fluency from Flesh to Filament: Collation, Representation, and Analysis of Multi-Scale Neuroimaging data to Characterize and Diagnose Alzheimer's Disease
从肉体到细丝的流畅性:多尺度神经影像数据的整理、表示和分析,以表征和诊断阿尔茨海默病
- 批准号:
10462257 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别:
Previvors Recharge: A Resilience Program for Cancer Previvors
癌症预防者恢复活力计划:癌症预防者恢复力计划
- 批准号:
10698965 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别:
A multicenter study in bronchoscopy combining Stimulated Raman Histology with Artificial intelligence for rapid lung cancer detection - The ON-SITE study
支气管镜检查结合受激拉曼组织学与人工智能快速检测肺癌的多中心研究 - ON-SITE 研究
- 批准号:
10698382 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别:
Dynamic neural coding of spectro-temporal sound features during free movement
自由运动时谱时声音特征的动态神经编码
- 批准号:
10656110 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别:
HEAR-HEARTFELT (Identifying the risk of Hospitalizations or Emergency depARtment visits for patients with HEART Failure in managed long-term care through vErbaL communicaTion)
倾听心声(通过口头交流确定长期管理护理中的心力衰竭患者住院或急诊就诊的风险)
- 批准号:
10723292 - 财政年份:2023
- 资助金额:
$ 30.48万 - 项目类别: