Collaborative Research: Converging Genomics, Phenomics, and Environments Using Interpretable Machine Learning Models
协作研究:使用可解释的机器学习模型融合基因组学、表型组学和环境
基本信息
- 批准号:1940062
- 负责人:
- 金额:$ 48.3万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Mitigating the effects of climate change on public health and conservation calls for a better understanding of the dynamic interplay between biological processes and environmental effects. The state-of-the-art, which has led to many important discoveries, utilizes numerical or statistical models for making predictions or performing in silico experimentation, but these techniques struggle to capture the nonlinear response of natural systems. Machine learning (ML) methods are better able to cope with nonlinearity and have been used successfully in biological applications, but several barriers still exist, including the opaque nature of the algorithm output and the absence of ML-ready data. This project seeks to significantly advance technologies in ML and create a new interdisciplinary field, computational ecogenomics. This will be accomplished by designing ML techniques for encoding heterogeneous genomic and environmental data and mapping them to multi-level phenotypic traits, reducing the amount of necessary training data, and then developing interactive visualizations to better interpret ML models and their outputs. These advances will responsibly and transparently inform policy to maximize resources during this crucial window for planetary health, while revealing underlying biological mechanisms of response to stress and evolutionary pressure.The long-term vision for this project is to develop predictive analytics for organismal response to environmental perturbations using innovative data science approaches and change the way scientists think about gene expression and the environment. The goal for this two-year award is to develop a proof-of-concept for an institute focused on predicting emergent properties of complex systems; an institute that would itself foster the development of many new sub-disciplines. The core of this activity is developing a machine learning framework capable of predicting phenotypes based on multi-scale data about genes and environments. Available data, ranging from simple vectors to complex images to sequences, will be ingested into this framework by applying proven semantic data integration tools and algorithmic data transformation methods. The central hypothesis of this research is that deep learning algorithms and biological knowledge graphs will predict phenotypes more accurately across more taxa and more ecosystems than do current numerical and traditional statistical modeling methods. The rationale for this project is that a timely investment in data science will push through a bottleneck in life science, accelerating discovery of gene-phenotype-environment relationships, and catalyzing a new computational discipline to uncover the complex "rules of life."This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
减轻气候变化对公共卫生和保护的影响需要更好地了解生物过程和环境影响之间的动态相互作用。最先进的技术利用数值或统计模型进行预测或进行计算机实验,带来了许多重要的发现,但这些技术很难捕捉自然系统的非线性响应。机器学习 (ML) 方法能够更好地应对非线性,并已在生物应用中成功使用,但仍然存在一些障碍,包括算法输出的不透明性和缺乏 ML 就绪数据。该项目旨在显着推进机器学习技术并创建一个新的跨学科领域:计算生态基因组学。这将通过设计用于编码异质基因组和环境数据的机器学习技术并将其映射到多级表型性状、减少必要的训练数据量,然后开发交互式可视化以更好地解释机器学习模型及其输出来实现。 这些进步将负责任地、透明地为政策提供信息,以便在这个地球健康的关键窗口期最大限度地利用资源,同时揭示应对压力和进化压力的潜在生物机制。该项目的长期愿景是开发生物体对环境反应的预测分析。使用创新的数据科学方法进行扰动,并改变科学家对基因表达和环境的思考方式。这个为期两年的奖项的目标是为一家专注于预测复杂系统的新兴特性的研究所开发概念验证;一个本身将促进许多新分支学科发展的研究所。 这项活动的核心是开发一个机器学习框架,能够根据基因和环境的多尺度数据预测表型。 通过应用经过验证的语义数据集成工具和算法数据转换方法,可用数据(从简单向量到复杂图像再到序列)都将被引入到该框架中。 这项研究的中心假设是,与当前的数值和传统统计建模方法相比,深度学习算法和生物知识图谱将更准确地预测更多类群和更多生态系统的表型。 该项目的基本原理是,对数据科学的及时投资将突破生命科学的瓶颈,加速发现基因-表型-环境关系,并催化新的计算学科来揭示复杂的“生命规则”。是美国国家科学基金会利用数据革命 (HDR) 大创意活动的一部分,并得到 HDR 和 NSF 生物科学理事会生物基础设施部门的共同支持。该奖项反映了 NSF 的法定使命通过使用基金会的智力优点和更广泛的影响审查标准进行评估,并被认为值得支持。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Using knowledge graphs to infer gene expression in plants
使用知识图推断植物中的基因表达
- DOI:10.3389/frai.2023.1201002
- 发表时间:2023-06
- 期刊:
- 影响因子:4
- 作者:Thessen, Anne E.;Cooper, Laurel;Swetnam, Tyson L.;Hegde, Harshad;Reese, Justin;Elser, Justin;Jaiswal, Pankaj
- 通讯作者:Jaiswal, Pankaj
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Patrick Heidorn其他文献
Patrick Heidorn的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Patrick Heidorn', 18)}}的其他基金
SI2-SSE: Visualizing Astronomy Repository Data using WorldWide Telescope Software Systems
SI2-SSE:使用全球望远镜软件系统可视化天文学存储库数据
- 批准号:
1642446 - 财政年份:2016
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: Conceptualizing an Institute for Empowering Long Tail Research
合作研究:构想一个促进长尾研究的研究所
- 批准号:
1216884 - 财政年份:2012
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: BiSciCol Tracker: Towards a tagging and tracking infrastructure for biodiversity science collections
合作研究:BiSciCol Tracker:建立生物多样性科学馆藏的标记和跟踪基础设施
- 批准号:
0956271 - 财政年份:2010
- 资助金额:
$ 48.3万 - 项目类别:
Continuing Grant
Biodiversity and Biocomplexity Informatics: Policy and Implementation, Science versus Citizen Science, to be held in Portland, Oregon, July 16, 2002
生物多样性和生物复杂性信息学:政策和实施,科学与公民科学,将于 2002 年 7 月 16 日在俄勒冈州波特兰举行
- 批准号:
0229031 - 财政年份:2002
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
ITR/IM: An Internet Environment for BioDiversity Survey Collaboration and Verification
ITR/IM:生物多样性调查协作和验证的互联网环境
- 批准号:
0113918 - 财政年份:2001
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Biological Information Browsing Environment
生物信息浏览环境
- 批准号:
9982849 - 财政年份:2000
- 资助金额:
$ 48.3万 - 项目类别:
Continuing Grant
相似国自然基金
高分辨率气候风险及光伏能源数据集构建与汇聚共享技术研究
- 批准号:42341206
- 批准年份:2023
- 资助金额:60 万元
- 项目类别:专项基金项目
喜马拉雅淡色花岗岩多脉冲岩浆汇聚累积过程中的分异演化:矿物学示踪研究
- 批准号:42372088
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
液相脉冲放电可控汇聚冲击波一步法制备碳量子点单原子催化剂研究
- 批准号:12372332
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
汇聚几何下变密度混合层的转捩与湍流混合研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
循环递进直流电场下吹填淤泥中电渗流的汇聚机理及其过程强化研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: Converging Design Methodology: Multi-objective Optimization of Resilient Structural Spines
合作研究:融合设计方法:弹性结构脊柱的多目标优化
- 批准号:
2120683 - 财政年份:2021
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: Converging Design Methodology: Multi-objective Optimization of Resilient Structural Spines
合作研究:融合设计方法:弹性结构脊柱的多目标优化
- 批准号:
2120692 - 财政年份:2021
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: Converging Design Methodology: Multi-objective Optimization of Resilient Structural Spines
合作研究:融合设计方法:弹性结构脊柱的多目标优化
- 批准号:
2120684 - 财政年份:2021
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: Converging COVID-19, environment, health, and equity
合作研究:融合 COVID-19、环境、健康和公平
- 批准号:
2037862 - 财政年份:2020
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant
Collaborative Research: Converging COVID-19, environment, health, and equity
合作研究:融合 COVID-19、环境、健康和公平
- 批准号:
2037834 - 财政年份:2020
- 资助金额:
$ 48.3万 - 项目类别:
Standard Grant