CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms

职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化

基本信息

  • 批准号:
    2340586
  • 负责人:
  • 金额:
    $ 60.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-06-01 至 2029-05-31
  • 项目状态:
    未结题

项目摘要

When trying to understand the workings of complex systems (whether it be individual cells or whole ecosystems), large datasets have the potential to provide deep scientific and operational insights. However, there are two major challenges that must be addressed: how to quickly yet rigorously process such large datasets, and how to avoid becoming overconfident in the conclusions reached, given the limitations of the data and knowledge of how such systems work. This research develops a comprehensive framework and set of algorithms for addressing both of these challenges in a general way, so that scientists and other data analysts can use them off-the-shelf, thereby accelerating the acquisition of new knowledge. The work will be developed in the context of two modern application areas of broad interest. The first is to enable biologists to learn about the inner workings of systems that are difficult or impossible to observe directly (such as the internal functioning of cells or the evolutionary history of animal species). The second is to enable ecologists to predict how ecosystems change over periods of time ranging from months to decades, thereby enabling better management of ecosystems and deployment of ecological monitoring efforts. The investigator is working directly with experts in these applications to have an immediate and substantive impact in both areas. In one educational component of the project, the investigator is a core member of the team developing new modern introductory applied statistics courses for undergraduate students at Boston University. The investigator is also writing an accessible textbook on the design and analysis of algorithms for data science, which will be of broad interest to students and researchers in machine learning, data science, statistics, and related fields. Despite many empirical successes, a lack of machine-learning methods with rigorous guarantees has resulted in systems that unpredictably perform poorly in real-world settings and therefore cannot be trusted for scientific discovery and safety-critical applications. Hence, there is an urgent need to create learning algorithms that are simultaneously scalable to the large datasets and high-dimensional models typical of machine-learning applications; able to accurately quantify uncertainty to ensure correct decision-making despite model misspecification, distribution shift, and data corruption; and are reliable and easy-to-use for the typical machine-learning practitioner. The primary technical objective of the project is to provide a comprehensive solution to these challenges by developing provably correct subsampling Markov chain Monte Carlo (MCMC) algorithms with automated tuning procedures. The key technical tool is a statistical-scaling-limits approach to establishing statistical and algorithmic foundations for how to tune basic subsampling MCMC algorithms designed for inference in latent variable and Gaussian process models, and for modified subsampling MCMC algorithms that can improve computational efficiency and numerical stability. These theoretical developments will be translated into practical, user-friendly algorithms with diagnostics that inform the user if the theory is applicable to their problem. The theory and algorithms will also be extended to distributionally robust losses such as maximum mean discrepancy. The research program is highly interdisciplinary, drawing on theory and methods from large-scale probabilistic machine learning, statistics, stochastic analysis, stochastic process theory, and numerical analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当试图了解复杂系统(无论是单个细胞还是整个生态系统)的工作原理时,大型数据集有可能提供深入的科学和操作见解。然而,必须解决两个主要挑战:如何快速而严格地处理如此大的数据集,以及考虑到数据的局限性和对此类系统如何工作的了解,如何避免对得出的结论过于自信。这项研究开发了一个全面的框架和一套算法,以通用的方式解决这两个挑战,以便科学家和其他数据分析师可以使用现成的它们,从而加速新知识的获取。这项工作将在两个引起广泛关注的现代应用领域的背景下开展。首先是使生物学家能够了解难以或不可能直接观察的系统的内部运作(例如细胞的内部功能或动物物种的进化历史)。第二个目标是使生态学家能够预测生态系统在几个月到几十年的时间内如何变化,从而更好地管理生态系统和部署生态监测工作。研究人员正在与这些应用领域的专家直接合作,以便在这两个领域产生直接和实质性的影响。在该项目的一个教育部分中,调查员是为波士顿大学本科生开发新的现代应用统计入门课程的团队的核心成员。研究人员还在编写一本关于数据科学算法的设计和分析的易读教科书,这将引起机器学习、数据科学、统计学和相关领域的学生和研究人员的广泛兴趣。尽管取得了许多经验上的成功,但缺乏严格保证的机器学习方法导致系统在现实环境中表现不佳,因此无法信任科学发现和安全关键型应用。因此,迫切需要创建可同时扩展到机器学习应用程序典型的大型数据集和高维模型的学习算法;尽管存在模型指定错误、分布变化和数据损坏,仍能够准确量化不确定性,以确保正确的决策;对于典型的机器学习从业者来说可靠且易于使用。该项目的主要技术目标是通过开发具有自动调整程序的可证明正确的子采样马尔可夫链蒙特卡罗 (MCMC) 算法,为这些挑战提供全面的解决方案。关键技术工具是统计缩放限制方法,用于建立统计和算法基础,以调整基本子采样 MCMC 算法,该算法设计用于潜在变量和高斯过程模型的推理,以及改进的子采样 MCMC 算法,可以提高计算效率和数值稳定。这些理论发展将转化为实用的、用户友好的算法,并具有诊断功能,告知用户该理论是否适用于他们的问题。该理论和算法还将扩展到分布鲁棒损失,例如最大平均差异。该研究项目是高度跨学科的,借鉴了大规模概率机器学习、统计学、随机分析、随机过程理论和数值分析的理论和方法。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jonathan Huggins其他文献

A Statistical Learning Theory Framework for Supervised Pattern Discovery
用于监督模式发现的统计学习理论框架
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins;C. Rudin
  • 通讯作者:
    C. Rudin
Toward a Theory of Pattern Discovery
迈向模式发现理论
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
PASS-GLM:多项式近似足够的统计量以进行可扩展的贝叶斯 GLM 推理
A Framework for Improving the Reliability of Black-box Variational Inference
提高黑盒变分推理可靠性的框架
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Manushi K. V. Welandawe;M. R. Andersen;Aki Vehtari;Jonathan Huggins
  • 通讯作者:
    Jonathan Huggins
Using bagged posteriors for robust inference and model criticism
使用袋装后验进行稳健推理和模型批评
  • DOI:
    10.1093/sysbio/syab011
  • 发表时间:
    2019-12-15
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins;Jeffrey W. Miller
  • 通讯作者:
    Jeffrey W. Miller

Jonathan Huggins的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

具备可扩展性与隐私保障的数据驱动分布式优化方法及其在需求响应中的应用
  • 批准号:
    72301008
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于可扩展去蜂窝架构的大规模低时延高可靠通信研究
  • 批准号:
    62371039
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
基于无监督持续学习的单细胞多组学数据可扩展整合方法研究
  • 批准号:
    62303488
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
自动驾驶场景下基于强化学习的可扩展多智能体协同策略研究
  • 批准号:
    62306062
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
区块链系统中面向业务优化的混合状态验证机制的可扩展性研究
  • 批准号:
    62302202
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Towards Scalable and Robust Inference of Phylogenetic Networks
职业:走向可扩展和稳健的系统发育网络推理
  • 批准号:
    2144367
  • 财政年份:
    2022
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Leveraging Combinatorial Structures for Robust and Scalable Learning
职业:利用组合结构实现稳健且可扩展的学习
  • 批准号:
    1845032
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Scalable and Robust Dynamic Matching Market Design
职业:可扩展且稳健的动态匹配市场设计
  • 批准号:
    1846237
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Robust and scalable genome-wide phylogenetics
职业:稳健且可扩展的全基因组系统发育学
  • 批准号:
    1845967
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Robust, scalable, reliable machine learning
职业:稳健、可扩展、可靠的机器学习
  • 批准号:
    1750286
  • 财政年份:
    2018
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了