CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms

职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化

基本信息

  • 批准号:
    2340586
  • 负责人:
  • 金额:
    $ 60.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-06-01 至 2029-05-31
  • 项目状态:
    未结题

项目摘要

When trying to understand the workings of complex systems (whether it be individual cells or whole ecosystems), large datasets have the potential to provide deep scientific and operational insights. However, there are two major challenges that must be addressed: how to quickly yet rigorously process such large datasets, and how to avoid becoming overconfident in the conclusions reached, given the limitations of the data and knowledge of how such systems work. This research develops a comprehensive framework and set of algorithms for addressing both of these challenges in a general way, so that scientists and other data analysts can use them off-the-shelf, thereby accelerating the acquisition of new knowledge. The work will be developed in the context of two modern application areas of broad interest. The first is to enable biologists to learn about the inner workings of systems that are difficult or impossible to observe directly (such as the internal functioning of cells or the evolutionary history of animal species). The second is to enable ecologists to predict how ecosystems change over periods of time ranging from months to decades, thereby enabling better management of ecosystems and deployment of ecological monitoring efforts. The investigator is working directly with experts in these applications to have an immediate and substantive impact in both areas. In one educational component of the project, the investigator is a core member of the team developing new modern introductory applied statistics courses for undergraduate students at Boston University. The investigator is also writing an accessible textbook on the design and analysis of algorithms for data science, which will be of broad interest to students and researchers in machine learning, data science, statistics, and related fields. Despite many empirical successes, a lack of machine-learning methods with rigorous guarantees has resulted in systems that unpredictably perform poorly in real-world settings and therefore cannot be trusted for scientific discovery and safety-critical applications. Hence, there is an urgent need to create learning algorithms that are simultaneously scalable to the large datasets and high-dimensional models typical of machine-learning applications; able to accurately quantify uncertainty to ensure correct decision-making despite model misspecification, distribution shift, and data corruption; and are reliable and easy-to-use for the typical machine-learning practitioner. The primary technical objective of the project is to provide a comprehensive solution to these challenges by developing provably correct subsampling Markov chain Monte Carlo (MCMC) algorithms with automated tuning procedures. The key technical tool is a statistical-scaling-limits approach to establishing statistical and algorithmic foundations for how to tune basic subsampling MCMC algorithms designed for inference in latent variable and Gaussian process models, and for modified subsampling MCMC algorithms that can improve computational efficiency and numerical stability. These theoretical developments will be translated into practical, user-friendly algorithms with diagnostics that inform the user if the theory is applicable to their problem. The theory and algorithms will also be extended to distributionally robust losses such as maximum mean discrepancy. The research program is highly interdisciplinary, drawing on theory and methods from large-scale probabilistic machine learning, statistics, stochastic analysis, stochastic process theory, and numerical analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当试图了解复杂系统(无论是单个细胞还是整个生态系统)的工作时,大数据集都有可能提供深厚的科学和运营见解。但是,必须解决两个主要挑战:如何快速但严格地处理如此大的数据集,以及如何避免在得出的结论中过度自信,鉴于数据的局限性和有关此类系统如何工作的知识。这项研究开发了一种全面的框架和一组算法,以一般方式应对这两个挑战,以便科学家和其他数据分析师可以在现成的情况下使用它们,从而加速对新知识的获取。这项工作将在两个广泛关注的现代应用领域的背景下开发。首先是使生物学家能够了解难以直接观察的系统的内部工作(例如细胞的内部功能或动物物种的进化史)。第二个是使生态学家能够预测生态系统如何在几个月到几十年的时间内发生变化,从而可以更好地管理生态系统和生态监测工作的部署。研究人员正在直接与这些应用程序中的专家合作,以在这两个领域产生直接和实质性的影响。在项目的一个教育部分中,研究人员是该团队的核心成员,开发了新的现代入门应用程序统计课程,为波士顿大学的本科生。研究人员还编写了一本有关数据科学算法的设计和分析的可访问教科书,这对于机器学习,数据科学,统计和相关领域的学生和研究人员来说将引起广泛的兴趣。尽管取得了许多经验成功,但缺乏严格保证的机器学习方法导致了在现实环境中表现不佳的系统,因此不能相信科学发现和安全关键的应用程序。因此,迫切需要创建学习算法,这些算法同时可扩展到机器学习应用程序的典型的大型数据集和高维模型;能够准确量化不确定性,以确保尽管模型规定,分配变化和数据损坏,但仍能确保正确的决策;对于典型的机器学习从业者而言,可靠且易于使用。该项目的主要技术目标是通过开发可证明正确的亚采样马尔可夫链蒙特卡洛(MCMC)算法,为这些挑战提供全面的解决方案,并提供自动调整程序。关键的技术工具是一种统计尺度限制方法,用于建立统计和算法基础,用于如何调整旨在推断潜在变量和高斯过程模型的基本中采样MCMC算法,以及用于改进的MCMC算法可以提高计算效率和数值稳定性的MCMC算法。这些理论发展将转化为具有诊断诊断的实用,用户友好的算法,这些算法将其告知用户该理论是否适用于他们的问题。该理论和算法也将扩展到分布强劲的损失,例如最大平均差异。该研究计划是高度跨学科的,借鉴了大规模概率机器学习,统计,随机分析,随机过程理论和数值分析的理论和方法。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识绩效和广泛影响的评估来通过评估来支持的。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jonathan Huggins其他文献

Provably Learning Mixtures of Gaussians and More
可证明学习高斯等的混合
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins
  • 通讯作者:
    Jonathan Huggins
A simple feature-copying approach for long-distance dependencies
一种用于长距离依赖的简单特征复制方法
  • DOI:
    10.3115/1596374.1596405
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Vilain;Jonathan Huggins;Ben Wellner
  • 通讯作者:
    Ben Wellner
Independent versus truncated finite approximations for Bayesian nonparametric inference
贝叶斯非参数推理的独立近似与截断有限近似
  • DOI:
    10.3150/18-bej1020
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Tin D. Nguyen;Jonathan Huggins
  • 通讯作者:
    Jonathan Huggins
Scaling Bayesian inference: theoretical foundations and practical methods
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins
  • 通讯作者:
    Jonathan Huggins
A Targeted Accuracy Diagnostic for Variational Approximations
变分近似的目标精度诊断
  • DOI:
    10.48550/arxiv.2302.12419
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yu Wang;Mikolaj Kasprzak;Jonathan Huggins
  • 通讯作者:
    Jonathan Huggins

Jonathan Huggins的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

面向智能网卡的可扩展FPGA包分类技术研究
  • 批准号:
    62372123
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
面向高并发软件的可扩展建模与分析技术研究
  • 批准号:
    62302375
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于随机化的高效可扩展深度学习算法研究
  • 批准号:
    62376131
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
包含时空维度的可扩展光MIMO解调芯片与均衡器
  • 批准号:
    62335019
  • 批准年份:
    2023
  • 资助金额:
    225.00 万元
  • 项目类别:
    重点项目
基于可扩展去蜂窝架构的大规模低时延高可靠通信研究
  • 批准号:
    62371039
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

CAREER: Towards Scalable and Robust Inference of Phylogenetic Networks
职业:走向可扩展和稳健的系统发育网络推理
  • 批准号:
    2144367
  • 财政年份:
    2022
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Scalable and Robust Dynamic Matching Market Design
职业:可扩展且稳健的动态匹配市场设计
  • 批准号:
    1846237
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Robust and scalable genome-wide phylogenetics
职业:稳健且可扩展的全基因组系统发育学
  • 批准号:
    1845967
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Leveraging Combinatorial Structures for Robust and Scalable Learning
职业:利用组合结构实现稳健且可扩展的学习
  • 批准号:
    1845032
  • 财政年份:
    2019
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
CAREER: Robust, scalable, reliable machine learning
职业:稳健、可扩展、可靠的机器学习
  • 批准号:
    1750286
  • 财政年份:
    2018
  • 资助金额:
    $ 60.48万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了