CAREER: Robust, scalable, reliable machine learning

职业:稳健、可扩展、可靠的机器学习

基本信息

  • 批准号:
    1750286
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-03-15 至 2024-02-29
  • 项目状态:
    已结题

项目摘要

Machine learning is increasingly deployed in large-scale, mission critical problems for the purpose of making decisions that affect a vast number of individuals' employment, savings, health, and safety. The potential for machine learning to dramatically impact and change people's lives necessitates that machine learning methods be robust, explainable, and understandable---rather than black-box. This research develops new techniques that are both computationally motivated and theoretically sound for robust machine learning at scale. The work is situated in the context of three modern classes of applications. (1) Economists are interested in analyzing the efficacy of microcredit, small loans to individuals in impoverished areas with the goal of eliminating poverty. (2) Biologists are interested in using single-cell RNA sequencing data to understand cells' relationships and development trajectories. (3) The Internet of Things (IoT) is poised to generate a wealth of complex data across energy readings in buildings, within transportation infrastructure, from vehicles on the road, and from many other sensor sources. The PI is working directly with area experts so as to have immediate, broad impact across application domains. In an educational component of the project, the PI is a core part of developing a new graduate curriculum and degree in statistics, data science, and statistical machine learning at MIT. The methods and applications in this proposal feature in a new course on modern machine learning methods. The PI is also developing a high-school level introduction to machine learning as part of the established Women's Technology Program (WTP).The issues of robustness and explainability particularly arise in domains with nontrivial spatial and temporal dependencies, where the amount of data is often massive, and where practitioners typically have some expert knowledge about the domain before engaging with a particular dataset. These are precisely the domains where existing machine learning methodologies are less well-developed. The need to bring structural knowledge to bear on the problem suggests the use of Bayesian methods, which can incorporate this knowledge via prior and modeling assumptions. To live up to the promise of these methods, though, practical approaches need to be robust to assumptions as well as to noisy or adversarial data, lest this data change important decisions in ways not understood by the practitioner. This research incorporates advances in statistical physics to assess the sensitivity of a data analysis to assumptions and data values. And to realize the advantages of the proposed robust and understandable machine learning framework, practitioners must face extreme scalability issues---both from a computational perspective as well as a modeling perspective. On the computational side, this research builds on recent advances from computational geometry to scale to data sets at modern sizes. On the modeling side, note that while small-scale problems exhibit dense spatio-temporal dependencies, large-scale problems tend to be sparser, and practical approaches must reflect this sparsity to be reliable at scale. This work incorporates advances in probability theory to model sparse IoT networks. This proposal is highly interdisciplinary---bringing together ideas from machine learning, statistics, physics, theoretical computer science, probability theory, and systems and applying these ideas to microcredit, single-cell RNA sequencing, sensor networks, international trade, and industrial applications including customer service at scale.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习越来越多地应用于大规模的关键任务问题,以便做出影响大量个人就业、储蓄、健康和安全的决策。机器学习具有极大影响和改变人们生活的潜力,因此机器学习方法必须稳健、可解释且易于理解,而不是黑匣子。这项研究开发了既具有计算动机又理论上合理的新技术,可实现大规模的稳健机器学习。这项工作位于三类现代应用程序的背景下。 (1)经济学家有兴趣分析以消除贫困为目标向贫困地区个人提供小额信贷、小额贷款的效果。 (2) 生物学家有兴趣使用单细胞 RNA 测序数据来了解细胞的关系和发育轨迹。 (3) 物联网 (IoT) 有望从建筑物、交通基础设施、道路上的车辆以及许多其他传感器源的能源读数中生成大量复杂的数据。 PI 直接与领域专家合作,以便在整个应用领域产生直接、广泛的影响。在该项目的教育部分中,PI 是麻省理工学院开发新的统计学、数据科学和统计机器学习研究生课程和学位的核心部分。该提案中的方法和应用是现代机器学习方法的新课程的特色。 PI 还正在开发高中水平的机器学习入门课程,作为已建立的女性技术计划 (WTP) 的一部分。鲁棒性和可解释性问题尤其出现在具有非平凡空间和时间依赖性的领域,这些领域的数据量通常是规模庞大,并且从业者在接触特定数据集之前通常拥有一些有关该领域的专业知识。这些正是现有机器学习方法尚不成熟的领域。由于需要利用结构知识来解决问题,因此建议使用贝叶斯方法,该方法可以通过先验和建模假设合并这些知识。然而,为了实现这些方法的承诺,实用方法需要对假设以及噪声或对抗性数据具有鲁棒性,以免这些数据以从业者不理解的方式改变重要决策。这项研究结合了统计物理学的进步来评估数据分析对假设和数据值的敏感性。为了实现所提出的稳健且易于理解的机器学习框架的优势,从业者必须面对极端的可扩展性问题——无论是从计算角度还是从建模角度。在计算方面,这项研究建立在从计算几何到规模到现代规模的数据集的最新进展的基础上。在建模方面,请注意,虽然小规模问题表现出密集的时空依赖性,但大规模问题往往比较稀疏,并且实际方法必须反映这种稀疏性才能在规模上可靠。这项工作结合了概率论的进步来模拟稀疏物联网网络。该提案具有高度跨学科性——汇集了机器学习、统计学、物理学、理论计算机科学、概率论和系统的思想,并将这些思想应用于小额信贷、单细胞 RNA 测序、传感器网络、国际贸易和工业应用该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(18)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Validated Variational Inference via Practical Posterior Error Bounds
通过实际后验误差界验证变分推理
  • DOI:
  • 发表时间:
    2019-10-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jonathan Huggins;Mikolaj Kasprzak;Trevor Campbell;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Confidently Comparing Estimates with the c-value
自信地将估计值与 c 值进行比较
More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics
少花钱多办事:通过贝叶斯非参数预测和最大化基因组变异发现
  • DOI:
    10.1093/biomet/asab012
  • 发表时间:
    2021-02
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Masoero, Lorenzo;Camerlenghi, Federico;Favaro, Stefano;Broderick, Tamara
  • 通讯作者:
    Broderick, Tamara
Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics
评估贝叶斯非参数中对破棍先验的敏感性
  • DOI:
    10.1214/22-ba1309
  • 发表时间:
    2018-10-15
  • 期刊:
  • 影响因子:
    4.4
  • 作者:
    Runjing Liu;Ryan Giordano;Michael I. Jordan;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets
对于高维分层模型,考虑跨协变量而不是跨数据集的效果的可交换性
  • DOI:
  • 发表时间:
    2021-07-13
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Brian L. Trippe;H. Finucane;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tamara Broderick其他文献

Comment: Nonparametric Bayes Modeling of Populations of Networks
评论:网络群体的非参数贝叶斯建模
Redshift Accuracy Requirements for Future Supernova and Number Count Surveys
未来超新星和计数巡天的红移精度要求
  • DOI:
    10.1086/424726
  • 发表时间:
    2004-01-30
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Huterer;A. Kim;L. Krauss;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
A Swiss Army Infinitesimal Jackknife
瑞士军用无穷小折刀
  • DOI:
  • 发表时间:
    2018-06-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;William T. Stephenson;Runjing Liu;Michael I. Jordan;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time
SKIM-FA 内核:线性时间内的高维变量选择和非线性交互发现
  • DOI:
    10.1109/icma.2015.7237519
  • 发表时间:
    2021-06-23
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Raj Agrawal;Tamara Broderick
  • 通讯作者:
    Tamara Broderick
Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box
具有确定性目标的黑盒变分推理:更快、更准确、甚至更黑盒
  • DOI:
    10.48550/arxiv.2304.05527
  • 发表时间:
    2023-04-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ryan Giordano;Martin Ingram;Tamara Broderick
  • 通讯作者:
    Tamara Broderick

Tamara Broderick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Tamara Broderick', 18)}}的其他基金

Collaborative Research: PPoSS: Planning: Scalable Systems for Probabilistic Programming
协作研究:PPoSS:规划:概率编程的可扩展系统
  • 批准号:
    2029016
  • 财政年份:
    2020
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop for Women in Machine Learning
机器学习女性研讨会
  • 批准号:
    1833154
  • 财政年份:
    2018
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop for Women in Machine Learning
机器学习女性研讨会
  • 批准号:
    1833154
  • 财政年份:
    2018
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant

相似国自然基金

强壮前沟藻共生细菌降解膦酸酯产生促藻效应的分子机制
  • 批准号:
    42306167
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于复合编码脉冲串的水下主动隐蔽性探测新方法研究
  • 批准号:
    61271414
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
半定松弛与非凸二次约束二次规划研究
  • 批准号:
    11271243
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
高效率强壮消息鉴别码的分析与设计
  • 批准号:
    61202422
  • 批准年份:
    2012
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
民航客运网络收益管理若干问题的研究
  • 批准号:
    60776817
  • 批准年份:
    2007
  • 资助金额:
    20.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
  • 批准号:
    2340586
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Towards Scalable and Robust Inference of Phylogenetic Networks
职业:走向可扩展和稳健的系统发育网络推理
  • 批准号:
    2144367
  • 财政年份:
    2022
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Leveraging Combinatorial Structures for Robust and Scalable Learning
职业:利用组合结构实现稳健且可扩展的学习
  • 批准号:
    1845032
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Scalable and Robust Dynamic Matching Market Design
职业:可扩展且稳健的动态匹配市场设计
  • 批准号:
    1846237
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
CAREER: Robust and scalable genome-wide phylogenetics
职业:稳健且可扩展的全基因组系统发育学
  • 批准号:
    1845967
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了