CAREER: Leveraging Combinatorial Structures for Robust and Scalable Learning

职业:利用组合结构实现稳健且可扩展的学习

基本信息

  • 批准号:
    1845032
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-05-01 至 2025-04-30
  • 项目状态:
    未结题

项目摘要

The difficulty of searching through a massive amount of data in order to quickly make an informed decision is one of today's most ubiquitous challenges. Many scientific and engineering models feature data with inherently discrete characteristics, where discrete means that the data takes on a finite set of possible values. Examples of such data include phrases in text to objects in an image. Similarly, nearly all aspects of data science involve discrete tasks such as data summarization and model explanation. As computational methods pervade all aspects of science and engineering, it is of great importance to understand which discrete formulations can be solved efficiently and how to do so. Many of these problems are notoriously hard, and even those that are theoretically solvable may only be possible for only small amounts of data. However, the problems of practical interest are often much more well-behaved and possess inherent structure that allows them to be solved more efficiently. This CAREER award aims to substantially advance the frontiers of large-scale discrete optimization in data science and machine learning by developing fundamentally new algorithms. This project will also provide a number of educational opportunities such as outreach to local high school and middle school students through Yale's Pathways to Science program.Just as convexity has been a celebrated and well-studied condition under which continuous optimization is tractable, submodularity is a condition for which discrete objectives may be optimized. While current research in submodular optimization has led to fundamental breakthroughs in discrete mathematical programming, there is still a large gap between the theory and the limitations of the existing algorithms used by practitioners in the real world. In particular, most of the existing submodular optimization methods fail miserably when faced with the numerous sources of uncertainty inherent in machine learning tasks, from noise in the data to variability of the true objective. Moreover, submodularity is too strong an assumption for a variety of novel machine learning applications, necessitating the development of completely new algorithms. In order to lift current provable methods out of the sterile lab environment and scale them into the messy real world, it is important to carefully reexamine their limitations, consider more realistic but less perfect conditions, and develop correspondingly robust yet scalable algorithms. This CAREER project presents a research plan towards designing, analyzing, and evaluating new approaches for robust submodular optimization at a massive scale that leads to solving a broad array of optimization problems of significant practical importance. Furthermore, it addresses generalizations of submodular functions that widely broaden the applicability of these methods, moving to a realm beyond submodularity. The research directions in this project have deep and far-reaching societal benefits, as robust and scalable computational methods play a central role in nearly every scientific and industrial venture in today's information age. Such advances are expected to play crucial roles in enabling data-driven scientific discoveries, promoting fairness in machine learning, and supporting STEM education by helping these communities handle the computational challenges associated with big data. The results of this project will be broadly disseminated to the greater scientific community through tutorials, workshops, and open-source software.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
搜索大量数据以快速做出明智的决策是当今最普遍的挑战之一。许多科学和工程模型的数据具有固有的离散特征,其中离散意味着数据呈现一组有限的可能值。此类数据的示例包括文本中的短语到图像中的对象。同样,数据科学的几乎所有方面都涉及离散任务,例如数据汇总和模型解释。由于计算方法遍及科学和工程的各个方面,了解哪些离散公式可以有效求解以及如何求解非常重要。其中许多问题都非常困难,甚至那些理论上可以解决的问题也可能仅适用于少量数据。然而,实际感兴趣的问题通常表现得更好,并且具有允许更有效地解决它们的固有结构。该职业奖旨在通过开发全新的算法,大幅推进数据科学和机器学习领域大规模离散优化的前沿。该项目还将提供许多教育机会,例如通过耶鲁大学的科学之路计划向当地高中和中学生进行推广。正如凸性一直是一种著名且经过充分研究的条件,在这种条件下,连续优化是易于处理的,子模性是一种可以优化离散目标的条件。虽然当前子模优化的研究已经在离散数学规划方面取得了根本性突破,但该理论与现实世界中实践者使用的现有算法的局限性之间仍然存在很大差距。特别是,当面对机器学习任务中固有的众多不确定性来源(从数据中的噪声到真实目标的可变性)时,大多数现有的子模块优化方法都会惨败。 此外,对于各种新颖的机器学习应用来说,子模块性的假设过于强烈,因此需要开发全新的算法。为了将当前可证明的方法从无菌实验室环境中提升出来,并将其扩展到混乱的现实世界中,重要的是仔细重新检查它们的局限性,考虑更现实但不太完美的条件,并开发相应的稳健但可扩展的算法。该职业项目提出了一项研究计划,旨在设计、分析和评估大规模鲁棒子模块优化的新方法,从而解决一系列具有重大实际意义的优化问题。此外,它还解决了子模函数的泛化问题,广泛拓宽了这些方法的适用性,进入了子模性之外的领域。该项目的研究方向具有深远的社会效益,因为强大且可扩展的计算方法在当今信息时代的几乎每一个科学和工业企业中都发挥着核心作用。这些进步预计将在实现数据驱动的科学发现、促进机器学习的公平性以及通过帮助这些社区应对与大数据相关的计算挑战来支持 STEM 教育方面发挥关键作用。该项目的成果将通过教程、研讨会和开源软件向更广泛的科学界广泛传播。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(34)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Power of Subsampling in Submodular Maximization
子模最大化中子采样的威力
  • DOI:
    10.1287/moor.2021.1172
  • 发表时间:
    2021-04-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Christopher Harshaw;Ehsan Kazemi;Moran Feldman;Amin Karbasi
  • 通讯作者:
    Amin Karbasi
Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
PAC 框架之外的多类别可学习性:通用费率和部分概念类别
Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free
量化 Frank-Wolfe:更快的优化、更少的通信和无投影
  • DOI:
  • 发表时间:
    2019-02-17
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mingrui Zhang;Lin Chen;Aryan Mokhtari;Hamed Hassani;Amin Karbasi
  • 通讯作者:
    Amin Karbasi
Fast Neural Kernel Embeddings for General Activations
用于一般激活的快速神经内核嵌入
  • DOI:
    10.48550/arxiv.2209.04121
  • 发表时间:
    2022-09-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Insu Han;A. Z;ieh;ieh;Jaehoon Lee;Roman Novak;Lechao Xiao;Amin Karbasi
  • 通讯作者:
    Amin Karbasi
Parallelizing Thompson Sampling
并行 Thompson 采样
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amin Karbasi其他文献

Graph-Constrained Group Testing
图约束组测试
  • DOI:
    10.1109/tit.2011.2169535
  • 发表时间:
    2010-01-09
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Mahdi Cheraghchi;Amin Karbasi;S. Mohajer;Venkatesh Saligrama
  • 通讯作者:
    Venkatesh Saligrama
Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling
看到看不见的网络:从受访者驱动的抽样中推断隐藏的社会关系
  • DOI:
    10.1609/aaai.v30i1.10164
  • 发表时间:
    2015-11-13
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lin Chen;Forrest W. Crawford;Amin Karbasi
  • 通讯作者:
    Amin Karbasi
Batched Neural Bandits
批量神经强盗
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Quanquan Gu;Amin Karbasi;Khashayar Khosravi;V. Mirrokni;Dongruo Zhou
  • 通讯作者:
    Dongruo Zhou
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning
Langevin Thompson 采样与对数通信:强盗和强化学习
A DOA estimation method for an arbitrary triangular microphone arrangement
任意三角形麦克风排列的DOA估计方法

Amin Karbasi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Leveraging high-throughput continuous-flow synthesis of Charge-Altering Releasable Transporter gene delivery vectors to establish structure-function relationships for mRNA delivery
利用高通量连续流合成电荷改变可释放转运蛋白基因递送载体来建立 mRNA 递送的结构功能关系
  • 批准号:
    9758810
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
Leveraging high-throughput continuous-flow synthesis of Charge-Altering Releasable Transporter gene delivery vectors to establish structure-function relationships for mRNA delivery
利用高通量连续流合成电荷改变可释放转运蛋白基因递送载体来建立 mRNA 递送的结构功能关系
  • 批准号:
    10007583
  • 财政年份:
    2019
  • 资助金额:
    $ 55万
  • 项目类别:
Leveraging Temozolomide to Improve Treatment Efficacy of Immune Checkpoint Blockade in Glioblastoma
利用替莫唑胺提高胶质母细胞瘤免疫检查点阻断的治疗效果
  • 批准号:
    10458232
  • 财政年份:
    2016
  • 资助金额:
    $ 55万
  • 项目类别:
Leveraging Temozolomide to Improve Treatment Efficacy of Immune Checkpoint Blockade in Glioblastoma
利用替莫唑胺提高胶质母细胞瘤免疫检查点阻断的治疗效果
  • 批准号:
    10462415
  • 财政年份:
    2016
  • 资助金额:
    $ 55万
  • 项目类别:
Leveraging Temozolomide to Improve Treatment Efficacy of Immune Checkpoint Blockade in Glioblastoma
利用替莫唑胺提高胶质母细胞瘤免疫检查点阻断的治疗效果
  • 批准号:
    10057265
  • 财政年份:
    2016
  • 资助金额:
    $ 55万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了