Collaborative Research: RI: Small: Robust Deep Learning with Big Imbalanced Data

合作研究：RI：小型：具有大不平衡数据的鲁棒深度学习

基本信息

批准号：
2110546
负责人：
Penghang Yin
金额：
$ 23.35万
依托单位：
SUNY at Albany
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2110546&HistoricalAwards=false
关键词：
Collaborative Research RI Small Robust

项目摘要

This project promotes the progress of science and technology development by advancing artificial intelligence (AI) through innovations in scalable and robust computational methods. AI, especially deep learning, has brought transformative impact in industries and quantum leaps in the quality of a wide range of everyday technologies including face recognition, speech recognition and machine translation. However, in order to accelerate the democratization of AI there are still many challenges to be addressed including data issues and model issues. This project seeks to advance AI by addressing one critical issue related to data; i.e., data imbalance. This happens when the collected data for training AI models does not have enough instances representing some property the models are trying to learn. For example, molecules with a certain antibacterial property would be far fewer than all possible molecules making predictions of antibacterial properties challenging. The goal of this project is to develop algorithms with theoretical guarantees to make AI learn more effectively from the big imbalanced data. This project will also contribute to training future professionals in AI and machine learning, including training high school students and under-represented undergraduates. This project investigates a broad family of robust losses for deep learning. The research activities include (i) developing scalable offline stochastic algorithms for solving non-decomposable robust losses that are formulated into min-max, min-min formulations; (ii) developing efficient online stochastic algorithms for solving a family of distributionally robust optimization problems that are cast into compositional optimization problems; (iii) developing effective strategies for training deep neural networks by solving the considered non-decomposable robust losses; (iv) establishing the underlying theory including optimization and statistical convergence of the proposed algorithms. The algorithms are being evaluated on big imbalanced data such as images, graphs, texts.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目通过以可扩展且强大的计算方法的创新来推进人工智能（AI），从而促进了科学技术发展的进步。 AI，尤其是深度学习，对行业产生了变革性的影响，并在各种日常技术的质量中跳跃，包括面部识别，语音识别和机器翻译。但是，为了加速AI的民主化，仍然存在许多挑战，包括数据问题和模型问题。该项目旨在通过解决与数据相关的一个关键问题来推进AI；即数据不平衡。当收集的用于培训AI模型的数据没有足够代表模型试图学习的某些属性的实例时，就会发生这种情况。例如，具有某种抗菌特性的分子将少于所有可能的分子，这些分子对抗菌特性的预测提出了挑战。该项目的目的是开发具有理论保证的算法，以使AI从大型不平衡数据中更有效地学习。该项目还将有助于培训未来的AI和机器学习专业人员，包括培训高中生和代表性不足的本科生。该项目调查了一个广泛的深度学习损失家庭。研究活动包括（i）开发可扩展的离线随机算法，以解决将非解释的稳健损失分为最小，最小米尔的制剂；（ii）开发有效的在线随机算法，以解决一个分配强大的优化问题，这些问题被置于组成优化问题中；（iii）通过解决考虑的非解释稳健损失来制定训练深神经网络的有效策略；（iv）建立基本理论，包括所提出算法的优化和统计收敛。该算法正在对大型不平衡数据（例如图像，图形，文本）进行评估。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子优点和更广泛的影响评估标准通过评估来支持的。