CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
基本信息
- 批准号:2237640
- 负责人:
- 金额:$ 54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-03-01 至 2028-02-29
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Unsupervised learning problems are in general significantly more difficult than their supervised counterparts in machine learning. This poses considerable challenges in not only machine learning research but also education, as nearly all models are NP-hard (with possibly the sole exception of PCA), and the community has been dwelling on algorithms without optimality guarantees for several decades. This project aims at developing a principled framework of minimum volume polytopic embedding that unifies various unsupervised learning problems such as independent component analysis, dictionary learning, and nonnegative matrix factorization, by treating the problem as embedding the set of data points into a regular polytope such as a simplex, a box, or an orthoplex, while guided by a novel matrix volume criterion. The benefit is two-fold: 1) it provides identifiability guarantee with finite samples, and 2) it hinges on the development of algorithms that could optimally solve these NP-hard problems under mild assumptions. The PI’s prior work has showed strong identifiability guarantees for the former benefit, while this project will focus on resolving the latter one, starting from a Frank-Wolfe algorithmic framework that has shown great empirical success. Furthermore, this project will greatly expand its application domains such as POMDP identification in reinforcement learning, aggregate flexibility in power systems, and deep polytopic word embedding in natural language processing. In terms of the mathematical framework, extensions to handle nonlinearity and deep representation learning are also developed, which have been elusive and are expected to be widely impactful beyond the main focus of theory and algorithm development in this project. Extensive education and outreach plans are laid out to corroborate the research impact and encourage students from all backgrounds to engage in computer science and machine learning research.In this project we propose a novel framework that tries to transform all data points as points in a regular polytope (such as a simplex, a box, or an orthoplex), hence the aim polytopic embedding, while guided by a novel matrix volume optimization criterion. The PI's prior work not only showed strong identifiability guarantees of the latent representation, but also found a wide variety of practical success in applications. Prior success inspires the PI to further investigate this direction, resolve unsettled theoretical challenges, broaden the learning framework, and seek even more application domains. This project will evolve along the following synergistic thrusts: in Thrust 1, a Frank-Wolfe algorithm is designed to solve the NP-hard polytopic embedding problem. Inspired by recent developments in analyzing guaranteed non-convex learning, a promising pathway is laid out to provide provable global optimality guarantees. In Thrust 2, the proposed learning framework will be used to identify an unknown POMDP from only observations with computational guarantees. Research along this thrust will be applied to healthcare recommendations from medical data. In Thrust 3, the problem of aggregate flexibility in power systems is introduced, which provides an interesting dual interpretation of polytopic embedding. Experiments on real data will validate the performance and expand the framework to handle nonlinear constraints. In Thrust 4, we propose a novel word embedding scheme with not only computational guarantee but also semantic interpretation. An extension to deep polytopic embedding framework is also introduced.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
无监督学习问题比机器学习中的有监督学习问题要困难得多,这不仅给机器学习研究而且给教育带来了相当大的挑战,因为几乎所有模型都是 NP 难的(可能唯一的例外是 PCA)。几十年来,社区一直在研究没有最优性保证的算法,该项目旨在开发一个最小体积多面嵌入的原则框架,通过处理独立成分分析、字典学习和非负矩阵分解来统一各种无监督学习问题。问题是将数据点集嵌入到规则多面体中,例如单纯形、盒子或正形体,同时由新颖的矩阵体积标准引导,其好处有两个:1)它提供了有限样本的可识别性保证, 2)它取决于能够在温和假设下最优地解决这些 NP 难题的算法的开发。 PI 之前的工作已经显示出对前一个好处的强大可识别性保证,而本项目将重点关注。为了解决这一问题,从已经取得巨大经验成功的 Frank-Wolfe 算法框架开始,该项目将极大地扩展其应用领域,例如强化学习中的 POMDP 识别、电力系统中的聚合灵活性以及深度多面词嵌入。在自然语言处理方面,还开发了处理非线性和深度表示学习的扩展,这些扩展一直难以捉摸,预计将在该项目的理论和算法开发的主要焦点之外产生广泛的影响。制定教育和推广计划是为了证实研究影响,并鼓励来自不同背景的学生参与计算机科学和机器学习研究。在这个项目中,我们提出了一个新颖的框架,试图将所有数据点转换为常规多胞体中的点(例如单纯形、盒子或正交体),因此目标是多面嵌入,同时在新颖的矩阵体积优化标准的指导下,PI的先前工作不仅显示了潜在表示的强大可识别性保证,而且先前的成功激励 PI 研究这个方向,解决尚未解决的理论挑战,拓宽学习框架,并寻求更多的应用领域,该项目将沿着以下协同主旨进一步发展: Thrust 1 是一种 Frank-Wolfe 算法,旨在解决 NP 困难的多面嵌入问题,受到分析有保证的非凸学习的最新发展的启发,提出了一条有希望的途径来提供可证明的全局。在主旨 2 中,所提出的学习框架将用于仅从具有计算保证的观察结果中识别未知的 POMDP。在主旨 3 中,该主旨的研究将应用于医疗数据的总体灵活性问题。引入了 power systems,它提供了多面嵌入的双重解释。在真实数据上进行的实验将验证性能并扩展框架以处理非线性约束。在 Thrust 4 中,我们提出了一种新颖的词嵌入方案。不仅是计算保证,还引入了对深度多面嵌入框架的扩展。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Identifiable Bounded Component Analysis Via Minimum Volume Enclosing Parallelotope
通过最小体积封闭平行位图进行可识别的有界分量分析
- DOI:10.1109/icassp49357.2023.10095905
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:Hu, Jingzhou;Huang, Kejun
- 通讯作者:Huang, Kejun
Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees
具有可识别性保证的体积正则化非负 Tucker 分解
- DOI:10.1109/icassp49357.2023.10096076
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:Sun, Yuchen;Huang, Kejun
- 通讯作者:Huang, Kejun
Global Identifiability of L1-based Dictionary Learning via Matrix Volume Optimization
通过矩阵体积优化实现基于 L1 的字典学习的全局可识别性
- DOI:
- 发表时间:2023-12
- 期刊:
- 影响因子:0
- 作者:Hu, Jingzhou;Huang, Kejun
- 通讯作者:Huang, Kejun
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kejun Huang其他文献
Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm
无锚相关主题建模:可识别性和算法
- DOI:
- 发表时间:
2016-11-15 - 期刊:
- 影响因子:0
- 作者:
Kejun Huang;Xiao Fu;N. Sidiropoulos - 通讯作者:
N. Sidiropoulos
Phase retrieval using a conjugate symmetric reference
使用共轭对称参考进行相位检索
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Kejun Huang;Yonina C. Eldar - 通讯作者:
Yonina C. Eldar
Multi-functional Pd/MOFs@MOFs Confined Core-Shell Catalysts with Wrinkled Surface for Selective Catalysis.
用于选择性催化的具有皱纹表面的多功能Pd/MOFs@MOFs受限核壳催化剂。
- DOI:
10.1002/asia.202100922 - 发表时间:
2021-09-21 - 期刊:
- 影响因子:0
- 作者:
Min‐Jie Chen;Ganggang Chang;Li;Kejun Huang;Chun Pu;Dan Li;Yao Yao;Jiaxin Li - 通讯作者:
Jiaxin Li
Unsupervised Learning of Nonlinear Mixtures: Identifiability and Algorithm
非线性混合物的无监督学习:可识别性和算法
- DOI:
10.1109/ieeeconf44664.2019.9048661 - 发表时间:
2019-11-01 - 期刊:
- 影响因子:0
- 作者:
Bo Yang;Xiao Fu;N. Sidiropoulos;Kejun Huang - 通讯作者:
Kejun Huang
Vulture: VULnerabilities in impuTing drUg REsistance
秃鹫:估算耐药性的弱点
- DOI:
10.1145/3584371.3612993 - 发表时间:
2023-09-03 - 期刊:
- 影响因子:0
- 作者:
Aysegül Bumin;Megan Shah;Kejun Huang;Tamer Kahveci - 通讯作者:
Tamer Kahveci
Kejun Huang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
CAREER: Principled yet practical observability for a microservices-based cloud
职业:基于微服务的云的原则性且实用的可观察性
- 批准号:
2340128 - 财政年份:2024
- 资助金额:
$ 54万 - 项目类别:
Continuing Grant
Principled phylogenomic analysis without gene tree estimation
无需基因树估计的有原则的系统发育分析
- 批准号:
2308495 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别:
Standard Grant
Principled Reasoning about Dynamical Systems
关于动力系统的原理推理
- 批准号:
RGPIN-2020-05031 - 财政年份:2022
- 资助金额:
$ 54万 - 项目类别:
Discovery Grants Program - Individual
CRCNS Research Proposal: Collaborative Research: US-German Collaboration toward a biophysically principled network model of transcranial magnetic stimulation (TMS)
CRCNS 研究提案:合作研究:美德合作建立经颅磁刺激 (TMS) 的生物物理原理网络模型
- 批准号:
10708986 - 财政年份:2022
- 资助金额:
$ 54万 - 项目类别:
CRCNS Research Proposal: Collaborative Research: US-German Collaboration toward a biophysically principled network model of transcranial magnetic stimulation (TMS)
CRCNS 研究提案:合作研究:美德合作建立经颅磁刺激 (TMS) 的生物物理原理网络模型
- 批准号:
10610594 - 财政年份:2022
- 资助金额:
$ 54万 - 项目类别: