CRII: III: Scaling up Distance Metric Learning for Large-scale Ultrahigh-dimensional Data

CRII：III：扩大大规模超高维数据的距离度量学习

基本信息

批准号：
1463988
负责人：
Tianbao Yang
金额：
$ 17.46万
依托单位：
University of Iowa
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-03-01 至 2018-02-28
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1463988&HistoricalAwards=false
关键词：
CRII III Scaling up Distance

项目摘要

This project is to research and develop highly scalable stochastic optimization algorithms for distance metric learning (DML) for large-scale ultrahigh-dimensional (LSUD) data. DML is a fundamental problem in machine learning aiming to learn a distance metric such that intra-class variation is small and inter-class variation is large. When the scale and dimensionality of data is very large, the computational cost of DML is prohibitive. Domains utilizing machine learning techniques such as computer vision, natural language processing and bioinformatics will be directly impacted by this research. For example, one application is fine-grained image classification, e.g., categorizing different types of flowers or models of vehicles from pictures (this application will be used as one criteria to evaluate success of the research.) The research will enable data scientists to extract more knowledge from massive high-dimensional data complementing the White House BIG DATA Initiative to analyze large and complex data sets. Beyond its research impact, this project will facilitate the development of a new machine learning course at the University of Iowa (UI), and contribute to training future professionals in big data analytics. Broader impact will be further affected by dissemination of results through publications, open-sourced software, etc.This project addresses the computational challenges of LSUD-DML by scaling up the state of the art stochastic gradient descent (SGD) methods. A key computational bottleneck in applying SGD to DML is to project the updated solution into a complicated feasible domain at each iteration. The innovative proposed ideas lie at reducing the total cost of projections by (i) constructing and exploring a low-rank structured stochastic gradient to reduce the cost of projection, and (ii) dividing iterations into epochs and performing a projection-efficient SGD at each epoch to reduce the number of projections. Investigating data-dependent sampling strategies (i.e., selective sampling, importance sampling, and a combination of both) for LSUD-DML will further scale up the proposed methods. This research will provide experimental evidence regarding the scalability of the proposed algorithms while revealing insights into the proposed techniques and various analytical tradeoffs.For further information see the project web site at: http://homepage.cs.uiowa.edu/~tyng/dml.html.

该项目是为了研究和开发高度可扩展的随机优化算法，用于远程度量学习（DML）用于大规模超高维（LSUD）数据。 DML是机器学习旨在学习距离度量的一个基本问题，因此类内变化很小，类间变化很大。当数据的尺度和维度非常大时，DML的计算成本是过时的。利用机器学习技术，例如计算机视觉，自然语言处理和生物信息学的领域将受到这项研究的直接影响。例如，一种应用是细粒度的图像分类，例如，对图片中不同类型的花朵或车辆模型进行分类（该应用将用作评估研究成功的一个标准。）该研究将使数据科学家能够从大规模的高维数据中提取更多知识，以补充白色的白宫大数据，以分析大型和复杂的数据集。除研究影响外，该项目还将促进爱荷华大学（UI）新机器学习课程的开发，并为大数据分析的未来专业人员培训。通过出版物，开源软件等传播结果的影响将进一步影响。该项目通过扩大最先进的随机梯度下降（SGD）方法来解决LSUD-DML的计算挑战。将SGD应用于DML的关键计算瓶颈是将更新的解决方案投射到每次迭代时可行的可行域中。创新的想法旨在通过（i）构建和探索低级别的结构性随机梯度来降低投影的总成本，以降低投影成本，（ii）将迭代分为时期并在每个时期内进行投影有效的SGD，以减少投影次数。研究依赖数据的采样策略（即选择性抽样，重要性采样和两者组合）对于LSUD-DML将进一步扩展提出的方法。这项研究将提供有关拟议算法的可伸缩性的实验证据，同时揭示了对所提出的技术和各种分析折衷的见解。有关更多信息，请参见项目网站：http：//homepage.cs.uiowa.edu/~tyng/dml.html.html.html。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Tianbao Yang其他文献

Improved bounds for the Nystrm method with application to kernel classification

改进 Nystr 的界限

DOI：
发表时间：
2013
期刊：
IEEE Transactions on Information Theory
影响因子：
2.5
作者：
Rong Jin;Tianbao Yang;Mehrdad Mahdavi;Yu-Feng Li;Zhi-Hua Zhou
通讯作者：
Zhi-Hua Zhou

Deep AUC Maximization for Medical Image Classification: Challenges and Opportunities

DOI：
发表时间：
2021-11
期刊：
ArXiv
影响因子：
0
作者：
Tianbao Yang
通讯作者：
Tianbao Yang

Evolution of the morphological, structural, and molecular properties of gluten protein in dough with different hydration levels during mixing.

DOI：
10.1016/j.fochx.2022.100448
发表时间：
2022-10-30
期刊：
FOOD CHEMISTRY-X
影响因子：
6.1
作者：
Ruobing Jia;Mengli Zhang;Tianbao Yang;Meng Ma;Qingjie Sun;Man Li
通讯作者：
Man Li

UV-Light-Induced Dehydrogenative N-Acylation of Amines with 2-Nitrobenzaldehydes to Give 2-Aminobenzamides

紫外线诱导胺与 2-硝基苯甲醛脱氢 N-酰化生成 2-氨基苯甲酰胺

DOI：
10.1055/a-1736-4388
发表时间：
2022-01
期刊：
Synthesis
影响因子：
0
作者：
Dishu Zeng;Tianbao Yang;Niu Tang;Wei Deng;Jiannan Xiang;Shuang-Feng Yin;Nobuaki Kambe;Renhua Qiu
通讯作者：
Renhua Qiu