Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
基本信息
- 批准号:RGPIN-2017-05032
- 负责人:
- 金额:$ 2.04万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Machine learning (ML), a field that develops software that can improve itself through learning and experience, has been largely driven by the availability of historical data, and by the need to develop efficient and scalable algorithms and supporting theories. Conversely, the success of ML in science, engineering, and commerce, along with technological innovations, has led to an unprecedented growth and enthusiasm in big data collection, thereby redefining computational efficiency and inviting system solutions. For example, the recent AlphaGo system of Deepmind that beats top human Go players needed 1900 CPUs and 280 GPUs to carry out the computation. How to balance computation with communication in this vast distributed cluster, without compromising system throughput or correctness? On the other hand, a small startup developing a mobile app may not afford the same computational power as Google, hence often has to turn into primitive solutions. How to build an algorithmic framework for ML that provides ''knobs'' to adjust the computational load, with explicit, controllable loss on the accuracy? Meeting such diverse computational needs in the big data era has thus been a grand challenge for the ML field.******We attempt to address such computational challenge in ML and big data, through three complementary objectives: (1) Real problems are hard, but also structured. Over the years the importance of designing statistical methodologies and computational algorithms that can exploit certain structure in data and model has become evident. Encouraged by our previous work on sparsity and low-rankness, we propose to investigate two additional structures that are common in ML applications: monotonicity and multi-modality (in the tensor format), and developing efficient algorithms that benefit from the presence of such structures. (2) Data is always noisy and full of random fluctuations, hence diminishing the need of obtaining exact or even high-precision solutions in ML. Approximate computation, if done properly, can significantly reduce the computation time in ML. We initiate a systematic study of the tradeoffs of approximate computation in ML, from ''downgrading'' computationally expensive programs to simpler and cheaper ones, to ''optimally" smooth nondifferentiable functions, and to attach measures of nonconvexity to nonconvex functions. (3) Distributed computation has become the norm in handling big datasets. We propose the Bounded Asynchronous Protocol (BAP) to better balance communication and computation in distributed ML systems, and we continue to investigate the speedups and convergence guarantees of typical ML iterative algorithms under BAP and possibly less stringent convex or smooth assumptions. Our work will further advance the computational theory and practice in ML, and the resulting algorithms and system will be fundamental for analyzing big datasets using ML methodologies.
机器学习 (ML) 是一个开发可以通过学习和经验自我改进的软件的领域,它在很大程度上是由历史数据的可用性以及开发高效、可扩展的算法和支持理论的需求驱动的。相反,机器学习在科学、工程和商业领域的成功以及技术创新导致了大数据收集前所未有的增长和热情,从而重新定义了计算效率并吸引了系统解决方案。例如,Deepmind 最近击败人类顶尖围棋选手的 AlphaGo 系统需要 1900 个 CPU 和 280 个 GPU 来执行计算。如何在这个庞大的分布式集群中平衡计算与通信,而不影响系统吞吐量或正确性?另一方面,开发移动应用程序的小型初创公司可能无法提供与谷歌相同的计算能力,因此通常不得不转向原始的解决方案。如何构建一个机器学习算法框架,提供“旋钮”来调整计算负载,并在精度上产生明确的、可控的损失?因此,满足大数据时代如此多样化的计算需求对于机器学习领域来说是一个巨大的挑战。******我们试图通过三个互补的目标来解决机器学习和大数据中的这种计算挑战:(1)实际问题是困难的,但也是结构化的。多年来,设计能够利用数据和模型中某些结构的统计方法和计算算法的重要性已经变得显而易见。受到我们之前关于稀疏性和低秩性的工作的鼓励,我们建议研究机器学习应用中常见的两种额外结构:单调性和多模态(以张量格式),并开发受益于此类结构的存在的高效算法。 (2) 数据总是充满噪声且充满随机波动,因此减少了在机器学习中获得精确甚至高精度解决方案的需求。如果处理得当,近似计算可以显着减少机器学习中的计算时间。我们启动了一项关于机器学习中近似计算权衡的系统研究,从“降级”计算成本高昂的程序到更简单、更便宜的程序,到“最佳”平滑不可微函数,以及将非凸性度量附加到非凸函数。(3 )分布式计算已成为处理大数据集的规范,我们提出有界异步协议(BAP)以更好地平衡分布式机器学习系统中的通信和计算,并且我们将继续研究加速和计算。 BAP 下典型 ML 迭代算法的收敛保证以及可能不太严格的凸或平滑假设我们的工作将进一步推进 ML 的计算理论和实践,并且由此产生的算法和系统将成为使用 ML 方法分析大数据集的基础。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yu, Yaoliang其他文献
DEVIATE: A Deep Learning Variance Testing Framework
DEVIATE:深度学习方差测试框架
- DOI:
10.1109/ase51524.2021.9678540 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Pham, Hung Viet;Kim, Mijung;Tan, Lin;Yu, Yaoliang;Nagappan, Nachiappan - 通讯作者:
Nagappan, Nachiappan
Petuum: A New Platform for Distributed Machine Learning on Big Data
- DOI:
10.1145/2783258.2783323 - 发表时间:
2015-01-01 - 期刊:
- 影响因子:0
- 作者:
Xing, Eric P.;Ho, Qirong;Yu, Yaoliang - 通讯作者:
Yu, Yaoliang
Yu, Yaoliang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yu, Yaoliang', 18)}}的其他基金
Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
- 批准号:
RGPIN-2017-05032 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
A Theoretical Foundation and Practical Platform for Adversarial Machine Learning
对抗性机器学习的理论基础和实践平台
- 批准号:
543522-2019 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Collaborative Research and Development Grants
Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
- 批准号:
RGPIN-2017-05032 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
- 批准号:
RGPIN-2017-05032 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
A Theoretical Foundation and Practical Platform for Adversarial Machine Learning
对抗性机器学习的理论基础和实践平台
- 批准号:
543522-2019 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Collaborative Research and Development Grants
Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
- 批准号:
RGPIN-2017-05032 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
A Theoretical Foundation and Practical Platform for Adversarial Machine Learning
对抗性机器学习的理论基础和实践平台
- 批准号:
543522-2019 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Collaborative Research and Development Grants
Computational Foundations of Machine Learning in the Era of Big Data
大数据时代机器学习的计算基础
- 批准号:
RGPIN-2017-05032 - 财政年份:2017
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
大型复杂构件高性能刚柔耦合驱动喷涂机器人基础理论与关键技术
- 批准号:52335002
- 批准年份:2023
- 资助金额:230 万元
- 项目类别:重点项目
飞秒激光高效加工自发荧光钙钛矿微机器人基础研究
- 批准号:62305321
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向髓核摘除的脊柱内镜手术机器人自主操作基础理论与关键技术研究
- 批准号:62373054
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
高稳普适软体抓持机器人仿生聚类创成基础理论
- 批准号:52375030
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
支气管软镜手术机器人自主操作基础技术研究
- 批准号:62303248
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Longitudinal neural fingerprinting of opioid-use trajectories
阿片类药物使用轨迹的纵向神经指纹图谱
- 批准号:
10805031 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Maternal mHealth blood hemoglobin analysis with informed deep learning
通过知情深度学习进行孕产妇 mHealth 血液血红蛋白分析
- 批准号:
10566426 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Domain adaptation approaches to unify established and emerging sequencing technologies
领域适应方法统一现有和新兴的测序技术
- 批准号:
10643544 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Neurodevelopment of executive function, appetite regulation, and obesity in children and adolescents
儿童和青少年执行功能、食欲调节和肥胖的神经发育
- 批准号:
10643633 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别: