Distributed Optimization for Machine Learning on Decentralized Data and Features

基于分散数据和特征的机器学习分布式优化

基本信息

批准号：
RGPIN-2019-04998
负责人：
Niu, Di
金额：
$ 2.99万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=738000
关键词：
Distributed Optimization Machine Learning Decentralized

项目摘要

The need to scale up machine learning, in the presence of a rapid growth in both data volumes and model complexity, has sparked broad interests in developing distributed machine learning systems as well as parallel optimization algorithms. Most existing studies focus on partitioning computation among a tightly coupled cluster of machines either in a data-parallel fashion, to deal with large amounts of training samples, or a model-parallel fashion, to deal with large models, such as deep neural networks. In contrast, the goal of this research program is to glean insights and build models when the dataset (features, samples, labels or a combination of them) used for machine learning is inherently decentralized and owned by multiple participants/domains. Our long-term vision is to design reliable distributed algorithms and systems that can build models from decentralized data, without letting participants share original data with each other or to a central site. By effectively leveraging data from other domains, each participant is expected to enhance its predicting power over a model built only based on its local data, while the joint model built in a decentralized way is expected to approach and approximate the global model if all data were collected centrally. In the meantime, the sharing of model parameters among parties should also be minimized to preserve privacy and reduce communication overhead. Toward these objectives, we will introduce generic composite model structures that can jointly reap insights from data in different decentralization scenarios, including decentralization by features, by samples, by labels or decentralization by a combination of them. We will design theoretically inspired distributed optimization algorithms to solve these problems, develop effective communication compression techniques to reduce overhead, and also study implementation issues for specific applications. Specifically, our algorithms will be inspired by the recent advancements in the convergence of ADMM, stochastic gradient descent (SGD) and proximal SGD in an asynchronous and blockwise setting. Our communication compression techniques will be inspired by the opportunity to suppress model parameter transfers in flat regions of the optimization objective function, whereas existing literature mainly considers significance filters for gradients. Finally, we will use the proposed model architectures and algorithms to solve various real-world applications in cross-domain recommender systems, multitasked natural language understanding and collaborative mobile edge computing. We will design specific model composition and decomposition structures as well as distributed algorithms based on the data decentralization pattern inherent in each problem.

在存在快速增长数据量和模型复杂性的情况下，需要扩大机器学习的需求，这引发了对分布式机器学习系统作为平行优化算法的广泛兴趣。处理大量培训样本或模型平行的时尚RGE模型，例如深度神经网络，该研究计划的目标是在数据集中汇总和建立模型参与者/域名。为了增强其仅基于基于OTS的OTS的模型的预测功率，而以分散方式建立的联合模型将预计并近似于所有数据，如果所有数据都是集体的。也可以最大程度地减少私密性并减少沟通开销，我们将介绍一般的复合模型，可以通过标签或通过标签或分散的组合来从不同的部门中共同收获。这些问题，开发沟通技术，以减少我们的算法，将受到ADMM，随机梯度下降（SGD）L SGD的融合的启发。压迫F的优化目标功能，而存在主要考虑梯度的重要性过滤器。分解结构以及基于数据固有的模式固有的算法。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Niu, Di其他文献

FDML: A Collaborative Machine Learning Framework for Distributed Features

DOI：
10.1145/3292500.3330765
发表时间：
2019-01-01
期刊：
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING
影响因子：
0
作者：
Hu, Yaochen;Niu, Di;Zhou, Shengping
通讯作者：
Zhou, Shengping

Random Network Coding in Peer-to-Peer Networks: From Theory to Practice

DOI：
10.1109/jproc.2010.2091930
发表时间：
2011-03-01
期刊：
PROCEEDINGS OF THE IEEE
影响因子：
20.6
作者：
Li, Baochun;Niu, Di
通讯作者：
Niu, Di

BLCA prognostic model creation and validation based on immune gene-metabolic gene combination.

基于免疫基因-代谢基因组合的BLCA预后模型创建和验证。

DOI：
10.1007/s12672-023-00853-6
发表时间：
2023-12-16
期刊：
DISCOVER ONCOLOGY
影响因子：
2.2
作者：
Yue, Shao-Yu;Niu, Di;Liu, Xian-Hong;Li, Wei-Yi;Ding, Ke;Fang, Hong-Ye;Wu, Xin-Dong;Li, Chun;Guan, Yu;Du, He-Xi
通讯作者：
Du, He-Xi