ASCENT: Collaborative Research: Scaling Distributed AI Systems based on Universal Optical I/O
ASCENT:协作研究:基于通用光学 I/O 扩展分布式人工智能系统
基本信息
- 批准号:2023751
- 负责人:
- 金额:$ 32.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-08-15 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Our society is rapidly becoming reliant on neural networks based artificial intelligence computation. New algorithms are invented daily, increasing the memory and computational requirements for both inference and training. This explosive growth has created an enormous demand for distributed machine learning (ML) training and inference. Estimates by OpenAI illustrate the steady growth of computational requirements of 100x every two years since 2012, which is a 50x faster than the rate of computation improvements enabled previously through Moore’s Law of semiconductor industry that we have enjoyed in the last half-century. This new computation demand has been partly met by rapid development of hardware accelerators and software stacks to support these specialized computations. Hardware accelerators have provided a significant amount of speed-up but today’s training tasks can still take days and even weeks. The reason for this: as the number of workers (e.g. compute nodes) increases, the computation time per worker decreases, but the communication requirements between the nodes increase, creating a bottleneck in the interconnect between the compute nodes. Future distributed ML systems will require 1-2 orders of magnitude higher interconnect bandwidth per node, creating a pressing need for entirely new ways to build interconnects for distributed ML systems. This proposal aims to create a new paradigm for scaling distributed ML computation, by developing a scalable interconnect solution based on advancing the integrated electronics and photonics technology that enables direct node-to-node optical fiber connectivity. The proposed cross-stack collaborative multi-disciplinary work will enable the education and training of a unique crop of engineers and scientists that cross the boundaries of machine learning, networking, and electronic-photonic systems and devices, which are in severe demand. The principal investigators have an established track record of direct engagement with high-school students providing summer internships at Berkeley Wireless Research Center and MIT’s Women’s Technology Program, as well as exemplary undergraduate research activities at Boston University. The educational and outreach activities the PIs have put in place will ensure early exposure and continued training of new generation of leaders in this field, from K-12, through undergraduate and graduate studies, and continuing workforce education, with special focus on underrepresented students.The interconnect has emerged as the key bottleneck in enabling the full potential of distributed ML. Future ML workloads are likely to require tens of Tbps of bandwidth per device. Ubiquitous deployment of logically-connected, physically distributed computation across shelf, rack and row scale can only be enabled by a new universal I/O that enables socket to socket communication at the energy, latency and bandwidth density of in-package interconnects. No such technology currently exists. Silicon-photonics based optical I/O has the potential to address this critical challenge, but fundamental advances–from chip manufacturing to routing algorithms–are still needed to ensure the scalability of these interconnect systems. To enable high-bandwidth density and energy-efficiency, dense wavelength division multiplexing must be used. High-efficiency ring resonator-based modulators and comb laser sources are needed to enable Tbps rates over each fiber connection and socket bandwidth scaling from 10s to 100s of Tbps. New link architectures like the proposed laser-forwarded coherent link are needed to enable high-efficiency external centralized comb laser sources with modest (sub-mW) power per wavelength per fiber port. The proposed work will also develop new scheduling algorithms, network architectures, and workload parallelism strategy to leverage the bandwidth density and low-latency of the universal optical I/O, to map large AI workloads with massive datasets to a scalable distributed compute system.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
我们的社会正在迅速依赖基于神经网络的人工智能计算,每天都会发明新的算法,这增加了推理和训练的内存和计算需求,这种爆炸性增长创造了对分布式机器学习(ML)训练和推理的巨大需求。 OpenAI 的估计表明,自 2012 年以来,计算需求每两年稳定增长 100 倍,这比我们过去通过半导体行业摩尔定律实现的计算改进速度快了 50 倍。半个世纪以来,硬件加速器和软件堆栈的快速发展已经部分满足了这种新的计算需求,硬件加速器提供了显着的加速,但今天的训练任务仍然需要数天甚至数周的时间。原因是:随着工作节点(例如计算节点)数量的增加,每个工作节点的计算时间会减少,但节点之间的通信需求会增加,从而在未来的分布式机器学习系统之间的互连中产生瓶颈。每个节点将需要高出 1-2 个数量级的互连带宽,因此迫切需要为分布式 ML 系统构建互连的全新方法。该提案旨在通过开发可扩展的互连解决方案来创建扩展分布式 ML 计算的新范例。基于集成电子和光子技术,可实现直接节点到节点光纤连接。拟议的跨堆栈协作多学科工作将能够对跨越机器学习边界的独特工程师和科学家进行教育和培训,网络,以及主要研究人员拥有与高中生直接接触的记录,在伯克利无线研究中心和麻省理工学院的女性技术项目以及示范性本科生研究活动中提供暑期实习机会。 PI 开展的教育和外展活动将确保该领域的新一代领导者从 K-12 开始,一直到本科和研究生学习以及继续劳动力教育,并特别关注这一领域的早期接触和持续培训。关于代表性不足的学生。互连已成为充分发挥分布式 ML 潜力的关键瓶颈。未来 ML 工作负载可能需要每台设备数十 Tbps 的带宽,跨机架、机架和行规模进行普遍部署的逻辑连接、物理分布式计算。可以通过一种新的通用 I/O 来实现,该 I/O 能够以封装内互连的能量、延迟和带宽密度实现套接字到套接字的通信,目前尚不存在基于硅光子的光学 I/O 技术有可能解决这一关键问题。挑战,但仍然需要从芯片制造到路由算法的根本性进步来确保这些互连系统的可扩展性,必须使用密集波分复用。需要基于调制器和梳状激光源来实现每个光纤连接的 Tbps 速率,并将套接字带宽从 10 秒扩展到 100 秒 Tbps。所提出的工作还将开发新的调度算法、网络架构和工作负载并行策略,以利用密度带宽和低功耗。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Photonic crystal modulator in a CMOS foundry platform
CMOS 代工平台中的光子晶体调制器
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Al Qubaisi, Kenaish;Onural, Deniz;Gevorgyan, Hayk;Popović, Miloš A.
- 通讯作者:Popović, Miloš A.
Reflectionless standing-wave operation in microring resonators
微环谐振器中的无反射驻波操作
- DOI:10.1364/ofc.2022.m3e.3
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Al Qubaisi, Kenaish;Gluhović, Ðorđe;Onural, Deniz;Popović, Miloš A.
- 通讯作者:Popović, Miloš A.
Microring Modulators in a New Silicon Photonics-Optimized 45 nm Monolithic Electronics-Photonics SOI CMOS Platform
新型硅光子优化 45 nm 单片电子光子 SOI CMOS 平台中的微环调制器
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Al Qubaisi, Kenaish;Khilo, Anatol;Gevorgyan, Hayk;Popović, Miloš A.
- 通讯作者:Popović, Miloš A.
Photonic resonators with microring-like behavior based on standing wave cavity pairs with opposite-symmetry modes
- DOI:10.1364/fio.2020.ftu8e.2
- 发表时间:2020-09
- 期刊:
- 影响因子:0
- 作者:K. A. Qubaisi;M. Popović
- 通讯作者:K. A. Qubaisi;M. Popović
Miniature, highly sensitive MOSCAP ring modulators in co-optimized electronic-photonic CMOS
- DOI:10.1364/prj.438047
- 发表时间:2022-01-01
- 期刊:
- 影响因子:7.6
- 作者:Gevorgyan, Hayk;Khilo, Anatol;Popovi, Milos A.
- 通讯作者:Popovi, Milos A.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Milos Popovic其他文献
The landscape of high-affinity human antibodies against intratumoral antigens
针对肿瘤内抗原的高亲和力人类抗体的前景
- DOI:
10.1101/2021.02.06.430058 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
G. Rakocevic;I. Glotova;I. de Santiago;B. Ç. Toptas;Milena Popovic;Milos Popovic;D. Leone;A. Stachyra;R. Rozenfeld;Deniz Kural;D. Biasci - 通讯作者:
D. Biasci
Keeping Friends Close, and Their Oil Closer: Rethinking the Role of the Shanghai Cooperation Organization in China's Strive for Energy Security in Kazakhstan
拉近朋友,拉近石油:重新思考上海合作组织在中国争取哈萨克斯坦能源安全中的作用
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Milos Popovic - 通讯作者:
Milos Popovic
Fragile Proxies: Explaining Rebel Defection Against Their State Sponsors
脆弱的代理人:解释叛乱分子背叛其国家赞助者的原因
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Milos Popovic - 通讯作者:
Milos Popovic
Managing Internationalized Civil Wars
管理国际化内战
- DOI:
10.1093/acrefore/9780190228637.013.573 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Erin K. Jenne;Milos Popovic - 通讯作者:
Milos Popovic
Inter-Rebel Alliances in the Shadow of Foreign Sponsors
外国赞助商阴影下的叛军联盟
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Milos Popovic - 通讯作者:
Milos Popovic
Milos Popovic的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Milos Popovic', 18)}}的其他基金
Collaborative Research: FuSe: Collaborative Optically Disaggregated Arrays of Extreme-MIMO Radio Units (CODAeMIMO)
合作研究:FuSe:Extreme-MIMO 无线电单元的协作光学分解阵列 (CODAeMIMO)
- 批准号:
2328946 - 财政年份:2023
- 资助金额:
$ 32.5万 - 项目类别:
Continuing Grant
RAISE-EQuIP: Single-Chip, Wall-Plug Photon Pair Source and CMOS Quantum Systems on Chip
RAISE-EQuIP:单芯片、壁插式光子对源和 CMOS 量子片上系统
- 批准号:
1842692 - 财政年份:2018
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
OP: Collaborative Research: Coherent Integrated Si-Photonic Links
OP:协作研究:相干集成硅光子链路
- 批准号:
1611086 - 财政年份:2016
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
OP: Collaborative Research: Coherent Integrated Si-Photonic Links
OP:协作研究:相干集成硅光子链路
- 批准号:
1701596 - 财政年份:2016
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
Molding Optical Field Patterns for Highly Efficient Design of Strong-Confinement Photonic Devices
用于强约束光子器件高效设计的模塑光场图案
- 批准号:
1128709 - 财政年份:2011
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
相似国自然基金
基于交易双方异质性的工程项目组织间协作动态耦合研究
- 批准号:72301024
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向5G超高清移动视频传输的协作NOMA系统可靠性研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向协作感知车联网的信息分发时效性保证关键技术研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
数据物理驱动的车间制造服务协作可靠性机理与优化方法研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
医保基金战略性购买促进远程医疗协作网价值共创的制度创新研究
- 批准号:
- 批准年份:2022
- 资助金额:45 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: How faithfully are melt embayments wedded to magma ascent?
合作研究:熔体海湾与岩浆上升的关系有多忠实?
- 批准号:
2221896 - 财政年份:2022
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
Collaborative Research: SWIFT: Context-aware Spectrum Coexistence dEsign aNd implemenTation in satellite bands (ASCENT)
合作研究:SWIFT:卫星频段的上下文感知频谱共存设计和实施 (ASCENT)
- 批准号:
2245910 - 财政年份:2022
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
Achieving Equity through SocioCulturally-informed, Digitally-Enabled Cancer Pain managemeNT” (ASCENT) Clinical Trial
通过社会文化知情、数字化的癌症疼痛管理 NT™ (ASCENT) 临床试验实现公平
- 批准号:
10539159 - 财政年份:2022
- 资助金额:
$ 32.5万 - 项目类别:
Collaborative Research: SWIFT: Context-aware Spectrum Coexistence dEsign aNd implemenTation in satellite bands (ASCENT)
合作研究:SWIFT:卫星频段的上下文感知频谱共存设计和实施 (ASCENT)
- 批准号:
2128540 - 财政年份:2021
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant
Collaborative Research: SWIFT: Context-aware Spectrum Coexistence dEsign aNd implemenTation in satellite bands (ASCENT)
合作研究:SWIFT:卫星频段的上下文感知频谱共存设计和实施 (ASCENT)
- 批准号:
2128584 - 财政年份:2021
- 资助金额:
$ 32.5万 - 项目类别:
Standard Grant