AF: Small: Data Stream Algorithms with Application to Linear Algebra
AF:小:数据流算法及其在线性代数中的应用
基本信息
- 批准号:1815840
- 负责人:
- 金额:$ 43.4万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-06-01 至 2023-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Many important problems in machine learning, scientific computing, and statistics benefit from fast procedures for solving numerical linear algebra problems. At the same time, many large-scale datasets, such as internet search logs, network traffic, and sensor network data, have been studied in data-stream literature. Surprisingly, a number of fast procedures used in numerical linear algebra have been made possible by exploiting tools that were originally developed in the data-stream literature. These developments are based on a technique called sketching, which is a tool for quickly compressing a problem to a smaller version of itself, for which one can then afford to run a much slower procedure on the smaller problem. A major goal of this project is to study foundational problems in the data-stream domain, and develop their connections to problems in numerical linear algebra. The algorithms developed here will be accessible to graduate and undergraduate students from computer science, machine learning, and mathematics, and the investigator plans to integrate the results of the project into a graduate course on algorithms for big data, as well as undergraduate algorithms courses. The investigator is actively working with underrepresented minority and undergraduate researchers on topics directly related to this project.The first main thrust of this project is to develop new techniques for fundamental problems in data streams where the goal is to use minimal amount of memory while running algorithms over one pass on a data stream. Such problems include statistical problems - such as estimating the variance, moments, most frequent items, heavy hitters - for many of which optimal memory bounds are still unknown. Another challenge in this domain is that of processing massive streams for real-world graphs, such as geometric intersection graphs, which arise in cellular networks and scheduling theory. By studying a breadth of problems in different corners of data streams, the investigator plans to develop new techniques and build unexpected applications. The second main thrust of the project is to develop new algorithms and hardness results for fundamental problems in numerical linear algebra, connecting such problems to the study of data streams. CountSketch, which is a low-memory heavy hitters algorithm, paved the way for obtaining time-optimal algorithms for regression. This project will continue to push forward such connections, for example, by studying CountSketch in the context of tensors, which is a new application domain. The project will also investigate many deterministic (instead of randomized) data stream algorithms, and understanding their role in linear algebra. A major goal is to understand the limitations of speedups to linear algebra problems. One particular problem of interest here is to obtain low rank approximation with spectral norms.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习、科学计算和统计学中的许多重要问题都受益于解决数值线性代数问题的快速过程。与此同时,数据流文献中也研究了许多大规模数据集,例如互联网搜索日志、网络流量和传感器网络数据。令人惊讶的是,通过利用最初在数据流文献中开发的工具,数值线性代数中使用的许多快速过程已经成为可能。这些发展基于一种称为草图的技术,该技术是一种将问题快速压缩为其较小版本的工具,这样人们就可以在较小的问题上运行速度慢得多的程序。该项目的主要目标是研究数据流领域的基础问题,并发展它们与数值线性代数问题的联系。这里开发的算法可供计算机科学、机器学习和数学专业的研究生和本科生使用,研究人员计划将该项目的成果整合到大数据算法研究生课程以及本科生算法课程中。研究人员正在积极与代表性不足的少数族裔和本科生研究人员就与该项目直接相关的主题进行合作。该项目的第一个主要目标是开发解决数据流中基本问题的新技术,其目标是在运行算法时使用最少量的内存数据流上的一次传递。这些问题包括统计问题——例如估计方差、矩、最常见的项目、重磅问题——其中许多问题的最佳内存范围仍然未知。该领域的另一个挑战是处理现实世界图的大量流,例如蜂窝网络和调度理论中出现的几何交叉图。通过研究数据流不同角落的广泛问题,研究人员计划开发新技术并构建意想不到的应用程序。该项目的第二个主旨是为数值线性代数的基本问题开发新的算法和硬度结果,将这些问题与数据流的研究联系起来。 CountSketch 是一种低内存重量级算法,为获得时间最优回归算法铺平了道路。该项目将继续推动这种联系,例如,通过在张量的背景下研究 CountSketch,这是一个新的应用领域。该项目还将研究许多确定性(而不是随机)数据流算法,并了解它们在线性代数中的作用。主要目标是了解线性代数问题加速的局限性。这里令人感兴趣的一个特别问题是获得光谱范数的低阶近似。该奖项反映了 NSF 的法定使命,并且通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(71)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
WOR and p's: Sketches for ℓp-Sampling Without Replacement
WOR 和 p 的:无需替换的 p 采样草图
- DOI:
- 发表时间:2020-01
- 期刊:
- 影响因子:0
- 作者:Cohen, Edith;Pagh, Rasmus;Woodruff, David P.
- 通讯作者:Woodruff, David P.
LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs
LSF-Join:分布式全对的局部敏感过滤
- DOI:
- 发表时间:2020-01
- 期刊:
- 影响因子:0
- 作者:Rashtchian, Cyrus;Sharma, Aneesh;Woodruff, David P.
- 通讯作者:Woodruff, David P.
A Framework for Adversarially Robust Streaming Algorithms
对抗性鲁棒流算法的框架
- DOI:10.1145/3471485.3471488
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Ben;Jayaram, Rajesh;Woodruff, David P.;Yogev, Eylon
- 通讯作者:Yogev, Eylon
Separations for Estimating Large Frequency Moments on Data Streams
估计数据流上大频率矩的分离
- DOI:
- 发表时间:2021-01
- 期刊:
- 影响因子:0
- 作者:Woodruff, David P.;Zhou, Samson
- 通讯作者:Zhou, Samson
A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World
稀疏主成分分析的快速、可证明准确的近似算法揭示了世界各地的人类遗传变异
- DOI:10.1007/978-3-031-04749-7_6
- 发表时间:2022-01
- 期刊:
- 影响因子:0
- 作者:Chowdhury, Agniva;Bose, Aritra;Zhou, Samson;Woodruff, David P.;Drineas, Petros
- 通讯作者:Drineas, Petros
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
David Woodruff其他文献
Visions in Theoretical Computer Science: A Report on the TCS Visioning Workshop 2020
理论计算机科学的愿景:2020 年 TCS 愿景研讨会报告
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Shuchi Chawla;Jelani Nelson;C. Umans;David Woodruff - 通讯作者:
David Woodruff
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Grass:使用结构化稀疏梯度计算高效的低内存 LLM 训练
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Aashiq Muhamed;Oscar Li;David Woodruff;Mona Diab;Virginia Smith - 通讯作者:
Virginia Smith
Linear Models for Item Scores: Reliability, Covariance Structure, and Psychometric Inference
项目分数的线性模型:可靠性、协方差结构和心理测量推理
- DOI:
- 发表时间:
1993 - 期刊:
- 影响因子:0
- 作者:
David Woodruff - 通讯作者:
David Woodruff
A Primate Genome Project Deserves High Priority
灵长类动物基因组计划值得高度重视
- DOI:
- 发表时间:
2000 - 期刊:
- 影响因子:56.9
- 作者:
E. McConkey;A. Varki;J. Allman;K. Benirschke;F. Crick;T. Deacon;F. D. de Waal;A. Dugaiczyk;P. Gagneux;M. Goodman;L. Grossman;D. Gumucio;T. Insel;K. Kidd;M. King;K. Krauter;R. Kucherlapati;A. Motulsky;D. Nelson;P. Oefner;George E. Palade;George E. Palade;O. Ryder;C. Stewart;J. Sikela;A. Stone;David Woodruff - 通讯作者:
David Woodruff
SIAM Conference on Optimization
SIAM 优化会议
- DOI:
10.48550/arxiv.2206.08366 - 发表时间:
2019-08-01 - 期刊:
- 影响因子:0
- 作者:
Willliam Cook;Simge Küçükyavuz;Richard Barr;D. Hochbaum;Jill Hardin;David Woodruff;Yongpei Guan;M. Anjos;M. Todd;Polytech. Montreal;A. Ben;Andrew Conn;Todd Munson;Daniel Ralph - 通讯作者:
Daniel Ralph
David Woodruff的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('David Woodruff', 18)}}的其他基金
Collaborative Research: AF: Small: Exploring the Frontiers of Adversarial Robustness
合作研究:AF:小型:探索对抗鲁棒性的前沿
- 批准号:
2335412 - 财政年份:2024
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
Travel Support for 16th Tri-Annual International Conference on Stochastic Programming (ICSP); Davis, California; 24-28 July 2023
第 16 届三年一度的随机规划国际会议 (ICSP) 的差旅支持;
- 批准号:
2309931 - 财政年份:2023
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
Surface, subsurface and buried interface structure at the atomic scale; pushing the limits of medium energy ion scattering
原子尺度的表面、次表面和埋入界面结构;
- 批准号:
EP/E021786/1 - 财政年份:2006
- 资助金额:
$ 43.4万 - 项目类别:
Research Grant
New insights into complex molecule-surface interactions through local structure determination
通过局部结构测定对复杂分子-表面相互作用的新见解
- 批准号:
EP/D034329/1 - 财政年份:2006
- 资助金额:
$ 43.4万 - 项目类别:
Research Grant
Molecular Evolution and Systematics of Marmosets (Primates: Callithrix)
狨猴的分子进化和系统学(灵长类动物:Callithrix)
- 批准号:
9511194 - 财政年份:1995
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
CRB: Population Viability and Biodiversity Following Rainforest Fragmentation
CRB:雨林破碎化后的人口生存能力和生物多样性
- 批准号:
9300182 - 财政年份:1993
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
CRB: Population Viability of Tropical Forest Vertebrates
CRB:热带森林脊椎动物的种群活力
- 批准号:
9000486 - 财政年份:1990
- 资助金额:
$ 43.4万 - 项目类别:
Continuing Grant
DNA Sequences and Fingerprints from Chimpanzee Hair: A New Approach to Establishing Genetic and Evolutionary Relationships
黑猩猩毛发的 DNA 序列和指纹:建立遗传和进化关系的新方法
- 批准号:
9011896 - 财政年份:1990
- 资助金额:
$ 43.4万 - 项目类别:
Continuing Grant
Genetic Variation and Systematics of Cerion and Biomphalaria(Mollusca: Gastropoda)
Cerion 和 Bimphalaria(软体动物:腹足纲)的遗传变异和系统学
- 批准号:
8500733 - 财政年份:1985
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
Genetics of Host-Parasite Compatibility: Snail Resistance to a Trematode
宿主-寄生虫相容性的遗传学:蜗牛对吸虫的抗性
- 批准号:
8311210 - 财政年份:1984
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
相似国自然基金
基于复杂抽样和时空效应下卫生服务调查数据的小域估计方法研究
- 批准号:82304238
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
融合多源异构数据的小微企业经营风险智能识别与应对策略研究
- 批准号:72301188
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
复杂场景下模型—数据联合驱动的红外小目标检测研究
- 批准号:62303165
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
员工算法规避行为的内涵结构、量表开发及多层次影响机制:基于大(小)数据研究方法整合视角
- 批准号:72372021
- 批准年份:2023
- 资助金额:40 万元
- 项目类别:面上项目
小程序中用户隐私数据的违规泄露行为检测方法
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
相似海外基金
AF:Small: Fundamental Geometric Data Structures
AF:Small:基本几何数据结构
- 批准号:
2203278 - 财政年份:2022
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
Collaborative Research: AF: Small: RUI: Data Science from Economic Foundations
合作研究:AF:小型:RUI:来自经济基础的数据科学
- 批准号:
2218813 - 财政年份:2022
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
Collaborative Research: AF: Small: RUI: Data Science from Economic Foundations
合作研究:AF:小型:RUI:来自经济基础的数据科学
- 批准号:
2218814 - 财政年份:2022
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
AF: Small: New Directions for Parallel Data Structures
AF:小:并行数据结构的新方向
- 批准号:
2103483 - 财政年份:2021
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant
AF: Small: New Directions for Parallel Data Structures
AF:小:并行数据结构的新方向
- 批准号:
2103483 - 财政年份:2021
- 资助金额:
$ 43.4万 - 项目类别:
Standard Grant