AF:Small:Extreme Streaming Problems
AF:小:极端流媒体问题
基本信息
- 批准号:1718432
- 负责人:
- 金额:$ 49.91万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-15 至 2020-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The need for modern streaming systems to collect and analyze human activities is enormous, in businesses, government, academia, and society. Businesses can improve their operations and production; governments can improve the participation and satisfaction of citizens; society at large can be more sustainable or safe; etc. However, systems that collect and analyze such streams have enormous challenges of scale. At the highest level, this proposal combines a research, education and outreach plan to address these challenges. This research focusses on developing new algorithmic methods and theoretical understanding of modern streaming problems. There is an extensive theory of streaming algorithms for single streams, or to a limited extent to distributed streams, for one (or few) high cardinality dimensional data and simple frequency based analyses. But there are large gaps in creating streaming algorithms for "extreme" needs in modern data streaming applications, where the dimension of data that is collected is large, multiple streams of collected in distributed vantage points, there is a need to find anomalies in high dimensional spaces, and analyses that are needed are sophisticated including machine learning and other real time decision making tasks. This research develops the algorithmic theory of these extreme streaming problems. In particular, the project develops the algorithmic foundations for using large scale distributed streaming systems and tradeoff quality, certainty, CPU, memory and communication needed to do extreme, streaming, sophisticated analyses. Since modern streaming systems power businesses and deal with behavioral data on users, this work has broad societal impact. The project significantly improves the state of the art in algorithms for modern streaming systems. By providing new, rich algorithmic approaches, the project inspires practitioners in academia and industry to conceive more impactful applications, which are infeasible given the current algorithmic tools.The research program both enables and benefits from an education and outreach program that enhances curriculum, fosters training women and underrepresented minorities. To enable technology transfer the project involves practitioners in streaming systems, for field-testing the methods whenever possible. All publishable results are disseminated in respected academic journals, conferences, and workshops. All code and data sets developed in this work are made openly available to the community via the MassDAL site that already has code that is used by the community for classical streaming problems.Classical streaming algorithms use space sublinear, typically polylogarithmic in the input parameters. When extended to distributed systems, often the focus is on sublinear communication. The research program, here, builds on this algorithmic theory of past 15 years and addresses the modern, extreme needs of streaming applications. The project extends the theory to: many dimensions with large attributes, using far fewer resources in memory, computing and communication; emerging, pipeline models of streaming; more sophisticated analyses from local privacy to deep learning type vector embeddings; etc. The research program addresses fundamental problems. In more detail:(a) Extant streaming algorithms work for one or few dimensions of data of high cardinality. Modern streaming systems collect logs of human activity and have 100s of dimensions, 10s or more of them have very high cardinality. Can one identify the key problems for these domains and develop an algorithmic theory? The PI has identified an effective approach based on graphical modeling of the relationship between the dimensions, and believes this nugget can yield an effective theory.(b) Extant streaming algorithms use polylogarithmic words of memory per analysis when they are considered successful (and lower bounds point to problems for which sublinear space is not sufficient). Modern streaming systems run several orders of magnitude of such analyses, for example, one analysis for each of the millions of users. The project has identified an approach of "frugal" streaming, where algorithms use a small constant number of words, and develops a theory of frugal streaming algorithms, and their limitations.(c) Modern data stream systems allow pipelining, with the stream (modifiable or not) passing through stages, either at individual sites, or across the sites. The project abstracts and develops algorithmic theory of streaming problems with pipelined streaming systems. Preliminary results indicate that this allows algorithms that use time sublinear in the sublinear space used by the solutions, and there is a rich class of path problems that can be solved in these models which are impossible in classical streaming.(d) Modern systems need sophisticated streaming analyses. For example, streaming systems that collect usage data from users need private methods, and combining local differential privacy with streaming methods is an exciting direction; as another example, systems that analyze usage data might rely on embedding data into vectors with semantics, like word2vec and related deep learning methods. These methods need to work with polylogarithmic space for streaming. As another example, rich classes of graph problems are solvable in property testing framework with sublinear samples, can such classes be solved in streaming models too? The project highlights specific research challenges involved in developing streaming algorithms, and develops algorithms with provable performance guarantees on the tradeoff of resources used, approximation ratio, and probability of success. The project empirically evaluates them when possible.One cannot take known statements of problems and hope to solve them using streaming algorithms. One needs to modify the problems a bit to be amenable to streaming. In classical streaming, "heavy hitters" and "few terms" properties helped achieve that. In similar vein, the project identifies certain natural phenomena which helps formulate versions of problems for which extreme streaming solutions can be developed. Contours of this are already seen in using graphical models that limit interactions between dimensions to circumvent high dimensional high cardinality cases, or reusing counters in frugal streaming or sampling the sketch data structure in privacy problems and pipelined streaming, or using the randomness in stream order. This may eventually lead to algorithmic and empirical insights. Overall vision of the project is to provide a principled perspective for design and analysis of streaming algorithms with extreme needs.
企业、政府、学术界和社会对现代流媒体系统收集和分析人类活动的需求是巨大的。企业可以改善其运营和生产;政府可以提高公民的参与度和满意度;整个社会可以更加可持续或更安全;然而,收集和分析此类流的系统面临着巨大的规模挑战。在最高层面上,该提案结合了研究、教育和外展计划来应对这些挑战。这项研究的重点是开发新的算法方法和对现代流媒体问题的理论理解。 对于单个流或在有限程度上分布式流、用于一个(或几个)高基数维度数据和基于简单频率的分析,存在广泛的流算法理论。但是,在现代数据流应用程序中,为“极端”需求创建流算法还存在很大差距,其中收集的数据维度很大,在分布式有利位置收集多个流,需要在高维中查找异常情况所需的空间和分析非常复杂,包括机器学习和其他实时决策任务。 这项研究发展了这些极端流问题的算法理论。特别是,该项目开发了使用大规模分布式流媒体系统的算法基础,并权衡进行极端、流媒体、复杂分析所需的质量、确定性、CPU、内存和通信。由于现代流媒体系统为企业提供动力并处理用户的行为数据,因此这项工作具有广泛的社会影响。该项目显着提高了现代流媒体系统算法的技术水平。通过提供新的、丰富的算法方法,该项目激励学术界和工业界的从业者构想更具影响力的应用程序,而考虑到当前的算法工具,这是不可行的。该研究项目既支持教育和推广计划,也从教育和推广计划中受益,该计划增强了课程,促进了培训妇女和代表性不足的少数群体。为了实现技术转让,该项目涉及流媒体系统的从业者,尽可能对方法进行现场测试。所有可发表的成果都会在受人尊敬的学术期刊、会议和研讨会上传播。这项工作中开发的所有代码和数据集都通过 MassDAL 站点向社区公开提供,该站点已经拥有社区用于解决经典流问题的代码。经典流算法使用空间次线性,通常在输入参数中使用多对数。当扩展到分布式系统时,焦点通常是次线性通信。这里的研究项目建立在过去 15 年的算法理论的基础上,解决了流媒体应用程序的现代极端需求。该项目将该理论扩展到:具有大属性的多维度,使用更少的内存、计算和通信资源;新兴的流媒体管道模型;从本地隐私到深度学习类型向量嵌入的更复杂的分析;等等。该研究计划解决基本问题。更详细地说:(a)现有的流算法适用于高基数数据的一维或几维。现代流媒体系统收集人类活动日志并具有数百个维度,其中十个或更多维度具有非常高的基数。人们能否识别这些领域的关键问题并开发一种算法理论? PI 已经确定了一种基于维度之间关系的图形建模的有效方法,并相信这一点可以产生有效的理论。(b) 现有的流算法在每次分析被认为成功时使用多对数内存字(以及下限)指出次线性空间不足以解决的问题)。现代流媒体系统运行几个数量级的此类分析,例如,对数百万用户中的每一位进行一次分析。该项目确定了一种“节俭”流式传输方法,其中算法使用少量恒定数量的单词,并开发了一种节俭流式传输算法理论及其局限性。(c) 现代数据流系统允许流水线操作,流(可修改)或不)通过各个站点或跨站点的各个阶段。该项目抽象并开发了管道流系统的流问题的算法理论。初步结果表明,这允许算法在解决方案所使用的次线性空间中使用时间次线性,并且可以在这些模型中解决丰富的路径问题,而这在经典流中是不可能的。(d) 现代系统需要复杂的流式分析。例如,收集用户使用数据的流式系统需要私有方法,将本地差分隐私与流式方法相结合是一个令人兴奋的方向;另一个例子,分析使用数据的系统可能依赖于将数据嵌入到具有语义的向量中,例如 word2vec 和相关的深度学习方法。这些方法需要使用多对数空间来进行流处理。另一个例子,丰富的图问题类别可以在具有次线性样本的属性测试框架中解决,这些类别也可以在流模型中解决吗?该项目强调了开发流算法所涉及的具体研究挑战,并开发了在所用资源、近似率和成功概率之间进行权衡时具有可证明性能保证的算法。该项目在可能的情况下对它们进行实证评估。人们不能接受已知的问题陈述并希望使用流算法来解决它们。人们需要稍微修改一下问题才能适应流媒体。在经典流媒体中,“重量级人物”和“很少术语”属性有助于实现这一目标。同样,该项目识别了某些自然现象,这有助于制定可以开发极端流解决方案的问题版本。这种轮廓已经在使用限制维度之间交互的图形模型来规避高维度高基数情况,或者在节俭流中重用计数器,或者在隐私问题和管道流中采样草图数据结构,或者使用流顺序中的随机性中已经看到了。这最终可能会带来算法和经验的见解。该项目的总体愿景是为具有极端需求的流算法的设计和分析提供原则性的视角。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shanmugavelayu Muthukrishnan其他文献
Shanmugavelayu Muthukrishnan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Shanmugavelayu Muthukrishnan', 18)}}的其他基金
AitF: FULL: Collaborative Research: Compact Data Structures for Traffic Measurement in Software-Defined Networks
AitF:完整:协作研究:软件定义网络中流量测量的紧凑数据结构
- 批准号:
1535878 - 财政年份:2015
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Dealing Efficiently with Big Social Network Data
BIGDATA:F:DKA:协作研究:有效处理社交网络大数据
- 批准号:
1447793 - 财政年份:2014
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
AF: Medium: Collaborative Research: Sparse Approximation: Theory and Extensions
AF:媒介:协作研究:稀疏逼近:理论与扩展
- 批准号:
1161151 - 财政年份:2012
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
Workshop on Foundations of Algorithms in the Field
现场算法基础研讨会
- 批准号:
1131447 - 财政年份:2011
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
ICES: Small: Auctions and Optimizations in Ad Exchanges
ICES:小型:广告交易中的拍卖和优化
- 批准号:
1101677 - 财政年份:2011
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
Approximate Distributed Stream Tracking: Enabling the Next Generation of Data-Streaming Applications
近似分布式流跟踪:支持下一代数据流应用程序
- 批准号:
0414852 - 财政年份:2005
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
Collaborative Research: Algorithms for sparse data representations
协作研究:稀疏数据表示算法
- 批准号:
0354690 - 财政年份:2004
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
ITR: Sublinear Algorithms for Massive Data Sets
ITR:海量数据集的次线性算法
- 批准号:
0220280 - 财政年份:2002
- 资助金额:
$ 49.91万 - 项目类别:
Continuing Grant
相似国自然基金
极端降雨下黄土小流域坡改梯和淤地坝对泥沙连通性的影响机理
- 批准号:42377331
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
红壤丘陵区旱地和水田对农业小流域N负荷贡献的水文机制研究
- 批准号:41771263
- 批准年份:2017
- 资助金额:63.0 万元
- 项目类别:面上项目
极端环境下红树植物小RNA表观遗传调控转座子的进化基因组学研究
- 批准号:31770246
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
小尺寸系统自组织临界态的特性研究:准周期行为与极端事件预测
- 批准号:11675096
- 批准年份:2016
- 资助金额:50.0 万元
- 项目类别:面上项目
极端嗜热古菌小分子核酸结合蛋白的研究
- 批准号:31130003
- 批准年份:2011
- 资助金额:300.0 万元
- 项目类别:重点项目
相似海外基金
Beat Extreme: An Interactive, Tailored Text Messaging Program Combining Extreme Weather Alerts with Hyper-localized Resources & Actionable Insights for Addressing Climate Change
Beat Extreme:一款将极端天气警报与超本地化资源相结合的交互式定制短信程序
- 批准号:
10698887 - 财政年份:2023
- 资助金额:
$ 49.91万 - 项目类别:
Collaborative Research: SHF:SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复 (EASER) 范式
- 批准号:
2333895 - 财政年份:2023
- 资助金额:
$ 49.91万 - 项目类别:
Standard Grant
Understanding interactions between minerals and small biopolymers under extreme conditions
了解极端条件下矿物质和小型生物聚合物之间的相互作用
- 批准号:
2870997 - 财政年份:2023
- 资助金额:
$ 49.91万 - 项目类别:
Studentship
Novel Small Molecule Drug Candidate for the Prevention of Bronchopulmonary Dysplasia
预防支气管肺发育不良的新型小分子候选药物
- 批准号:
10698418 - 财政年份:2023
- 资助金额:
$ 49.91万 - 项目类别: