Multilevel Graph-Based Methods for Efficient Data Exploration

基于多级图的高效数据探索方法

基本信息

批准号：
2011324
负责人：
Yousef Saad
金额：
$ 24.42万
依托单位：
University of Minnesota-Twin Cities
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-08-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2011324&HistoricalAwards=false
关键词：
Multilevel Graph Based Methods Efficient

项目摘要

Graph theory helps scientists and engineers model various types of relations between entities in a set, whether members of a social network, or molecules in a chemical compound for example. Not surprisingly, with the advent of data-based methodologies that work by unraveling and exploiting relations between data items, graph theory tools are finding their way in a very broad range of applications. The primary goal of this project is to examine a class of methods that manipulate graphs, specifically by developing effective multilevel algorithms that take advantage of divide and conquer approaches. In multilevel techniques, smaller and smaller graphs are extracted from some original graph with the goal of keeping as much of its intrinsic information as possible. These smaller graphs are then employed instead of the original ones, resulting in significant gains in performance. This project addresses issues that are of great relevance to many current data-based methodologies and will be applicable across various disciplines. As such it will help promote interest in problems related to the current shift toward such methodologies because its research theme blends mathematical methods, innovations in algorithms, and applications. On the educational side, special courses and tutorials will be offered to graduate students from other disciplinary fields who wish to explore research in data sciences. This project will support one graduate student per year for each of the three years.The rapid expansion of machine learning methodologies into a great variety of disciplines is pushing the demand for numerical methods that can effectively deal with large datasets. Among these methods, those based on graph representations of data figure prominently. The goal of this project is to develop effective multilevel algorithms that are rooted in graph theoretical approaches, for performing various machine learning tasks. A primary focus of the planned research is that of "graph coarsening", a technique whereby an original graph is substantially reduced in size by agglomerating nearby nodes together, to produce a faithful representative of the original graph. The project will exploit a class of methods based on multilevel coarsening, in which coarsening is applied recursively for a few levels. The ultimate goal of a multilevel coarsening approach is to make it possible to perform the heavy computations with the coarsened graph which is much smaller, resulting in much faster processing, with minimal loss in accuracy. Coarsening is an effective alternative to random sampling, a well-established method that consists of replacing the original data by a subset of its columns or rows that are selected at random or quasi- randomly. This project will study, both empirically and theoretically, various coarsening strategies. For example, coarsening will be studied from the angle of a projection method for approximating eigenvectors. Coarsening methods that try to preserve the eigenvectors exactly will also be studied. Among the many possible applications of graph coarsening the project will specifically consider their use in speeding up the training of a class of neural networks known as Graph Convolutional Networks (GCNs). A number of other research issues, all under the general theme of graph-based methods, will also be investigated. For example, the project will study how a form of hypergraph coarsening can be used to provide a solution to the "graph sparsification" problem, whereby a sparser version of a given graph is sought, or to the "column subset selection problem" which consists of selecting important rows (or columns) from a given data matrix.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

图理论可以帮助科学家和工程师模拟集合中实体之间的各种关系，无论是社交网络的成员还是化合物中的分子。毫不奇怪，随着基于数据的方法的出现，通过揭示和利用数据项之间的关系来起作用，图理论工具正在在非常广泛的应用程序中找到自己的方式。该项目的主要目的是检查一类操纵图的方法，特别是通过开发利用鸿沟和征服方法的有效多级算法。在多级技术中，从某些原始图中提取越来越小的图，其目的是保留尽可能多的内在信息。然后使用这些较小的图表而不是原始图，从而导致性能的显着提高。该项目解决了与许多基于数据的方法相关的问题，并且将在各个学科中适用。因此，它将有助于促进人们对与当前转向此类方法相关的问题的兴趣，因为其研究主题融合了数学方法，算法中的创新和应用程序。在教育方面，将为来自其他纪律领域的研究生提供特殊的课程和教程，他们希望探索数据科学研究。该项目将在三年中每年每年支持一名研究生。机器学习方法的快速扩展到各种各样的学科正在推动对可以有效处理大型数据集的数值方法的需求。在这些方法中，基于数据图的图表表示的方法显着。该项目的目的是开发植根于图理论方法的有效多级算法，以执行各种机器学习任务。计划中的研究的主要重点是“图形粗化”，该技术通过将附近节点组合在一起，从而大大降低了原始图的大小，以产生原始图的忠实代表。该项目将利用基于多级粗化的一类方法，其中递归用于几个层次。多级粗化方法的最终目标是，可以使用较小的图形进行繁重的计算，该图形要小得多，导致处理速度更快，准确性损失最小。粗化是一种随机采样的有效替代方法，它是一种完善的方法，它是通过其随机或准选择的列或行的子集替换原始数据的一部分。该项目将从经验和理论上研究各种粗糙的策略。例如，将从近似特征向量的投影方法的角度研究粗化。还将研究试图保留特征向量的粗糙方法。在图形的许多可能应用中，该项目将专门考虑它们在加速训练一类称为图形卷积网络（GCN）的神经网络的训练中。还将研究所有其他研究问题，这些研究都将在基于图的方法的一般主题下。例如，该项目将研究如何使用一种形式的超图表来为“图形稀疏”问题提供解决方案，从而寻求给定图的更稀疏版本，或者涉及“列子集选择问题”，其中包括从给定数据Mertrix中选择重要的行MER的“列表”（或列）。影响审查标准。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Revisiting the (block) Jacobi subspace rotation method for the symmetric eigenvalue problem

重温对称特征值问题的（块）雅可比子空间旋转方法

DOI：
10.1007/s11075-022-01377-w
发表时间：
2023
期刊：
Numerical Algorithms
影响因子：
2.1
作者：
Saad, Yousef
通讯作者：
Saad, Yousef

Graph coarsening: from scientific computing to machine learning

DOI：
10.1007/s40324-021-00282-x
发表时间：
2021-06
期刊：
SeMA Journal
影响因子：
0
作者：
Jie Chen;Y. Saad;Zecheng Zhang
通讯作者：
Jie Chen;Y. Saad;Zecheng Zhang

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yousef Saad其他文献

Randomized linear solvers for computational architectures with straggling workers

用于具有落后工人的计算架构的随机线性求解器

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
V. Kalantzis;Yuanzhe Xi;L. Horesh;Yousef Saad
通讯作者：
Yousef Saad

Efficiently Generalizing Ultra-Cold Atomic Simulations via Inhomogeneous Dynamical Mean-Field Theory from Two- to Three-Dimensions

通过二维到三维的非齐次动态平均场理论有效推广超冷原子模拟

DOI：
10.1109/hpcmp-ugc.2010.17
发表时间：
2010
期刊：
2010 DoD High Performance Computing Modernization Program Users Group Conference
影响因子：
0
作者：
James Freericks;H. R. Krishnamurthy;Pierre Carrier;Yousef Saad
通讯作者：
Yousef Saad