III: Small: Partitioning Big Data for the High Performance Computation of Persistent Homology

III:小:对大数据进行分区以实现持久同调的高性能计算

基本信息

  • 批准号:
    1909096
  • 负责人:
  • 金额:
    $ 49.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

New insights with machine learning exists across may domains, including, for example, medicine, social media, image processing, biology, and computer and network security. Machine learning is able to process large, high-dimensional data sets that are beyond human capabilities. One emerging method of machine learning is based on a branch of mathematics called topology that is sometimes able to discover knowledge that is not available using conventional methods. The field of topology is concerned with of the shape of an object and Persistent Homology is the critical method in topology used to extract the features of a shape. Persistent Homology will classify an object by the size and number of holes and voids in that object. Unfortunately, computing the Persistent Homology for an object requires significant amounts of memory and long run-times that increases exponentially in the number of points that forms that object. This project will treat the object formed by the data and subdivide it into smaller regions for the parallel computation of Persistent Homology on each region. The results from the regional analyses will then be assembled together and any duplicate or missing results will be identified and restored in a post analysis step. The computation on all of the regions will be completed in substantially less time and in much less total memory than a single computation on the entire data set. Testing of the methods developed will be performed using a variety of synthetic and real-world data. The synthetic data will permit controlled studies on performance and scalability. Realworld data from a variety of sources and especially data where the small topological features are significant (such as data from brain scans) will be used. This project will propel the application of topology based analysis to discover new insights and meaningful information from massive high-dimensional data. An expansion of student training in data mining through topological-based methods will be achieved with the addition of classes, projects (senior project, MS Theses, PhD Dissertations, and so on), seminars, and research co-op training experiences. Students at all levels will be impacted and special emphasis placed on minority and underrepresented student groups participation. This project will also participate in the Women in Science and Engineering programs at UC. The project investigators will engage local area K-12 students, international exchange students and researchers at UC's collaborative institutions, UC's Medical School, Cincinnati Children's Medical Center, the Air Force Research Lab, and local industries with information and seminars on this project investigations and results.This project proposes to combine the fields of Approximate Computing with Topological Data Analysis to dramatically reduce the computational and memory requirements to use Topological Data Analysis on very large data sets. In particular, this project will develop approximate methods for computing Persistent Homology that dramatically increase the sizes of data sets for which data mining methods based on topological data analysis can be applied. This project expects to increase the size of the input data set that can be analyzed by Topological Data Analysis methods by at least 3-5 orders of magnitude. While approximate methods can introduce error, the features identified by the approximate methods will identify regions of the point cloud where an upscaling steps and regional computations of Persistent Homology can be used (in parallel) to establish more precise boundaries of those features. The project will develop algorithmic improvements, formal statements on the correctness, error bounds, and complexities of the algorithms and approximation techniques. These techniques have important implications on the ability to apply topological data analysis techniques to much larger data sets than currently possible.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习的新见解存在于许多领域,包括医学、社交媒体、图像处理、生物学以及计算机和网络安全。机器学习能够处理超出人类能力的大型高维数据集。一种新兴的机器学习方法基于称为拓扑的数学分支,有时能够发现使用传统方法无法获得的知识。拓扑学领域关注的是物体的形状,持久同调是拓扑学中用于提取形状特征的关键方法。持久同源性将根据对象中孔和空隙的大小和数量对对象进行分类。不幸的是,计算对象的持久同调性需要大量的内存和较长的运行时间,而形成该对象的点的数量呈指数级增长。该项目将处理数据形成的对象,并将其细分为更小的区域,以便在每个区域上并行计算持久同源性。然后,区域分析的结果将被汇总在一起,任何重复或丢失的结果将在分析后步骤中被识别和恢复。与对整个数据集进行单次计算相比,对所有区域的计算将在更少的时间和更少的总内存中完成。将使用各种合成和真实数据对所开发的方法进行测试。合成数据将允许对性能和可扩展性进行受控研究。将使用来自各种来源的现实世界数据,特别是那些小的拓扑特征很重要的数据(例如来自脑部扫描的数据)。该项目将推动基于拓扑的分析的应用,以从海量高维数据中发现新的见解和有意义的信息。通过增加课程、项目(高级项目、硕士论文、博士论文等)、研讨会和研究合作培训经验,将通过基于拓扑的方法扩展数据挖掘方面的学生培训。各级学生都将受到影响,并特别强调少数群体和代表性不足的学生群体的参与。该项目还将参与加州大学科学与工程领域的女性项目。项目调查人员将向当地 K-12 学生、国际交换生以及加州大学合作机构、加州大学医学院、辛辛那提儿童医学中心、空军研究实验室和当地企业的研究人员提供有关该项目调查和结果的信息和研讨会该项目建议将近似计算与拓扑数据分析领域相结合,以显着减少在非常大的数据集上使用拓扑数据分析的计算和内存需求。 特别是,该项目将开发计算持久同源性的近似方法,从而显着增加可以应用基于拓扑数据分析的数据挖掘方法的数据集的大小。该项目预计将可通过拓扑数据分析方法分析的输入数据集的大小增加至少 3-5 个数量级。虽然近似方法可能会引入误差,但通过近似方法识别的特征将识别点云的区域,在该区域中可以使用(并行)使用持久同调的放大步骤和区域计算来建立这些特征的更精确的边界。该项目将开发算法改进、关于算法和近似技术的正确性、误差范围和复杂性的正式声明。这些技术对于将拓扑数据分析技术应用于比目前更大的数据集的能力具有重要意义。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Persistence Homology of Proximity Hyper-Graphs for Higher Dimensional Big Data
高维大数据的邻近超图的持久同源性
Computation of persistent homology on streaming data using topological data summaries
使用拓扑数据摘要计算流数据上的持久同源性
  • DOI:
    10.1111/coin.12597
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.8
  • 作者:
    Moitra, Anindya;Malott, Nicholas O.;Wilsey, Philip A.
  • 通讯作者:
    Wilsey, Philip A.
Fast Computation of Persistent Homology with Data Reduction and Data Partitioning
Topology Preserving Data Reduction for Computing Persistent Homology
Homology-Separating Triangulated Euler Characteristic Curve
同调分离三角欧拉特征曲线
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Philip Wilsey其他文献

Philip Wilsey的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Philip Wilsey', 18)}}的其他基金

SI2-SSE: Scalable Big Data Clustering by Random Projection Hashing
SI2-SSE:通过随机投影哈希进行可扩展的大数据集群
  • 批准号:
    1440420
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Combining Static Analysis and Dynamic Run-time Optimization for Parallel Discrete Event Simulation in Many-Core Environments
CSR:小型:协作研究:结合静态分析和动态运行时优化,实现多核环境中的并行离散事件仿真
  • 批准号:
    0915337
  • 财政年份:
    2009
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant

相似国自然基金

单细胞分辨率下的石杉碱甲介导小胶质细胞极化表型抗缺血性脑卒中的机制研究
  • 批准号:
    82304883
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
小分子无半胱氨酸蛋白调控生防真菌杀虫活性的作用与机理
  • 批准号:
    32372613
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
诊疗一体化PS-Hc@MB协同训练介导脑小血管病康复的作用及机制研究
  • 批准号:
    82372561
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
非小细胞肺癌MECOM/HBB通路介导血红素代谢异常并抑制肿瘤起始细胞铁死亡的机制研究
  • 批准号:
    82373082
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
FATP2/HILPDA/SLC7A11轴介导肿瘤相关中性粒细胞脂代谢重编程影响非小细胞肺癌放疗免疫的作用和机制研究
  • 批准号:
    82373304
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

Deciphering mechanisms governing functional partitioning of the C. elegans genome
破译控制线虫基因组功能分区的机制
  • 批准号:
    9207005
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
Deciphering mechanisms governing functional partitioning of the C. elegans genome
破译控制线虫基因组功能分区的机制
  • 批准号:
    8795206
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
Deciphering mechanisms governing functional partitioning of the C. elegans genome
破译控制线虫基因组功能分区的机制
  • 批准号:
    9000710
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
Deciphering mechanisms governing functional partitioning of the C. elegans genome
破译控制线虫基因组功能分区的机制
  • 批准号:
    8612654
  • 财政年份:
    2014
  • 资助金额:
    $ 49.93万
  • 项目类别:
III: Small: RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis
III:小:RanKloud:可扩展多媒体和社交媒体分析的数据分区和资源分配策略
  • 批准号:
    1116394
  • 财政年份:
    2011
  • 资助金额:
    $ 49.93万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了