Collaborative Research: OAC: Approximate Nearest Neighbor Similarity Search for Large Polygonal and Trajectory Datasets

合作研究:OAC:大型多边形和轨迹数据集的近似最近邻相似性搜索

基本信息

  • 批准号:
    2313039
  • 负责人:
  • 金额:
    $ 36.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-08-01 至 2026-07-31
  • 项目状态:
    未结题

项目摘要

Similarity searches are a critical task in data mining. Nearest neighbor similarity search over geometrical shapes - polygons and trajectories - are used in various domains such as digital pathology, solar physics, and geospatial intelligence. In digital pathology for tumor diagnosis, tissues are represented as polygons and Jaccard distance - ratio of areas of intersection to union - is used for similarity comparisons. In solar physics for predicting solar flares, the query object and the dataset is made up of polygons representing solar events. In geospatial intelligence, similarity search is used to geo-locate a shape or a contour in global reference datasets. The current literature, while rich in methods for textual and image datasets, is lacking for geometric datasets. This project will develop scalable similarity search systems on polygonal and trajectory datasets. It will produce benchmark datasets of polygonal queries and responses for the research community and inform the data mining techniques which employ similarity primitives. It will help introduce student projects for courses on parallel, distributed, high performance, and data intensive computing, data mining, and spatial computing. This will also train PhD students, including those at a Hispanic Serving Institution. Given the ever increasing size of datasets, exact nearest neighbor searches requiring a scan of the entire dataset quickly become impractical, leading to approximate nearest neighbor searches. Traditional methods, such as using trees, suffer from the constraints of dimensionality. Approximate similarity search is required for scalability in processing large numbers of queries, index construction over big spatial data, and to address the dynamic nature of data itself. This project will explore approximate similarity search algorithms based on product quantization and locality sensitive hashing (LSH) techniques for 10-100 billion scale datasets. It will result in (i) new methods for creating robust signatures of geometric data, based on comprehensive exploration of the performance/accuracy tradeoffs among different encoding schemes, informed by spatial properties of the data and requirements of relevant distance metrics, (ii) scalable coarse quantization techniques to hierarchically organize the polygonal datasets into neighborhoods by preserving hyperspace locality properties, leading to product quantization based scalable systems, and (iii) LSH-based techniques focusing on designing LSH functions for Jaccard distance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
相似性搜索是数据挖掘中的一项关键任务。对几何形状(多边形和轨迹)的最近邻相似性搜索用于各种领域,例如数字病理学、太阳物理学和地理空间智能。在肿瘤诊断的数字病理学中,组织被表示为多边形,并且杰卡德距离(交叉面积与联合面积的比率)用于相似性比较。在预测太阳耀斑的太阳物理学中,查询对象和数据集由表示太阳事件的多边形组成。在地理空间智能中,相似性搜索用于对全球参考数据集中的形状或轮廓进行地理定位。当前的文献虽然丰富了文本和图像数据集的方法,但缺乏几何数据集的方法。该项目将开发针对多边形和轨迹数据集的可扩展相似性搜索系统。它将为研究社区生成多边形查询和响应的基准数据集,并为采用相似基元的数据挖掘技术提供信息。它将帮助介绍并行、分布式、高性能和数据密集型计算、数据挖掘和空间计算课程的学生项目。这还将培训博士生,包括西班牙裔服务机构的博士生。鉴于数据集的大小不断增加,需要扫描整个数据集的精确最近邻搜索很快变得不切实际,从而导致近似最近邻搜索。传统方法(例如使用树)受到维度的限制。处理大量查询、在大空间数据上构建索引以及解决数据本身的动态特性时,需要近似相似性搜索来实现可扩展性。该项目将探索基于乘积量化和局部敏感哈希(LSH)技术的近似相似度搜索算法,适用于 10-1000 亿规模的数据集。它将产生(i)基于对不同编码方案之间的性能/精度权衡的全面探索,根据数据的空间属性和相关距离度量的要求,创建稳健的几何数据签名的新方法,(ii)可扩展粗量化技术,通过保留超空间局部性属性,将多边形数据集分层组织成邻域,从而产生基于乘积量化的可扩展系统,以及 (iii) 基于 LSH 的技术,专注于为 Jaccard 距离设计 LSH 函数。该奖项反映了通过使用基金会的智力价值和更广泛的影响审查标准进行评估,NSF 的法定使命被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sushil Prasad其他文献

Unsteady Power Law Nanofluid Flow Subjected to Electro-Magnetohydrodynamics with Active and Passive Nanoparticles Flux
非稳态幂律纳米流体流动受到主动和被动纳米颗粒通量的电磁流体动力学影响
  • DOI:
    10.1166/jon.2023.2070
  • 发表时间:
    2023-12-01
  • 期刊:
  • 影响因子:
    4.1
  • 作者:
    Shikha Ch;el;el;Shilpa Sood;Sonika Sharma;Sushil Prasad
  • 通讯作者:
    Sushil Prasad
Molecular docking studies of dihydropyridazin-3(2H)-one derivatives as Antifungal, antibacterial and anti-helmintic agents
二氢哒嗪-3(2H)-酮衍生物作为抗真菌、抗菌和抗蠕虫剂的分子对接研究
Numerical Analysis of Williamson-Micropolar Ternary Nanofluid Flow Through Porous Rotatory Surface
威廉姆森-微极性三元纳米流体穿过多孔旋转表面的数值分析
  • DOI:
    10.1166/jon.2023.2092
  • 发表时间:
    2023-12-01
  • 期刊:
  • 影响因子:
    4.1
  • 作者:
    Diksha Sharma;Shilpa Sood;Archie Thakur;Sushil Prasad
  • 通讯作者:
    Sushil Prasad
Body weights and growth rates in indigenous chicken breeds of India
印度本土鸡品种的体重和生长率
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Manish K. Singh;Shive Kumar;S. Singh;R. K. Sharma;Anand Krishnan Prakash;Sushil Prasad;Yujuvendra Singh;Deep Narayan Singh
  • 通讯作者:
    Deep Narayan Singh
A Parallel Workflow for Polar Sea-Ice Classification using Auto-labeling of Sentinel-2 Imagery
使用 Sentinel-2 图像自动标记的极地海冰分类并行工作流程
  • DOI:
    10.48550/arxiv.2403.13135
  • 发表时间:
    2024-03-19
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jurdana Masuma Iqrah;Wei Wang;Hongjie Xie;Sushil Prasad
  • 通讯作者:
    Sushil Prasad

Sushil Prasad的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sushil Prasad', 18)}}的其他基金

Collaborative Research: CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
  • 批准号:
    2321015
  • 财政年份:
    2023
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research:CyberTraining:Implementation:Medium: Broadening Adoption of Parallel and Distributed Computing in Undergraduate Computer Science and Engineering Curricula
合作研究:网络培训:实施:中:在本科计算机科学与工程课程中扩大并行和分布式计算的采用
  • 批准号:
    2017590
  • 财政年份:
    2020
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
  • 批准号:
    1924272
  • 财政年份:
    2019
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
  • 批准号:
    2002649
  • 财政年份:
    2019
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
  • 批准号:
    1924272
  • 财政年份:
    2019
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Early Adopters of Curriculum Initiative in Parallel and Distributed Computing at EduPar-12
EduPar-12 并行和分布式计算课程计划的早期采用者
  • 批准号:
    1238003
  • 财政年份:
    2012
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
A Curriculum Initiative on Parallel and Distributed Computing - Workshop on Parallel and Distributed Computing Education (EduPar-11) and Early Adopter Program
并行和分布式计算课程计划 - 并行和分布式计算教育研讨会 (EduPar-11) 和早期采用者计划
  • 批准号:
    1135124
  • 财政年份:
    2011
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
NSF/TCPP Student Travel Awards for IPDPS-2011
IPDPS-2011 NSF/TCPP 学生旅行奖
  • 批准号:
    1138281
  • 财政年份:
    2011
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Technical Committee on Parallel Processing (TCPP) Student Travel Awards
并行处理技术委员会 (TCPP) 学生旅行奖
  • 批准号:
    1016907
  • 财政年份:
    2010
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
A Curriculum Initiative on Parallel and Distributed Computing - Toward Core Topics for Undergraduates
并行和分布式计算课程计划 - 面向本科生核心主题
  • 批准号:
    1048711
  • 财政年份:
    2010
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant

相似国自然基金

IGF-1R调控HIF-1α促进Th17细胞分化在甲状腺眼病发病中的机制研究
  • 批准号:
    82301258
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
CTCFL调控IL-10抑制CD4+CTL旁观者激活促口腔鳞状细胞癌新辅助免疫治疗抵抗机制研究
  • 批准号:
    82373325
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
RNA剪接因子PRPF31突变导致人视网膜色素变性的机制研究
  • 批准号:
    82301216
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
血管内皮细胞通过E2F1/NF-kB/IL-6轴调控巨噬细胞活化在眼眶静脉畸形中的作用及机制研究
  • 批准号:
    82301257
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于多元原子间相互作用的铝合金基体团簇调控与强化机制研究
  • 批准号:
    52371115
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 36.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了