SHF: EAGER: HI-HDFS - Holistic I/O optimizations for the Hadoop distributed filesystem

SHF:EAGER:HI-HDFS - Hadoop 分布式文件系统的整体 I/O 优化

基本信息

  • 批准号:
    1747447
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2018-08-31
  • 项目状态:
    已结题

项目摘要

File systems and their outdated POSIX "byte stream" interface suffer from an impedance mismatch with the versatile I/O requirements of today's applications. Specifically, the I/O path from the application to the raw storage device is becoming longer and it involves the interplay of intricate software and hardware components. This produces complex aggregate I/O patterns that application developers (often subject matter experts with limited knowledge of how massive concurrency creates I/O bottlenecks) cannot optimize based on intuition alone. File systems that tout their high scalability, such as the Hadoop distributed file system, largely do so by limiting applications to sequential access patterns. The question of whether one can accelerate the I/O performance of the Hadoop distributed file system for analytical applications with complex data models that cannot readily serialize data contiguously for fast sequential access remains open. This project seeks to address this question and build HI-HDFS -- a framework that automatically collects and manages semantically richer I/O metadata to guide object placement in the Hadoop distributed file system. The HI-HDFS framework synthesizes the I/O activity across software components throughout the datacenter in a navigable graph structure to identify application-agnostic motifs in I/O activity. A novel I/O forecasting technique identifies and ameliorates bottlenecks at large scale by inspecting I/O activity from small-scale runs. Overall, the HI-HDFS framework challenges the I/O optimization mantra that manual data placement is the cornerstone of I/O performance and paves the way towards next-generation object-centric storage systems for high-performance computers. The efficacy of this automated approach will be examined on a complex data processing workload from the domain of emergency response which exhibits I/O patterns that are characteristic of modern analytical applications. The broader impacts of this work are expected to include open-source prototype implementations as well as pedagogical impact on a cloud computing course for both Computer Science and Data Analytics undergraduate majors at Ohio State.
文件系统及其过时的POSIX“字节流”界面与当今应用程序的多功能I/O要求遭受阻抗不匹配。具体而言,从应用程序到原始存储设备的I/O路径越来越长,它涉及复杂的软件和硬件组件的相互作用。这会产生复杂的汇总I/O模式,即应用开发人员(通常对大规模并发创建I/O瓶颈的知识有限的主题专家)无法基于直觉进行优化。吹捧其高可扩展性的文件系统(例如Hadoop分布式文件系统)主要通过将应用程序限制为顺序访问模式来做到这一点。是否可以加速使用复杂数据模型的分析应用程序的Hadoop分布式文件系统的I/O性能的问题,这些模型无法轻易序列化数据,以使数据连续序列化以获得快速顺序访问。该项目旨在解决此问题并构建HI-HDFS,该框架自动收集和管理语义上更丰富的I/O元数据,以指导Hadoop分布式文件系统中的对象放置。 HI-HDFS框架在可通航的图形结构中综合了整个数据中心跨软件组件的I/O活动,以识别I/O活动中的应用程序 - 不可能的主题。一种新颖的I/O预测技术通过检查小规模运行中的I/O活动来确定并改善瓶颈。总体而言,HI-HDFS框架挑战了I/O优化的口头禅,即手动数据放置是I/O性能的基石,并为高性能计算机的下一代对象存储系统铺平了道路。这种自动化方法的功效将在紧急响应领域的复杂数据处理工作量上进行检查,该响应表现出具有现代分析应用特征的I/O模式。预计这项工作的更广泛影响包括开源原型实施以及对俄亥俄州立大学计算机科学和数据分析本科生的云计算课程的教学影响。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation
  • DOI:
    10.1609/aaai.v33i01.3301265
  • 发表时间:
    2018-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
  • 通讯作者:
    Jiankai Sun-;Bortik Bandyopadhyay;Armin Bashizade;Jiongqian Liang;P. Sadayappan;S. Parthasarathy
ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs
ArrayBridge:将 SciDB 中的声明性数组处理与基于 HDF5 的命令式程序交织在一起
ApproxJoin: Approximate Distributed Joins
  • DOI:
    10.1145/3267809.3267834
  • 发表时间:
    2018-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
  • 通讯作者:
    D. Quoc;Istemi Ekin Akkus;Pramod Bhatotia;Spyros Blanas;Ruichuan Chen;C. Fetzer;T. Strufe
Evaluating Scalability Bottlenecks by Workload Extrapolation
Characterizing I/O optimization opportunities for array-centric applications on HDFS
  • DOI:
    10.1109/hpec.2018.8547529
  • 发表时间:
    2018-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
  • 通讯作者:
    Donghe Kang;Vedang Patel;Kalyan Khandrika;Spyros Blanas;Yang Wang;S. Parthasarathy
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Spyros Blanas其他文献

In-Memory Transactions
Query Processing on Gaming Consoles
游戏机上的查询处理
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wei Cui;Qianxi Zhang;Spyros Blanas;Jesús Camacho;Brandon Haynes;Yinan Li;Ravishankar Ramamurthy;Peng Cheng;Rathijit Sen;Matteo Interlandi
  • 通讯作者:
    Matteo Interlandi
Engineering Security and Performance with Cipherbase
使用 Cipherbase 进行工程安全和性能
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Arasu;Spyros Blanas;Ken Eguro;Manas R. Joglekar;R. Kaushik;Donald Kossmann;Ravishankar Ramamurthy;P. Upadhyaya;R. Venkatesan
  • 通讯作者:
    R. Venkatesan
GRaSP: generalized range search in peer-to-peer networks
GRaSP:对等网络中的广义范围搜索
ApproxJoin
近似连接

Spyros Blanas的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Spyros Blanas', 18)}}的其他基金

SHF: Small: Hyperscaling Data Analytics for High-Performance Computers
SHF:小型:高性能计算机的超大规模数据分析
  • 批准号:
    1816577
  • 财政年份:
    2018
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
CRII: III: Declarative array processing for large-scale scientific analyses
CRII:III:用于大规模科学分析的声明性数组处理
  • 批准号:
    1464381
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似国自然基金

渴望及其对农村居民收入差距的影响研究
  • 批准号:
    71903117
  • 批准年份:
    2019
  • 资助金额:
    19.0 万元
  • 项目类别:
    青年科学基金项目
威胁应对视角下的消费者触摸渴望及其补偿机制研究
  • 批准号:
    71502075
  • 批准年份:
    2015
  • 资助金额:
    17.5 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

EAGER: A Genome Wide HDR Enhancement Screen in Maize
EAGER:玉米全基因组 HDR 增强屏幕
  • 批准号:
    2409037
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
  • 批准号:
    2409395
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Integrating Pathological Image and Biomedical Text Data for Clinical Outcome Prediction
EAGER:整合病理图像和生物医学文本数据进行临床结果预测
  • 批准号:
    2412195
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Generalizing Monin-Obukhov Similarity Theory (MOST)-based Surface Layer Parameterizations for Turbulence Resolving Earth System Models (ESMs)
EAGER:将基于 Monin-Obukhov 相似理论 (MOST) 的表面层参数化推广到湍流解析地球系统模型 (ESM)
  • 批准号:
    2414424
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
EAGER: Creating a Composite EL Nino Record from the Lowland Neotropics
EAGER:创造低地新热带区综合厄尔尼诺记录
  • 批准号:
    2417794
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了