III:Small:Enabling Technology for Best-Effort Data Integration Systems

III:小型:尽力而为数据集成系统的支持技术

基本信息

  • 批准号:
    1018792
  • 负责人:
  • 金额:
    $ 49.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2010
  • 资助国家:
    美国
  • 起止时间:
    2010-08-15 至 2014-07-31
  • 项目状态:
    已结题

项目摘要

Over the past few decades, the problem of data integration (DI) has received significant attention. Much of the initial attention was directed at integrating business data. Such data typically requires exact integration, because anything less is clearly not usable. Today, such exact DI systems continue to play an important role. But they are ill-suited for many emerging domains, such as personal information management, building Web community portals, scientific data management, text management for business intelligence, public safety, and military intelligence analysis. First, they are typically constructed in ``one shot'' in that the system is substantially unusable until it is completed in all of its envisioned generality. Second, when presented with new data, these systems often incur long delays before making the data available to users. Third, they typically are not designed to benefit from user feedback, even though opportunities for such feedback often exist in today's Web 2.0 world. Fourth, exact DI systems provide little or no assistance in explaining answers to users. In response, this project explores a paradigm shift, from precise DI systems to best-effort ones. Instead of being constructed in one shot, these systems are constructed incrementally. Their data is always queryable some fashion. They tolerate mistakes in the data, and can leverage user feedback to improve over time. Finally, they can explain their answers to the users, thereby allowing them to understand, verify, and trust query results.To build best-effort DI systems, researchers will pursue the following technical thrusts. (1)Increasing support for incremental development through the specification and implementation of a declarative, semantically transparent extraction/integration language, together with an effective optimization and execution framework. (2) Leveraging the power of a user community through the design and implementation of techniques that allow users to correct errors in the extraction/integration process as they are encountered, that consistently propagates these corrections throughout the extracted and integrated data, and that use these corrections to improve the quality of extraction/integration modules. (3) Developing and implementing techniques to capture information that will help users reason about the system's data along with support for exploring the implications of this information. The team will combine the technology to build a prototype end-to-end best-effort DI system and evaluate the system on three real-world applications: the DBLife portal, the GLEON limnology project, and the madison.com Web portal.This research will be integrated with ongoing efforts in educating students on techniques for extracting and integrating structured data. Inclusion of underrepresented minorities in the projects will be continued. The results from this project will be incorporated into a textbook on data integration to be published in 2010-2011. The project will facilitate the widespread deployment of data integration systems, thus resulting in more effective information management and access for society. It will play an integral part in educating next-generation professional workers and researchers. The research will also help domain scientists in limnology in the context of the GLEON project. It also has the potential to help the developers of madison.com build a system of much greater use to the greater Madison community. Finally, data and system artifacts from the project will be disseminated broadly in the research community to significantly enhance the data management infrastructure for research and education.
在过去的几十年中,数据集成问题(DI)受到了极大的关注。 最初的大部分关注都是针对整合业务数据的。这样的数据通常需要精确集成,因为显然什么都不可用。如今,这种确切的DI系统继续发挥重要作用。但是它们不适合许多新兴领域,例如个人信息管理,建立网络社区门户,科学数据管理,商业智能,公共安全和军事情报分析的文本管理。首先,它们通常是在``一镜头''中构造的,因为该系统基本上是无法使用的,直到它在其所有设想的一般性中都完成。其次,当提供新数据时,这些系统通常会在将数据提供给用户之前会产生较长的延迟。第三,即使在当今的Web 2.0世界中经常存在此类反馈的机会,它们通常并非旨在从用户反馈中受益。第四,精确的DI系统几乎没有或根本没有帮助向用户解释答案。 作为回应,该项目探讨了从精确的DI系统到最佳富度范围的范式转变。这些系统不是一个镜头构造的,而是逐步构建。 他们的数据总是可以查询某种时尚。他们可以忍受数据中的错误,并可以利用用户反馈来改善随着时间的推移。最后,他们可以向用户解释答案,从而使他们能够理解,验证和信任查询结果。为了构建最佳的DI系统,研究人员将追求以下技术推力。 (1)通过规范和实施声明性的,语义上透明的提取/集成语言,以及有效的优化和执行框架来增加对增量发展的支持。 (2)通过设计和实施技术来利用用户社区的功能,这些技术使用户能够在遇到的提取/集成过程中纠正这些错误中的错误,从而在整个提取和集成的数据中始终如一地传播这些校正,并使用这些更正来改善提取质量/集成模块。 (3)开发和实施技术以捕获将帮助用户推理系统数据的信息,并支持探索此信息的含义。 该团队将结合该技术来构建端到端最佳饮食DI系统的原型,并在三个现实世界中评估该系统:Dblife门户网站,Gleon Limnology项目和Madison.com Web Web Portal。这项研究将与持续的努力集成,以在提取和整合结构性数据的技术方面努力进行教育。 将继续将代表人数不足的少数民族包含在项目中。 该项目的结果将纳入将于2010 - 2011年发布的有关数据集成的教科书。该项目将促进数据集成系统的广泛部署,从而为社会提供更有效的信息管理和访问。它将在教育下一代专业工作者和研究人员中发挥不可或缺的作用。 该研究还将在Gleon项目的背景下帮助领域科学家。它还有潜力帮助Madison.com的开发商建立一个对大麦迪逊社区有更大用途的系统。最后,该项目的数据和系统工件将在研究界广泛传播,以显着增强研究和教育的数据管理基础设施。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据

数据更新时间:2024-06-01

AnHai Doan的其他基金

III: Medium: Enabling Technologies for 21st Century Entity Matching Applications
III:媒介:21 世纪实体匹配应用的支持技术
  • 批准号:
    1564282
    1564282
  • 财政年份:
    2016
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
EAGER: Discovering Emerging Events in Social Media
EAGER:发现社交媒体中的新兴事件
  • 批准号:
    1143807
    1143807
  • 财政年份:
    2011
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
CAREER: Evolving and Self-Managing Data Integration Systems
职业:不断发展和自我管理的数据集成系统
  • 批准号:
    0712836
    0712836
  • 财政年份:
    2006
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
CAREER: Evolving and Self-Managing Data Integration Systems
职业:不断发展和自我管理的数据集成系统
  • 批准号:
    0347903
    0347903
  • 财政年份:
    2004
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant

相似国自然基金

靶向Treg-FOXP3小分子抑制剂的筛选及其在肺癌免疫治疗中的作用和机制研究
  • 批准号:
    32370966
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
化学小分子激活YAP诱导染色质可塑性促进心脏祖细胞重编程的表观遗传机制研究
  • 批准号:
    82304478
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
靶向小胶质细胞的仿生甘草酸纳米颗粒构建及作用机制研究:脓毒症相关性脑病的治疗新策略
  • 批准号:
    82302422
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
HMGB1/TLR4/Cathepsin B途径介导的小胶质细胞焦亡在新生大鼠缺氧缺血脑病中的作用与机制
  • 批准号:
    82371712
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
小分子无半胱氨酸蛋白调控生防真菌杀虫活性的作用与机理
  • 批准号:
    32372613
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

III: Small: Enabling the Best Utilization of GPUs for In-Memory Data Management Systems
III:小型:为内存数据管理系统实现 GPU 的最佳利用
  • 批准号:
    1718450
    1718450
  • 财政年份:
    2017
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
III: Small: Enabling Declarative Querying and Analytics over Large Dynamic Information Networks
III:小型:在大型动态信息网络上实现声明式查询和分析
  • 批准号:
    1319432
    1319432
  • 财政年份:
    2013
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
Microbicide intravaginal ring IND enabling studies
杀菌剂阴道环 IND 启用研究
  • 批准号:
    8467256
    8467256
  • 财政年份:
    2013
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
HCC: III: Small Grant: Enabling the Use of Virtual Worlds for Research and Teaching in Archaeology
HCC:III:小额资助:支持使用虚拟世界进行考古学研究和教学
  • 批准号:
    1018512
    1018512
  • 财政年份:
    2010
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant
III-COR-Small: Beyond Keyword Search: Enabling Diverse Structured Query Paradigms over Text Databases
III-COR-Small:超越关键字搜索:在文本数据库上启用多样化的结构化查询范式
  • 批准号:
    0811038
    0811038
  • 财政年份:
    2008
  • 资助金额:
    $ 49.93万
    $ 49.93万
  • 项目类别:
    Continuing Grant
    Continuing Grant