CAREER: Towards Open World Event Knowledge Extraction with Weak Supervision

职业:在弱监督下实现开放世界事件知识提取

基本信息

项目摘要

Understanding events, such as who did what to whom, when and where, is one of the fundamental human activities to learn about the changing world. The answers to these questions underpin the key information conveyed in the overwhelming majority, if not all, of language-based communication. However, current research paradigm suffers from several shortcomings in extracting event knowledge from the open world scenarios. In these scenarios, knowledge extraction from data is limited to a few large domains (e.g., news or biomedical) or common languages (e.g., English, Spanish and Chinese), because of the heavy reliance on the human effort to contextualize data. This includes creating large-scale manual annotations or defining the schematic templates for a few target event types. This project aims to lay the foundation and establish new paradigms for open world event knowledge extraction by developing new and more efficient algorithms to extend the extraction capability to the wide range of scenario, while requiring minimal human effort. This foundation should provide extensive coverage of different event types and be easily adapted to emerging scenarios. The success of this project will directly benefit users of the intelligent information access systems. For applications that analyze emerging and trending topics and events, such as natural disasters, national elections, protest and disease outbreak, success of the proposed research will not only provide an accurate and abstractive summary and easy access of each topic for humans, but also allow analysts to better discover the participants of the events, the cause, effects and temporal orders among them, and help discover more insights. The technical aims of the project are divided into three thrusts. Thrust 1 develops schema-guided event extraction approaches. This is done by leveraging the knowledge from the complex target event schema, such as the event type structures (i.e., type name and argument roles), hierarchy and temporal/causal/part-whole relations among the event types, which provide valuable guidance, especially when there is few to no annotations available. While event annotations for most of the domains and scenarios are not existing and extremely expensive and time-consuming to obtain, the large-scale unlabeled in-domain data are usually accessible. Thus, Thrust 2 will further develops a suite of more efficient and novel self-training strategies to make use of the large-scale unlabeled data through self-supervision. In practice, there is even no event type schema available to most of the domains and scenarios, such as natural disaster or disease outbreak. Manually defining an event schema with high coverage is extremely challenging and time consuming as it requires background knowledge in both linguistics and the target domain, and humans need to manually examine a large amount of in-domain data to determine the salient event types. Considering these challenges, Thrust 3 further explores novel solutions to automatically deduce the target event schema, including event types, the roles of their participants, as well as their relations from the raw text and extract their event mentions accordingly.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
了解事件,例如谁对谁,何时何地做了什么,是了解不断变化的世界的基本人类活动之一。这些问题的答案是基于绝大多数(如果不是全部)基于语言的交流所传达的关键信息的基础。但是,当前的研究范式在从开放世界情景中提取事件知识的几个缺点。在这些情况下,从数据中提取的知识提取仅限于一些大型领域(例如新闻或生物医学)或普通语言(例如英语,西班牙语和中文),因为很大程度上依赖人类的努力来背景数据。这包括创建大规模的手动注释或定义一些目标事件类型的示意图。该项目旨在通过开发新的,更有效的算法将提取能力扩展到广泛的场景,同时需要最少的人类努力,从而为开放世界事件知识提取的基础奠定基础并为开放世界事件知识提取。该基础应提供不同事件类型的广泛覆盖,并可以轻松适应新兴方案。该项目的成功将直接使智能信息访问系统的用户受益。对于分析新兴和趋势主题和事件的应用,例如自然灾害,全国选举,抗议和疾病爆发,拟议的研究的成功不仅将提供准确,抽象的摘要,并为人类提供每个主题的访问,还可以使分析师更好地发现事件的参与者,原因,效果和临时命令,并帮助他们发现更多的Insims simells。该项目的技术目标分为三个推力。推力1开发模式引导的事件提取方法。这是通过利用复杂目标事件模式的知识来完成的,例如事件类型结构(即类型名称和参数角色),层次结构和时间/因果/因果/因果/因果/因果关系,这些事件类型提供了有价值的指导,尤其是在几乎没有可用的注释的情况下。虽然大多数域和场景的事件注释尚不存在,而且非常昂贵且耗时,但通常可以访问大规模的无标记内数据。因此,推力2将进一步开发一套更有效,更新颖的自我训练策略,以通过自学意义来利用大规模的未标记数据。在实践中,大多数领域和场景(例如自然灾害或疾病爆发)甚至都没有事件类型的模式。手动定义具有较高覆盖范围的事件模式非常具有挑战性和耗时,因为它需要语言学和目标域中的背景知识,并且人类需要手动检查大量内域数据以确定显着事件类型。考虑到这些挑战,Thrust 3进一步探索了新的解决方案,以自动推论目标事件模式,包括事件类型,其参与者的角色,以及与原始文本的关系,并相应地提取事件提及。该奖项反映了NSF的法定任务,并通过使用基金会的知识优点和广泛影响来通过评估来评估NSF的法定任务。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lifu Huang其他文献

RPI BLENDER TAC-KBP2017 13 Languages EDL System
RPI BLENDER TAC-KBP2017 13 种语言 EDL 系统
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0.5
  • 作者:
    Boliang Zhang;Xiaoman Pan;Ying Lin;Tongtao Zhang;Kevin Blissett;Samia Kazemi;Spencer Whitehead;Lifu Huang;Heng Ji
  • 通讯作者:
    Heng Ji
APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models
APrompt:注意力提示调优,有效适应预训练语言模型
  • DOI:
    10.18653/v1/2023.emnlp-main.567
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Qifan Wang;Yuning Mao;Jingang Wang;Hanchao Yu;Shaoliang Nie;Sinong Wang;Fuli Feng;Lifu Huang;Xiaojun Quan;Zenglin Xu;Dongfang Liu
  • 通讯作者:
    Dongfang Liu
Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models
通过从科学论文中提取语句来自动管理抗生素抗性基因:基准数据集和模型
ELISA System Description for LoReHLT 2017
LoReHLT 2017 的 ELISA 系统说明
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Leon Cheung;Thamme Gowda;U. Hermjakob;N. Liu;Jonathan May;Alexandra Mayn;Nima Pourdamghani;Michael Pust;Kevin Knight;Nikolaos Malandrakis;Pavlos Papadopoulos;Anil Ramakrishna;Karan Singla;Victor R. Martinez;Colin Vaz;Dogan Can;Shrikanth S. Narayanan;Kenton Murray;Toan Q. Nguyen;David Chiang;Xiaoman Pan;Boliang Zhang;Ying Lin;Di Lu;Lifu Huang;Kevin Blissett;Tongtao Zhang;O. Glembek;M. Baskar;Santosh Kesiraju;L. Burget;Karel Beneš;I. Szoke;Karel Veselý;Camille Goudeseune;Mark H. Johnson;Leda Sari;Wenda Chen;Angli Liu
  • 通讯作者:
    Angli Liu
Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming
生成众包对话数据集以打击网络诱骗
  • DOI:
    10.48550/arxiv.2405.13154
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xinyi Zhang;Pamela J. Wisniewski;Jin;Lifu Huang;Sang Won Lee
  • 通讯作者:
    Sang Won Lee

Lifu Huang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

SHP2调控Treg向Th2-like Treg的可塑性转化在变应性鼻炎中的作用与机制研究
  • 批准号:
    82301281
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
EAST高极向比压运行模式下芯部与边界兼容机制的数值模拟研究
  • 批准号:
    12375228
  • 批准年份:
    2023
  • 资助金额:
    53 万元
  • 项目类别:
    面上项目
CXCR5依赖的边缘区B细胞向滤泡树突状细胞呈递外泌体引发心脏移植排斥的研究
  • 批准号:
    82300460
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
Dlx2通过调控Tspan13影响上颌突间充质干细胞骨向分化的机制研究
  • 批准号:
    82301008
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

mHealth OAE: Towards Universal Newborn Hearing Screening in Kenya (mTUNE)
mHealth OAE:迈向肯尼亚全民新生儿听力筛查 (mTUNE)
  • 批准号:
    10738905
  • 财政年份:
    2023
  • 资助金额:
    $ 59.35万
  • 项目类别:
NSF Workshop: Towards an Open Source Model for Data and Metadata Standards
NSF 研讨会:迈向数据和元数据标准的开源模型
  • 批准号:
    2334483
  • 财政年份:
    2023
  • 资助金额:
    $ 59.35万
  • 项目类别:
    Standard Grant
Conference: Pushing Towards Open-Source AI
会议:推动开源人工智能
  • 批准号:
    2335774
  • 财政年份:
    2023
  • 资助金额:
    $ 59.35万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 1: Transforming Volcanology towards Open Science in the Cloud with VICTOR
合作研究:GEO OSE Track 1:与 VICTOR 一起将火山学转变为云中的开放科学
  • 批准号:
    2324749
  • 财政年份:
    2023
  • 资助金额:
    $ 59.35万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 1: Transforming Volcanology towards Open Science in the Cloud with VICTOR
合作研究:GEO OSE Track 1:与 VICTOR 一起将火山学转变为云中的开放科学
  • 批准号:
    2324748
  • 财政年份:
    2023
  • 资助金额:
    $ 59.35万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了