CAREER: Learning to Extract Consistent Event Graphs from Long and Complex Documents
职业:学习从长而复杂的文档中提取一致的事件图
基本信息
- 批准号:2340435
- 负责人:
- 金额:$ 56.12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-05-01 至 2029-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Documents about real-world events are published daily. The large number of such documents makes it very hard for people to read and absorb them all, a phenomenon known as “information overload". Applying computer algorithms that can automatically extract events is a promising solution because they can transform large amounts of text into smaller summaries in the form of structured event knowledge graphs that reveal the relationships between the people, places, and times in the events. Current deep learning-based event extraction techniques mainly focus on extracting event knowledge at the level of individual sentences and are unable to extract a knowledge graph spanning multiple sentences with sufficient accuracy or efficiency. For example, existing techniques would struggle with events described in a long document having multiple sections. Moreover, these extraction techniques do not capture accurate information regarding real-life events because they typically include nuanced attributes such as causes and effects. The research goal of this CAREER award is to build information extraction (IE) methods with natural language processing methods, using the latest deep learning-based techniques, to construct an event knowledge graph for storing knowledge and improving the ability of people to track rapidly evolving event information. In the short term, the project will improve the quality and comprehensiveness of event knowledge graphs. In the long run, the project will entirely transform people's experiences and habits in acquiring event knowledge from various sources. The system to be developed through this award will better support numerous event-oriented tasks that people need to perform, such as future event prediction, event factuality verification, and risk event prevention, all of which have profound impacts on society. Moreover, our work would make fundamental contributions to a wide range of interdisciplinary applications such as statutory reasoning based on legal documents, prediction of disease outbreaks, and biomedical document understanding, all of which currently rely on extremely slow and high-cost methods.The general technical goal of this project is to address the knowledge gap of event extraction from long and complex documents (as compared to the traditional sentence-level extraction) and to do so in an efficient manner. The general goal is divided into three sub-research goals. First, to extract the entirety of event attributes, which is not possible for current models trained on a dataset with a predefined schema, the project introduces a new question-answer generation paradigm that enables a novel representation of events from clusters of documents discussing the same events. The project will leverage document hierarchy information for extracting events, which enforces the validity and broad coverage of event information. Motivated by the fact that current event knowledge construction is inefficient and is impaired by pairwise event-event relation predictions, the second research goal is to develop novel techniques enabling the construction of the event knowledge graph. For this purpose, the investigators propose interleaving targeted retrieval and joint modeling of event arguments and entity-entity relations. This not only enables efficient updating of graphs, but also ensures its global consistency. Finally, the third goal is to adapt to individual information-seeking needs, which is not considered by current methods. The project will study schema induction strategies and schema matching algorithms for adapting the event knowledge graph to user preferences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
关于现实世界事件的文档每天发布。大量此类文档使人们很难阅读和吸收所有这些文件,这一现象称为“信息过载”。应用可以自动提取事件的计算机算法是一个有希望的解决方案,因为它们可以以结构化事件知识图的形式将大量文本转换为较小的摘要,这些摘要揭示了事件中人们,地点和时间之间的关系。当前基于深度学习的事件提取技术主要集中于在单个句子的层面上提取事件知识,并且无法以足够的准确性或效率提取跨越多个句子的知识图。例如,现有技术将与具有多个部分的长期文档中描述的事件斗争。此外,这些提取技术不会捕获有关现实生活事件的准确信息,因为它们通常包含细微的属性,例如原因和效果。该职业奖的研究目标是使用最新的基于深度学习的技术来构建使用自然语言处理方法的信息提取方法(IE)方法,以构建一个事件知识图,以存储知识并提高人们跟踪快速发展的事件信息的能力。在短期内,该项目将提高事件知识图的质量和全面性。从长远来看,该项目将完全改变人们从各种来源的获取事件知识中的经历和习惯。通过该奖项开发的系统将更好地支持人们需要执行的许多面向事件的任务,例如未来的事件预测,事件事实验证和风险事件预防,所有这些都会对社会产生深远的影响。此外,我们的工作将对基于法律文件,疾病暴发的预测以及生物医学文件的理解等广泛的跨学科应用做出基本贡献,所有这些都依赖于目前非常缓慢和高成本的方法。该项目的一般技术目标是在长期和复杂的文档中提取事件的知识差距,以解决传统的范围(与传统范围相比),以解决该项目的范围,以便在传统的范围中进行效率(以下范围),因此要遵循效率和效率的效率,并遵循效率和效率的效果,并确定效率的效率和效果。总体目标分为三个子研究目标。首先,为了提取全部事件属性,对于使用预定义的模式训练的当前模型是不可能的,该项目引入了一个新的问题 - 答案生成范式,该范式可以从文档群中进行新的事件的新颖表示,讨论同一事件。该项目将利用文档层次结构信息来提取事件,从而实现事件信息的有效性和广泛覆盖范围。由于当前的事件知识构建效率低下,并且受到成对事件事件与事件关系预测的损害,第二个研究目标是开发实现事件知识图的构建的新型技术。为此,调查人员提出了对事件论证和实体实体关系的有针对性检索和联合建模的交织。这不仅可以有效地更新图形,而且还可以确保其全局一致性。最后,第三个目标是适应个人信息寻求需求,而当前方法不考虑这一点。该项目将研究用于将事件知识图适应用户偏好调整的模式归纳策略和构图匹配的算法。该奖项反映了NSF的法定任务,并使用基金会的知识分子优点和更广泛的影响评估审查标准,被认为是通过评估来获得的支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xinya Du其他文献
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
VIA:用于全局和本地视频编辑的时空视频适应框架
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jing Gu;Yuwei Fang;Ivan Skorokhodov;Peter Wonka;Xinya Du;Sergey Tulyakov;Xin Eric Wang - 通讯作者:
Xin Eric Wang
Measuring industrial operational efficiency and factor analysis: A dynamic series-parallel recycling DEA model.
衡量工业运营效率和因素分析:动态串并联回收 DEA 模型。
- DOI:
10.1016/j.scitotenv.2022.158084 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Lina Zhang;Xinya Du;Yung‐ho Chiu;Q. Pang;XiaoWang;Qianwen Yu - 通讯作者:
Qianwen Yu
Xinya Du的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
文本—行人图像跨模态匹配的鲁棒性特征学习及语义对齐研究
- 批准号:62362045
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
基于深度学习方法的南海海气耦合延伸期智能预报研究
- 批准号:42375143
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
面向机器人复杂操作的接触形面和抓取策略共适应学习
- 批准号:52305030
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
社交媒体中的上市公司谣言识别、后果及治理研究:多模态深度学习视角
- 批准号:72302018
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
SimplePath II: A Cell-free Enzymatic Platform for Cannabinoid Biosynthesis
SimplePath II:大麻素生物合成的无细胞酶平台
- 批准号:
10755860 - 财政年份:2021
- 资助金额:
$ 56.12万 - 项目类别:
The Glucocorticoid Receptor as Signal Integrator: Studying All Drug Resistance
糖皮质激素受体作为信号积分器:研究所有耐药性
- 批准号:
8473060 - 财政年份:2011
- 资助金额:
$ 56.12万 - 项目类别:
The Glucocorticoid Receptor as Signal Integrator: Studying All Drug Resistance
糖皮质激素受体作为信号积分器:研究所有耐药性
- 批准号:
8299116 - 财政年份:2011
- 资助金额:
$ 56.12万 - 项目类别:
The Glucocorticoid Receptor as Signal Integrator: Studying All Drug Resistance
糖皮质激素受体作为信号积分器:研究所有耐药性
- 批准号:
8278345 - 财政年份:2011
- 资助金额:
$ 56.12万 - 项目类别:
The glucocorticoid receptor as signal integrator: studying ALL drug resistance
作为信号整合器的糖皮质激素受体:研究 ALL 耐药性
- 批准号:
8075410 - 财政年份:2010
- 资助金额:
$ 56.12万 - 项目类别: