CISE Research Resources: Discourse Penn Treebank and Multimodal FORM: Development of Two Richly Annotated Corpora
CISE 研究资源:Discourse Penn Treebank 和 Multimodal FORM:两个注释丰富的语料库的开发
基本信息
- 批准号:0224417
- 负责人:
- 金额:$ 99.78万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2002
- 资助国家:美国
- 起止时间:2002-10-15 至 2006-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
EIA-0224417Aravind K. JoshiMark LibermanUniversity of PennsylvaniaCISE RR: Discourse Penn Trebank and Multimodal FORM: Development of Two Richly Annotated CorporaThis project, providing critical resources for research discourse modeling and conversational interaction, aims at developing new technologies and systems for information retrieval and human computer interaction. Centering on the construction of annotated corpora, two large-scale resources, one in the discourse domain and one in the dialog domain will be built:1. Discourse Penn Treebank (DPTB) and2. MultiFORM: Augmenting the FORM corpus with body movements, speech, and intonation.The former project develops a large scale and reliably annotated corpus that will encode coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives. This annotation will be "on top of" the Penn Treebank (PTB) annotations as well as the predicate-argument annotations of PTB (called the Proposition Bank or Prop Bank). The latter involves a corpus of gesture-annotated videos, FORM that was designed to be extensible in order to eventually represent the entire multimodal experience of conversational interaction. This multimodal FORM , MultiFORM, will be created by adding body movement, speech and syntactic structure, and intonation. Large-scale annotated corpora have played a critical role in speech and natural language research by enabling large-scale integration of statistical knowledge (derived from the corpora) with linguistic knowledge (as represented in annotations) leading to scientific and technological advances. Representative examples constitute robust parsing and automatic extraction of relations and coreferences and their applications to information extraction, question answering, summarization, and machine translation. PTB, a resource developed a decade ago, represents an example of such a resource that impacts natural language processing worldwide. PTB deals with corpora at the sentence level warranting a new large scale and reliable discourse and dialog structure annotated corpora. Although intellectual and practical connections exist between studies of the structures of discourse and dialog, the initial requirements for resources to study these areas diverge while overlapping in conception. On the discourse side, we need for corpora that deals with the kinds of structures found in composed text such as journalistic articles. The dialog side needs to focus on interactions among people and on extemporized rather than pre-composed material.
EIA-0224417Aravind K. JoshiMark Liberman 宾夕法尼亚大学 CISE RR:Discourse Penn Trebank 和 Multimodal FORM:两个注释丰富的语料库的开发该项目为研究话语建模和会话交互提供关键资源,旨在开发用于信息检索和人机的新技术和系统相互作用。 围绕标注语料库建设,建设一是话语域、一是对话域两大规模资源: 1.话语 Penn Treebank (DPTB) 和 2。 MultiFORM:通过身体动作、言语和语调增强 FORM 语料库。前一个项目开发了一个大规模且可靠注释的语料库,它将编码与话语连接词相关的连贯关系,包括它们的论证结构和照应链接,从而暴露出明确定义的水平语篇结构并支持提取与语篇连接词相关的一系列推论。 该注释将位于 Penn Treebank (PTB) 注释以及 PTB 的谓词-论证注释(称为命题库或命题库)的“之上”。 后者涉及手势注释视频的语料库,其设计为可扩展的,以便最终代表对话交互的整个多模式体验。 这种多模态形式(MultiFORM)将通过添加身体运动、言语和句法结构以及语调来创建。 大规模注释语料库通过实现统计知识(源自语料库)与语言知识(如注释中所代表)的大规模整合,在语音和自然语言研究中发挥了关键作用,从而促进了科学和技术的进步。 代表性的例子包括关系和共指的鲁棒解析和自动提取及其在信息提取、问答、摘要和机器翻译中的应用。 PTB 是十年前开发的资源,代表了影响全球自然语言处理的资源的一个例子。 PTB 处理句子级别的语料库,保证新的大规模且可靠的话语和对话结构注释语料库。 尽管话语和对话结构的研究之间存在知识和实践上的联系,但研究这些领域的资源的最初要求是不同的,同时概念上也有重叠。 在话语方面,我们需要语料库来处理新闻文章等合成文本中的结构类型。 对话方面需要关注人与人之间的互动以及即兴而非预先创作的材料。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aravind Joshi其他文献
Cogniac: a discourse processing engine
Cogniac:话语处理引擎
- DOI:
- 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
F. B. Baldwin;Aravind Joshi - 通讯作者:
Aravind Joshi
Quantum Circuit Optimization of Arithmetic circuits using ZX Calculus
使用 ZX 微积分对算术电路进行量子电路优化
- DOI:
10.48550/arxiv.2306.02264 - 发表时间:
2023-06-04 - 期刊:
- 影响因子:0
- 作者:
Aravind Joshi;Akshara Kairali;Renju Raju;A. Athreya;R. Monica;Sanjay Vishwakarma;Srinjoy Ganguly - 通讯作者:
Srinjoy Ganguly
Intention, Interpretation and the Computational Structure of Language
语言的意图、解释和计算结构
- DOI:
- 发表时间:
2024-09-13 - 期刊:
- 影响因子:0
- 作者:
Matthew Stone;Justine Cassell;Aravind Joshi;Mark Steedman;Vasil Daskalopoulos;David DeVault;Raymond - 通讯作者:
Raymond
Aravind Joshi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Aravind Joshi', 18)}}的其他基金
CI: ADDO-EN: Significant Enhancement of the Exisitng Penn Discourse Treebank
CI:ADDO-EN:现有宾夕法尼亚大学话语树库的显着增强
- 批准号:
1059353 - 财政年份:2011
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
RI: Exploiting and Exploring Discourse Connectivity: Deriving New Technology and Knowledge from the Penn Discourse Treebank
RI:利用和探索话语连通性:从宾夕法尼亚大学话语树库中获取新技术和知识
- 批准号:
0705671 - 财政年份:2007
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Metagrammatical Knowledge for Grammars and Corpora
语法和语料库的元语法知识
- 批准号:
0414409 - 财政年份:2004
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
ITR: Language, Learning, and Modeling Biological Sequences
ITR:语言、学习和生物序列建模
- 批准号:
0205456 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
ITR: Mining the Bibliome -- Information Extraction from the Biomedical Literature
ITR:挖掘文献库——从生物医学文献中提取信息
- 批准号:
0205448 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Constructing Science: Materials and Activities for Kindergarten and First-Grade
构建科学:幼儿园和一年级的材料和活动
- 批准号:
9252885 - 财政年份:1992
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
Research in Natural Language Processing: Mathematical and Computational Investigations in Constrained Grammatical Formalisms
自然语言处理研究:受限语法形式主义的数学和计算研究
- 批准号:
9016592 - 财政年份:1991
- 资助金额:
$ 99.78万 - 项目类别:
Continuing grant
Center for Research in Cognitive Science
认知科学研究中心
- 批准号:
8920230 - 财政年份:1991
- 资助金额:
$ 99.78万 - 项目类别:
Cooperative Agreement
Natural Language Processing (Computer Research)
自然语言处理(计算机研究)
- 批准号:
8410413 - 财政年份:1984
- 资助金额:
$ 99.78万 - 项目类别:
Continuing grant
Modelling Interactive Processes: Flexible Communication With Knowledge Bases
交互过程建模:与知识库的灵活通信
- 批准号:
8219196 - 财政年份:1983
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
相似国自然基金
通感算一体化智能车联网资源管理理论与技术研究
- 批准号:62371406
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
基于图谱补全与评价循证的中学STEM课程资源智能组织方法研究
- 批准号:62307023
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
资源受限柔性装配流水车间批量流调度与分批配送集成问题研究
- 批准号:52375489
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
过硫酸铵介导烟气硫硝资源电化学协同转化利用研究
- 批准号:22366022
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
流域洪水资源利用的耦合协调发展与风险效益分摊机制研究
- 批准号:52379027
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
- 批准号:
2219733 - 财政年份:2022
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: RPEP: CPS: A Resilient Cyber-Physical Security Framework for Next-Generation Distributed Energy Resources at Grid Edge
合作研究:CISE-MSI:RPEP:CPS:电网边缘下一代分布式能源的弹性网络物理安全框架
- 批准号:
2219734 - 财政年份:2022
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
CISE Research Resources: From High Performance to Low Power: Infrastructure for Ubiquitous Computing
CISE 研究资源:从高性能到低功耗:普适计算的基础设施
- 批准号:
0130143 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant
CISE Research Resources: Matching Advanced Visualization and Intelligent Data Mining to High-Performance Experimental Networks
CISE 研究资源:将高级可视化和智能数据挖掘与高性能实验网络相匹配
- 批准号:
0224306 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Continuing Grant
CISE Research Resources: Programming Environments and Applications for Clusters and Grids
CISE 研究资源:集群和网格的编程环境和应用程序
- 批准号:
0224453 - 财政年份:2002
- 资助金额:
$ 99.78万 - 项目类别:
Standard Grant