RI: Medium: Broad-Coverage Semantic Parsing: Linguistic Representation Learning from Crowd-Scale Data
RI:中:广泛覆盖的语义解析:从人群规模数据中学习语言表示
基本信息
- 批准号:1562364
- 负责人:
- 金额:$ 100.6万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Automated understanding of text is a capability that will advance a wide range of language technologies, including information extraction, question answering, opinion analysis, and translation between languages. Such technologies have been in demand in the intelligence and defense communities for many years, and they now underlie many commercially available information-management tools. This project develops robust algorithms that understand natural language expressions by mapping them to formal representations of their meaning, a technique known as semantic parsing. For semantic parsing to be employed in technologies like those listed above, it needs to overcome the fundamental challenge of broad coverage, the ability to handle any text input, in multiple languages. This project meets this challenge by creating new methods for gathering large repositories of semantically annotated data at greatly reduced cost; these are then used to train much more accurate broad-coverage parsing models. The results of this project include open-source implementations, high-quality annotated corpora on an unprecedented scale, and reusable distributed semantic representations for use by the community of natural language processing researchers and practitioners. The goal of broad-coverage semantic parsing can only be achieved by simultaneously focusing on new, large scale sources of data with semantically meaningful annotations and new learning algorithms for inducing models with the representational capacity to make full use of such data. For scalable data collection, this project introduces new techniques that rely on two key complementary insights: (1) any reader who understands a text can answer questions about it, and (2) questions can be constructed whose answers probe any aspect of semantics that need to be recovered. These observations allow designing new data collection techniques that reduce the burden of semantic annotation by providing simple questions and answers about texts. This QA-style annotation can be done for any text in any language, given only native speakers, bypassing the significant effort that currently goes into defining detailed annotation standards. It also allows gathering new datasets on a much larger scale, and for more diverse text types, than ever before. In addition, the project develops new representation learning techniques that tie together a wide range of semantic annotation styles, including the new crowdsourced ones, in a multitask learning setup. Continuous representations (e.g., of word types) provide a powerful way to allow sharing of statistical strength across a large vocabulary, many of whose elements are sparsely observed. While past work has emphasized learning word embeddings, this project employs a shared continuous space ("framespace") that can capture abstract frames and roles used in predicate-argument (and logical) semantics. The usefulness of these representations depends on the tasks they are trained to perform, and using multiple related tasks can lead to benefits on all of them, by sharing of statistical strength across task-specific representations, across elements of the semantic lexicon, and even across languages.
对文本的自动理解是一种能力,它将推进广泛的语言技术,包括信息提取,问题回答,意见分析和语言之间的翻译。 这些技术已经在情报和国防社区中需求多年,现在它们是许多市售信息管理工具的基础。 该项目开发了强大的算法,这些算法通过将自然语言表达方式映射到其含义的形式表示,即一种称为语义解析的技术。 为了使语义解析用于如上所述的技术,它需要克服广泛报道的基本挑战,以多种语言处理任何文本输入的能力。 该项目通过创建新方法来收集大量语义注释数据的存储库来应对这一挑战;然后将这些用于训练更准确的宽覆盖解析模型。 该项目的结果包括开源实施,以空前的规模上的高质量注释的语料库以及可重复使用的分布式语义表示,以供自然语言处理人员和从业者社区使用。仅通过同时专注于具有语义有意义的注释和新的学习算法的新的大规模数据来源,才能实现宽覆盖语义解析的目标,以诱导具有代表性的能力以充分利用此类数据的新学习算法。 对于可扩展的数据收集,该项目介绍了依赖两个关键互补见解的新技术:(1)任何理解文本的读者都可以回答有关它的问题,并且(2)可以构建问题的问题,其答案探究了需要恢复的语义的任何方面。 这些观察结果允许设计新的数据收集技术,通过提供有关文本的简单问题和答案来减轻语义注释的负担。 只有以母语为母语的人,可以为任何语言的任何文本进行此质量检查式的注释,绕开了当前在定义详细注释标准中所做的重大努力。 它还允许比以往任何时候都以更大的规模收集新的数据集,并且要多样化的文本类型。 此外,该项目还开发了新的表示学习技术,这些学习技术将各种语义注释样式(包括新的众包)在多任务学习设置中融合在一起。 连续表示(例如,单词类型)提供了一种有力的方法,可以在大型词汇范围内共享统计强度,其中许多元素被稀少地观察到。 尽管过去的工作强调了学习单词嵌入,但该项目采用了共享的连续空间(“框架空间”),该空间可以捕获谓词argument(和逻辑)语义中使用的抽象框架和角色。 这些表示形式的有用性取决于他们训练执行的任务,并且使用多个相关任务可以通过跨任务特定表示的统计强度,在语义词典的元素跨语言,甚至跨语言来带来好处。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Noah Smith其他文献
THE NORTH ATLANTIC TREATY ORGANIZATION AND UNITED STATES RELATIONSHIP: A STUDY OF ITS DEVELOPMENT AND POSSIBLE FUTURE
北大西洋公约组织与美国的关系:对其发展和可能的未来的研究
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Buying health: assessing the impact of a consumer-side vegetable subsidy on purchasing, consumption and waste
购买健康:评估消费者侧蔬菜补贴对购买、消费和浪费的影响
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:3.2
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Implications for cumulative and prolonged clinical improvement induced by cross-linked hyaluronic acid: An in vivo biochemical/microscopic study in humans.
交联透明质酸诱导的累积和长期临床改善的影响:人类体内生化/显微镜研究。
- DOI:
10.1111/exd.14998 - 发表时间:
2024 - 期刊:
- 影响因子:3.6
- 作者:
Frank Wang;T. Do;Noah Smith;J. Orringer;Sewon Kang;John J Voorhees;Gary J. Fisher - 通讯作者:
Gary J. Fisher
Biopsy of Suspected Melanoma
疑似黑色素瘤活检
- DOI:
10.1007/978-3-319-46029-1_10-1 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Noah Smith;T. Johnson;J. Kelly;A. Sober;C. Bichakjian - 通讯作者:
C. Bichakjian
How party nationalization conditions economic voting
- DOI:
10.1016/j.electstud.2016.11.014 - 发表时间:
2017-06-01 - 期刊:
- 影响因子:
- 作者:
Scott Morgenstern;Noah Smith;Alejandro Trelles - 通讯作者:
Alejandro Trelles
Noah Smith的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Noah Smith', 18)}}的其他基金
NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis
NSF-BSF:RI:小型:通过形式和经验分析的高效变压器
- 批准号:
2113530 - 财政年份:2021
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
RI/SES: Conference Proposal: Doctoral Consortium on Text as Data
RI/SES:会议提案:文本即数据博士联盟
- 批准号:
1830158 - 财政年份:2018
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
NSF-BSF: RI: Small: Collaborative Research: Modeling Crosslinguistic Influences Between Language Varieties
NSF-BSF:RI:小型:协作研究:模拟语言品种之间的跨语言影响
- 批准号:
1813153 - 财政年份:2018
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Workshop: Support for a workshop on scientific research applications of natural language technologies
研讨会:支持自然语言技术科研应用研讨会
- 批准号:
1433108 - 财政年份:2014
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
BIGDATA: Small: DA: Big Multilinguality for Data-Driven Lexical Semantics
BIGDATA:小:DA:数据驱动词汇语义的大多语言性
- 批准号:
1251131 - 财政年份:2013
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
EAGER: PARTIAL: An Exploratory Study on Practical Approaches for Robust NLP Tools with Integrated Annotation Languages
EAGER: PARTIAL:关于具有集成注释语言的鲁棒 NLP 工具实用方法的探索性研究
- 批准号:
1352440 - 财政年份:2013
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
SoCS: Collaborative Research: Data-Driven, Computational Models for Discovery and Analysis of Framing
SoCS:协作研究:用于发现和分析框架的数据驱动计算模型
- 批准号:
1211277 - 财政年份:2012
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
CAREER: Flexible Learning for Natural Language Processing
职业:自然语言处理的灵活学习
- 批准号:
1054319 - 财政年份:2011
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
RI-Small: Probabilistic Models for Structure Discovery in Text
RI-Small:文本结构发现的概率模型
- 批准号:
0915187 - 财政年份:2009
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
SGER: Scaling up unsupervised grammar induction
SGER:扩大无监督语法归纳
- 批准号:
0836431 - 财政年份:2008
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
相似国自然基金
复合低维拓扑材料中等离激元增强光学响应的研究
- 批准号:12374288
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
基于管理市场和干预分工视角的消失中等企业:特征事实、内在机制和优化路径
- 批准号:72374217
- 批准年份:2023
- 资助金额:41.00 万元
- 项目类别:面上项目
托卡马克偏滤器中等离子体的多尺度算法与数值模拟研究
- 批准号:12371432
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
中等质量黑洞附近的暗物质分布及其IMRI系统引力波回波探测
- 批准号:12365008
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
中等垂直风切变下非对称型热带气旋快速增强的物理机制研究
- 批准号:42305004
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
- 批准号:
2327438 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
- 批准号:
2344489 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant