EAGER: Collaborative Research: Scaling Up Discriminative Learning for Natural Language Understanding and Translation

EAGER：协作研究：扩大自然语言理解和翻译的判别学习

基本信息

批准号：
1446996
负责人：
Daniel Gildea
金额：
$ 12.91万
依托单位：
University of Rochester
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-08-15 至 2016-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1446996&HistoricalAwards=false
关键词：
EAGER Collaborative Research Scaling Up

项目摘要

This EArly Grant for Exploratory Research aims to improve automatic understanding of natural language by machines, and automatic translation between languages such as Chinese and English. In the realm of understanding, the project develops methods for syntactically and semantically analyzing, or parsing, sentences. Improved parsing can help in accessing the enormous amount of information available in unstructured text on the web and in databases of newspapers and scanned books. Improved translation between languages increases opportunities for trade as well as for dissemination of information generally between nations and cultures. Machine translation is widely used today despite its generally poor quality, and any improvement in quality will improve access to information for millions of people. This project aims to exploit the power of machine learning algorithms that are designed to discriminate between correct and incorrect outputs by numerically optimizing mathematical functions that are defined in terms of the data available for training. Discriminative structured prediction algorithms have witnessed great success in the field of natural language processing (NLP) over the past decade, generally surpassing their generative counterparts. However, there remain two major problems which prevent discriminative methods from scaling to very large datasets: first, they typically assume exact search (over a prohibitively large search space), which is rarely possible in practice for problems such as parsing and translation. Secondly, they normally assume the data is completely annotated, whereas many naturally occurring datasets are only partially annotated: for example a parallel text in machine translation includes the source and target sentence pairs but not the derivation between them. As a result of these two problems, the current methods are not taking full advantage of the enormous and ever increasing amount of text data available to us.This EArly Grant ofr Exploratory Research (EAGER) aims to: - Develop a linear-time structured learning framework specifically tailored for inexact search, which hopefully retains theoretical properties of structured learning (e.g. convergence) under exact search. - Extend this framework to handle latent variables, such as derivations in machine translation, syntactic structures in semantic parsing, and semantic representations in question answering. If the exploratory extension to latent variable frameworks is sucessful, it will enable longer-term research to: - Apply these efficient learning algorithms to discriminative training of machine translation systems over the entire training dataset rather than only on a small development set. - Apply these efficient learning algorithms to discriminative training for syntactic and semantic parsing, with the goal of scaling up semantic parsing to enable web-scale knowledge extraction.

这项探索性研究的早期赠款旨在提高机器对自然语言的自动理解，并在中文和英语等语言之间进行自动翻译。在理解领域，该项目开发了句法和语义分析或解析句子的方法。改进的解析可以帮助访问网络上的非结构化文本中可用的大量信息，以及报纸和扫描书籍的数据库中。语言之间的改进翻译增加了贸易的机会以及通常在国家和文化之间的信息传播。尽管质量通常较差，但如今，机器翻译却被广泛使用，质量的任何改善都将改善数百万人的信息访问。该项目旨在利用机器学习算法的力量，这些算法旨在通过数值优化根据可用于培训的数据来定义的数学功能来区分正确和错误的输出。在过去的十年中，歧视性结构化预测算法在自然语言处理（NLP）领域取得了巨大的成功，通常超过了其生成性。但是，还有两个主要问题可以防止判别方法扩展到非常大的数据集：首先，它们通常假设精确的搜索（在较大的较大搜索空间上），这在实践中很少能解决解析和翻译等问题。其次，他们通常假定数据已完全注释，而许多天然的数据集仅部分注释：例如，机器翻译中的并行文本包括源和目标句子对，但不包括它们之间的派生。由于这两个问题的结果，当前的方法并未充分利用我们可用的文本数据的巨大和越来越多的文本数据。此早期授予的探索性研究（急切）的目的是： - 开发专门针对不可充实搜索的线性结构化学习框架，该搜索专门定制了不精确的搜索，希望保留搜索的结构性学习理论（例如，不在contrecence search）下。 - 扩展此框架以处理潜在变量，例如机器翻译中的派生，语义解析中的句法结构以及所涉及的语义表示。如果对潜在变量框架的探索性扩展是成功的，则它将使长期研究能够： - 将这些有效的学习算法应用于整个培训数据集中的机器翻译系统的歧视性培训，而不仅仅是在小型开发集中。 - 将这些有效的学习算法应用于句法和语义解析的判别培训，目的是扩大语义解析以实现网络规模的知识提取。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Daniel Gildea其他文献

Synchronous context-free grammars and optimal linear parsing strategies

DOI：
10.1016/j.jcss.2015.04.003
发表时间：
2015-11-01
期刊：
Research article
影响因子：
作者：
Pierluigi Crescenzi;Daniel Gildea;Andrea Marino;Gianluca Rossi;Giorgio Satta
通讯作者：
Giorgio Satta