III: Small: Interactive Construction of Complex Query Models

III：小：复杂查询模型的交互构建

基本信息

批准号：
1617408
负责人：
James Allan
金额：
$ 51.6万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-07-15 至 2020-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1617408&HistoricalAwards=false
关键词：
III Small Interactive Construction Complex

项目摘要

This research program will investigate and implement SearchIE, a search-based approach to information "extraction." SearchIE will allow rapid, personalized, situational identification of types of objects or actions in text, where those types are likely to be useful for a complex search task. Modern search engines often provide some mechanism to indicate that a query keyword matches a document only if it occurs in the name of a person or in a location. To make that possible, annotators found and marked a large number of people names (for example) in text, a machine learning algorithm was applied to learn which low-level features are indicative of the name type, and then a resulting classifier for that type is run across the collection of documents. It is then possible to write a query that means "paris used as a person's name rather than a location." Unfortunately, the existing approaches do not serve searchers interested in novel, unanticipated types - for example, names of whaling ships, officers in Queen Victoria's navy, local watering holes. Such examples cannot be handled currently because the classifiers need to be trained and run ahead of time, an expensive data labeling process that is too daunting for many search tasks. Since on-line information gathering almost always starts with search and frequently involves identifying items of interest in the found text, bringing these two together has the potential to change both substantially. The SearchIE approach makes it possible for someone to build personalized extractors contextualized by their topical interests. The result is that the technology can radically improve online searching for lay persons as well as professionals by significantly reducing the time needed to focus queries into relevant information. It does not appear that the information extraction task has ever been approached directly as a search task. SearchIE is unique in bringing an information retrieval (search) mindset to the extraction problem, providing new capabilities that are either impossible or extremely difficult in the traditional "annotate then detect" model of the problem. This project will investigate the fundamental issues raised by the SearchIE approach. What models can best integrate extraction and search in new settings where they can truly happen simultaneously? How can a searcher describe and edit a model for the types of interest? Can an interactively developed model be a springboard into a machine learned model and when is there enough information to do that? Does using topical context to limit the scope of extraction provide the expected accuracy gains using SearchIE's approach? What data structure modifications are needed to fully implement SearchIE so that it is efficient as well as effective? How well does this approach fare on additional standard test collections? Addressing the systems and algorithmic issues are fundamental problems that have the potential to greatly impact both search and extraction. For further information, see the project's web site at http://ciir.cs.umass.edu/research/searchie.

该研究计划将调查并实施 SearchIE，这是一种基于搜索的信息“提取”方法。 SearchIE 将允许快速、个性化、情境地识别文本中的对象或动作类型，这些类型可能对复杂的搜索任务有用。现代搜索引擎通常提供某种机制来指示查询关键字仅在出现在人名或位置时才与文档匹配。为了实现这一点，注释者在文本中发现并标记了大量的人名（例如），应用机器学习算法来了解哪些低级特征指示姓名类型，然后得出该类型的分类器贯穿文档集合。然后可以编写一个查询，表示“巴黎用作人名而不是位置”。不幸的是，现有的方法无法满足对新颖的、意想不到的类型感兴趣的搜索者的需求，例如捕鲸船的名称、维多利亚女王海军的军官、当地的酒吧。目前无法处理此类示例，因为分类器需要提前训练和运行，这是一个昂贵的数据标记过程，对于许多搜索任务来说过于艰巨。由于在线信息收集几乎总是从搜索开始，并且经常涉及识别找到的文本中感兴趣的项目，因此将两者结合在一起有可能大大改变两者。 SearchIE 方法使人们可以根据自己的主题兴趣构建个性化的提取器。结果是，该技术可以通过显着减少将查询集中到相关信息所需的时间，从根本上改善非专业人士和专业人士的在线搜索。信息提取任务似乎从未被直接视为搜索任务。 SearchIE 的独特之处在于将信息检索（搜索）思维方式引入提取问题，提供了传统“注释然后检测”问题模型中不可能或极其困难的新功能。该项目将研究 SearchIE 方法提出的基本问题。哪些模型可以在新环境中最好地将提取和搜索集成在一起，使它们能够真正同时发生？搜索者如何描述和编辑感兴趣类型的模型？交互式开发的模型可以成为机器学习模型的跳板吗？什么时候有足够的信息来做到这一点？使用主题上下文来限制提取范围是否可以使用 SearchIE 的方法提供预期的准确性增益？需要对数据结构进行哪些修改才能完全实现 SearchIE，使其高效且有效？这种方法在其他标准测试集合上的表现如何？解决系统和算法问题是有可能对搜索和提取产生重大影响的基本问题。有关更多信息，请参阅该项目的网站：http://ciir.cs.umass.edu/research/searchie。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Reinforcement Learning Framework for Relevance Feedback

James Allan其他文献

Polymorphism in glutathione S-transferase P1 is associated with susceptibility to chemotherapy-induced leukemia

谷胱甘肽 S-转移酶 P1 的多态性与化疗诱发的白血病易感性相关

DOI：
10.1073/pnas.191211198
发表时间：
2001-09-11
期刊：
Proceedings of the National Academy of Sciences of the United States of America
影响因子：
11.1
作者：
James Allan;C. Wild;S. Rollinson;E. Willett;A. Moorman;G. Dovey;P. Roddam;E. Roman;R. Cartwright;Gareth J. Morgan
通讯作者：
Gareth J. Morgan

3‐methyladenine DNA glycosylases: structure, function, and biological importance

3-甲基腺嘌呤 DNA 糖基化酶：结构、功能和生物学重要性

DOI：
10.1002/(sici)1521-1878(199908)21:8<668::aid-bies6>3.0.co;2-d
发表时间：
1999-08-01
期刊：
BioEssays
影响因子：
4
作者：
M. D. Wyatt;James Allan;A. Lau;T. Ellenberger;L. Samson
通讯作者：
L. Samson

Enhancing the thermal conductivity of ethylene-vinyl acetate (EVA) in a photovoltaic thermal collector

提高光伏集热器中乙烯-醋酸乙烯酯 (EVA) 的导热性

DOI：
10.1063/1.4944557
发表时间：
2016-03-15
期刊：
AIP Advances
影响因子：
1.6
作者：
James Allan;H. Pinder;Z. Dehouche
通讯作者：
Z. Dehouche

A content based approach for discovering missing anchor text for web search

一种基于内容的方法，用于发现网络搜索缺失的锚文本

DOI：
10.1145/1835449.1835521
发表时间：
2010-07-19
期刊：
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
影响因子：
0
作者：
Xing Yi;James Allan
通讯作者：
James Allan

A Multi-Task Architecture on Relevance-based Neural Query Translation

基于相关性的神经查询翻译的多任务架构

DOI：
10.18653/v1/p19-1639
发表时间：
2019-06-17
期刊：
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
影响因子：
0
作者：
Sheikh Muhammad Sarwar;Hamed Bonab;James Allan
通讯作者：
James Allan