CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
基本信息
- 批准号:1822986
- 负责人:
- 金额:$ 37.67万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
For more than a decade, the software, datasets, and online services developed and provided by the Lemur Project have supported and enabled a large body of academic and commercial research on search engines, information retrieval, and other areas of computer science that analyze and process human language. This project makes critical enhancements to Lemur Project infrastructure, operates the infrastructure for another three years, and positions it for long-term sustainability. As part of the enhancements, the Galago search engine is enhanced to provide stronger integration of neural networks and other machine learning methods. A new dataset, ClueWeb2020, is developed to replace the widely-used ClueWeb09 and ClueWeb12 datasets. These investments will support advanced research for the next decade. The advanced search capabilities developed for the project's open-source Indri and Galago search engines, which are widely used for research, are added to the open-source Lucene search engine, which is widely used by industry. New software applications are developed to simplify migration between Lemur Project search engines and Lucene. These investments improve the state-of-the-art of software important to industry and enable researchers to migrate research to more widely-used software. The Lemur Project's research infrastructure attracted a substantial research user community because it easily enables leading-edge research. These enhancements enable researchers in information retrieval and related areas to carry out a much broader range of experiments and to share their results. Research and industry development supported by the new Lemur Project software will create a new generation of more capable search engines for a variety of tasks.The project is organized around three types of activities: Sustaining software, sustaining datasets, and operation. The project achieves long-term software sustainability by adding support for Indri and Galago functionality and creating integration and migration paths with the open-source Lucene search engine, which has large user and volunteer-developer communities. Research done with Galago or Indri will thus be reproducible in Lucene and more accessible to Lucene's industry users. The project also extends the Galago Application Programming Interface to support the newest developments in neural network (deep learning) document ranking technologies, which now are being studied widely and expected in a state-of-the-art research system. It broadens the utility of Ranklib by supporting neural algorithms for better comparison with high quality learning to rank approaches, and broadens the utility of the Sifaka text mining application with support for additional document and machine learning formats. The older ClueWeb09 and ClueWeb12 datsets are superseded by a new ClueWeb2020 dataset that is designed to last a decade and support research on newer learning-to-rank and neural network (deep learning) ranking algorithms. The project maintains and operates the existing infrastructure, in the form of software maintenance and support; dataset licensing and distribution; and operation of online search services. The new Lemur Project infrastructure supports a broad range of Information Retrieval research, for example, research on retrieval models; how to train learned rankers; use of semi-structured knowledge bases; result diversification; query optimization; and distributed search. In particular, it greatly improves support for research on learned and neural (deep learning) ranking algorithms, which have become important research topics in recent years. The ClueWeb datasets are used by a broad human language technologies research community. This project makes enhancements that sustain this infrastructure for the research community for at least the next decade.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
十多年来,Lemur Project开发和提供的软件,数据集和在线服务已支持并实现了大量有关搜索引擎,信息检索以及分析和处理人类语言的计算机科学领域的大量学术和商业研究。该项目对狐猴项目基础设施进行了关键的增强,再运营基础设施三年,并为长期可持续性定位。作为增强功能的一部分,Galago搜索引擎得到了增强,以提供更强的神经网络和其他机器学习方法的集成。开发了一个新的数据集Clueweb2020,以替换广泛使用的Clueweb09和Chueweb12数据集。这些投资将支持未来十年的高级研究。为该项目的开源Indri和Galago搜索引擎开发的先进搜索功能,该功能被广泛用于研究,并被添加到开源的Lucene搜索引擎中,该引擎已被行业广泛使用。开发了新的软件应用程序,以简化Lemur Project搜索引擎和Lucene之间的迁移。这些投资改善了对行业重要的软件的最新软件,并使研究人员能够将研究迁移到更广泛使用的软件。 Lemur Project的研究基础设施吸引了大量的研究用户社区,因为它可以轻松实现领先的研究。这些增强功能使信息检索和相关领域的研究人员能够进行更广泛的实验并分享其结果。由新的Lemur项目软件支持的研究和行业发展将为各种任务创建新一代功能强大的搜索引擎。该项目围绕三种类型的活动组织:维持软件,维持数据集和操作。该项目通过增加对Indri和Galago功能的支持并与开源的Lucene搜索引擎建立集成和迁移路径,从而实现了长期的软件可持续性,该搜索引擎具有大型的用户和志愿者社区。因此,使用Galago或Indri进行的研究将在Lucene中重现,Lucene的行业用户更容易获得。该项目还扩展了Galago应用程序编程界面,以支持神经网络(深度学习)文档排名技术的最新开发技术,该技术现在正在对最先进的研究系统进行广泛研究和预期。它通过支持神经算法来更好地与高质量学习的方法进行比较,从而扩大了Ranklib的实用性,并扩大了Sifaka文本挖掘应用程序的实用性,并支持其他文档和机器学习格式。较旧的clueweb09和clueweb12数据集被新的clueweb2020数据集所取代,该数据集旨在持续十年,并支持对较新的学习对秩和神经网络(深度学习)排名算法的研究。该项目以软件维护和支持的形式维护和运营现有的基础架构;数据集许可和分发;和在线搜索服务的操作。新的Lemur项目基础设施支持广泛的信息检索研究,例如关于检索模型的研究;如何训练学识渊博的排名者;使用半结构化知识库;结果多样化;查询优化;和分布式搜索。特别是,它极大地改善了对学习和神经(深度学习)排名算法的研究,这些算法已成为近年来的重要研究主题。 ChueWeb数据集由广泛的人类语言技术研究所使用。该项目至少在接下来的十年中为研究社区维持这种基础设施的增强能力。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛的影响评估标准通过评估来支持的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据
数据更新时间:2024-06-01
James Allan其他文献
Introduction to topic detection and tracking
- DOI:10.1007/978-1-4615-0933-2_110.1007/978-1-4615-0933-2_1
- 发表时间:20022002
- 期刊:
- 影响因子:0
- 作者:James AllanJames Allan
- 通讯作者:James AllanJames Allan
Using CrowdLogger for in situ information retrieval system evaluation
使用CrowdLogger进行现场信息检索系统评估
- DOI:10.1145/2513150.251316410.1145/2513150.2513164
- 发表时间:20132013
- 期刊:
- 影响因子:0
- 作者:H. Feild;James AllanH. Feild;James Allan
- 通讯作者:James AllanJames Allan
A semantic data framework to support data-driven demand forecasting
支持数据驱动的需求预测的语义数据框架
- DOI:10.1088/1742-6596/2600/2/02200110.1088/1742-6596/2600/2/022001
- 发表时间:20232023
- 期刊:
- 影响因子:0
- 作者:James Allan;Francesca Mangili;Marco Derboni;Luis Gisler;A. Hainoun;A. Rizzoli;Luca Ventriglia;M. SulzerJames Allan;Francesca Mangili;Marco Derboni;Luis Gisler;A. Hainoun;A. Rizzoli;Luca Ventriglia;M. Sulzer
- 通讯作者:M. SulzerM. Sulzer
Reranking search results for sparse queries
对稀疏查询的搜索结果重新排序
- DOI:10.1145/2063576.206360610.1145/2063576.2063606
- 发表时间:20112011
- 期刊:
- 影响因子:0
- 作者:Elif Aktolga;James AllanElif Aktolga;James Allan
- 通讯作者:James AllanJames Allan
Recent Experiments with INQUERY
最近的 INQUERY 实验
- DOI:10.21236/ada47055410.21236/ada470554
- 发表时间:19951995
- 期刊:
- 影响因子:0
- 作者:James Allan;Lisa Ballesteros;Jamie Callan;W. Bruce Croft;Zhihong LuJames Allan;Lisa Ballesteros;Jamie Callan;W. Bruce Croft;Zhihong Lu
- 通讯作者:Zhihong LuZhihong Lu
共 77 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
James Allan的其他基金
CondensabLe AeRosol from non Ideal Stove Emissions (CLARISE)
非理想炉排放产生的冷凝气溶胶 (CLARISE)
- 批准号:NE/X000923/1NE/X000923/1
- 财政年份:2023
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Research GrantResearch Grant
III: Medium: Collaborative Research: Athena: Learning-oriented Search with Personalized Learning Flows
III:媒介:协作研究:Athena:具有个性化学习流程的面向学习的搜索
- 批准号:21062822106282
- 财政年份:2021
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Continuing GrantContinuing Grant
EAGER: Dynamic Contextual Explanation of Search Results
EAGER:搜索结果的动态上下文解释
- 批准号:20394492039449
- 财政年份:2020
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
Soot Aerodynamic Size Selection for Optical properties (SASSO)
光学特性烟灰空气动力学尺寸选择 (SASSO)
- 批准号:NE/S00212X/1NE/S00212X/1
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Research GrantResearch Grant
III: Small: Mirador: Explainable Computational Models for Recognizing and Understanding Controversial Topics Encountered Online
III:小:Mirador:用于识别和理解网上遇到的有争议话题的可解释计算模型
- 批准号:18136621813662
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
I-Corps: Probabilistically Detecting Controversy
I-Corps:概率性检测争议
- 批准号:17210691721069
- 财政年份:2017
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
Megacity Delhi atmospheric emission quantification, assessment and impacts (DelhiFlux) - Manchester
大城市德里大气排放量化、评估和影响 (DelhiFlux) - 曼彻斯特
- 批准号:NE/P016472/1NE/P016472/1
- 财政年份:2016
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Research GrantResearch Grant
Sources and Emissions of Air Pollutants in Beijing (Manchester)
北京(曼彻斯特)空气污染物来源及排放
- 批准号:NE/N007123/1NE/N007123/1
- 财政年份:2016
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Research GrantResearch Grant
III: Small: Interactive Construction of Complex Query Models
III:小:复杂查询模型的交互构建
- 批准号:16174081617408
- 财政年份:2016
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
III: Small: Topical Positioning System (TPS) for Informed Reading of Web Pages
III:小:网页知情阅读的主题定位系统(TPS)
- 批准号:12172811217281
- 财政年份:2012
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
相似国自然基金
基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
- 批准号:82374375
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
- 批准号:32170497
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
- 批准号:42171358
- 批准年份:2021
- 资助金额:59.00 万元
- 项目类别:面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
- 批准号:
- 批准年份:2021
- 资助金额:59 万元
- 项目类别:面上项目
相似海外基金
CRI: CI-SUSTAIN: Racket on Alternative Platforms
CRI:CI-SUSTAIN:替代平台上的喧嚣
- 批准号:18232441823244
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Continuing GrantContinuing Grant
CRI:CI:SUSTAIN: Next-Generation, Sustainable Infrastructure for the RF-Powered Computing Community
CRI:CI:SUSTAIN:射频驱动计算社区的下一代可持续基础设施
- 批准号:18231481823148
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:18232881823288
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:18539191853919
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
- 批准号:18232921823292
- 财政年份:2018
- 资助金额:$ 37.67万$ 37.67万
- 项目类别:Standard GrantStandard Grant