RI: Small: Collaborative Research: Batch Learning from Logged Bandit Feedback

RI：小型：协作研究：从记录的强盗反馈中批量学习

基本信息

批准号：
1615706
负责人：
Thorsten Joachims
金额：
$ 39.98万
依托单位：
Cornell University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-07-01 至 2019-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1615706&HistoricalAwards=false
关键词：
RI Small Collaborative Research Batch

项目摘要

Log data is one of the most ubiquitous forms of data available, as it can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement platforms) at little cost. Making huge amounts of log data accessible to learning algorithms provides the potential to acquire knowledge at unprecedented scale. Furthermore, the ability to learn from log data can enable effective machine learning even in systems where manual labeling of training data is not economically viable. Log data, however, provides only partial information -- "contextual-bandit feedback" -- limited to the particular actions taken by the system. The feedback for all the other actions the system could have taken is typically not known. This makes learning from log data fundamentally different from traditional supervised learning, where "correct" predictions together with a loss function provide full-information feedback.This project tackles the problem of Batch Learning from Bandit Feedback (BLBF) by developing principled learning methods and algorithms that can be trained with logs containing contextual-bandit feedback. First, the project develops the learning theory of BLBF, especially with respect to understanding the use and design of counterfactual risk estimators for BLBF. Second, the project derives new learning methods for BLBF. Past work has already demonstrated that Conditional Random Fields can be trained in the BLBF setting, and the project derives BLBF analogs of other learning methods as well. Third, the project derives scalable training algorithms for these BLBF methods to enable large-scale applications. And, finally, the project validates the methods with real-world data from operational systems.

日志数据是可用的最普遍的数据形式之一，因为它可以从各种系统（例如搜索引擎，推荐系统，广告放置平台）中记录下来。制作大量的学习算法可访问的日志数据，为以前所未有的规模获取知识提供了潜力。此外，从日志数据中学习的能力即使在培训数据的手动标记在经济上也不可行的系统中也可以实现有效的机器学习。但是，日志数据仅提供部分信息 - “上下文伴随的反馈” - 仅限于系统采取的特定操作。系统可能采取的所有其他动作的反馈通常是不知道的。这使得从日志数据中学习从根本上与传统监督学习不同，在这些学习中，“正确”的预测与损失功能一起提供了完全信息反馈。本项目通过开发可以使用包含上下文伴随式伴随式反馈的日志来培训的原木来解决从匪徒反馈（BLBF）中批量学习的问题。首先，该项目发展了BLBF的学习理论，尤其是在了解BLBF的反事实风险估计器的使用和设计方面。其次，该项目为BLBF提供了新的学习方法。过去的工作已经证明，有条件的随机字段可以在BLBF设置中进行训练，并且该项目也得出了其他学习方法的BLBF类似物。第三，该项目为这些BLBF方法提供了可扩展的培训算法，以实现大规模应用。最后，该项目用来自操作系统的现实世界数据验证了这些方法。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Thorsten Joachims其他文献

Rankings for Two-Sided Market Platforms

双边市场平台排名

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Yi;Thorsten Joachims
通讯作者：
Thorsten Joachims

Localify.org: Locally-focus Music Artist and Event Recommendation

Localify.org：本地关注的音乐艺术家和活动推荐

DOI：
发表时间：
2023
期刊：
ACM Conference on Recommender Systems
影响因子：
0
作者：
Douglas Turnbull;April Trainor;Douglas Turnbull;Elizabeth Richards;Kieran Bentley;Victoria Conrad;Paul Gagliano;Cassandra Raineault;Thorsten Joachims
通讯作者：
Thorsten Joachims