Frameworks: arXiv as an accessible large-scale open research platform
框架:arXiv 作为一个可访问的大型开放研究平台
基本信息
- 批准号:2311521
- 负责人:
- 金额:$ 496.65万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-01-01 至 2028-12-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
arXiv is an open-access repository that has played a leading role in disciplines such as computer science, mathematics and physics for over 30 years. It hosts more than 2 million scientific papers and has a large user community. Each month there are approximately 5 million active users and 100 million web accesses. Despite its size and usage, arXiv has very limited search and recommendation functionality. In order to better serve the arXiv community, this project is building a new generation of search and recommendation functionality and simultaneously creating a research sandbox to reduce reliance on third-party, commercial services. To make arXiv's trove of scientific content accessible to the visually impaired, support is being added for well-structured HTML as well as PDF. Improved discovery of research results provides broad multidisciplinary benefits across areas of science. These include less researcher time wasted browsing through large amounts of irrelevant papers, revelation of "unknown unknowns," and accelerating research across different subject areas through unexpected synergies. Improved recommendation tools, which can provide unbiased and diverse sources of relevant research results and techniques, are urgently needed to break silos. arXiv will provide improved mechanisms for scientists to find out about important advances, both in their own field of expertise and in adjacent fields.This project includes 4 major focus areas: Open A/B Testing, Neural Representations of Scientific Text, arXiv Dynamics, and Security & Privacy. (1) Open A/B Testing enables arXiv to become a platform for A/B testing of search and recommendation algorithms. In addition to online A/B testing, offline A/B testing is provided using historical data along with counterfactual estimators for policy rewards. (2) Neural Representation of Scientific Text provides a vector-based representation of scientific texts (documents, paragraphs, and sentences) appropriate for multiple tasks, including citation, author, title, and keyword prediction. Differentiable search indices are investigated due to their potential to provide additional search performance improvements without requiring incremental re-training. Finally, this supports the construction of a scientific question-answering system which can also be used as a context-sensitive "chat-bot" enabling researchers to converse with and get a list of recent publications relevant to their interests. (3) The arXiv Dynamics project investigates how scientific fields grow, shrink, and transform over time. Creating a "trending and emerging arXiv topics" pattern recognition system predicts how interesting current and historical articles are to researchers. Research is investigating methods to remove the "rich-get-richer" effect from this model, to correct the model for the effects of the users' historical interactions with the system, and to track performance and solicit user feedback as these models change over time. (4) Under Security & Privacy arXiv's privacy policy is updated so that users are aware of how their (meta-)data may be used and the protections that will be deployed to protect their privacy. A "Layer 1" API allows researchers to make coarse-grained queries on anonymized arXiv weblogs and a "Layer 2" API which allows researchers to securely experiment on arXiv metadata and weblogs. Privacy is preserved by a combination of query restrictions and researcher usage agreements. A machine-learning API layer is being developed which supports differential privacy, and allows researchers to investigate the utility of these tools for novel ML-based applications, such as free-form question answering about scientific texts, neural recommender systems, etc.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Division of Physics within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
arXiv 是一个开放访问存储库,30 多年来一直在计算机科学、数学和物理等学科中发挥主导作用。它拥有超过 200 万篇科学论文,并拥有庞大的用户社区。每个月大约有 500 万活跃用户和 1 亿次网络访问。尽管 arXiv 的规模和用途都很大,但它的搜索和推荐功能非常有限。为了更好地服务 arXiv 社区,该项目正在构建新一代搜索和推荐功能,同时创建研究沙箱以减少对第三方商业服务的依赖。为了使视障人士能够访问 arXiv 的科学内容宝库,添加了对结构良好的 HTML 和 PDF 的支持。改进研究成果的发现可为科学领域提供广泛的多学科效益。其中包括减少研究人员浪费在浏览大量不相关论文上的时间、揭示“未知的未知数”,以及通过意想不到的协同作用加速不同学科领域的研究。迫切需要改进的推荐工具,能够提供公正且多样化的相关研究成果和技术来源,以打破孤岛。 arXiv 将为科学家提供改进的机制,以发现他们自己的专业领域和邻近领域的重要进展。该项目包括 4 个主要重点领域:开放 A/B 测试、科学文本的神经表示、arXiv 动力学和安全与隐私。 (1) 开放式A/B测试使arXiv成为搜索和推荐算法A/B测试的平台。除了在线 A/B 测试之外,还使用历史数据以及政策奖励的反事实估计器提供离线 A/B 测试。 (2) 科学文本的神经表示提供了适合多种任务的科学文本(文档、段落和句子)的基于向量的表示,包括引文、作者、标题和关键词预测。研究可微搜索索引是因为它们有可能提供额外的搜索性能改进,而无需增量重新训练。最后,这支持构建一个科学问答系统,该系统也可以用作上下文敏感的“聊天机器人”,使研究人员能够与他们的兴趣相关的最新出版物进行交谈并获得列表。 (3) arXiv Dynamics 项目研究科学领域如何随着时间的推移而增长、缩小和转变。创建“趋势和新兴 arXiv 主题”模式识别系统可以预测当前和历史文章对研究人员的兴趣程度。研究正在研究从该模型中消除“富者愈富”效应的方法,根据用户与系统的历史交互的影响修正模型,并在这些模型随时间变化时跟踪性能并征求用户反馈。 (4) 在安全和隐私项下,arXiv 的隐私政策已更新,以便用户了解他们的(元)数据如何使用以及为保护他们的隐私而将采取的保护措施。 “第 1 层”API 允许研究人员对匿名 arXiv 博客进行粗粒度查询,“第 2 层”API 允许研究人员安全地在 arXiv 元数据和博客上进行实验。通过查询限制和研究人员使用协议的组合来保护隐私。正在开发一个机器学习 API 层,该层支持差异隐私,并允许研究人员研究这些工具在基于 ML 的新颖应用程序中的实用性,例如有关科学文本的自由格式问答、神经推荐系统等。该奖项由先进网络基础设施办公室颁发,并得到计算机和信息科学与工程理事会信息与智能系统司以及数学和物理科学理事会物理司的共同支持。该奖项反映了 NSF 的法定使命,并已被视为值得通过使用基金会的智力优势和更广泛的影响审查标准进行评估来提供支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ramin Zabih其他文献
Ramin Zabih的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ramin Zabih', 18)}}的其他基金
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
RI: Medium: Collaborative Research: Graph Cut Algorithms for Domain-specific Higher Order Priors
RI:中:协作研究:特定领域高阶先验的图割算法
- 批准号:
1161860 - 财政年份:2012
- 资助金额:
$ 496.65万 - 项目类别:
Continuing Grant
RI-Medium: Collaborative Research: Graph Cut Algorithms for Linear Inverse Systems
RI-Medium:协作研究:线性逆系统的图割算法
- 批准号:
0803705 - 财政年份:2008
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
Dynamic Contextual Recognition of Moving Objects
移动物体的动态上下文识别
- 批准号:
9900115 - 财政年份:1999
- 资助金额:
$ 496.65万 - 项目类别:
Standard Grant
相似海外基金
Academic information system that integrates various viewpoints according to users' research skill
根据用户的研究技能整合各种观点的学术信息系统
- 批准号:
19H04421 - 财政年份:2019
- 资助金额:
$ 496.65万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Research on Preprint Archive as Socio-Technical Interaction Network
作为社会技术交互网络的预印本档案研究
- 批准号:
21700267 - 财政年份:2009
- 资助金额:
$ 496.65万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Entwicklung eines Modells zur gemeinschaftlichen Finanzierung der Open Access-Plattform arXiv"
开放获取平台 arXiv 联合融资模型的开发”
- 批准号:
194934317 - 财政年份:
- 资助金额:
$ 496.65万 - 项目类别:
Science Communication, Research Data, eResearch (Scientific Library Services and Information Systems)