Frameworks: arXiv as an accessible large-scale open research platform

框架:arXiv 作为一个可访问的大型开放研究平台

基本信息

  • 批准号:
    2311521
  • 负责人:
  • 金额:
    $ 496.65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-01-01 至 2028-12-31
  • 项目状态:
    未结题

项目摘要

arXiv is an open-access repository that has played a leading role in disciplines such as computer science, mathematics and physics for over 30 years. It hosts more than 2 million scientific papers and has a large user community. Each month there are approximately 5 million active users and 100 million web accesses. Despite its size and usage, arXiv has very limited search and recommendation functionality. In order to better serve the arXiv community, this project is building a new generation of search and recommendation functionality and simultaneously creating a research sandbox to reduce reliance on third-party, commercial services. To make arXiv's trove of scientific content accessible to the visually impaired, support is being added for well-structured HTML as well as PDF. Improved discovery of research results provides broad multidisciplinary benefits across areas of science. These include less researcher time wasted browsing through large amounts of irrelevant papers, revelation of "unknown unknowns," and accelerating research across different subject areas through unexpected synergies. Improved recommendation tools, which can provide unbiased and diverse sources of relevant research results and techniques, are urgently needed to break silos. arXiv will provide improved mechanisms for scientists to find out about important advances, both in their own field of expertise and in adjacent fields.This project includes 4 major focus areas: Open A/B Testing, Neural Representations of Scientific Text, arXiv Dynamics, and Security & Privacy. (1) Open A/B Testing enables arXiv to become a platform for A/B testing of search and recommendation algorithms. In addition to online A/B testing, offline A/B testing is provided using historical data along with counterfactual estimators for policy rewards. (2) Neural Representation of Scientific Text provides a vector-based representation of scientific texts (documents, paragraphs, and sentences) appropriate for multiple tasks, including citation, author, title, and keyword prediction. Differentiable search indices are investigated due to their potential to provide additional search performance improvements without requiring incremental re-training. Finally, this supports the construction of a scientific question-answering system which can also be used as a context-sensitive "chat-bot" enabling researchers to converse with and get a list of recent publications relevant to their interests. (3) The arXiv Dynamics project investigates how scientific fields grow, shrink, and transform over time. Creating a "trending and emerging arXiv topics" pattern recognition system predicts how interesting current and historical articles are to researchers. Research is investigating methods to remove the "rich-get-richer" effect from this model, to correct the model for the effects of the users' historical interactions with the system, and to track performance and solicit user feedback as these models change over time. (4) Under Security & Privacy arXiv's privacy policy is updated so that users are aware of how their (meta-)data may be used and the protections that will be deployed to protect their privacy. A "Layer 1" API allows researchers to make coarse-grained queries on anonymized arXiv weblogs and a "Layer 2" API which allows researchers to securely experiment on arXiv metadata and weblogs. Privacy is preserved by a combination of query restrictions and researcher usage agreements. A machine-learning API layer is being developed which supports differential privacy, and allows researchers to investigate the utility of these tools for novel ML-based applications, such as free-form question answering about scientific texts, neural recommender systems, etc.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Information and Intelligent Systems in the Directorate for Computer and Information Science and Engineering and the Division of Physics within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Arxiv是一个开放式存储库,在计算机科学,数学和物理学等学科中发挥了领导作用,已有30多年的历史了。它拥有超过200万张科学论文,并拥有大型用户社区。每个月,大约有500万活跃用户和1亿个网络访问。尽管它的尺寸和使用情况,ARXIV的搜索和推荐功能非常有限。为了更好地为Arxiv社区服务,该项目正在建立新一代的搜索和建议功能,并同时创建一个研究沙箱,以减少对第三方商业服务的依赖。为了使Arxiv的科学内容可访问视力障碍,正在为结构良好的HTML和PDF添加支持。改进的研究结果的发现可为科学领域提供广泛的多学科利益。其中包括减少研究人员的时间浪费,浏览大量无关的论文,“未知未知”的启示,以及通过意外的协同作用加速了不同主题领域的研究。迫切需要改进的推荐工具,可以提供无偏见的相关研究结果和技术来源,以破坏孤岛。 ARXIV将为科学家提供改进的机制,以便在自己的专业知识和邻近领域中了解重要进展。该项目包括4个主要重点领域:开放A/B测试,科学文本的神经表示,ARXIV动态以及安全和隐私。 (1)打开A/B测试使ARXIV成为搜索和建议算法A/B测试的平台。除了在线A/B测试外,使用历史数据以及反事实估算器的策略奖励还提供了离线A/B测试。 (2)科学文本的神经表示提供了基于向量的科学文本(文档,段落和句子)的表示,适用于多个任务,包括引文,作者,标题和关键字预测。可微分的搜索指数由于其潜力提供了额外的搜索性能改进而无需逐步训练而进行了研究。最后,这支持了一个科学的提问系统的构建,该系统也可以用作上下文敏感的“聊天机器人”,使研究人员能够与之交谈并获得与他们的利益相关的最新出版物的列表。 (3)ARXIV动力学项目研究了科学领域如何随着时间的推移的增长,收缩和转变。创建“趋势和新兴的Arxiv主题”模式识别系统预测了当前和历史文章对研究人员的有趣和历史文章的程度。研究正在研究从该模型中删除“丰富的富丽格”效应的方法,以纠正用户与系统的历史互动效果的模型,并随着这些模型随时间而变化,跟踪性能并征求用户反馈。 (4)根据安全与隐私,Arxiv的隐私政策进行了更新,以便用户知道如何使用其(元)数据以及将部署的保护措施来保护其隐私。 “第1层” API允许研究人员对匿名的Arxiv博客和“第2层” API进行粗粒的查询,该查询允许研究人员对Arxiv元数据和博客进行安全实验。查询限制和研究人员使用协议的结合可以保留隐私。正在开发一个机器学习的API层,以支持差异隐私,并允许研究人员调查这些工具用于新型基于ML的应用程序的实用性,例如回答有关科学文本,神经推荐系统等的自由形式的问题。该奖项反映了NSF的法定任务,并通过使用基金会的知识分子优点和更广泛的影响审查标准评估值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ramin Zabih其他文献

Ramin Zabih的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ramin Zabih', 18)}}的其他基金

BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
  • 批准号:
    1447473
  • 财政年份:
    2015
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Standard Grant
RI: Medium: Collaborative Research: Graph Cut Algorithms for Domain-specific Higher Order Priors
RI:中:协作研究:特定领域高阶先验的图割算法
  • 批准号:
    1161860
  • 财政年份:
    2012
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Continuing Grant
RI-Medium: Collaborative Research: Graph Cut Algorithms for Linear Inverse Systems
RI-Medium:协作研究:线性逆系统的图割算法
  • 批准号:
    0803705
  • 财政年份:
    2008
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Standard Grant
Dynamic Contextual Recognition of Moving Objects
移动物体的动态上下文识别
  • 批准号:
    9900115
  • 财政年份:
    1999
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Standard Grant

相似海外基金

Academic information system that integrates various viewpoints according to users' research skill
根据用户的研究技能整合各种观点的学术信息系统
  • 批准号:
    19H04421
  • 财政年份:
    2019
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Research on Preprint Archive as Socio-Technical Interaction Network
作为社会技术交互网络的预印本档案研究
  • 批准号:
    21700267
  • 财政年份:
    2009
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Entwicklung eines Modells zur gemeinschaftlichen Finanzierung der Open Access-Plattform arXiv"
开放获取平台 arXiv 联合融资模型的开发”
  • 批准号:
    194934317
  • 财政年份:
  • 资助金额:
    $ 496.65万
  • 项目类别:
    Science Communication, Research Data, eResearch (Scientific Library Services and Information Systems)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了