CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data

CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持

基本信息

  • 批准号:
    1823292
  • 负责人:
  • 金额:
    $ 23万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-08-01 至 2018-10-31
  • 项目状态:
    已结题

项目摘要

Access to the scientific and scholarly literature has changed radically in recent decades. Increasingly researchers and scholars make their publications freely available on the Web. Taking advantage of this opportunity, new scientific search engine tools have been developed such as Google Scholar, Semantic Scholar, and CiteSeer, now CiteSeerX. CiteSeerX has become one of the most comprehensive and widely-used online public resources for the Computer and Information Science and Engineering (CISE) research community. Millions of CiteSeerX Portable Document Format (PDF) documents are indexed by Google. CiteSeerX is unique among digital library search engines. It is open access, most all of its documents are harvested from the public Web, and users have full-text access to all documents searchable on its website. Moreover, it provides all automatically extracted metadata and citation context via an Open Archive Initiative (OAI) metadata service interface and bulk downloads on a public cloud - all under a Creative Commons license. This service is usually not available from other scholarly search engines. CiteSeerX performs automatic extraction and indexing of tables (in production), figures (developed)}, and algorithms (developed), capabilities rarely seen in other scholarly search engines. CiteSeerX provides its open source software and architecture on GitHub. At this time none of the other above-mentioned systems release their digital library software. Utilizing the established CiteSeerX infrastructure, this proposal aims to create a sustainable CiteSeerX system with new data resources and a much larger data collection. We will develop a new system that runs with low operation overhead, without a single point of failure, and that provides quality and enriched data and metadata in portable formats that will be available through accessible user interfaces. We will ingest all freely accessible scientific documents on the Web, currently estimated to be 30 million. CiteSeerX will make available high-quality metadata through an accessible Web User Interface, Application Programming Interface, and data dumps. SeerSuite, the platform on which CiteSeerX is built, will be refactored so as to be an easily deployable and configurable scholarly digital library framework. It will be built on commercial grade open source software. In addition, we will provide searchable semantic metadata, such as key phrases and disambiguated author names, and non-textual content such as data from figures, tables, algorithms, and equations. For long-term sustainability we will explore different monetization models. The result will be a refactored digital library search engine that provides stable, usable, and reliable data services on multiple types of scientific documents built on a portable, maintainable, and self-contained framework that can be deployed for other research document digital collections. Source code will be hosted at https://github.com/SeerLabs. System development and related research will be published in relevant venues and be made publicly available.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近几十年来,获得科学和学术文献的访问发生了根本性的变化。越来越多的研究人员和学者可以在网络上自由使用。利用这一机会,已经开发了新的科学搜索引擎工具,例如Google Scholar,Smantic Sc​​holar和Citeseer,现在是Citeseerx。 Citeseerx已成为计算机和信息科学与工程(CISE)研究社区最全面,最广泛使用的在线公共资源之一。数以百万计的Citeseerx便携式文档格式(PDF)文档由Google索引。 Citeseerx在数字图书馆搜索引擎中是独一无二的。它是开放访问权限,其大多数文档都是从公共网络收集的,用户可以全文访问其网站上可搜索的所有文档。此外,它通过开放的档案计划(OAI)元数据服务界面(OAI)在公共云上提供了所有自动提取的元数据和引用上下文 - 所有这些都在公共云上 - 都在创意共享许可下。 该服务通常无法从其他学术搜索引擎中获得。 Citeseerx执行表(生产),数字(开发)}和算法(已开发)的表自动提取和索引,在其他学术搜索引擎中很少见。 Citeseerx在Github上提供了其开源软件和架构。此时,其他上述系统都没有发布其数字图书馆软件。该提案利用已建立的Citeseerx基础架构,旨在创建一个具有新的数据资源和更大数据收集的可持续Citeseerx系统。 我们将开发一个新的系统,该系统以低操作开销,没有单个故障点运行,并以便携式格式提供质量和丰富的数据和元数据,这些格式将通过可访问的用户界面可用。我们将在网络上摄入所有可自由访问的科学文档,目前估计为3000万。 Citeseerx将通过可访问的Web用户界面,应用程序编程界面和数据转储提供高质量元数据。 SeerSuite是Citeseerx构建的平台,将被重构,以便易于部署和可配置的学术数字图书馆框架。它将建立在商业级开源软件上。 此外,我们还将提供可搜索的语义元数据,例如关键短语和歧义的作者名称,以及非文本内容,例如来自图形,表格,表,算法和方程式的数据。对于长期可持续性,我们将探索不同的货币化模型。结果将是一个重构的数字图书馆搜索引擎,该引擎可在多种类型的科学文档上提供稳定,可用且可靠的数据服务,以便携式,可维护和独立的框架构建,这些框架可用于其他研究文档数字收藏。源代码将在https://github.com/seerlabs托管。系统开发和相关研究将在相关场所发表,并公开可用。该奖项反映了NSF的法定任务,并使用基金会的知识分子优点和更广泛的影响评估标准,被认为值得通过评估来获得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Cornelia Caragea其他文献

Metadata Repository
元数据存储库
  • DOI:
    10.1007/978-0-387-39940-9_3058
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cornelia Caragea;Vasant G Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;B. Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;O. Frieder;C. Jensen;N. Tradisauskas;E. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;S. Bouchenak;Noël de Palma;Wei Tang;C. Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;I. Manolescu;Jeffrey Xu Yu;E. Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;A. Deshpande;Carlos Guestrin;S. Madden;C. Leung;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;J. Yu;P. Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;M. De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;C. Vangenot;E. Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;M. Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary
  • 通讯作者:
    W. Cellary
Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
通过预训练语言模型进行科学的关键词识别和分类中间任务迁移学习
Semantic Tokenizer for Enhanced Natural Language Processing
用于增强自然语言处理的语义分词器
  • DOI:
    10.48550/arxiv.2304.12404
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sandeep Mehta;Darpan Shah;Ravindra Kulkarni;Cornelia Caragea
  • 通讯作者:
    Cornelia Caragea
A Group-Based Personalized Model for Image Privacy Classification and Labeling
基于群体的个性化图像隐私分类和标签模型
MEDLINE/ PubMed
MEDLINE/PubMed
  • DOI:
    10.1007/978-0-387-39940-9_3039
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Cornelia Caragea;V. Honavar;P. Boncz;P. Larson;S. Dietrich;Gonzalo Navarro;Bhavani Thuraisingham;Yan Luo;Ouri E. Wolfson;S. Beitzel;Eric C. Jensen;Ophir Frieder;Christian S. Jensen;N. Tradisauskas;Ethan V. Munson;A. Wun;K. Goda;Stephen E. Fienberg;Jiashun Jin;Guimei Liu;Nick Craswell;T. Pedersen;Cesare Pautasso;M. Moro;S. Manegold;B. Carminati;Marina Blanton;Sara Bouchenak;Noël de Palma;Wei Tang;Christoph Quix;M. Jeusfeld;R. K. Pon;David J. Buttler;W. Meng;P. Zezula;Michal Batko;Vlastislav Dohnal;J. Domingo;Denilson Barbosa;Ioana Manolescu;Jeffrey Xu Yu;Emmanuel Cecchet;Vivien Quéma;Xifeng Yan;G. Santucci;D. Zeinalipour;Panos K. Chrysanthis;Amol Deshpande;Carlos Guestrin;Samuel Madden;Carson Kai;R. H. Güting;Amarnath Gupta;Heng Tao Shen;G. Weikum;Ramesh Jain;Jeffrey Xu Yu;Paolo Ciaccia;K. Candan;M. Sapino;C. Meghini;F. Sebastiani;U. Straccia;F. Nack;V. S. Subrahmanian;Maria Vanina Martinez;D. Reforgiato;T. Westerveld;M. Sebillo;G. Vitiello;Maria De Marsico;K. Voruganti;C. Parent;S. Spaccapietra;Christelle Vangenot;Esteban Zimányi;Prasan Roy;S. Sudarshan;E. Puppo;Peer Kröger;Matthias Renz;H. Schuldt;Solmaz Kolahi;A. Unwin;W. Cellary
  • 通讯作者:
    W. Cellary

Cornelia Caragea的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Cornelia Caragea', 18)}}的其他基金

CHS: Small: Collaborative Research: Automating Relevance and Trust Detection in Social Media Data for Emergency Response
CHS:小型:协作研究:自动化社交媒体数据中的相关性和信任检测以进行紧急响应
  • 批准号:
    1903963
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
TWC: Small: Collaborative: Towards Privacy Preserving Online Image Sharing
TWC:小型:协作:实现隐私保护在线图像共享
  • 批准号:
    1903714
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
  • 批准号:
    1853919
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
大数据:IA:协作研究:社交媒体上危机相关数据分类的领域适应方法
  • 批准号:
    1741353
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CAREER: From Data to Knowledge: Extracting and Utilizing Concept Graphs in Online Environments
职业:从数据到知识:在线环境中提取和利用概念图
  • 批准号:
    1802358
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Continuing Grant
CAREER: From Data to Knowledge: Extracting and Utilizing Concept Graphs in Online Environments
职业:从数据到知识:在线环境中提取和利用概念图
  • 批准号:
    1652674
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Continuing Grant
III: Small: Collaborative Research: Keyphrase Extraction in Document Networks
III:小:协作研究:文档网络中的关键词提取
  • 批准号:
    1813571
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Continuing Grant
BIGDATA: IA: Collaborative Research: Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
大数据:IA:协作研究:社交媒体上危机相关数据分类的领域适应方法
  • 批准号:
    1802284
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
TWC: Small: Collaborative: Towards Privacy Preserving Online Image Sharing
TWC:小型:协作:实现隐私保护在线图像共享
  • 批准号:
    1814255
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CHS: Small: Collaborative Research: Automating Relevance and Trust Detection in Social Media Data for Emergency Response
CHS:小型:协作研究:自动化社交媒体数据中的相关性和信任检测以进行紧急响应
  • 批准号:
    1814271
  • 财政年份:
    2017
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant

相似国自然基金

基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
  • 批准号:
    82374375
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
  • 批准号:
    32170497
  • 批准年份:
    2021
  • 资助金额:
    58 万元
  • 项目类别:
    面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
  • 批准号:
    42171358
  • 批准年份:
    2021
  • 资助金额:
    59.00 万元
  • 项目类别:
    面上项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    59 万元
  • 项目类别:
    面上项目

相似海外基金

CRI: CI-SUSTAIN: Racket on Alternative Platforms
CRI:CI-SUSTAIN:替代平台上的喧嚣
  • 批准号:
    1823244
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Continuing Grant
CRI: CI-SUSTAIN: Collaborative Research: Sustaining Lemur Project Resources for the Long-Term
CRI:CI-SUSTAIN:合作研究:长期维持狐猴项目资源
  • 批准号:
    1822986
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CRI:CI:SUSTAIN: Next-Generation, Sustainable Infrastructure for the RF-Powered Computing Community
CRI:CI:SUSTAIN:射频驱动计算社区的下一代可持续基础设施
  • 批准号:
    1823148
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
  • 批准号:
    1823288
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
CRI: CI-SUSTAIN: Collaborative Research: CiteSeerX: Toward Sustainable Support of Scholarly Big Data
CRI:CI-SUSTAIN:协作研究:CiteSeerX:迈向学术大数据的可持续支持
  • 批准号:
    1853919
  • 财政年份:
    2018
  • 资助金额:
    $ 23万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了