III: Large: Collaborative Research: Web Archive Cooperative
III:大型:协作研究:网络档案合作社
基本信息
- 批准号:1009916
- 负责人:
- 金额:$ 235.05万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-08-01 至 2015-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Web Science is an emerging discipline that studies the Web: how human activity is shaped by Web interactions, how the Web can benefit society, and how Web technologies can be improved. Central to Web Science is access to data that records the history of the Web, as well as data that records human activity (e.g., posed queries, tagged pages, Twitter updates). It is currently very difficult for academic researchers to obtain such Web data because it is hard to locate, it is fragmented across diverse sites, and is recorded using inconsistent formats and strategies. This project will build a Web Archive Cooperative (WAC) that will integrate existing archives (repositories of Web data), making it feasible to access large volumes of data in a simplified fashion. The WAC will be a virtual service, providing search facilities and access mechanisms to existing resources. These resources will not just be Web pages, but all types of available Web information, such as query logs, tag annotations, blogs, profiles and Twitter updates. Furthermore, resources will also include the software tools for building and managing Web archives.The project will explore three goals for a resource discovery service: (1) the manual or automated discovery of entire existing Web related archives; (2) the selection among known archives of the ones that support a specific research question; and (3) the identification of individual resources from within the selected archives. Tools for characterizing discovered archives, especially for the case where the archive does not provide rich descriptive metadata, will also be developed. Characterization of an archive includes elements such as an estimate of the archive's coverage, particulars of the crawling parameters, like dates/frequencies, crawl duration, depth, per-site ceiling on the number of collected pages, content statistics, and link structure. Mechanisms for integrating diverse archives will be developed, and the mechanisms will be applied to site reconstruction (from various archives) and archive views (a logical fusion of resources from multiple sources). Since integration issues are so challenging, an experimental testbed will be set up with small but diverse resources. The testbed will contain several crawls of the same target sites, each obtained with different crawlers and using different parameters. The testbed will also contain related resources. Storage trading schemes will be developed, allowing members to trade local backup space for remote space. A Web archive replication tool will be developed based on existing notions for self-preserving objects. Alternatives for replica synchronization will be studied.Workshops to bring together key Web Science researchers will be organized to discuss available resources and impediments to sharing. These workshops will drive research and identify needed tools and protocols. With small groups of participants, challenge problems will be established, e.g., combining a set of Web archives. Reports of these results at future workshops can incentivize others to participate in the WAC. In addition, an Advisory Board of industrial, government, and academic experts has been set up to guide the project. A Summer Institute for Web Science graduate students will be held. At this Institute, students will learn to use the latest tools and will learn from each other's experiences in dealing with Web data. In addition, a one-day workshop will be developed, to be offered at Web Science conferences (WWW, SIGIR, etc.) to educate participants about WAC resources. An undergraduate Web Sciences track for computer science majors will be set up, taking advantage of WAC resources. The project will have impact in two ways. First, it will provide tools and services that facilitate access to Web resources. Any researcher, from a computer scientist studying efficient Web search, to a social scientist studying how human beliefs are changing today, to a historian studying how the early Web evolved, to a biologist understanding how disease spreads, will benefit from the work. Second, the project motivates students and young researchers to stay in academia. Currently top talent is flowing to industry because only they have comprehensive Web data, and it is so hard to do significant Web Science at universities. The WAC can provide an alternative, attracting more researchers and teachers to this important area.
Web Science是一门研究网络的新兴学科:人类活动是如何通过网络互动塑造的,网络如何使社会受益以及如何改善网络技术。 Web科学的中心是访问记录Web历史记录的数据,以及记录人类活动的数据(例如,提出的查询,标记的页面,Twitter更新)。目前,学术研究人员很难获得此类Web数据,因为很难找到,它在各种站点之间被分散,并且使用不一致的格式和策略记录。该项目将建立一个网络档案合作社(WAC),该合作社将集成现有的档案(网络数据存储库),使以简化的方式访问大量数据。 WAC将是虚拟服务,为现有资源提供搜索设施和访问机制。这些资源不仅是网页页面,还将是所有类型的可用Web信息,例如查询日志,标签注释,博客,个人资料和Twitter更新。此外,资源还将包括用于构建和管理Web档案的软件工具。该项目将探索资源发现服务的三个目标:(1)手册或自动发现整个现有与Web相关的档案的发现; (2)在支持特定研究问题的档案的已知档案中的选择; (3)从选定档案中识别单个资源。还将开发表征发现的档案的工具,特别是对于档案不提供丰富描述性元数据的情况。档案的表征包括诸如档案覆盖范围的估计,爬行参数的细节,例如日期/频率,爬行持续时间,深度,每个位置上限,收集的页面数量,内容统计信息和链接结构。 将开发用于整合各种档案的机制,并将机制应用于站点重建(来自各种档案)和档案视图(来自多个来源的资源的逻辑融合)。由于集成问题非常具有挑战性,因此将使用少量但多样化的资源建立实验性测试床。测试床将包含相同目标位点的几个爬网,每个爬网都用不同的爬网和使用不同的参数获得。测试床还将包含相关资源。 将制定存储交易计划,使成员可以将本地备份空间用于远程空间。 Web存档复制工具将基于现有的自我保护对象的概念开发。将研究复制同步的替代方案。将组织关键的网络科学研究人员的工作坊,以讨论可用的资源和共享障碍。这些研讨会将推动研究并确定所需的工具和协议。在一小部分参与者的情况下,将建立挑战问题,例如结合一组网络档案。这些结果在未来的研讨会上的报告可以激励他人参加WAC。此外,已经成立了工业,政府和学术专家顾问委员会来指导该项目。 将举行夏季网络科学研究生暑期研究所。在这个学院,学生将学习使用最新工具,并将从彼此处理网络数据方面的经验中学习。此外,将在网络科学会议(www,sigir等)上开发一日研讨会,以向参与者提供有关WAC资源的教育。通过利用WAC资源,将建立一个计算机科学专业的本科Web Sciences Track。该项目将以两种方式影响。 首先,它将提供有助于访问Web资源的工具和服务。从研究有效的网络搜索的计算机科学家到研究人类信念如何变化的社会科学家到研究早期网络如何发展的历史学家,再到了解疾病如何传播的生物学家将从工作中受益。 其次,该项目激励学生和年轻研究人员留在学术界。目前,顶级人才正在流向行业,因为只有他们拥有全面的网络数据,而且很难在大学中进行重要的网络科学。 WAC可以提供替代方案,吸引更多的研究人员和教师进入这一重要领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Hector Garcia-Molina其他文献
Maximizing remote work in flooding-based peer-to-peer systems
- DOI:
10.1016/j.comnet.2005.09.024 - 发表时间:
2006-07-14 - 期刊:
- 影响因子:
- 作者:
Qixiang Sun;Neil Daswani;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
Assigning textual names to sets of geographic coordinates
- DOI:
10.1016/j.compenvurbsys.2006.02.001 - 发表时间:
2006-07-01 - 期刊:
- 影响因子:
- 作者:
Mor Naaman;Yee Jiun Song;Andreas Paepcke;Hector Garcia-Molina - 通讯作者:
Hector Garcia-Molina
Hector Garcia-Molina的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Hector Garcia-Molina', 18)}}的其他基金
EAGER: InfoCalc, a Spreadsheet Interface to Web Archive Analysis
EAGER:InfoCalc,网络档案分析的电子表格界面
- 批准号:
0941727 - 财政年份:2009
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
SGER, year II: A Web Sociologist's Workbench
SGER,第二年:网络社会学家的工作台
- 批准号:
0735129 - 财政年份:2007
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
CRI: CRD Analysis Toolbenches for Web Archives
CRI:网络档案 CRD 分析工具台
- 批准号:
0707464 - 财政年份:2007
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
SGER: A Web Sociologist's Workbench
SGER:网络社会学家的工作台
- 批准号:
0624725 - 财政年份:2006
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
SEI(BIO): Computing Support for Acquisition, Collaborative Curation, and Dissemination in Biodiversity Research
SEI(BIO):生物多样性研究中采集、协作管理和传播的计算支持
- 批准号:
0430448 - 财政年份:2004
- 资助金额:
$ 235.05万 - 项目类别:
Continuing Grant
ITR: DataMotion - Dealing With Fast-Moving Data
ITR:DataMotion - 处理快速移动的数据
- 批准号:
0324431 - 财政年份:2003
- 资助金额:
$ 235.05万 - 项目类别:
Continuing Grant
Large-Scale Web Research Testbed
大规模网络研究测试平台
- 批准号:
0322975 - 财政年份:2003
- 资助金额:
$ 235.05万 - 项目类别:
Continuing Grant
ITR: From the Web to the Global InfoBase
ITR:从网络到全球信息库
- 批准号:
0085896 - 财政年份:2000
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
DLI-Phase 2: Stanford InterLib Technologies
DLI-第 2 阶段:斯坦福 InterLib Technologies
- 批准号:
9817799 - 财政年份:1999
- 资助金额:
$ 235.05万 - 项目类别:
Cooperative Agreement
相似国自然基金
基于大塑性变形晶粒细化的背压触变反挤压锡青铜偏析行为调控研究
- 批准号:52365047
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
面向大跨度结构的高强多孔骨料内养护UHPC徐变性能与模型研究
- 批准号:52308231
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于深度光学的大视场高分辨宽景深小型化显微成像
- 批准号:62301293
- 批准年份:2023
- 资助金额:10 万元
- 项目类别:青年科学基金项目
基于气体多通腔多模非线性效应的大能量可调谐光源的研究
- 批准号:12374318
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
二维氮化钼/磷化钼面内异质结构催化材料的设计合成及大电流密度析氢性能研究
- 批准号:22379116
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
2348169 - 财政年份:2023
- 资助金额:
$ 235.05万 - 项目类别:
Continuing Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236578 - 财政年份:2023
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236579 - 财政年份:2023
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks
III:小型:协作研究:大规模网络的经济高效采样和估计
- 批准号:
2209921 - 财政年份:2021
- 资助金额:
$ 235.05万 - 项目类别:
Standard Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027170 - 财政年份:2020
- 资助金额:
$ 235.05万 - 项目类别:
Cooperative Agreement