II-NEW: Collaborative Research: Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons)
II-新:协作研究:垃圾邮件处理、归档和监控社区设施 (SPAM Commons)
基本信息
- 批准号:0855180
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-09-01 至 2012-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In this project, the PIs propose to construct and develop a shared infrastructure to support the collection and maintenance of realistic, large scale spam data sets, referred as SPAM Commons.Spam is a problem in many important communications media such as email and web. A sub-problem of spam, phishing (a form of online pretexting), caused an estimated $3.2B in damages in 2007. The broad impact of effective spam filtering methods can be estimated in billions of dollars in several communications media such as email and web.Spam has also invaded other media, with concrete attack examples in social networks, blogosphere, Internet telephony (VoIP), instant messaging, and click fraud. Unfortunately, spam research has been hampered by the lack of published real world data sets due to concerns with privacy and company intellectual property. This project team develops a shared infrastructure to support the collection and maintenance of realistic, large scale spam data sets, called Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons). The main goals of SPAM Commons are: (1) to facilitate remedial research that will stem the wastes and losses caused by spam, and (2) enable revolutionary research that aim for stopping certain kinds of spam attacks altogether. SPAM Commons is divided into a Public Partition and a Protected Partition.The Public Partition is a direct analog of standard corpora for speech and image recognition research, consisting of a systematic and regular collection of both spam and legitimate data in the various communications media, starting from email and web spam, and expanding into other communications media as spam becomes a serious threat in each area and data become available. The Protected Partition consists of a combined data and processing facility that makes private data or near real-time spam data available for experimental evaluation of spam defense mechanisms in a protected testbed. Access to such protected data will enable new spam research on real-time evolving spam and real world data sets that is infeasible today. The intellectual challenges of the SPAM Commons project extend beyond the new research on various abovementioned spam areas enabled by the availability of data sets. The construction of both partitions of SPAM Commons includes significant intellectual challenges of their own. First, the isolation of Protected Partition addresses partially the concerns of privacy, which remains a general research problem. Second, useful spam and legitimate data sets require automated distinction of spam from legitimate documents with certainty, which remains an open research question in email, web, and other media. Third, the adversarial and mutual evolution of spam producers and defenders require continuous collection of fresh data for further study. Finally, the collection and streaming of near-real-time spam data represent research resources currently unavailable to spam researchers. Advances in these areas will spur the growth and evolution of SPAM Commons that will enable new research on the evolving and growing spam problem.The impact of SPAM Commons data sets on experimental spam research may be similar to the impact of large corpora in disciplines such as speech/image recognition and natural language processing, which achieved a level of scientific result reproducibility and comparativeness after the use of such corpora became standard requirements. The proposed data repository will be supported and used by 9 university partners (Clayton State, Emory, Georgia Tech, NC A&T, Northwestern, Texas A&M, UC Davis, U. Georgia, UNC Charlotte), and several industry partners (IBM, PureWire, Secure Computing).
在该项目中,PI 建议构建和开发一个共享基础设施,以支持收集和维护现实的大规模垃圾邮件数据集,称为 SPAM Commons。垃圾邮件是许多重要通信媒体(例如电子邮件和网络)中的一个问题。垃圾邮件的子问题网络钓鱼(在线借口的一种形式)在 2007 年造成了估计 3.2B 美元的损失。有效的垃圾邮件过滤方法对电子邮件和网络等多种通信媒体的广泛影响估计可达数十亿美元垃圾邮件还侵入了其他媒体,具体的攻击示例包括社交网络、博客圈、网络电话 (VoIP)、即时消息和点击欺诈。不幸的是,由于对隐私和公司知识产权的担忧,垃圾邮件研究因缺乏公开的现实世界数据集而受到阻碍。该项目团队开发了一个共享基础设施,以支持收集和维护真实的大规模垃圾邮件数据集,称为垃圾邮件处理、存档和监控社区设施 (SPAM Commons)。 SPAM Commons 的主要目标是:(1) 促进补救研究,以阻止垃圾邮件造成的浪费和损失;(2) 实现旨在完全阻止某些类型的垃圾邮件攻击的革命性研究。 SPAM Commons 分为公共分区和受保护分区。公共分区是语音和图像识别研究标准语料库的直接模拟,由各种通信媒体中的垃圾邮件和合法数据的系统和定期收集组成,随着垃圾邮件成为每个领域的严重威胁并且数据变得可用,从电子邮件和网络垃圾邮件扩展到其他通信媒体。受保护的分区由组合的数据和处理设施组成,使私有数据或近乎实时的垃圾邮件数据可用于受保护的测试台中的垃圾邮件防御机制的实验评估。访问此类受保护的数据将能够对实时演变的垃圾邮件和现实世界数据集进行新的垃圾邮件研究,而这在今天是不可行的。 SPAM Commons 项目的智力挑战超出了通过数据集的可用性对上述各种垃圾邮件领域进行的新研究。 SPAM Commons 的两个分区的构建都包含其自身的重大智力挑战。首先,受保护分区的隔离部分解决了隐私问题,这仍然是一个普遍的研究问题。其次,有用的垃圾邮件和合法数据集需要自动区分垃圾邮件和合法文档,这在电子邮件、网络和其他媒体中仍然是一个开放的研究问题。第三,垃圾邮件制造者和防御者的对抗和相互演化需要不断收集新数据进行进一步研究。最后,近乎实时的垃圾邮件数据的收集和流式传输代表了垃圾邮件研究人员目前无法获得的研究资源。这些领域的进步将刺激 SPAM Commons 的发展和演变,从而促进对不断发展和增长的垃圾邮件问题的新研究。SPAM Commons 数据集对实验性垃圾邮件研究的影响可能类似于大型语料库对以下学科的影响:语音/图像识别和自然语言处理,在使用此类语料库成为标准要求后,达到了一定程度的科学结果再现性和可比性。拟议的数据存储库将得到 9 家大学合作伙伴(克莱顿州立大学、埃默里大学、佐治亚理工学院、北卡罗来纳州 A&T、西北大学、德克萨斯 A&M、加州大学戴维斯分校、佐治亚大学、北卡罗来纳大学夏洛特分校)和多个行业合作伙伴(IBM、PureWire、安全计算)。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Calton Pu其他文献
Approaches for service deployment
服务部署方法
- DOI:
10.1002/marc.201500587 - 发表时间:
2024-09-13 - 期刊:
- 影响因子:3.2
- 作者:
Qinyi Wu;Calton Pu;Wenchang Yan;Gueyoung Jung;Georgia Tech;Munindar P Singh - 通讯作者:
Munindar P Singh
Collaborative Computing: Networking, Applications and Worksharing
协作计算:网络、应用程序和工作共享
- DOI:
10.1007/978-3-642-03354-4 - 发表时间:
2024-09-13 - 期刊:
- 影响因子:0
- 作者:
James Joshi;Elisa Bertino;Calton Pu;H. Ramampiaro - 通讯作者:
H. Ramampiaro
JTangCSB: A Cloud Service Bus for Cloud and Enterprise Application Integration
JTangCSB:用于云和企业应用集成的云服务总线
- DOI:
10.1109/mic.2014.62 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Xingjian Lu;Calton Pu;Zhaohui Wu;Hanwei Chen - 通讯作者:
Hanwei Chen
Buffer overflows: attacks and defenses for the vulnerability of the decade
缓冲区溢出:十年来漏洞的攻击与防御
- DOI:
10.1109/discex.2000.821514 - 发表时间:
2000-01-25 - 期刊:
- 影响因子:0
- 作者:
Crispin Cowan;Perry Wagle;Calton Pu;Steve Beattie;Jonathan Walpole - 通讯作者:
Jonathan Walpole
Buffer Overflows : Attacks and Defenses for the Vulnerability of the Decade *
缓冲区溢出:十年来漏洞的攻击和防御 *
- DOI:
10.1109/discex.2000.821514 - 发表时间:
2000-01-25 - 期刊:
- 影响因子:0
- 作者:
Crispin Cowan;Perry Wagle;Calton Pu;Steve Beattie;Jonathan Walpole - 通讯作者:
Jonathan Walpole
Calton Pu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Calton Pu', 18)}}的其他基金
HNDS-I: Collaborative Research: Developing a Data Platform for Analysis of Nonprofit Organizations
HNDS-I:协作研究:开发用于分析非营利组织的数据平台
- 批准号:
2024320 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
EAGER: Live Reality: Sustainable and Up-to-Date Information Quality in Live Social Media through Continuous Evidence-Based Knowledge Acquisition
EAGER:实时现实:通过持续的循证知识获取,实时社交媒体中可持续且最新的信息质量
- 批准号:
2039653 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
RAPID: Tracking and Evaluation of the Coronavirus (COVID-19) Epidemic Propagation by Finding and Maintaining Live Knowledge in Social Media
RAPID:通过在社交媒体中查找和维护实时知识来跟踪和评估冠状病毒(COVID-19)的流行传播
- 批准号:
2026945 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
1st US-Japan Workshop Enabling Global Collaborations in Big Data Research; June, 2017, Atlanta, GA
第一届美日研讨会促进大数据研究的全球合作;
- 批准号:
1741034 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
RCN: SAVI: Adaptive Management and Use of Resilient Infrastructures in Smart Cities: Support for Global Collaborative Research on Real-Time Analytics of Heterogeneous Big Data
RCN:SAVI:智慧城市弹性基础设施的适应性管理和使用:支持异构大数据实时分析的全球协作研究
- 批准号:
1550379 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
EAGER: An Exploratory Study of Multi-Hazard Management through Multi-Source Integration of Physical and Social Sensors
EAGER:通过物理和社会传感器的多源集成进行多危害管理的探索性研究
- 批准号:
1402266 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CSR: Small: Lightning in Clouds: Detection and Characterization of Very Short Bottlenecks
CSR:小:云中闪电:极短瓶颈的检测和表征
- 批准号:
1421561 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SAVI: EAGER: for Global Research on Applying Information Technology to Support Effective Disaster Management (GRAIT-DM)
SAVI:EAGER:应用信息技术支持有效灾害管理的全球研究 (GRAIT-DM)
- 批准号:
1250260 - 财政年份:2012
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
RAPID: Automating Emergency Data and Metadata Management to Support Effective Short Term and Long Term Disaster Recovery Efforts
RAPID:自动化应急数据和元数据管理,支持有效的短期和长期灾难恢复工作
- 批准号:
1138666 - 财政年份:2011
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CSR:Small: Multi-Bottlenecks: What They Are and How to Find Them
CSR:小:多瓶颈:它们是什么以及如何找到它们
- 批准号:
1116451 - 财政年份:2011
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似国自然基金
溶酶体膜蛋白LAMP2新突变Y228*促进心肌细胞糖代谢异常导致Danon病心肌病的机制研究
- 批准号:82360048
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
基于二元重编程的归一化肿瘤疫苗在局部晚期三阴乳腺癌新辅助治疗中的作用与机制研究
- 批准号:32371451
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
甜菊糖苷新位点糖基化的机制研究及其在低热量甜味剂结构创新中的应用
- 批准号:32372277
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
新骨架紫杉烷二萜baccataxane的化学合成、衍生化和降糖活性研究
- 批准号:82373758
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
通过机器学习和多模式验证聚焦新靶点ENHO/Adropin在系统性硬化症中的作用和机制研究
- 批准号:82371818
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Comprehensive mapping of multimodal chromatin state in single cells
单细胞多模式染色质状态的综合绘图
- 批准号:
10323270 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
The Memorial Sloan Kettering Cancer Center SPORE in Leukemia
纪念斯隆凯特琳癌症中心 SPORE 白血病
- 批准号:
10474261 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别: