III: Small: Fast Subset Scan for Anomalous Pattern Detection
III:小:用于异常模式检测的快速子集扫描
基本信息
- 批准号:0916345
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-08-01 至 2013-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This work will develop new methods for fast and scalable detection of anomalous patterns (subsets of the data that are interesting or unexpected) in massive, multivariate datasets. There will be a focus on real-world applications such as an emerging disease outbreak or a pattern of smuggling activity with complex, subtle, and probabilistic patterns that are difficult to spot with existing techniques. The research is based on two key insights. First, the pattern detection problem can be framed as a search over all subsets of the data, in which can be defined a measure of the "anomalousness" of a subset and then maximize this measure over all potentially relevant subsets. Second, it has been discovered that, for many spatial detection methods (including Kulldor's spatial scan statistic and many recently proposed variants), one can perform an exact search which efficiently maximizes the measure of anomalousness over all subsets of the data. The research team will explore this new combinatorial optimization method, investigate how it can be extended to constrained subset scans and to more general multivariate pattern detection problems, and examine how it can be incorporated into a subset scan framework, enabling the creation a variety of fast, scalable, and useful methods for anomalous pattern detection. Intellectual MeritThe research team will develop, implement, and evaluate a general probabilistic framework for efficient detection of anomalous patterns in both spatial and non-spatial datasets. The proposed work will address these challenging and important research questions:1)How can one define a useful measure of the "anomalousness" of a subset of the data, and efficiently optimize this measure over all subsets to find the most anomalous patterns?2) What are the necessary and sufficient conditions for a set function F (S ) to satisfy the "linear- time subset scanning" (LTSS) property, enabling exact unconstrained optimization of F (S ) over all 2 N subsets of N records while only requiring O(N ) subsets to be evaluated?3) How can one extend fast subset scanning methods to general multivariate datasets, and incorporate search constraints such as proximity, connectivity, and self-similarity?4) How can one deal with uncertainty about the effects of an anomalous pattern by searching over subsets of "input" and "output" attributes as well as subsets of records? Broader ImpactDevelopment and testing will be prioritized in three areas: 1) early detection of disease outbreaks, 2) detecting illicit container shipments, and 3) identifying anomalous trends in social networks. These applications will allow the demonstration the value of these methods across a wide spectrum of domains. Through existing collaborations, the algorithms will be incorporated into deployed systems for health and crime surveillance that contribute directly to the public good. The Principle Investigator's lab has over 5 years of history offering free machine learning software, and the software implementations of all algorithms developed through this grant will be made publicly available. The bulk of the funding will go to training graduate students who will become the next generation of researchers to explore new methods for anomalous pattern detection. Key Words: anomalous patterns; pattern detection; fast subset scan; scan statistics; optimization.
这项工作将开发新方法,用于快速且可扩展地检测大规模多元数据集中的异常模式(有趣或意外的数据子集)。重点将放在现实世界的应用上,例如新出现的疾病爆发或具有复杂、微妙和概率模式的走私活动模式,这些模式很难用现有技术发现。该研究基于两个关键见解。首先,模式检测问题可以被构建为对数据的所有子集的搜索,其中可以定义子集的“异常性”的度量,然后在所有潜在相关的子集上最大化该度量。其次,人们发现,对于许多空间检测方法(包括 Kulldor 的空间扫描统计和许多最近提出的变体),人们可以执行精确搜索,从而有效地最大化所有数据子集的异常度量。研究团队将探索这种新的组合优化方法,研究如何将其扩展到约束子集扫描和更一般的多元模式检测问题,并研究如何将其合并到子集扫描框架中,从而能够快速创建各种、可扩展且有用的异常模式检测方法。智力优势研究团队将开发、实施和评估一个通用概率框架,以有效检测空间和非空间数据集中的异常模式。拟议的工作将解决这些具有挑战性和重要的研究问题:1)如何定义数据子集“异常性”的有用度量,并在所有子集上有效优化该度量以找到最异常的模式?2)集合函数 F (S ) 满足“线性时间子集扫描”(LTSS) 性质的必要和充分条件是什么,从而能够在 N 条记录的所有 2 N 个子集上实现 F (S ) 的精确无约束优化,同时只需要要评估的 O(N ) 个子集?3) 如何将快速子集扫描方法扩展到一般多元数据集,并纳入邻近性、连通性和自相似性等搜索约束?4) 如何处理影响的不确定性通过搜索“输入”和“输出”属性的子集以及记录的子集来发现异常模式?更广泛的影响开发和测试将优先考虑三个领域:1)疾病爆发的早期检测,2)检测非法集装箱运输,3)识别社交网络中的异常趋势。这些应用将允许在广泛的领域展示这些方法的价值。通过现有的合作,这些算法将被纳入已部署的健康和犯罪监测系统中,直接为公共利益做出贡献。原理研究员的实验室拥有超过 5 年提供免费机器学习软件的历史,通过这笔资助开发的所有算法的软件实现都将公开。大部分资金将用于培训研究生,他们将成为下一代研究人员,探索异常模式检测的新方法。关键词:异常模式;模式检测;快速子集扫描;扫描统计;优化。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daniel Neill其他文献
Identifying Significant Predictive Bias in Classifiers June 2017
识别分类器中的显着预测偏差 2017 年 6 月
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Zhe Zhang;Daniel Neill - 通讯作者:
Daniel Neill
Anticorps dirigé contre il-17br
Anticorps dirigé against il-17br
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
A. N. McKenzie;Daniel Neill - 通讯作者:
Daniel Neill
Daniel Neill的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daniel Neill', 18)}}的其他基金
The impact of thermally-regulated cell wall modifications on Streptococcus pneumoniae pathogenesis
热调节细胞壁修饰对肺炎链球菌发病机制的影响
- 批准号:
MR/X009130/1 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Research Grant
FAI: End-To-End Fairness for Algorithm-in-the-Loop Decision Making in the Public Sector
FAI:公共部门算法在环决策的端到端公平性
- 批准号:
2040898 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CAREER: Machine Learning and Event Detection for the Public Good
职业:公益机器学习和事件检测
- 批准号:
0953330 - 财政年份:2010
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
相似国自然基金
单细胞分辨率下的石杉碱甲介导小胶质细胞极化表型抗缺血性脑卒中的机制研究
- 批准号:82304883
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
小分子无半胱氨酸蛋白调控生防真菌杀虫活性的作用与机理
- 批准号:32372613
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
诊疗一体化PS-Hc@MB协同训练介导脑小血管病康复的作用及机制研究
- 批准号:82372561
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
非小细胞肺癌MECOM/HBB通路介导血红素代谢异常并抑制肿瘤起始细胞铁死亡的机制研究
- 批准号:82373082
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FATP2/HILPDA/SLC7A11轴介导肿瘤相关中性粒细胞脂代谢重编程影响非小细胞肺癌放疗免疫的作用和机制研究
- 批准号:82373304
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
EAGER: III: Small: Green Granular Neural Networks with Fast FPGA-based Incremental Transfer Learning
EAGER:III:小型:具有基于 FPGA 的快速增量迁移学习的绿色粒度神经网络
- 批准号:
2234227 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Small Intestine Targeted Fast Acting Oral Insulin Formulation
小肠靶向速效口服胰岛素制剂
- 批准号:
10385154 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
III: Small: Task-aware Materialization for Fast Data Analytics
III:小型:用于快速数据分析的任务感知物化
- 批准号:
1910014 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Fast and Efficient Algorithms for Matrix Decompositions and Applications to Human Genetics
III:小:快速高效的矩阵分解算法及其在人类遗传学中的应用
- 批准号:
1661756 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
III: Small: Fast and Efficient Algorithms for Matrix Decompositions and Applications to Human Genetics
III:小:快速高效的矩阵分解算法及其在人类遗传学中的应用
- 批准号:
1319280 - 财政年份:2013
- 资助金额:
$ 50万 - 项目类别:
Standard Grant