SaTC: CORE: Medium: Large-Scale Data Driven Anomaly Detection and Diagnosis from System Logs

SaTC：核心：中：大规模数据驱动的系统日志异常检测和诊断

基本信息

批准号：
1801446
负责人：
Robert Ricci
金额：
$ 110万
依托单位：
University of Utah
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-01 至 2023-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1801446&HistoricalAwards=false
关键词：
SaTC CORE Medium Large Scale

项目摘要

Detecting unusual and anomalous behavior in computer systems is a critical part of ensuring they are secure and trustworthy. System logs, which record actions taken by programs, are a promising source of data for such anomaly detection. However, existing practices and tools for doing log analysis require deep expertise, as well as heavy human involvement in both defining and interpreting possible anomalies, which limits their scalability and effectiveness. This project's goal is to improve the state of the art around log-based anomaly detection by developing a framework called DeepLog through (a) advancing natural language processing techniques to extract structured information from a wide variety of log files to support analysis across different data sources and across time, (b) developing new methods to model legitimate workflows and log event sequences over time, (c) adapting machine learning methods to identify deviations from those workflows that represent potential anomalies, and (d) creating tools for system administrators to help them diagnose possible security issues more effectively and efficiently. The work will be integrated into a freely available software package to benefit both other researchers and practicing system administrators and used to support both classroom and research-based educational activities at the investigators' institutions.Toward log parsing, the team will adapt named entity recognition methods to parse unstructured logs as well as structured logs where the structure is not pre-defined by, e.g., regular expressions, into structured key-value pairs of log event types and parameters. This data can be seen as a multi-dimensional feature space whose contents are constrained by the execution of the underlying programs and thus reflects a hidden structure that defines the set of valid, non-anomalous execution sequences. To help articulate this hidden structure, the team will develop long-short-term-memory (LSTM)-based neural network models that use both the key and value elements to extract semantically meaningful subsequences of program behavior from data extracted from system runs known to be normal. Once these models are developed using known-good training data, they can be applied to anomaly detection by flagging for consideration new log entries that are unexpected given the current state of the system, logs, and model; they can also be used to infer the underlying workflows and hidden structures described earlier. These models will be improved through that online learning methods, administrators' feedback about the seriousness of reported anomalies, and generative adversarial training models which create execution sequences that, though anomalous, hew closely to the hidden structures embedded in the logs and the LSTM-based models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在计算机系统中检测异常和异常行为是确保它们安全和值得信赖的关键部分。记录程序采取的记录动作的系统日志是此类异常检测的有前途的数据来源。但是，现有的实践和进行日志分析的实践和工具需要深厚的专业知识，并重大参与定义和解释可能的异常，这限制了它们的可扩展性和有效性。该项目的目标是通过开发一个称为DeepLog的框架，通过（a）推进自然语言处理技术来从各种各样的日志文件中提取结构化信息，以支持跨时间和跨时间的分析，（b）开发新的方法来支持合法的工作序列，（b）从这些方法中识别这些方法，以识别这些方法的方法（c），（b）（d）为系统管理员创建工具，以帮助他们更有效，有效地诊断可能的安全问题。这项工作将被整合到一个免费的软件包中，以使其他研究人员和实践系统管理员受益，并用于支持研究人员机构中的教室和基于研究的教育活动。在“登录日志解析”中，该团队将适应命名的实体识别方法，以解析非结构性日志，并在其中构成的结构构成的构造量，E.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G. 参数。该数据可以看作是一个多维特征空间，其内容受到基础程序的执行来限制，因此反映了一个隐藏的结构，该结构定义了有效的，非反对的执行序列。为了帮助阐明这种隐藏的结构，团队将开发基于长期记忆（LSTM）的神经网络模型，这些神经网络模型同时使用密钥和价值元素从已知的系统运行中提取的数据中提取程序行为的语义有意义的子序列，已知是正常的。一旦使用已知良好训练数据开发这些模型，就可以通过标记考虑系统，日志和模型的当前状态来将它们应用于异常检测。它们还可以用于推断前面描述的基础工作流和隐藏结构。 These models will be improved through that online learning methods, administrators' feedback about the seriousness of reported anomalies, and generative adversarial training models which create execution sequences that, though anomalous, hew closely to the hidden structures embedded in the logs and the LSTM-based models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review 标准。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Year of Automated Anomaly Detection in a Datacenter

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Rufaida Ahmed;J. Porter;Abubaker Abdelmutalab;R. Ricci
通讯作者：
Rufaida Ahmed;J. Porter;Abubaker Abdelmutalab;R. Ricci

Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning

正确的理由：为可信的表格推理提取证据

DOI：
10.18653/v1/2022.acl-long.231
发表时间：
2022
期刊：
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
影响因子：
0
作者：
Gupta, Vivek;Zhang, Shuo;Vempala, Alakananda;He, Yujie;Choji, Temma;Srikumar, Vivek
通讯作者：
Srikumar, Vivek

Learning Constraints for Structured Prediction Using Rectifier Networks

DOI：
10.18653/v1/2020.acl-main.438
发表时间：
2020-05
期刊：
ArXiv
影响因子：
0
作者：
Xingyuan Pan;Maitrey Mehta;Vivek Srikumar
通讯作者：
Xingyuan Pan;Maitrey Mehta;Vivek Srikumar

Augmenting Neural Networks with First-order Logic

DOI：
10.18653/v1/p19-1028
发表时间：
2019-06
期刊：
ArXiv
影响因子：
0
作者：
Tao Li;Vivek Srikumar
通讯作者：
Tao Li;Vivek Srikumar

Structured Tuning for Semantic Role Labeling

DOI：
10.18653/v1/2020.acl-main.744
发表时间：
2020-05
期刊：
ArXiv
影响因子：
0
作者：
Tao Li;Parth Anand Jawale;M. Palmer;Vivek Srikumar
通讯作者：
Tao Li;Parth Anand Jawale;M. Palmer;Vivek Srikumar

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Robert Ricci其他文献

Avoiding the Ordering Trap in Systems Performance Measurement

避免系统性能测量中的排序陷阱

DOI：
发表时间：
2023
期刊：
Proceedings of the USENIX Annual Technical Conference (ATC
影响因子：
0
作者：
Dmitry Duplyakin;Nikhil Ramesh;Carina Imburgia;Hamza Fathallah Al Sheikh;Semil Jain;Prikshit Tekta;Aleksander Maricq;Gary Wong;Robert Ricci
通讯作者：
Robert Ricci

Most Cited Computer Networks Articles

被引用最多的计算机网络文章

DOI：
发表时间：
2017
期刊：
影响因子：
0
作者：
Luigi Atzori;Antonio Iera;Giacomo Morabito;Michele Nitti;Wenye Wang;Zhuo Lu;M. Berman;Jeffrey S. Chase;Lawrence Landweber;Akihiro Nakao;Max Ott;Dipankar Raychaudhuri;Robert Ricci;I. Seskar;S. Sicari;A. Rizzardi;L. Grieco;A. Coen
通讯作者：
A. Coen