Research integrity is crucial to ensuring the trustworthiness of scientific discoveries. This work is aimed at detecting misbehaviors targeting scientific workflows, which are computing paradigms widely used to facilitate scientific collaborations across multiple geographically distributed research sites. We develop a new system called RAMP (Real-Time Aggregated Matrix Profile) for real-time anomaly detection in scientific workflow systems. RAMP builds upon an existing time series data analysis technique called Matrix Profile to detect anomalous distances among subsequences of event streams collected from scientific workflows in an online manner. Using an adaptive uncertainty function, the anomaly detection model is dynamically adjusted to prevent high false alarm rates. RAMP can incorporate user feedback on reported anomalies and modify model parameters to improve anomaly detection accuracy. Our experimental results from applying RAMP to the logs generated by DATAVIEW, a scientific workflow platform, show that RAMP is able to identify a varied range of anomalies with high accuracy for both interleaved and non-interleaved workflow executions in real time.
研究诚信对于确保科学发现的可信度至关重要。这项工作旨在检测针对科学工作流的不当行为,科学工作流是一种计算范式,广泛用于促进多个地理上分散的研究站点之间的科学协作。我们开发了一个名为RAMP(实时聚合矩阵轮廓)的新系统,用于科学工作流系统中的实时异常检测。RAMP基于一种现有的时间序列数据分析技术——矩阵轮廓,以在线方式检测从科学工作流收集的事件流子序列之间的异常距离。通过使用自适应不确定性函数,动态调整异常检测模型,以防止出现高误报率。RAMP可以纳入用户对已报告异常的反馈,并修改模型参数以提高异常检测的准确性。我们将RAMP应用于科学工作流平台DATAVIEW生成的日志,实验结果表明,RAMP能够实时高精度地识别交错和非交错工作流执行中的各种异常情况。