Overlay networks have emerged as a powerful and flexible platform for developing new disruptive network applications. The attractive characteristics of overlay networks such as planetary-scale distributions, user-level flexibility (e.g., overlay routing) and manageability bring to overlay fault diagnosis new challenges, which include inaccessible underlying network information, incomplete and inaccurate network status observations; dynamic symptom-fault causality relationships, and multi-layer complexity. To address these challenges, we propose a distributed user-level Belief Revision based overlay fault diagnosis technique called EUDiag. EUDiag can passively use observed overlay symptoms as reported by overlay monitoring agents to correlate and diagnose faults, and select the least-costly appropriate probing actions whenever necessary to enhance the passive fault reasoning results. EUDiag adapts to the changes in highly dynamic overlay networks by incrementally revising user beliefs based on new observed overlay symptoms. EUDiag can diagnose faults without relying on underlying network fault probabilistic quantifications (e.g. prior fault probability).Simulations and experimental studies show that EUDiag can efficiently (e.g. low latency) and accurately localize root causes of overlay faults/problems, even when the observed symptoms are incomplete.
覆盖网络已成为一个强大而灵活的平台,用于开发新的破坏性网络应用程序。覆盖网络的吸引力特征,例如行星尺度分布,用户级的灵活性(例如,覆盖路由)和管理性为覆盖故障诊断带来了新的挑战,包括无法访问的基础网络信息,不完整和不准确的网络状态观测值;动态症状过失因果关系和多层复杂性。为了应对这些挑战,我们提出了一个分布式的用户级信念修订版的覆盖故障诊断技术,称为Eudiag。 Eudiag可以被动地使用观察到的覆盖症状,如覆盖剂量监测剂所报告的,以相关和诊断故障,并在必要时选择最不可能的探测动作以增强被动故障推理结果。 Eudiag通过基于新观察到的覆盖症状来逐步修改用户信念,适应高度动态覆盖网络的变化。 Eudiag可以诊断故障而不依赖潜在的网络故障概率量化(例如先前的故障概率)。模拟和实验研究表明,Eudiag可以有效地(例如低潜伏期),并且即使在观察到的症状是叠加的症状也可以定位的根本原因不完整。