Owing to the critical role of the domain name system (DNS), its query log data are utilized for various network monitoring purposes. With the diversification of network services, these data have become increasingly complex, making mining useful information challenging. DNS query log data can be considered as the superposition of two types of communication patterns: groups of domains accessed simultaneously (e.g., ad servers and content delivery network (CDN) servers) and time-series access patterns based on user behavior characteristics (e.g., access trends during the night). However, previous studies have not focused on extracting both access patterns hidden in the data. This study proposes a method that extracts both patterns of accessed domains and temporal access patterns as user communication behaviors from DNS query log data and predicts future accesses based on these patterns. The proposed method first aggregates similar fully qualified domain names (FQDNs) associated with the same service. We then present temporal regularized nonnegative tensor factorization (TR-NTF) that extracts both access patterns from a third-order tensor expressing DNS query log data and enables prediction. We evaluate the proposed method using synthetic and actual data and demonstrate that it successfully extracts hidden communication patterns and achieves sufficient prediction accuracy.
由于域名系统(DNS)的关键作用,其查询日志数据被用于各种网络监视目的。随着网络服务的多元化,这些数据变得越来越复杂,使采矿有用的信息具有挑战性。 DNS查询日志数据可以视为两种类型的通信模式的叠加:同时访问的域组(例如,广告服务器和内容传递网络(CDN)服务器)和基于用户行为特征的时间序列访问模式(例如,例如,夜间访问趋势)。但是,以前的研究并未集中于提取数据中隐藏的两个访问模式。这项研究提出了一种方法,该方法将访问域的模式和时间访问模式提取为DNS查询日志数据的用户通信行为,并根据这些模式预测未来访问。提出的方法首先汇总了与同一服务相关的类似完全合格的域名(FQDN)。然后,我们提出时间正则化非负张量分解(TR-NTF),该分解从表达DNS查询日志数据的三阶张量中提取两个访问模式并启用预测。我们使用合成和实际数据评估了提出的方法,并证明它成功提取了隐藏的通信模式并实现了足够的预测准确性。