Streaming data, crucial for applications like crowdsourcing analytics, behavior studies, and real-time monitoring, faces significant privacy risks due to the large and diverse data linked to individuals. In particular, recent efforts to release data streams, using the rigorous privacy notion of differential privacy (DP), have encountered issues with unbounded privacy leakage. This challenge limits their applicability to only a finite number of time slots (''finite data stream'') or relaxation to protecting the events (''event or $w$-event DP'') rather than all the records of users. A persistent challenge is managing the sensitivity of outputs to inputs in situations where users contribute many activities and data distributions evolve over time. In this paper, we present a novel technique for Differentially Private data streaming over Infinite disclosure (DPI) that effectively bounds the total privacy leakage of each user in infinite data streams while enabling accurate data collection and analysis. Furthermore, we also maximize the accuracy of DPI via a novel boosting mechanism. Finally, extensive experiments across various streaming applications and real datasets (e.g., COVID-19, Network Traffic, and USDA Production), show that DPI maintains high utility for infinite data streams in diverse settings. Code for DPI is available at https://github.com/ShuyaFeng/DPI.
流数据对于众包分析、行为研究和实时监测等应用至关重要,但由于与个人相关的数据量大且多样,面临着重大的隐私风险。特别是,近期利用差分隐私(DP)这一严格的隐私概念发布数据流的努力,遇到了无界隐私泄露的问题。这一挑战将其适用性限制在仅有限数量的时间段(“有限数据流”),或者放宽到保护事件(“事件或\(w\)-事件DP”),而非用户的所有记录。一个持续存在的挑战是,在用户贡献许多活动且数据分布随时间演变的情况下,管理输出对输入的敏感度。在本文中,我们提出了一种用于无限披露上的差分隐私数据流(DPI)的新技术,该技术有效地限制了无限数据流中每个用户的总隐私泄露,同时能够进行准确的数据收集和分析。此外,我们还通过一种新颖的提升机制最大化了DPI的准确性。最后,在各种流应用和真实数据集(例如,新冠疫情、网络流量和美国农业部产量)上进行的大量实验表明,DPI在不同环境下的无限数据流中保持了较高的实用性。DPI的代码可在https://github.com/ShuyaFeng/DPI获取。