Abstract Consider the setting where multiple parties each hold a multiset of users and the task is to estimate the reach (i.e., the number of distinct users appearing across all parties) and the frequency histogram (i.e., fraction of users appearing a given number of times across all parties). In this work we introduce a new sketch for this task, based on an exponentially distributed counting Bloom filter. We combine this sketch with a communication-efficient multi-party protocol to solve the task in the multi-worker setting. Our protocol exhibits both differential privacy and security guarantees in the honest-but-curious model and in the presence of large subsets of colluding workers; furthermore, its reach and frequency histogram estimates have a provably small error. Finally, we show the practicality of the protocol by evaluating it on internet-scale audiences.
摘要 考虑这样一种情形:多个参与方各自持有一组用户(可重复),任务是估计覆盖范围(即所有参与方中不同用户的数量)以及频率直方图(即所有参与方中出现特定次数的用户所占比例)。在这项工作中,我们基于指数分布的计数布隆过滤器为该任务引入了一种新的草图。我们将这种草图与一种通信高效的多方协议相结合,以在多工作者环境下解决该任务。我们的协议在诚实但好奇的模型以及存在大量合谋工作者子集的情况下,既具有差分隐私性又有安全性保证;此外,其覆盖范围和频率直方图估计具有可证明的小误差。最后,我们通过在互联网规模的受众上对该协议进行评估,展示了该协议的实用性。