CIF: Medium: Collaborative Research: Information-theoretic Guarantees on Privacy in the Age of Learning

CIF：媒介：协作研究：学习时代隐私的信息理论保证

基本信息

批准号：
1900750
负责人：
Flavio Calmon
金额：
$ 38.3万
依托单位：
Harvard University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-06-01 至 2024-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1900750&HistoricalAwards=false
关键词：
CIF Medium Collaborative Research Information

项目摘要

Armed with powerful advances in machine learning, the ability of an interested party to gather personal information from an individual's expanding digital footprint is outstripping anyone's capability to keep their information private. While this aggregated data can have tremendous benefit for consumers and data scientists via technologies built on machine learning and artificial intelligence, this benefit must be tempered with meaningful assurances of privacy for the very people who provided the data in the first place. This project adopts a rigorous information-theoretic approach to give meaningful privacy guarantees while still providing statistical utility. By combining theoretical and data-driven research, this project can inform public policy as well as best-practices for industry. The overall goal is to provide any data scientist with a set of tools to guarantee meaningful privacy in practice. To do so, this project explores meaningful measures of privacy leakage in the learning context, characterizes the fundamental tradeoffs between privacy and utility, develops techniques to ensure privacy in realistic settings, and tests these algorithms on publicly available datasets. The project is also committed to broadening participation in computing via two outreach efforts: (i) interactive demonstrations of privacy issues that stem from using social media to middle and high school students via ASU's annual STEM event, Open Door, and (ii) teaching modules on machine learning (ML) and artificial intelligence (AI), and short courses ("data jams") at ASU via the Young Engineers Shape the World (YESW) summer program and at Harvard; these modules, targeted at female, financially disadvantaged, and Latino and Hispanic students, aim to make a meaningful contribution to increasing a diverse STEM workforce by providing students hands-on experience on basic concepts of coding, manipulating datasets, and producing simple visualizations collectively. Outreach efforts will be evaluated using well understood metrics for assessment of student interest, engagement, and knowledge via ASU?s College Research and Evaluation Services Team (CREST).This project aims to derive a foundational, statistical theory of privacy that builds upon and contributes to modern theoretical advances in information theory and machine learning. The statistical nature of inference (both for legitimate and illegitimate ends) requires a statistical approach to measuring and ensuring privacy and utility. A significant novel element derived from this view is the maximal alpha leakage, a new, tunable measure for information leakage which quantifies the ability of an adversary to learn any function of private data via a parametric class of loss functions. This tunable measure is derived from a rich information-theoretic framework based on Renyi divergence, thereby uniting disparate existing measures under a single framework. Moreover, its operational significance and computational flexibility allow for natural application in machine learning. In the context of these measures, this project studies privacy-utility tradeoffs both theoretically and in a data-driven manner in two distinct settings: (i) releasing datasets in a similar form as the original, with privacy and strict utility guarantees for arbitrary statistical analysis, and (ii) releasing privacy-guaranteed data representations for specific learning tasks. Broader dissemination of the work will go beyond conferences to organizing a privacy workshop in the latter half of the project to enable inter-disciplinary interactions and application.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

凭借机器学习的强大进步，相关方从个人不断扩大的数字足迹中收集个人信息的能力超过了任何人保持其信息私密性的能力。虽然这些聚合数据可以通过基于机器学习和人工智能的技术为消费者和数据科学家带来巨大的好处，但这种好处必须通过对最初提供数据的人的有意义的隐私保证来调节。该项目采用严格的信息论方法来提供有意义的隐私保证，同时仍然提供统计实用性。通过结合理论和数据驱动的研究，该项目可以为公共政策以及行业最佳实践提供信息。总体目标是为任何数据科学家提供一套工具，以保证实践中有意义的隐私。为此，该项目探索了学习环境中隐私泄露的有意义的措施，描述了隐私和效用之间的基本权衡，开发了确保现实环境中隐私的技术，并在公开可用的数据集上测试了这些算法。该项目还致力于通过两项外展工作扩大对计算的参与：(i) 通过亚利桑那州立大学的年度 STEM 活动 Open Door，向中学生和高中生展示因使用社交媒体而产生的隐私问题的互动演示，以及 (ii) 教学模块机器学习 (ML) 和人工智能 (AI)，以及亚利桑那州立大学通过青年工程师塑造世界 (YESW) 暑期项目和哈佛大学的短期课程（“数据堵塞”）；这些模块针对女性、经济困难、拉丁裔和西班牙裔学生，旨在通过为学生提供编码、操作数据集和生成简单可视化基本概念的实践经验，为增加多元化的 STEM 劳动力做出有意义的贡献。外展工作将通过亚利桑那州立大学的大学研究和评估服务团队 (CREST) 使用众所周知的指标来评估学生的兴趣、参与度和知识。该项目旨在得出一种基础的隐私统计理论，该理论建立在并有助于信息论和机器学习的现代理论进展。推理的统计性质（无论是为了合法还是非法目的）需要一种统计方法来衡量和确保隐私和实用性。从这个观点衍生出的一个重要的新颖元素是最大阿尔法泄漏，这是一种新的、可调节的信息泄漏度量，它量化了对手通过参数类损失函数学习私有数据的任何函数的能力。这种可调措施源自基于 Renyi 散度的丰富信息理论框架，从而将不同的现有措施统一在一个框架下。此外，其操作意义和计算灵活性允许在机器学习中自然应用。在这些措施的背景下，该项目从理论上和以数据驱动的方式在两种不同的环境中研究隐私与效用的权衡：（i）以与原始形式类似的形式发布数据集，并为任意统计提供隐私和严格的效用保证分析，以及 (ii) 发布特定学习任务的隐私保证数据表示。这项工作的更广泛传播将超越会议，在项目后半段组织隐私研讨会，以实现跨学科互动和应用。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值进行评估，被认为值得支持以及更广泛的影响审查标准。