Collaborative Research: SaTC: CORE: Small: Differentially Private Data Synthesis: Practical Algorithms and Statistical Foundations
协作研究:SaTC:核心:小型:差分隐私数据合成:实用算法和统计基础
基本信息
- 批准号:2247795
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-15 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Data collected by organizations and agencies are a key resource in today’s information age and fuel a significant part of today's economy. However, the disclosure of those data poses serious threats to individual privacy. One important approach to using data while protecting privacy is differential private data synthesis (DPDS). That is, given as input a private dataset, one uses a differentially private algorithm to generate synthetic datasets that are “similar” to the input dataset. While DPDS has received much attention in recent years, our understanding on this topic remains limited. This project takes a multi-disciplinary approach to advance our scientific understanding as well as improve practice techniques for DPDS. More specifically, this project’s novelties are as follows. First, it systematically explores the design space in marginal-based DPDS algorithms that have been proven to be effective in NIST competitions on DPDS, while also taking insights from data synthesis techniques developed in similar fields (often not satisfying DP). Second, it develops statistical theories that both are motivated by the empirical performances of DPDS algorithms, and guide the empirical research of these algorithms. The project’s broader significance and importance are as follows. We are in the information economy. Data of all kinds, such as online interaction, medical sensor data, genomic data, and location data are being collected. Practical techniques that enable use of these data while protecting individual privacy are crucially needed and will greatly enhance the value of such data. Users will gain from increased control of their private information, and society as a whole will benefit from deriving maximal benefit from aggregated data. PIs plan to jointly develop and teach a graduate-level course on synthetic data based on the existing research in this area as well as research results from this project, and involve undergraduate students in research. This project has two thrusts. The first thrust aims to develop new marginal-based DPDS algorithms that improve upon the state-of-art in empirical evaluations. The tasks include: perform an in-depth study of the “marginal-to-dataset” problem (how to synthesize a dataset when given a set of marginals); develop and evaluate new approaches for handling numerical attributes; and develop adaptive and automated techniques for selecting marginals so that dataset synthesized with them captures as much useful information from the input dataset as possible. The second thrust complements the empirical research in the first thrust, and aims to develop statistical theory for high dimensional marginal-based data synthesis algorithms, and also a general learning theory framework to evaluate the utility of synthetic data in downstream tasks. The two thrusts are highly complementary and support each other. The experimental study in Thrust 1 will provide insights and directions for theoretical studies in Thrust 2, which will help explain the experimental findings as well as guide additional experimental studies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
组织和机构收集的数据是当今信息时代的关键资源,是当今经济的重要组成部分,然而,这些数据的披露对个人隐私构成了严重威胁,在保护隐私的同时使用数据的一种重要方法是差异化私人数据。也就是说,给定一个私有数据集作为输入,使用一种差分私有算法来生成与输入数据集“相似”的合成数据集,尽管 DPDS 近年来受到了广泛关注,但我们对这一主题的理解仍然存在。该项目仍然有限。更具体地说,该项目的新颖之处如下:首先,它系统地探索了已被证明有效的基于边际的 DPDS 算法的设计空间。 NIST 在 DPDS 上的竞赛,同时也从类似领域开发的数据合成技术中汲取见解(通常不满足 DP) 其次,它发展了统计理论,这些理论都是由 DPDS 算法的经验性能驱动的,并指导这些算法的实证研究。 。该项目的更广泛意义和重要性如下:我们正在收集各种数据,例如在线交互、医疗传感器数据、基因组数据和位置数据,同时可以使用这些数据。保护个人隐私是迫切需要的,这将极大地提高这些数据的价值,用户将从加强对其私人信息的控制中获益,整个社会将从 PI 共同开发和教授数据的计划中获得最大利益。研究生水平课程该项目有两个重点,旨在开发新的基于边际的 DPDS 算法,以改进状态。实证评估中的艺术任务包括:深入研究“边际到数据集”问题(在给定一组边际时如何合成数据集);并开发自适应和自动化的选择技术第二个主旨补充了第一个主旨中的实证研究,旨在为高维基于边际的数据合成算法开发统计理论。评估合成数据在下游任务中的效用的通用学习理论框架这两个主旨是高度互补的,并且相互支持。主旨 1 中的实验研究将为主旨 2 中的理论研究提供见解和方向,这将有助于解释实验。调查结果以及附加指南该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Improving Adversarial Robustness Through the Contrastive-Guided Diffusion Process
通过对比引导扩散过程提高对抗鲁棒性
- DOI:10.48550/arxiv.2304.07756
- 发表时间:2022-10-18
- 期刊:
- 影响因子:0
- 作者:Yidong Ouyang;Liyan Xie;Guang Cheng
- 通讯作者:Guang Cheng
Optimal Convergence Rates of Deep Convolutional Neural Networks: Additive Ridge Functions
深度卷积神经网络的最优收敛率:加性岭函数
- DOI:10.48550/arxiv.2403.16459
- 发表时间:2022-02-24
- 期刊:
- 影响因子:0
- 作者:Zhiying Fang;Guang Cheng
- 通讯作者:Guang Cheng
Online Regularization toward Always-Valid High-Dimensional Dynamic Pricing
面向始终有效的高维动态定价的在线正则化
- DOI:10.1080/01621459.2023.2284979
- 发表时间:2023-12
- 期刊:
- 影响因子:3.7
- 作者:Wang, Chi;Wang, Zhanyu;Sun, Will Wei;Cheng, Guang
- 通讯作者:Cheng, Guang
Sparse confidence sets for normal mean models
正态平均模型的稀疏置信集
- DOI:
- 发表时间:2023-03
- 期刊:
- 影响因子:0
- 作者:Ning, Yang;Cheng, Guang
- 通讯作者:Cheng, Guang
Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms
基于差分隐私边际的数据合成算法的统计理论
- DOI:
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Li, Ximing;Wang, Chendi;Cheng, Guang
- 通讯作者:Cheng, Guang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Guang Cheng其他文献
Accurate compressed traffic detection via traffic analysis using Graph Convolutional Network based on graph structure feature
基于图结构特征的图卷积网络通过流量分析实现精确的压缩流量检测
- DOI:
10.1016/j.comcom.2023.04.031 - 发表时间:
2023-05-01 - 期刊:
- 影响因子:0
- 作者:
Nan Fu;Guang Cheng;X. Su - 通讯作者:
X. Su
Private Protocol Traffic Identification Based on Sequence Statistical Fingerprint
基于序列统计指纹的私有协议流量识别
- DOI:
10.1109/globecom48099.2022.10000713 - 发表时间:
2022-12-04 - 期刊:
- 影响因子:0
- 作者:
Junchen Li;Guang Cheng;Zekun Jing;Haiyang Wei - 通讯作者:
Haiyang Wei
TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing
TimeAutoDiff:结合自动编码器和扩散模型进行时间序列表格数据合成
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Namjoon Suh;Yuning Yang;Din;Qitong Luan;Shirong Xu;Shixiang Zhu;Guang Cheng - 通讯作者:
Guang Cheng
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility
贝叶斯激励兼容的双边市场动态在线推荐
- DOI:
10.48550/arxiv.2310.19683 - 发表时间:
2024-06-04 - 期刊:
- 影响因子:0
- 作者:
Yuantong Li;Guang Cheng;Xiaowu Dai - 通讯作者:
Xiaowu Dai
Community-base Fault Diagnosis Using Incremental Belief Revision
使用增量置信修正进行基于社区的故障诊断
- DOI:
10.1109/nas.2009.24 - 发表时间:
2009-07-09 - 期刊:
- 影响因子:0
- 作者:
Yongning Tang;Guang Cheng;Zhiwei Xu;E. Al - 通讯作者:
E. Al
Guang Cheng的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Guang Cheng', 18)}}的其他基金
I-Corps: Trustworthy Synthetic Data Generation
I-Corps:值得信赖的综合数据生成
- 批准号:
2317549 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Conference: UCLA Synthetic Data Workshop
会议:加州大学洛杉矶分校综合数据研讨会
- 批准号:
2309349 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
1712907 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: Semiparametric ODE Models for Complex Gene Regulatory Networks
合作研究:复杂基因调控网络的半参数 ODE 模型
- 批准号:
1418202 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: Bootstrap M-estimation in Semi-Nonparametric Models
职业:半非参数模型中的 Bootstrap M 估计
- 批准号:
1151692 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
General Semiparametric Inference via Bootstrap Sampling
通过 Bootstrap 采样进行一般半参数推理
- 批准号:
0906497 - 财政年份:2009
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
- 批准号:82304250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
肠道普拉梭菌代谢物丁酸抑制心室肌铁死亡改善老龄性心功能不全的机制研究
- 批准号:82300430
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
社会网络关系对公司现金持有决策影响——基于共御风险的作用机制研究
- 批准号:72302067
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向图像目标检测的新型弱监督学习方法研究
- 批准号:62371157
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
面向开放域对话系统信息获取的准确性研究
- 批准号:62376067
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317232 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
- 批准号:
2338302 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Using Intelligent Conversational Agents to Empower Adolescents to be Resilient Against Cybergrooming
合作研究:SaTC:核心:中:使用智能会话代理使青少年能够抵御网络诱骗
- 批准号:
2330940 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Using Intelligent Conversational Agents to Empower Adolescents to be Resilient Against Cybergrooming
合作研究:SaTC:核心:中:使用智能会话代理使青少年能够抵御网络诱骗
- 批准号:
2330941 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317233 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant