Collaborative Research: SaTC: CORE: Small: Differentially Private Data Synthesis: Practical Algorithms and Statistical Foundations
协作研究:SaTC:核心:小型:差分隐私数据合成:实用算法和统计基础
基本信息
- 批准号:2247795
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-15 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Data collected by organizations and agencies are a key resource in today’s information age and fuel a significant part of today's economy. However, the disclosure of those data poses serious threats to individual privacy. One important approach to using data while protecting privacy is differential private data synthesis (DPDS). That is, given as input a private dataset, one uses a differentially private algorithm to generate synthetic datasets that are “similar” to the input dataset. While DPDS has received much attention in recent years, our understanding on this topic remains limited. This project takes a multi-disciplinary approach to advance our scientific understanding as well as improve practice techniques for DPDS. More specifically, this project’s novelties are as follows. First, it systematically explores the design space in marginal-based DPDS algorithms that have been proven to be effective in NIST competitions on DPDS, while also taking insights from data synthesis techniques developed in similar fields (often not satisfying DP). Second, it develops statistical theories that both are motivated by the empirical performances of DPDS algorithms, and guide the empirical research of these algorithms. The project’s broader significance and importance are as follows. We are in the information economy. Data of all kinds, such as online interaction, medical sensor data, genomic data, and location data are being collected. Practical techniques that enable use of these data while protecting individual privacy are crucially needed and will greatly enhance the value of such data. Users will gain from increased control of their private information, and society as a whole will benefit from deriving maximal benefit from aggregated data. PIs plan to jointly develop and teach a graduate-level course on synthetic data based on the existing research in this area as well as research results from this project, and involve undergraduate students in research. This project has two thrusts. The first thrust aims to develop new marginal-based DPDS algorithms that improve upon the state-of-art in empirical evaluations. The tasks include: perform an in-depth study of the “marginal-to-dataset” problem (how to synthesize a dataset when given a set of marginals); develop and evaluate new approaches for handling numerical attributes; and develop adaptive and automated techniques for selecting marginals so that dataset synthesized with them captures as much useful information from the input dataset as possible. The second thrust complements the empirical research in the first thrust, and aims to develop statistical theory for high dimensional marginal-based data synthesis algorithms, and also a general learning theory framework to evaluate the utility of synthetic data in downstream tasks. The two thrusts are highly complementary and support each other. The experimental study in Thrust 1 will provide insights and directions for theoretical studies in Thrust 2, which will help explain the experimental findings as well as guide additional experimental studies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
组织和机构收集的数据是当今信息时代的关键资源,并助长了当今经济的重要组成部分。但是,这些数据的披露对个人隐私构成了严重威胁。在保护隐私的同时,使用数据的一种重要方法是私人数据综合(DPD)。也就是说,作为输入一个私有数据集,人们使用不同的私有算法来生成与输入数据集“相似”的合成数据集。尽管DPD近年来引起了很多关注,但我们对该主题的理解仍然有限。该项目采用多学科的方法来提高我们的科学理解以及改进的DPD实践技术。更具体地说,该项目的新颖性如下。首先,它系统地探讨了基于边缘的DPD算法中的设计空间,这些算法已被证明在DPD的NIST竞争中有效,同时还从类似领域开发的数据合成技术(通常不满意DP)中获得了见解。其次,它发展了统计理论,即两者都是由DPDS算法的经验性能激发的,并指导这些算法的经验研究。该项目的重要性和重要性如下。我们在信息经济中。正在收集各种数据,例如在线互动,医疗传感器数据,基因组数据和位置数据。完全需要在保护个人隐私的同时使用这些数据的实用技术,并且将大大提高此类数据的价值。用户将从对其私人信息的控制中获得收益,整个社会将受益于从汇总数据中获得最大收益。 PIS计划基于该领域的现有研究以及该项目的研究共同开发和教授有关合成数据的研究生级课程,并让本科生参与研究。该项目有两个推力。第一个推力旨在开发新的基于边缘的DPD算法,以改善经验评估中的最先进。任务包括:对“边缘到数据”问题进行深入研究(在给出一组边缘时如何合成数据集);开发和评估处理数值属性的新方法;并开发自适应和自动化技术来选择边缘,以便与它们合成的数据集从输入数据集中捕获尽可能多的有用信息。第二个推力在第一个推力中完成了实证研究,旨在为高维边缘数据综合算法开发统计理论,也是一个一般学习理论框架,以评估下游任务中合成数据的实用性。这两个推力是高度互补的,并互相支持。推力1中的实验研究将为推力2中的理论研究提供洞察力和方向,这将有助于解释实验发现,并指导其他实验研究。该奖项反映了NSF的法定任务,并被认为是通过基金会的知识分子优点和更广泛的影响来评估的珍贵的支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
FairRR: Pre-Processing for Group Fairness through Randomized Response
FairRR:通过随机响应进行群体公平性预处理
- DOI:
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Zeng, Xianli;Ward, Joshua;Cheng, Guang
- 通讯作者:Cheng, Guang
AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
- DOI:10.48550/arxiv.2310.15479
- 发表时间:2023-10
- 期刊:
- 影响因子:0
- 作者:Namjoon Suh;Xiaofeng Lin;Din-Yin Hsieh;Merhdad Honarkhah;Guang Cheng
- 通讯作者:Namjoon Suh;Xiaofeng Lin;Din-Yin Hsieh;Merhdad Honarkhah;Guang Cheng
Improving Adversarial Robustness Through the Contrastive-Guided Diffusion Process
- DOI:
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Yidong Ouyang;Liyan Xie;Guang Cheng
- 通讯作者:Yidong Ouyang;Liyan Xie;Guang Cheng
Sparse confidence sets for normal mean models
正态平均模型的稀疏置信集
- DOI:10.1093/imaiai/iaad003
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Ning, Yang;Cheng, Guang
- 通讯作者:Cheng, Guang
Optimal Convergence Rates of Deep Convolutional Neural Networks: Additive Ridge Functions
- DOI:
- 发表时间:2022-02
- 期刊:
- 影响因子:0
- 作者:Zhiying Fang;Guang Cheng
- 通讯作者:Zhiying Fang;Guang Cheng
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Guang Cheng其他文献
PDA-cross-linked beta-cyclodextrin: a novel adsorbent for the removal of BPA and cationic dyes.
PDA 交联 β-环糊精:一种用于去除 BPA 和阳离子染料的新型吸附剂。
- DOI:
10.2166/wst.2020.286 - 发表时间:
2020-06 - 期刊:
- 影响因子:2.7
- 作者:
Jianyu Wang;Guang Cheng;Jian Lu;Huafeng Chen;Yanbo Zhou - 通讯作者:
Yanbo Zhou
Identifying Video Resolution from Encrypted QUIC Streams in Segment-combined Transmission Scenarios
分段组合传输场景下加密QUIC流视频分辨率识别
- DOI:
10.1145/3651863.3651883 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Yuanjie Zhao;Hua Wu;Liujinhan Chen;Songtao Liu;Guang Cheng;Xiaoyan Hu - 通讯作者:
Xiaoyan Hu
RBAS: A Real-Time User Behavior Analysis System for Internet TV in Cloud Computing
RBAS:云计算下的互联网电视实时用户行为分析系统
- DOI:
10.1145/2935663.2935664 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
C. Zhu;Guang Cheng;Xiaojun Guo;Yuxiang Wang - 通讯作者:
Yuxiang Wang
Community-base Fault Diagnosis Using Incremental Belief Revision
使用增量置信修正进行基于社区的故障诊断
- DOI:
10.1109/nas.2009.24 - 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Yongning Tang;Guang Cheng;Zhiwei Xu;E. Al - 通讯作者:
E. Al
BadGD: A unified data-centric framework to identify gradient descent vulnerabilities
BadGD:一个以数据为中心的统一框架,用于识别梯度下降漏洞
- DOI:
10.48550/arxiv.2405.15979 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
ChiHua Wang;Guang Cheng - 通讯作者:
Guang Cheng
Guang Cheng的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Guang Cheng', 18)}}的其他基金
Conference: UCLA Synthetic Data Workshop
会议:加州大学洛杉矶分校综合数据研讨会
- 批准号:
2309349 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
I-Corps: Trustworthy Synthetic Data Generation
I-Corps:值得信赖的综合数据生成
- 批准号:
2317549 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
1712907 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: Semiparametric ODE Models for Complex Gene Regulatory Networks
合作研究:复杂基因调控网络的半参数 ODE 模型
- 批准号:
1418202 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: Bootstrap M-estimation in Semi-Nonparametric Models
职业:半非参数模型中的 Bootstrap M 估计
- 批准号:
1151692 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
General Semiparametric Inference via Bootstrap Sampling
通过 Bootstrap 采样进行一般半参数推理
- 批准号:
0906497 - 财政年份:2009
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
- 批准号:62371263
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
腙的Heck/脱氮气重排串联反应研究
- 批准号:22301211
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
- 批准号:52364038
- 批准年份:2023
- 资助金额:33 万元
- 项目类别:地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
- 批准号:82371176
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FOXO3 m6A甲基化修饰诱导滋养细胞衰老效应在补肾法治疗自然流产中的机制研究
- 批准号:82305286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317232 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Using Intelligent Conversational Agents to Empower Adolescents to be Resilient Against Cybergrooming
合作研究:SaTC:核心:中:使用智能会话代理使青少年能够抵御网络诱骗
- 批准号:
2330940 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
- 批准号:
2338301 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317233 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
- 批准号:
2338302 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant