Advanced Theory and Methods for Evaluating the Utility and Privacy Risks of Synthetic Health Data

评估综合健康数据的实用性和隐私风险的先进理论和方法

基本信息

  • 批准号:
    RGPIN-2022-04811
  • 负责人:
  • 金额:
    $ 1.75万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Access to health data for secondary purposes remains a challenge because of privacy concerns. Synthetic data generation (SDG) has been proposed to enable data sharing that is believed to have low identification risks because there is no one-to-one mapping to real individuals. However, if the generative models used to generate synthetic data are overfit, or if a dataset is categorical with a small number of possible combinations of values, then real records may be generated. The adoption of SDG will also depend on demonstrating the utility of the generated data. Utility is broadly defined as the ability to replicate the conclusions from the analysis of real data on synthetic data. SDG needs to simultaneously optimize on privacy and utility. However, thus far SDG loss functions have largely been focused on maximizing utility, and privacy risks are often assessed after the data are generated. The purpose of this program is to develop a unified privacy framework for SDG, and to evaluate and improve current utility metrics. These results would then be used to define and test a combined loss metric that can be applied to optimize the generation of synthetic data which allows for the simultaneous management of privacy and utility. Privacy Evaluation Our focus in this program will be on identity disclosure conditional on attribute disclosure and membership disclosure. We will develop and validate a unified risk model that integrates identity, attribute, and membership disclosure. Currently there are no privacy models that are directly applicable to longitudinal synthetic datasets. The unified model of disclosure above will be extended to longitudinal data with multiple heterogeneous events per patient. Existing approaches used in the disclosure control literature will be incorporated into the synthetic data privacy model. Utility Evaluation Utility metrics can serve multiple purposes such as model optimization and synthetic dataset evaluation to accept or reject specific generated datasets. In this part of the program, current utility metrics will be empirically evaluated. The results will clarify which utility metrics are useful for optimization, and synthesized dataset acceptance/rejection. Currently, there has been a dearth of work on evaluating the utility of synthetic longitudinal data. Simple approaches such as concordance between k-order Markov chains capture some structural properties, but do not provide measures related to analytic workloads. This program of research will extend and evaluate the utility metrics for longitudinal data. Risk-Utility Optimization With appropriately defined privacy and utility metrics, a combined risk-utility measure can be defined and used as an optimization criterion for SDG algorithms. This will ensure that generated synthetic data satisfy both criteria by construction. Such a measure will be evaluated on common SDG algorithms used on health data.
由于隐私问题,出于次要目的获取健康数据仍然是一个挑战。合成数据生成(SDG)已被提出来实现数据共享,这种数据共享被认为具有较低的识别风险,因为不存在与真实个体的一对一映射。然而,如果用于生成合成数据的生成模型过度拟合,或者如果数据集是具有少量可能的值组合的分类数据,则可能会生成真实记录。 可持续发展目标的采用还取决于展示所生成数据的效用。效用被广泛定义为将真实数据分析得出的结论复制到合成数据上的能力。 SDG 需要同时优化隐私和实用性。然而,到目前为止,SDG 损失函数主要集中在效用最大化,并且通常在数据生成后评估隐私风险。 该计划的目的是为可持续发展目标开发统一的隐私框架,并评估和改进当前的效用指标。然后,这些结果将用于定义和测试组合损失指标,该指标可用于优化合成数据的生成,从而允许同时管理隐私和实用性。 隐私评估 我们在此计划中的重点是以属性披露和会员身份披露为条件的身份披露。我们将开发并验证一个集成身份、属性和会员披露的统一风险模型。 目前还没有直接适用于纵向合成数据集的隐私模型。上述公开的统一模型将扩展到每个患者具有多个异质事件的纵向数据。披露控制文献中使用的现有方法将被纳入合成数据隐私模型中。 效用评估 效用指标可以用于多种目的,例如模型优化和综合数据集评估,以接受或拒绝特定生成的数据集。 在该计划的这一部分中,将对当前的效用指标进行实证评估。结果将阐明哪些实用指标可用于优化以及综合数据集接受/拒绝。 目前,缺乏评估合成纵向数据效用的工作。 k 阶马尔可夫链之间的一致性等简单方法捕获了一些结构属性,但不提供与分析工作负载相关的度量。该研究计划将扩展和评估纵向数据的效用指标。风险-效用优化 通过适当定义隐私和效用指标,可以定义组合的风险-效用度量并将其用作 SDG 算法的优化标准。这将确保生成的合成数据通过构造满足这两个标准。此类措施将根据用于健康数据的常见可持续发展目标算法进行评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

ElEmam, Khaled其他文献

ElEmam, Khaled的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('ElEmam, Khaled', 18)}}的其他基金

Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2020
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2019
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2018
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2017
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2016
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2015
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2013
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs

相似国自然基金

物理-数据混合驱动的复杂曲面多模态视觉检测理论与方法
  • 批准号:
    52375516
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
考虑异质性、交互性、层次性的医联体系统效率评价理论、方法及应用
  • 批准号:
    72371232
  • 批准年份:
    2023
  • 资助金额:
    41 万元
  • 项目类别:
    面上项目
面向应急通信的移动信息网络弹性适变理论与方法
  • 批准号:
    62341103
  • 批准年份:
    2023
  • 资助金额:
    150 万元
  • 项目类别:
    专项基金项目
基于预设轨迹约束的海上风电工程船智能控制理论方法
  • 批准号:
    52301417
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
面向六自由度交互的沉浸式视频感知编码理论与方法研究
  • 批准号:
    62371081
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

Designing an Ethnodrama Intervention Addressing PrEP Stigma Toward Young Women
设计民族戏剧干预措施,解决针对年轻女性的 PrEP 耻辱
  • 批准号:
    10755777
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Using Momentary Measures to Understand Physical Activity Adoption and Maintenance among Pacific Islanders in the United States
使用临时措施了解美国太平洋岛民体育活动的采用和维持情况
  • 批准号:
    10737528
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Optimizing Use of Advanced Diabetes Technology for Self-Management in Adolescents with Type 1 Diabetes: Integration of Real-Time Glucose and Narrative Data
优化使用先进糖尿病技术对 1 型糖尿病青少年进行自我管理:实时血糖和叙述数据的集成
  • 批准号:
    10569293
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Mathematical modeling and molecular imaging to maximize response while minimizing toxicities from systemic therapies in preclinical models of breast cancer
数学建模和分子成像可最大限度地提高乳腺癌临床前模型中全身治疗的反应,同时最大限度地降低毒性
  • 批准号:
    10564905
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了