Advanced Theory and Methods for Evaluating the Utility and Privacy Risks of Synthetic Health Data

评估综合健康数据的实用性和隐私风险的先进理论和方法

基本信息

  • 批准号:
    RGPIN-2022-04811
  • 负责人:
  • 金额:
    $ 1.75万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Access to health data for secondary purposes remains a challenge because of privacy concerns. Synthetic data generation (SDG) has been proposed to enable data sharing that is believed to have low identification risks because there is no one-to-one mapping to real individuals. However, if the generative models used to generate synthetic data are overfit, or if a dataset is categorical with a small number of possible combinations of values, then real records may be generated. The adoption of SDG will also depend on demonstrating the utility of the generated data. Utility is broadly defined as the ability to replicate the conclusions from the analysis of real data on synthetic data. SDG needs to simultaneously optimize on privacy and utility. However, thus far SDG loss functions have largely been focused on maximizing utility, and privacy risks are often assessed after the data are generated. The purpose of this program is to develop a unified privacy framework for SDG, and to evaluate and improve current utility metrics. These results would then be used to define and test a combined loss metric that can be applied to optimize the generation of synthetic data which allows for the simultaneous management of privacy and utility. Privacy Evaluation Our focus in this program will be on identity disclosure conditional on attribute disclosure and membership disclosure. We will develop and validate a unified risk model that integrates identity, attribute, and membership disclosure. Currently there are no privacy models that are directly applicable to longitudinal synthetic datasets. The unified model of disclosure above will be extended to longitudinal data with multiple heterogeneous events per patient. Existing approaches used in the disclosure control literature will be incorporated into the synthetic data privacy model. Utility Evaluation Utility metrics can serve multiple purposes such as model optimization and synthetic dataset evaluation to accept or reject specific generated datasets. In this part of the program, current utility metrics will be empirically evaluated. The results will clarify which utility metrics are useful for optimization, and synthesized dataset acceptance/rejection. Currently, there has been a dearth of work on evaluating the utility of synthetic longitudinal data. Simple approaches such as concordance between k-order Markov chains capture some structural properties, but do not provide measures related to analytic workloads. This program of research will extend and evaluate the utility metrics for longitudinal data. Risk-Utility Optimization With appropriately defined privacy and utility metrics, a combined risk-utility measure can be defined and used as an optimization criterion for SDG algorithms. This will ensure that generated synthetic data satisfy both criteria by construction. Such a measure will be evaluated on common SDG algorithms used on health data.
出于次要目的获取健康数据仍然是一个挑战,因为隐私问题。已提出合成数据生成(SDG)来实现据信具有低识别风险的数据共享,因为没有一对一的映射对真实的个人。但是,如果用于生成合成数据的生成模型过度拟合,或者数据集与少数可能的值组合分类,则可以生成实际记录。 可持续发展目标的采用也将取决于证明生成数据的实用性。实用程序广泛定义为从合成数据的真实数据分析中复制结论的能力。 可持续发展目标需要同时优化隐私和效用。但是,到目前为止,SDG损失功能在很大程度上集中在最大化效用上,并且在生成数据后通常会评估隐私风险。 该计划的目的是为可持续发展目标开发一个统一的隐私框架,并评估和改善当前公用事业指标。然后,这些结果将用于定义和测试合并的损失度量,该指标可用于优化合成数据的生成,该数据允许同时管理隐私和实用程序。 隐私评估我们在该计划上的重点将放在属性披露和会员资格披露条件下的身份披露上。我们将开发并验证一个整合身份,属性和会员披露的统一风险模型。 当前,没有直接适用于纵向合成数据集的隐私模型。上面的统一模型将扩展到纵向数据,每个患者有多个异质事件。披露控制文献中使用的现有方法将纳入合成数据隐私模型中。 实用程序评估实用程序指标可以实现多种目的,例如模型优化和合成数据集评估,以接受或拒绝特定生成的数据集。 在该计划的这一部分中,将对当前的公用事业指标进行经验评估。结果将阐明哪些实用程序指标可用于优化,并合成数据集接受/拒绝。 当前,在评估合成纵向数据的实用性方面已经缺乏工作。诸如K阶马尔可夫链之间的一致性之类的简单方法捕获了一些结构性,但没有提供与分析工作负载相关的措施。该研究计划将扩展和评估纵向数据的效用指标。具有适当定义的隐私和效用指标的风险实用性优化,可以将合并的风险实用性措施定义并用作SDG算法的优化标准。这将确保生成的合成数据满足构造的两个标准。这种措施将对用于健康数据的常见可持续发展算法进行评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

ElEmam, Khaled其他文献

ElEmam, Khaled的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('ElEmam, Khaled', 18)}}的其他基金

Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2020
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2019
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2018
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2017
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2016
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2015
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Metrics and methods for the de-identification of health information
健康信息去识别化的指标和方法
  • 批准号:
    186936-2011
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2014
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs
Electronic Health Information
电子健康信息
  • 批准号:
    1000216983-2009
  • 财政年份:
    2013
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Canada Research Chairs

相似国自然基金

基于热电力协同调控的食管穿越式适形热物理治疗理论与方法研究
  • 批准号:
    52306105
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
面向六自由度交互的沉浸式视频感知编码理论与方法研究
  • 批准号:
    62371081
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
物理-数据混合驱动的复杂曲面多模态视觉检测理论与方法
  • 批准号:
    52375516
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
高维结构约束的光场视频稀疏模型压缩理论与方法
  • 批准号:
    62371278
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
基于证据理论的非完备信息度量及融合方法研究
  • 批准号:
    62301439
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Designing an Ethnodrama Intervention Addressing PrEP Stigma Toward Young Women
设计民族戏剧干预措施,解决针对年轻女性的 PrEP 耻辱
  • 批准号:
    10755777
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Using Momentary Measures to Understand Physical Activity Adoption and Maintenance among Pacific Islanders in the United States
使用临时措施了解美国太平洋岛民体育活动的采用和维持情况
  • 批准号:
    10737528
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Optimizing Use of Advanced Diabetes Technology for Self-Management in Adolescents with Type 1 Diabetes: Integration of Real-Time Glucose and Narrative Data
优化使用先进糖尿病技术对 1 型糖尿病青少年进行自我管理:实时血糖和叙述数据的集成
  • 批准号:
    10569293
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Mathematical modeling and molecular imaging to maximize response while minimizing toxicities from systemic therapies in preclinical models of breast cancer
数学建模和分子成像可最大限度地提高乳腺癌临床前模型中全身治疗的反应,同时最大限度地降低毒性
  • 批准号:
    10564905
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
Advanced theory and methods for the de-identification of small cohorts, complex and composed health data
小群体、复杂组合健康数据去识别化的先进理论和方法
  • 批准号:
    RGPIN-2016-06781
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了