A novel platform for synthetic generation and statistical obfuscation of tabular clinical data, simulated images, and machine-generated text
用于表格临床数据、模拟图像和机器生成文本的合成生成和统计混淆的新颖平台
基本信息
- 批准号:10696488
- 负责人:
- 金额:$ 32.46万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-15 至 2024-09-14
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAutomobile DrivingBackBehaviorBiomedical ResearchBusinessesClinicalClinical DataComplementComplexDataData ProtectionData SetDisclosureEducationEquilibriumFast Healthcare Interoperability ResourcesGenerationsGoalsHaresHealthHealth Care ResearchHealth Care SectorHealth Insurance Portability and Accountability ActHealthcareImageIndustryInfrastructureInstitutionLegal patentMagnetic Resonance ImagingMasksMedicalModelingNursesOccupationsPatient CarePhasePositron-Emission TomographyPrivacyProcessProductionProtocols documentationRecordsRegulationReportingResearchRiskSecureServicesSmall Business Technology Transfer ResearchSocietiesSocioeconomic StatusStatutes and LawsStructureSumTechniquesTechnologyTestingTextTimebig biomedical dataclinical imagingcost effectivedata anonymizationdata de-identificationdata exchangedata formatdata repositorydata sharingdesensitizationdesignelectronic health dataelectronic structureflexibilityinnovationinterestinteroperabilitynon-compliancenovelpredictive modelingprivacy protectionsoftware as a servicestructured datatoolunstructured dataweb appweb services
项目摘要
PROJECT SUMMARY
Data is a critical and highly valuable commodity, driving meaningful change in our society,
especially when it pertains to patient care and biomedical research. Currently, institutions pay
inordinate sums to increase, regain, and complement their data panels. As an extra burden,
data legislation and privacy protection regulations introduce barriers to forming effective
partnerships between business, clinical, research and educational organizations. As a result,
approximately 80% of medical data today can’t be readily shared because they contain
personal, protected or sensitive information and remains unstructured and untapped after they
are created. There is a growing and urgent unmet need for technology solutions that balance
research and commercial organizations interests by supporting flexible general-purpose
analytics while guaranteeing privacy protection.
There are no effective mechanisms to enable
data sharing without either risking inappropriate release of sensitive information or potential
degradation of the information content. The currently available few protocols and algorithms for
modeling, processing, interrogating, and ultimately sharing large sensitive data (e.g., thousands
and millions of records with thousands of heterogeneous features) all share significant
limitations and their practical use still lags behind research progress. Two major unmet needs in
the data sharing industry are i) the inability to return de-identified clones of the raw data, and ii)
lack of scalability requirements of production deployments. GrayRain, LLC is an early-stage
Software-as-a-Service company developing a novel platform for statistical obfuscation and de-
identification of sensitive structured (numerical, categorical tabular data) and unstructured
information (e.g., clinical text, doctors/nurses notes and clinical images, such as MRI, PET). The
core of GrayRain’s technology is the novel patented statistical obfuscation algorithm, DataSifter. The
technology proposed in this STTR Phase I application will significantly increase the number of
secure data transactions in the healthcare sector and beyond, enabling data sharing with fully
controllable risk of identification of any sensitive information, including, but not limited to PHI
(personal health information), demographic information, or socioeconomic status. GrayRain’s
technology is able to produce de-identified clones of raw tabular data, addressing a major limitations
encounter across existing data anonymization protocols. As far as scalability, the main goal of this
STTR Phase I is to establish feasibility of GrayRain to accurately and efficiently (re: scalability) de-
identify and share large-scale complex EHR data repositories with a controlled risk of disclosing
protected or personal health information .
项目概要
数据是一种至关重要且极具价值的商品,推动着我们社会的有意义的变革,
特别是当涉及患者护理和生物医学研究时,目前由机构付费。
增加、恢复和补充他们的数据面板的金额过多,作为额外的负担,
数据立法和隐私保护出台法规形成有效壁垒
因此,商业、临床、研究和教育组织之间的伙伴关系。
如今大约 80% 的医疗数据无法轻易共享,因为它们包含
个人的、受保护的或敏感的信息,并且在这些信息之后仍然是非结构化的和未被利用的
对平衡技术解决方案的需求日益增长且未得到满足。
通过支持灵活的通用目的来满足研究和商业组织的利益
分析的同时保证隐私保护。
没有有效的机制来实现
数据共享,不会带来敏感信息不当发布的风险或潜在的风险
目前可用的协议和算法很少。
建模、处理、询问并最终共享大量敏感数据(例如数千个
以及具有数千个异构特征的数百万条记录)都共享重要的
局限性及其实际应用仍然落后于研究进展的两个主要未满足的需求。
数据共享行业的特点是 i) 无法返回原始数据的去识别化克隆,以及 ii)
GrayRain, LLC 处于早期阶段,缺乏可扩展性要求。
软件即服务公司开发用于统计混淆和反演的新颖平台
识别敏感的结构化(数字、分类表格数据)和非结构化数据
信息(例如临床文本、医生/护士笔记和临床图像,例如 MRI、PET)。
GrayRain 技术的核心是新颖的专利统计混淆算法 DataSifter。
STTR 第一阶段申请中提出的技术将显着增加
确保医疗保健行业及其他领域的数据交易安全,从而实现数据共享
识别任何敏感信息(包括但不限于 PHI)的可控风险
(个人健康信息)、人口统计信息或社会经济状况。
技术能够生成原始表格数据的去识别化克隆,解决了主要限制
就可扩展性而言,这是其主要目标。
STTR 第一阶段是建立 GrayRain 准确有效(可扩展性)de-的可行性
识别和共享大规模复杂的 EHR 数据存储库,并控制泄露风险
受保护的或个人的健康信息。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ronak Shetty其他文献
Ronak Shetty的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
- 批准号:82304250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
多模态高层语义驱动的深度伪造检测算法研究
- 批准号:62306090
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高精度海表反照率遥感算法研究
- 批准号:42376173
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
基于新型深度学习算法和多组学研究策略鉴定非编码区剪接突变在肌萎缩侧索硬化症中的分子机制
- 批准号:82371878
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
基于深度学习与水平集方法的心脏MR图像精准分割算法研究
- 批准号:62371156
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Restoring Dexterous Hand Function with Artificial Neural Network-Based Brain-Computer Interfaces
利用基于人工神经网络的脑机接口恢复灵巧手功能
- 批准号:
10680206 - 财政年份:2023
- 资助金额:
$ 32.46万 - 项目类别:
A Novel Algorithm to Identify People with Undiagnosed Alzheimer's Disease and Related Dementias
一种识别未确诊阿尔茨海默病和相关痴呆症患者的新算法
- 批准号:
10696912 - 财政年份:2023
- 资助金额:
$ 32.46万 - 项目类别:
Connecting Latinos en Pareja: A Couples-based HIV Prevention Intervention for Latino Male Couples
连接拉丁裔与帕雷哈:针对拉丁裔男性夫妇的基于夫妇的艾滋病毒预防干预措施
- 批准号:
10706860 - 财政年份:2023
- 资助金额:
$ 32.46万 - 项目类别:
Digital Twin Neighborhoods for Research on Place-Based Health Inequalities in Mid-Life
用于研究中年地区健康不平等的数字孪生社区
- 批准号:
10583781 - 财政年份:2023
- 资助金额:
$ 32.46万 - 项目类别:
An acquisition and analysis pipeline for integrating MRI and neuropathology in TBI-related dementia and VCID
用于将 MRI 和神经病理学整合到 TBI 相关痴呆和 VCID 中的采集和分析流程
- 批准号:
10810913 - 财政年份:2023
- 资助金额:
$ 32.46万 - 项目类别: