CAREER: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning

职业:新的数据集成方法,用于高效、稳健的元估计、模型融合和迁移学习

基本信息

  • 批准号:
    2337943
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-06-01 至 2029-05-31
  • 项目状态:
    未结题

项目摘要

Statistical science aims to learn about natural phenomena by drawing generalizable conclusions from an aggregate of similar experimental observations. With the recent “Big Data” and “Open Science” revolutions, scientists have shifted their focus from aggregating individual observations to aggregating massive publicly available datasets. This endeavor is premised on the hope of improving the robustness and generalizability of findings by combining information from multiple datasets. For example, combining data on rare disease outcomes across the United States can paint a more reliable picture than basing conclusions only on a small number of cases in one hospital. Similarly, combining data on disease risk factors across the United States can distinguish local from national health trends. To date, statistical approaches to these data aggregation objectives have been limited to simple settings with limited practical utility. In response to this gap, this project develops new methods for aggregating information from multiple datasets in three distinct data integration problems grounded in scientific practice. The developed approaches are intuitive, principled and robust to substantial differences between datasets, and are broadly applicable in medical, economic and social sciences, among others. Among other applications, the project will deliver new tools to extract health insights from large electronic health records databases. The project will support undergraduate and graduate student training, course development, and the recruitment and professional mentoring of under-represented minorities in statistics. Further, the project will impact STEM education through a data science teacher training program in underserved communities.This project develops intuitive, principled, robust and efficient methods in three essential data integration problems: meta-analysis, model fusion and transfer learning. First, the project delivers a set of meta-analysis methods for privacy-preserving one-shot estimation and inference using a new notion of dataset similarity. The primary novelty in the approach is the joint estimation of both dataset-specific parameters and a combined parameter that bears some similarity to the classic meta-estimator. Second, the project establishes model fusion methods that learn the clustering of similar datasets. The methods’ unique feature is a model fusion that dials data integration along a spectrum of more to less fusion and thereby does not force model parameters from clustered datasets to be exactly equal. Third, the project develops flexible and robust transfer learning approaches that leverage historical information for improved statistical efficiency in a target dataset of interest. An important element of these approaches is a flexible specification of the type of models fit to the source datasets. All three sets of methods place a premium on interpretability, statistical efficiency and robustness of the inferential output. The project unifies the three sets of proposed methods under a formal data integration framework formulated around two axioms of data integration. Data integration ideas pervade every field of scientific study in which data are collected, and so the research contributes to scientific endeavors in the medical, economic and social sciences, among others.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
通过与最近的“开放科学”相似的经验得出可概括的结论来了解自然现象。通过将多个数据集的信息结合起来,结合了整个美国的稀有疾病结果的数据,比仅仅在一个医院中仅组合疾病风险因素的数据,就可以绘制更可靠的情况。美国可以将本地趋势与日期区分开来。在实践中。在统计数据中,该项目将影响教师老师老师的茶点。使用新的数据集概念的荟萃分析方法的一声估计和推理。估算器。第三,帽子在感兴趣的目标数据集中利用了效率的效率。数据整合。数据整合遍布每个科学研究的领域,其中数据是医学,经济和社会科学领域的收藏家,这是反映NSF'SF'STUTUTORY MISSION基金会的智力优点和更广泛的影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Emily Hector其他文献

Emily Hector的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

基于异构化数据和图神经网络预测新冠肺炎患者恢复期肺功能的研究
  • 批准号:
    82302313
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
构建多组学数据融合模型预测结直肠癌新辅助免疫治疗疗效的研究
  • 批准号:
    82373431
  • 批准年份:
    2023
  • 资助金额:
    48 万元
  • 项目类别:
    面上项目
基于多模态数据融合和时空特征建模的新冠肺炎危险分层及预后研究
  • 批准号:
    82371958
  • 批准年份:
    2023
  • 资助金额:
    48 万元
  • 项目类别:
    面上项目
基于手机信令大数据的新冠疫情前后城市居民移动性规律变化研究
  • 批准号:
    42301210
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
整合全基因组关联分析和转录组数据鉴定影响绵羊尾长性状的新基因
  • 批准号:
    32360823
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Causes and Downstream Effects of 14-3-3 Phosphorylation in Synucleinopathies
突触核蛋白病中 14-3-3 磷酸化的原因和下游影响
  • 批准号:
    10606132
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
CAREER: New Frontiers of Private Learning and Synthetic Data
职业:私人学习和合成数据的新领域
  • 批准号:
    2339775
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Bilirubin Catabolism induces Plasminogen-Activator Inhibitor 1 (PAI-1) worsening Metabolic Dysfunction
胆红素分解代谢诱导纤溶酶原激活剂抑制剂 1 (PAI-1) 恶化代谢功能障碍
  • 批准号:
    10750132
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
Virtual drug screen reveals context-dependent inhibition of cardiomyocyte hypertrophy
虚拟药物筛选揭示了心肌细胞肥大的情境依赖性抑制
  • 批准号:
    10678351
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
2023 Atherosclerosis
2023 动脉粥样硬化
  • 批准号:
    10675221
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了