Collaborative Research: Use of Random Compression Matrices For Scalable Inference in High Dimensional Structured Regressions

合作研究:使用随机压缩矩阵进行高维结构化回归中的可扩展推理

基本信息

  • 批准号:
    2210672
  • 负责人:
  • 金额:
    $ 18万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-15 至 2025-05-31
  • 项目状态:
    未结题

项目摘要

As the scientific community moves into a data-driven era, there is an unprecedented opportunity to leverage large scale imaging, genetic and EHR data to better characterize and understand human disease to improve treatment and prognosis. Consequently, analysis of such datasets with flexible statistical models has become an enormously active area of research over the last decade. To this end, this project plans to develop a completely new class of methods, which are based on the idea of fitting statistical models on datasets obtained by compressing big data using a well designed mechanism. The development enables efficient modeling of massive data on an unprecedented scale. While the motivation of the investigators comes primarily from complex modeling and uncertainty quantification of massive biomedical data, the statistical methods are general enough to set important footprints in the related literature of machine learning and environmental sciences. The overarching goal also includes the development of software toolkits to better serve practitioners in related disciplines. Further, the projects will provide first hand training opportunities for graduate and undergraduate students, including female and students from minority communities, in state-of-the-art statistical methodologies and imaging/genetic/EHR data. By disseminating the outcome of the project among high school students in terminology that they can understand, the project can have far reaching effects to enhance public scientific literacy about statistics.Two crucial aspects of modern statistical learning approaches in the era of complex and high dimensional data are accuracy and scale in inference. Modern data are increasingly complex and high dimensional, involving a large number of variables and large sample size, with complex relationships between different variables. Developing practically efficient (in terms of storage and analysis) and theoretically “optimal” Bayesian high dimensional parametric or nonparametric regression methods to draw accurate inference with valid uncertainties from such complex datasets is an extremely important problem. To offer a general solution for this problem, the investigators will develop approaches based on data compression using a small number of random linear transformations. The approach either reduces a large number of records corresponding to each variable using compression, in which case it maintains feature interpretation for adequate inference, or, reduces the dimension of the covariate vector for each sample using compression, in which case the focus is only on prediction of the response. In either case, data compression facilitates drawing storage efficient, scalable and accurate Bayesian inference/prediction in presence of high dimensional data with sufficiently rich parametric and nonparametric regression models. An important goal is to establish precise theoretical results on the convergence behavior of the fitted models with compressed data as a function of the number of predictors, sample size, properties of random linear transformations and features of these models. The approaches will be used to study neurological disorders by combining brain imaging data, genetic data and electronic health records (EHR) data from the UK Biobank database. The project will also contribute on a broader front to advancing the interdisciplinary research training and broadening participation in statistical sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着科学界进入数据驱动的时代,有一个前所未有的机会来利用大规模成像,遗传和EHR数据来更好地表征和理解人类疾病以改善治疗和预后。因此,在过去的十年中,对具有灵活统计模型的此类数据集的分析已成为一个非常活跃的研究领域。为此,该项目计划开发一种全新的方法,这些方法基于通过使用精心设计的机制来压缩大数据来拟合静态模型的想法。该开发可以在前所未有的量表上有效地建模大量数据。虽然研究人员的动机主要来自大规模生物医学数据的复杂建模和不确定性量化,但统计方法足以在机器学习和环境科学的相关文献中设置重要的足迹。总体目标还包括开发软件工具包,以更好地为相关学科的从业人员提供服务。此外,这些项目将通过最先进的统计方法和成像/遗传/EHR数据为研究生和本科生提供第一手培训机会,包括女性和少数民族社区的学生。通过在他们可以理解的术语中传播该项目的结果,该项目可能会产生较远的效果,以增强有关统计的公共科学素养。在复杂和高维数据时代,现代统计学习方法的两个关键方面是准确的推论和规模。现代数据越来越复杂且维度很高,涉及大量变量和大型样本量,并且不同变量之间具有复杂的关系。开发实际有效的(在存储和分析方面)和理论上的“最佳”贝叶斯高维参数或非参数回归方法,以通过此类复杂数据集的有效不确定性来绘制准确的推断,这是一个极为重要的问题。为了为此问题提供一般解决方案,研究人员将使用少量随机线性转换基于数据压缩开发方法。该方法要么使用压缩减少与每个变量相对应的大量记录,在这种情况下,它维护特征解释以进行适当的推断,或者使用压缩来降低每个样品的协变量矢量的维度,在这种情况下,重点仅在响应的预测上。无论哪种情况,数据压缩设施都在具有足够丰富的参数和非参数回归模型的高维数据的情况下,在存在高维数据的情况下绘制存储有效,可扩展和准确的贝叶斯推断/预测。一个重要的目标是建立关于拟合模型与压缩数据的收敛行为的精确理论结果,这是预测变量的数量,样本大小,随机线性转换的属性以及这些模型的特征。这些方法将通过结合来自英国生物库数据库的大脑成像数据,遗传数据和电子健康记录(EHR)数据来研究神经系统疾病。该项目还将在更广泛的方面做出贡献,以推进跨学科的研究培训并扩大统计科学的参与。该奖项反映了NSF的法定任务,并通过使用基金会的知识分子和更广泛的影响评估标准来评估诚实的支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Distributed Bayesian Inference in Massive Spatial Data
  • DOI:
    10.1214/22-sts868
  • 发表时间:
    2023-01
  • 期刊:
  • 影响因子:
    5.7
  • 作者:
    Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
  • 通讯作者:
    Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rajarshi Guhaniyogi其他文献

Bayesian Conditional Density Filtering
贝叶斯条件密度过滤
Bayesian nonparametric areal wombling for small‐scale maps with an application to urinary bladder cancer data from Connecticut
小比例尺地图的贝叶斯非参数区域波动及其在康涅狄格州膀胱癌数据中的应用
  • DOI:
    10.1002/sim.7408
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Rajarshi Guhaniyogi
  • 通讯作者:
    Rajarshi Guhaniyogi
Approximated Bayesian Inference for Massive Streaming Data
海量流数据的近似贝叶斯推理
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rajarshi Guhaniyogi;R. Willett;D. Dunson
  • 通讯作者:
    D. Dunson
InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data
InVA:用于协调多模态神经影像数据的综合变分自动编码器
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bowen Lei;Rajarshi Guhaniyogi;Krishnendu Chandra;Aaron Scheffler;Bani Mallick
  • 通讯作者:
    Bani Mallick
Multivariate bias adjusted tapered predictive process models
多变量偏差调整锥形预测过程模型
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rajarshi Guhaniyogi
  • 通讯作者:
    Rajarshi Guhaniyogi

Rajarshi Guhaniyogi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rajarshi Guhaniyogi', 18)}}的其他基金

Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
  • 批准号:
    2220840
  • 财政年份:
    2021
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
  • 批准号:
    1854662
  • 财政年份:
    2019
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant

相似国自然基金

使用单分子磁镊研究DNA纽结
  • 批准号:
    12374216
  • 批准年份:
    2023
  • 资助金额:
    53 万元
  • 项目类别:
    面上项目
开放空间内部特征对公共生活行为的复合影响效应与使用者感知机理研究
  • 批准号:
    52308052
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
时空交互与社会化交互协同驱动的弱势道路使用者轨迹预测方法研究
  • 批准号:
    52302501
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
使用三维辐射磁流体力学数值模拟研究太阳活动区日冕加热问题
  • 批准号:
    12373054
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
基于污水流行病学的癌症化疗药物使用状况的研究
  • 批准号:
    42307534
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Collaborative Research: NCS-FR: DEJA-VU: Design of Joint 3D Solid-State Learning Machines for Various Cognitive Use-Cases
合作研究:NCS-FR:DEJA-VU:针对各种认知用例的联合 3D 固态学习机设计
  • 批准号:
    2319619
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Continuing Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
  • 批准号:
    2225023
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
  • 批准号:
    2225022
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: CAS-Climate: Linking Activities, Expenditures and Energy Use into an Integrated Systems Model to Understand and Predict Energy Futures
合作研究:CAS-气候:将活动、支出和能源使用连接到集成系统模型中,以了解和预测能源未来
  • 批准号:
    2243099
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
J-RISE: Relevant Implementation Strategies to Eliminate the social and structural barriers to HIV services among Justice-involved Black men who have sex with men
J-RISE:消除男男性行为黑人中艾滋病毒服务的社会和结构性障碍的相关实施策略
  • 批准号:
    10744578
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了