CAREER: Statistical Inference in High Dimensions using Variational Approximations
职业:使用变分近似进行高维统计推断
基本信息
- 批准号:2239234
- 负责人:
- 金额:$ 43.41万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-01 至 2028-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Modern data applications routinely involve massive datasets comprising a multitude of observations and features. To facilitate statistical learning in real time, there is an urgent need for principled and computationally efficient statistical methodology. Variational Inference methods have recently emerged as a popular choice in this context. The term "Variational Inference" refers to a general out-of-the-box strategy to develop statistical algorithms for a wide class of problems. For example, these algorithms are used as a sub-routine in text mining, generation of hyper-realistic artificial text and images, machine translation, etc. This approach is extremely attractive due to the computational efficiency of the proposed methods, and their superior practical performance. Despite these advantages, rigorous guarantees for these variational methods are still in a nascent state. This project will develop statistical guarantees for the validity of this approach in diverse settings. Subsequently, these new insights will be exploited to develop novel statistical methodology for modern data applications. The outcome of the proposed research will allow practitioners to deploy Variational Inference methods with confidence. In addition, the outcomes will add a new set of principled, computationally efficient methods to the statistician's toolkit. The PI will interweave his research and teaching throughout the research period and beyond. In particular, the PI will develop new undergraduate/graduate courses focusing on Variational Inference and mentor students (particularly those from under-represented backgrounds) with the aim of introducing them to opportunities in statistics and data science. The proposed research and educational activities will broaden participation in STEM generally, and encourage careers in statistics and data science.This project will study statistical inference based on variational approximations focusing on three concrete thrusts: (i) Statistical inference based on the Naive Mean Field (NMF) approximation for regression models, (ii) NMF approximation beyond regression and (iii) Advanced Mean Field approximations. Under theme (i), the PI will develop empirical Bayes methodology for the high-dimensional linear model, and compare Bayesian variable selection algorithms using the NMF approximation. Theme (ii) will focus on the NMF approximation for Hidden Markov Random Fields and Bayesian Neural Networks. Finally, theme (iii) will focus on certain alternative mean-field approximations. Physicists conjecture that if the number of datapoints and features are both large and comparable, the NMF approximation is no longer accurate; instead, the Thouless-Anderson-Palmer (TAP) approximation, an advanced mean-field approximation, should facilitate Bayes optimal inference. The proposed research will establish this conjecture in the context of high-dimensional linear regression under a proportional asymptotic regime. The theoretical foundations of the proposed methodology will rest on disparate ideas originating in non-linear large deviations (studied in probability and combinatorics), spin glasses (studied in probability and statistical physics) and graphical models. In turn, these ideas will be combined with classical statistical ideas (e.g. nonparametric maximum likelihood) to develop computationally efficient methods for high-dimensional inference. This cross-pollination of ideas will generate independent follow up research directions in each domain.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代数据应用程序通常涉及包含大量观察结果和特征的海量数据集。为了促进实时统计学习,迫切需要有原则且计算高效的统计方法。变分推理方法最近成为这种背景下的流行选择。术语“变分推理”是指为各种问题开发统计算法的通用开箱即用策略。例如,这些算法被用作文本挖掘、超现实人工文本和图像的生成、机器翻译等中的子程序。由于所提出的方法的计算效率及其优越的实用性,这种方法极具吸引力。表现。尽管有这些优点,这些变分方法的严格保证仍处于新生状态。该项目将为该方法在不同环境下的有效性提供统计保证。随后,这些新见解将被用来为现代数据应用开发新颖的统计方法。拟议研究的结果将使从业者能够充满信心地部署变分推理方法。此外,研究结果将为统计学家的工具包添加一套新的有原则的、计算高效的方法。 PI 将在整个研究期间及之后将他的研究和教学交织在一起。特别是,PI 将开发新的本科生/研究生课程,重点关注变分推理,并指导学生(特别是来自代表性不足背景的学生),旨在向他们介绍统计和数据科学的机会。拟议的研究和教育活动将扩大对 STEM 的普遍参与,并鼓励统计和数据科学领域的职业发展。该项目将研究基于变分近似的统计推断,重点关注三个具体目标:(i)基于朴素平均场的统计推断( NMF) 回归模型近似,(ii) 超越回归的 NMF 近似,以及 (iii) 高级平均场近似。在主题(i)下,PI将为高维线性模型开发经验贝叶斯方法,并使用NMF近似来比较贝叶斯变量选择算法。主题 (ii) 将重点关注隐马尔可夫随机场和贝叶斯神经网络的 NMF 近似。最后,主题 (iii) 将重点关注某些替代平均场近似。物理学家推测,如果数据点和特征的数量都很大且具有可比性,则 NMF 近似不再准确;相反,Thouless-Anderson-Palmer (TAP) 近似(一种高级平均场近似)应该有助于贝叶斯最优推理。拟议的研究将在比例渐近机制下的高维线性回归背景下建立这一猜想。所提出方法的理论基础将依赖于源自非线性大偏差(概率和组合学研究)、自旋玻璃(概率和统计物理学研究)和图形模型的不同思想。反过来,这些思想将与经典的统计思想(例如非参数最大似然)相结合,以开发用于高维推理的计算有效的方法。这种思想的交叉授粉将在每个领域产生独立的后续研究方向。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Subhabrata Sen其他文献
Tuning Potency of Bioactive Molecules via Polymorphic Modifications: A Case Study.
通过多态性修饰调节生物活性分子的效力:案例研究。
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:4.9
- 作者:
Anil Kumar;Jyoti Chauhan;K. Dubey;Subhabrata Sen;P. Munshi - 通讯作者:
P. Munshi
Synthesis of privileged scaffolds by using diversity-oriented synthesis.
使用面向多样性的合成来合成特权支架。
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Ramu Surakanti;Sumalatha Sanivarapu;Chiranjeevi Thulluri;P. Iyer;Raghuram S. Tangirala;R. Gundla;Uma Addepally;Y. Murthy;Lakshmi Velide;Subhabrata Sen - 通讯作者:
Subhabrata Sen
Synthesis of tetrahydro-1H-indolo[2,3-b]pyrrolo[3,2-c]quinolones via intramolecular oxidative ring rearrangement of tetrahydro-β-carbolines and their biological evaluation
四氢-β-咔啉分子内氧化环重排合成四氢-1H-吲哚并[2,3-b]吡咯并[3,2-c]喹诺酮类药物及其生物学评价
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
C. Bathula;C. Roma;J. Chauhan;A. R. Fernandes;Subhabrata Sen - 通讯作者:
Subhabrata Sen
Random linear estimation with rotationally-invariant designs: Asymptotics at high temperature
具有旋转不变设计的随机线性估计:高温下的渐近
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Yufan Li;Z. Fan;Subhabrata Sen;Yihong Wu - 通讯作者:
Yihong Wu
Network Measures for Chemical Library Design
化学图书馆设计的网络措施
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
N. Sukumar;Michael P. Krein;G. Prabhu;S. Bhattacharya;Subhabrata Sen - 通讯作者:
Subhabrata Sen
Subhabrata Sen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
极大孔径多天线系统基于结构化统计推理的非平稳信道估计和干扰抑制技术
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向智能推理的逻辑增强型分布式知识表示研究
- 批准号:61876223
- 批准年份:2018
- 资助金额:65.0 万元
- 项目类别:面上项目
面向大规模演化异质信息网络的未知关系学习与推理研究
- 批准号:61876183
- 批准年份:2018
- 资助金额:62.0 万元
- 项目类别:面上项目
基于统计的类型推理方法研究
- 批准号:61872272
- 批准年份:2018
- 资助金额:63.0 万元
- 项目类别:面上项目
基于概率统计模型的多层特征学习与推理技术研究
- 批准号:61771361
- 批准年份:2017
- 资助金额:62.0 万元
- 项目类别:面上项目
相似海外基金
CAREER: Statistical foundations of particle tracking and trajectory inference
职业:粒子跟踪和轨迹推断的统计基础
- 批准号:
2339829 - 财政年份:2024
- 资助金额:
$ 43.41万 - 项目类别:
Continuing Grant
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
- 批准号:
2338760 - 财政年份:2024
- 资助金额:
$ 43.41万 - 项目类别:
Continuing Grant
CAREER: Distribution-Free and Adaptive Statistical Inference
职业:无分布和自适应统计推断
- 批准号:
2338464 - 财政年份:2024
- 资助金额:
$ 43.41万 - 项目类别:
Continuing Grant
CAREER: Towards Tight Guarantees of Markov Chain Sampling Algorithms in High Dimensional Statistical Inference
职业:高维统计推断中马尔可夫链采样算法的严格保证
- 批准号:
2237322 - 财政年份:2023
- 资助金额:
$ 43.41万 - 项目类别:
Continuing Grant
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 43.41万 - 项目类别:
Continuing Grant