CAREER: Distribution-Free and Adaptive Statistical Inference

职业:无分布和自适应统计推断

基本信息

  • 批准号:
    2338464
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-01-15 至 2028-12-31
  • 项目状态:
    未结题

项目摘要

Recent years have witnessed a growing trend across scientific disciplines to embrace complex modeling and black-box machine learning algorithms. Despite the remarkable success of handling complex data structures and fitting sophisticated regression functions, there remains a substantial gap regarding the integration of rigorous statistical principles into these pipelines. The main difficulty revolves around achieving reliable uncertainty quantification and robust statistical inference without artificially simplifying the complexity inherent in these advanced tools. Most existing frameworks that aim to bridge the gap rely on strong assumptions under which the machine learning algorithm can accurately estimate the data generating distribution. Nevertheless, these assumptions are often hard to justify, especially for modern machine learning algorithms that have yet to be fully understood. This research project aims to develop new frameworks for statistical inference that wrap around any machine learning algorithms or complex models without concerning about failure modes. The resulting methods are able to address the potential threats to inferential validity caused by black-box machine learning algorithms in a wide range of applied fields, including medicine, healthcare, economics, political science, epidemiology, and climate sciences. Open source software will also be developed to help applied researchers integrate rigorous statistical inference into their domain-specific modeling workflows without compromising the effectiveness of modern tools in non-inferential tasks. This may further alleviate hesitation in adopting modern machine learning methods and catalyze collaboration between scientific and engineering fields. Throughout the project, the PI will mentor undergraduate and graduate students, equipping them with solid understandings of statistical principles to become future leaders in face of rapidly evolving machine learning techniques.This proposal will focus on distribution-free inference, which is immune to misspecification of parametric models, violation of nonparametric assumptions like smoothness or shape constraints, inaccuracy of asymptotic approximations due to limited sample size, high dimensionality, boundary cases, or irregularity. To avoid making uninformative decisions, an ideal distribution-free inference framework should also be adaptive to good modeling. This means that it should be as efficient as other frameworks that rely on distributional assumptions. Adaptivity alleviates the tradeoff between robustness and efficiency. The PI will develop distribution-free and adaptive inference frameworks for three specific problems. First, in causal inference, tighter identified set can be obtained for partially identified causal effects by incorporating pre-treatment covariates. However, existing frameworks for sharp inference require estimating conditional distributions of potential outcomes given covariates. The PI will develop a generic framework based on duality theory that is able to wrap around any estimates of conditional distributions and make distribution-free and adaptive inference. Second, many target parameters in medicine, political economy, and causal inference can be formulated through extremums of the conditional expectation of an outcome given covariates. In contrast to classical methods that impose distributional assumptions to enable consistent estimation of the conditional expectation, the PI will develop a distribution-free framework for testing statistical null hypotheses and constructing valid confidence intervals on the extremums directly. Finally, the use of complex models and prediction algorithms in time series nowcasting and forecasting presents challenges for reliable uncertainty quantification. To address this, the PI will develop a framework based on model predictive control and conformal prediction that is able to wrap around any forecasting algorithms and calibrate it to achieve long-term coverage, without any assumptions on the distribution of the time series. The ultimate goal of this research is to bring insights and present a suite of tools to empower statistical reasoning with machine learning and augment machine learning with statistical reasoning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近年来,跨科学学科越来越多地采用复杂建模和黑盒机器学习算法。尽管在处理复杂的数据结构和拟合复杂的回归函数方面取得了显着的成功,但在将严格的统计原理集成到这些管道中仍然存在很大的差距。主要困难在于在不人为简化这些先进工具固有的复杂性的情况下实现可靠​​的不确定性量化和稳健的统计推断。大多数旨在弥合差距的现有框架都依赖于强有力的假设,在这些假设下,机器学习算法可以准确地估计数据生成分布。然而,这些假设通常很难证明其合理性,特别是对于尚未完全理解的现代机器学习算法。该研究项目旨在开发新的统计推断框架,该框架涵盖任何机器学习算法或复杂模型,而无需考虑故障模式。由此产生的方法能够解决黑盒机器学习算法在医学、医疗保健、经济学、政治学、流行病学和气候科学等广泛应用领域对推理有效性造成的潜在威胁。还将开发开源软件,以帮助应用研究人员将严格的统计推理集成到其特定领域的建模工作流程中,而不会影响现代工具在非推理任务中的有效性。这可能会进一步缓解采用现代机器学习方法的犹豫,并促进科学和工程领域之间的合作。在整个项目中,PI 将指导本科生和研究生,使他们对统计原理有扎实的理解,成为面对快速发展的机器学习技术的未来领导者。该提案将重点关注无分布推理,这种推理不会受到错误指定的影响。参数模型、违反平滑度或形状约束等非参数假设、由于样本大小有限、高维、边界情况或不规则性导致渐近近似不准确。为了避免做出无信息决策,理想的无分布推理框架还应该适应良好的建模。这意味着它应该与依赖分布假设的其他框架一样高效。适应性减轻了鲁棒性和效率之间的权衡。 PI 将为三个具体问题开发无分布和自适应推理框架。首先,在因果推断中,通过合并预处理协变量,可以获得部分识别的因果效应的更严格的识别集。然而,现有的敏锐推理框架需要估计给定协变量的潜在结果的条件分布。 PI 将开发一个基于对偶理论的通用框架,该框架能够包装条件分布的任何估计,并进行无分布和自适应推理。其次,医学、政治经济学和因果推理中的许多目标参数可以通过给定协变量的结果的条件期望的极值来制定。与强加分布假设以实现条件期望的一致估计的经典方法相比,PI 将开发一个无分布框架来测试统计零假设并直接在极值上构建有效的置信区间。最后,在时间序列临近预报和预测中使用复杂模型和预测算法对可靠的不确定性量化提出了挑战。为了解决这个问题,PI 将开发一个基于模型预测控制和保形预测的框架,该框架能够封装任何预测算法并对其进行校准以实现长期覆盖,而无需对时间序列的分布进行任何假设。这项研究的最终目标是带来见解并提出一套工具,通过机器学习增强统计推理能力,并通过统计推理增强机器学习。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力评估进行评估,被认为值得支持。优点和更广泛的影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lihua Lei其他文献

Total Variation Floodgate for Variable Importance Inference in Classification
分类中变量重要性推理的总变异闸门
  • DOI:
  • 发表时间:
    2023-09-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wenshuo Wang;Lucas Janson;Lihua Lei;Aaditya Ramdas
  • 通讯作者:
    Aaditya Ramdas
Fabrication and measurement of traceable pitch standard with a big area at trans-scale
跨尺度大面积可溯源螺距标准的制作与测量
  • DOI:
    10.1088/1674-1056/23/9/090601
  • 发表时间:
    2014-07-23
  • 期刊:
  • 影响因子:
    1.7
  • 作者:
    Xiao Deng;Tongbao Li;Lihua Lei;Yan Ma;Rui Ma;Weng Junjing;Yuan Li
  • 通讯作者:
    Yuan Li
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
先学习再测试:校准预测算法以实现风险控制
  • DOI:
  • 发表时间:
    2021-10-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anastasios Nikolas Angelopoulos;Stephen Bates;E. C;ès;ès;Michael I. Jordan;Lihua Lei
  • 通讯作者:
    Lihua Lei
Inference for Synthetic Controls via Refined Placebo Tests
通过完善的安慰剂测试推断综合对照
  • DOI:
    10.3982/ecta20720
  • 发表时间:
    2024-01-13
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lihua Lei;Timothy Sudijono
  • 通讯作者:
    Timothy Sudijono
函数系数滑动平均模型及其在预测中国CPI中的应用
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    1.4
  • 作者:
    Song Xi Chen;Lihua Lei;Yundong Tu
  • 通讯作者:
    Yundong Tu

Lihua Lei的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

天山北坡林线生态过渡带内典型植物的碳分配策略研究
  • 批准号:
    32360367
  • 批准年份:
    2023
  • 资助金额:
    34 万元
  • 项目类别:
    地区科学基金项目
具有图和超图限制结构合作博弈的分配规则及其在社会网络中的应用
  • 批准号:
    72371151
  • 批准年份:
    2023
  • 资助金额:
    39 万元
  • 项目类别:
    面上项目
雌雄同株植物调整性分配和种子大小适应花粉限制的对策
  • 批准号:
    32371560
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
公平性考量下的资源汇集与分配问题研究
  • 批准号:
    72301240
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
复杂时序约束下多水面船多AUV协同多点访问任务分配方法研究
  • 批准号:
    62373255
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

CAREER: Geometric and Combinatorial Methods for Distribution-Free Inference and Dependent Network Data
职业:无分布推理和相关网络数据的几何和组合方法
  • 批准号:
    2046393
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Oxysterols in SLOS Neurodevelopment: Pathological Role and Therapy
氧甾醇在 SLOS 神经发育中的作用:病理作用和治疗
  • 批准号:
    10206211
  • 财政年份:
    2017
  • 资助金额:
    $ 40万
  • 项目类别:
Oxysterols in SLOS Neurodevelopment: Pathological Role and Therapy
氧甾醇在 SLOS 神经发育中的作用:病理作用和治疗
  • 批准号:
    9363788
  • 财政年份:
    2017
  • 资助金额:
    $ 40万
  • 项目类别:
INVESTIGATION OF THE VASCULAR RESPONSE IN LYMPH NODE METASTASES
淋巴结转移血管反应的研究
  • 批准号:
    8783581
  • 财政年份:
    2015
  • 资助金额:
    $ 40万
  • 项目类别:
Quantitative MRSI for the prediction of response to chemoradiation therapy in GBM
定量 MRSI 用于预测 GBM 放化疗反应
  • 批准号:
    8591009
  • 财政年份:
    2014
  • 资助金额:
    $ 40万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了