CAREER: Statistical Learning from a Modern Perspective: Over-parameterization, Regularization, and Generalization
职业:现代视角下的统计学习:过度参数化、正则化和泛化
基本信息
- 批准号:2143215
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-01 至 2027-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Statistical methods have been a major driving force towards interpretable, actionable, and trustworthy machine learning. However, the existing statistical theory remains highly inadequate in explaining many new phenomena that emerge, and become pervasive in modern machine learning applications. For instance, the prevalence of over-parameterized models (i.e., the ones that have more model parameters than samples) challenges our classical statistical insights about the bias-variance tradeoff; the fact that many learning algorithms exhibit favorable algorithmic regularization to alleviate overfitting is largely beyond the reach of previous statistical literature, and the unconventional shapes of the risk curves in modern applications puzzle many statisticians. Compared to the rich theory developed for classical settings, however, the statistical underpinnings for these curious yet mysterious phenomena remain far from sufficient. Motivated by this, the overarching goal of the project is to enrich the statistical foundation of machine learning by adapting it to contemporary settings, thereby bridging classical statistics and cutting-edge machine learning. In addition, the project will provide valuable opportunities for training students (particularly underrepresented groups) at all levels across multiple disciplines in the STEM field, and will exert scientific and societal impacts on several domains beyond the tasks described herein, including but not limited to neuroscience, online education, and equitable machine learning.Striving for interpretability and actionable insights, this project plans to revisit multiple classical statistical problems---ranging from minimum-norm interpolation, risk estimation, cross validation, kernel boosting, data-imbalanced classification, to transfer learning---with an emphasis on unveiling new insights for modern yet under-explored regimes. Several recurring themes include: (i) characterizing precise risk behavior in the face of large model complexity; (ii) reconciling the seemingly conflicting goals of over-parameterization and regularization; (iii) developing algorithm-specific statistical reasoning tools; and (iv) exploring the interplay between regularization and generalization. The project comprises three distinct yet related thrusts: (1) statistical insights for over-parameterization: which explores the prolific interplay between model complexity and out-of-sample performance; (2) algorithmic regularization via early stopping: which aims to develop statistical principles that underlie early stopping; (3) risk (non)-monotonicity with imbalanced data: which is motivated by the non-monotonicity of generalization errors in the sample size and pursues principled debiasing methods to rectify it. The project will develop a suite of statistical insights that can inform cutting-edge machine learning practice, as well as an array of statistical methodologies that will be practically appealing for modern data-driven applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
统计方法一直是可解释、可操作和值得信赖的机器学习的主要推动力。然而,现有的统计理论仍然不足以解释现代机器学习应用中出现并普遍存在的许多新现象。例如,过度参数化模型(即模型参数多于样本的模型)的盛行挑战了我们关于偏差-方差权衡的经典统计见解;事实上,许多学习算法表现出有利的算法正则化来减轻过度拟合,这在很大程度上超出了以前的统计文献的范围,并且现代应用中风险曲线的非常规形状让许多统计学家感到困惑。然而,与针对经典环境开发的丰富理论相比,这些奇怪而神秘的现象的统计基础仍然远远不够。受此启发,该项目的总体目标是通过适应当代环境来丰富机器学习的统计基础,从而架起经典统计学和前沿机器学习的桥梁。此外,该项目将为培训 STEM 领域多个学科各个级别的学生(特别是代表性不足的群体)提供宝贵的机会,并将对本文所述任务之外的多个领域产生科学和社会影响,包括但不限于神经科学、在线教育和公平的机器学习。为了寻求可解释性和可操作的见解,该项目计划重新审视多个经典统计问题——包括最小范数插值、风险估计、交叉验证、核提升、数据不平衡分类,到迁移学习——重点是为现代但尚未充分探索的制度揭示新的见解。几个反复出现的主题包括:(i)在面对复杂的模型时精确描述风险行为; (ii) 调和过度参数化和正则化看似相互冲突的目标; (iii) 开发特定算法的统计推理工具; (iv) 探索正则化和泛化之间的相互作用。该项目包括三个不同但相关的主旨:(1)过度参数化的统计见解:探索模型复杂性和样本外性能之间的丰富相互作用; (2) 通过提前停止进行算法正则化:旨在开发支持提前停止的统计原理; (3)不平衡数据的风险(非)单调性:其动机是样本量中泛化误差的非单调性,并追求有原则的去偏方法来纠正它。该项目将开发一套可为尖端机器学习实践提供信息的统计见解,以及一系列对现代数据驱动应用程序具有实际吸引力的统计方法。该奖项反映了 NSF 的法定使命,并被认为是值得的通过使用基金会的智力优势和更广泛的影响审查标准进行评估来获得支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Lasso with general Gaussian designs with applications to hypothesis testing
- DOI:10.1214/23-aos2327
- 发表时间:2020-07
- 期刊:
- 影响因子:0
- 作者:Michael Celentano;A. Montanari;Yuting Wei
- 通讯作者:Michael Celentano;A. Montanari;Yuting Wei
Softmax policy gradient methods can take exponential time to converge
- DOI:10.1007/s10107-022-01920-6
- 发表时间:2021-02
- 期刊:
- 影响因子:2.7
- 作者:Gen Li;Yuting Wei;Yuejie Chi;Yuantao Gu;Yuxin Chen
- 通讯作者:Gen Li;Yuting Wei;Yuejie Chi;Yuantao Gu;Yuxin Chen
Derandomizing Knockoffs
- DOI:10.1080/01621459.2021.1962720
- 发表时间:2020-12
- 期刊:
- 影响因子:3.7
- 作者:Zhimei Ren;Yuting Wei;E. Candès
- 通讯作者:Zhimei Ren;Yuting Wei;E. Candès
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yuting Wei其他文献
Nomogram model on estimating the risk of pressure injuries for hospitalized patients in the intensive care unit.
评估重症监护病房住院患者压力性损伤风险的列线图模型。
- DOI:
10.1016/j.iccn.2023.103566 - 发表时间:
2023 - 期刊:
- 影响因子:5.3
- 作者:
Lin Han;Yuting Wei;Juhong Pei;Hongyan Zhang;Lin Lv;Hongxia Tao;Qiuxia Yang;Qian Su;Yuxia Ma - 通讯作者:
Yuxia Ma
A flexible PEO-based polymer electrolyte with cross-linked network for high-voltage all solid-state lithium-ion battery
一种用于高压全固态锂离子电池的柔性交联网络PEO基聚合物电解质
- DOI:
10.1016/j.jmst.2023.10.005 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Nian Wang;Yuting Wei;Shuang Yu;Wenchao Zhang;Xiaoyu Huang;Binbin Fan;Hua Yuan;Yeqiang Tan - 通讯作者:
Yeqiang Tan
Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a Generative Model
具有生成模型的零和马尔可夫博弈中的极小最大最优多智能体强化学习
- DOI:
10.48550/arxiv.2208.10458 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Gen Li;Yuejie Chi;Yuting Wei;Yuxin Chen - 通讯作者:
Yuxin Chen
Stabilized finite element methods for miscible displacement in porous media
- DOI:
10.1051/m2an/1994280506111 - 发表时间:
1994 - 期刊:
- 影响因子:0
- 作者:
Yuting Wei - 通讯作者:
Yuting Wei
Wheel Loader Duty Cycle Test and Numerical Expression Research
轮式装载机工作循环试验及数值表达式研究
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Naiwei Zou;Yuting Wei;Dashuai Zhou;Yongfeng Miao - 通讯作者:
Yongfeng Miao
Yuting Wei的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yuting Wei', 18)}}的其他基金
Collaborative Research: Fine-Grained Statistical Inference in High Dimension: Actionable Information, Bias Reduction, and Optimality
协作研究:高维细粒度统计推断:可操作信息、减少偏差和最优性
- 批准号:
2147546 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Collaborative Research: Fine-Grained Statistical Inference in High Dimension: Actionable Information, Bias Reduction, and Optimality
协作研究:高维细粒度统计推断:可操作信息、减少偏差和最优性
- 批准号:
2015447 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似国自然基金
基于主动统计迁移学习的电动汽车传动系统关键部件智能故障诊断研究
- 批准号:52305109
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
深度统计学习:理论基础与模型设计
- 批准号:62376028
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
完全统计学习原则下的零经验风险记忆学习研究
- 批准号:62366035
- 批准年份:2023
- 资助金额:31 万元
- 项目类别:地区科学基金项目
面向医疗健康数据的隐私保护统计分析和机器学习方法研究
- 批准号:62372425
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
可解释语义耦合的演化统计学习方法
- 批准号:
- 批准年份:2022
- 资助金额:53 万元
- 项目类别:面上项目
相似海外基金
CAREER: New Frameworks for Ethical Statistical Learning: Algorithmic Fairness and Privacy
职业:道德统计学习的新框架:算法公平性和隐私
- 批准号:
2340241 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Identifying and Addressing the Effects of Social Media Use on Young Adults' E-Cigarette Use: A Solutions-Oriented Approach
识别和解决社交媒体使用对年轻人电子烟使用的影响:面向解决方案的方法
- 批准号:
10525098 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Assessing Native Hawaiian and Pacific Islander Maternal Outcomes and Health Care Experiences
评估夏威夷原住民和太平洋岛民的产妇结局和医疗保健体验
- 批准号:
10644888 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Integrative Data Science Approach to Advance Care Coordination of ADRD by Primary Care Providers
综合数据科学方法促进初级保健提供者对 ADRD 的护理协调
- 批准号:
10722568 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Digital monitoring of autonomic activity to detect empathy loss in behavioral variant frontotemporal dementia
对自主活动进行数字监测以检测行为变异型额颞叶痴呆的同理心丧失
- 批准号:
10722938 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别: