Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
基本信息
- 批准号:2054808
- 负责人:
- 金额:$ 15.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-09-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project aims to investigate two of the most widely used and state-of-the-art methods for high-dimensional regression: deep neural networks and random forests. Despite their widespread implementation, pinning down their theoretical properties has eluded researchers until recently. The proposed research aims to add to the growing body of literature on their analysis, by both developing tools of theoretical value and providing guarantees and guidance for practitioners and applied scientists who use these popular methods frequently in their work.The success of multi-layer networks has largely been buoyed by their ability to generalize well despite being able to fit most datasets, given enough parameters. This phenomenon is particularly striking when the input dimension is far greater than the available sample size, as is the case with many modern applications in molecular biology, medical imaging, and astrophysics, to name a few. A major component of the proposed work will be to obtain complexity bounds for classes of deep neural networks with controls on the size of their weights, which can then be used to bound generalization error and statistical risk. These complexity bounds reveal the role of complexity penalization, which is based on certain norms of the weights of the network. Motivated by these observations, another stream of the proposed research seeks to provide statistical guarantees of certain complexity penalized estimators and their adaptive properties. Current theoretical results for random forests are either for stylized versions of those that are used in practice or are asymptotic in nature and it is therefore difficult to determine the quality of convergence as a function of the parameters of the random forest. Furthermore, the setting for the analysis of more practical implementations of random forests is limited to structured, fixed-dimensional regression function classes. Given these restrictions, the first component of the proposal aims to investigate how random forests behave in the high-dimensional regime when the number of predictors grows with the sample size. Another research objective is to isolate and study families of flexible high-dimensional regression functions for which finite sample convergence rates can be established. The final endeavor of this project is to connect popular measures of variable importance to the bias of random forests. Since variable importance measures are used for assessing the role each predictor variable plays in influencing the output, this connection will partially explain why random forests are adaptive to sparsity. The relationship will also help to theoretically motivate variable importance measures as useful tools for model interpretability.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目旨在研究两种最广泛使用和最先进的高维回归方法:深度神经网络和随机森林。尽管它们得到了广泛的应用,但研究人员直到最近才确定它们的理论特性。拟议的研究旨在通过开发具有理论价值的工具,并为在工作中经常使用这些流行方法的从业者和应用科学家提供保证和指导,来丰富其分析的文献。多层网络的成功尽管在给定足够的参数的情况下能够适应大多数数据集,但很大程度上得益于它们良好的泛化能力。当输入维度远大于可用样本大小时,这种现象尤其引人注目,例如分子生物学、医学成像和天体物理学等许多现代应用的情况。所提出的工作的一个主要组成部分是获得深度神经网络类别的复杂性界限,并控制其权重的大小,然后可以将其用于限制泛化误差和统计风险。这些复杂性界限揭示了复杂性惩罚的作用,复杂性惩罚基于网络权重的某些规范。受这些观察的推动,拟议研究的另一部分旨在为某些复杂性惩罚估计量及其自适应特性提供统计保证。当前随机森林的理论结果要么是实践中使用的程式化版本,要么本质上是渐近的,因此很难确定收敛质量作为随机森林参数的函数。此外,随机森林的更实际实现的分析设置仅限于结构化、固定维度的回归函数类。考虑到这些限制,该提案的第一个组成部分旨在研究当预测变量的数量随着样本大小的增加而增长时,随机森林在高维状态下的表现如何。另一个研究目标是分离和研究灵活的高维回归函数族,可以为其建立有限样本收敛率。该项目的最终目标是将不同重要性的流行度量与随机森林的偏差联系起来。由于变量重要性度量用于评估每个预测变量在影响输出中所起的作用,因此这种联系将部分解释为什么随机森林能够适应稀疏性。这种关系还将有助于从理论上激发变量重要性测量作为模型可解释性的有用工具。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing
- DOI:10.1109/tit.2020.3025272
- 发表时间:2021-01-01
- 期刊:
- 影响因子:2.5
- 作者:Bu, Zhiqi;Klusowski, Jason M.;Su, Weijie J.
- 通讯作者:Su, Weijie J.
Sharp Analysis of a Simple Model for Random Forests
- DOI:
- 发表时间:2018-05
- 期刊:
- 影响因子:0
- 作者:Jason M. Klusowski
- 通讯作者:Jason M. Klusowski
Nonparametric Variable Screening with Optimal Decision Stumps
具有最佳决策树桩的非参数变量筛选
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Klusowski, Jason M;Tian, Peter
- 通讯作者:Tian, Peter
Sparse Learning with CART
使用 CART 进行稀疏学习
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Klusowski, Jason M
- 通讯作者:Klusowski, Jason M
Large Scale Prediction with Decision Trees
使用决策树进行大规模预测
- DOI:10.1080/01621459.2022.2126782
- 发表时间:2023
- 期刊:
- 影响因子:3.7
- 作者:Klusowski, Jason M.;Tian, Peter M.
- 通讯作者:Tian, Peter M.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jason Klusowski其他文献
Jason Klusowski的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jason Klusowski', 18)}}的其他基金
CAREER: Statistical Learning with Recursive Partitioning: Algorithms, Accuracy, and Applications
职业:递归分区的统计学习:算法、准确性和应用
- 批准号:
2239448 - 财政年份:2023
- 资助金额:
$ 15.8万 - 项目类别:
Continuing Grant
Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
- 批准号:
1915932 - 财政年份:2019
- 资助金额:
$ 15.8万 - 项目类别:
Continuing Grant
相似国自然基金
随机矩阵理论与深度学习的智能配电网故障感知方法研究
- 批准号:62302034
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向现代深度学习模型的随机二阶优化算法及理论研究
- 批准号:12301398
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于随机化的高效可扩展深度学习算法研究
- 批准号:62376131
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
随机一致性对监督学习的影响机制与应对策略研究
- 批准号:62306170
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于数据挖掘和强化学习的动态随机车辆路径优化问题研究
- 批准号:72301109
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
EAGER: IMPRESS-U: Random Matrix Theory and its Applications to Deep Learning
EAGER:IMPRESS-U:随机矩阵理论及其在深度学习中的应用
- 批准号:
2401227 - 财政年份:2024
- 资助金额:
$ 15.8万 - 项目类别:
Standard Grant
DeepMARA - Deep Reinforcement Learning based Massive Random Access Toward Massive Machine-to-Machine Communications
DeepMARA - 基于深度强化学习的大规模随机访问实现大规模机器对机器通信
- 批准号:
EP/Y028252/1 - 财政年份:2024
- 资助金额:
$ 15.8万 - 项目类别:
Fellowship
Project 1: Deciphering the Dynamic Evolution of the Tumor-Neural Interface
项目1:破译肿瘤-神经界面的动态演化
- 批准号:
10729275 - 财政年份:2023
- 资助金额:
$ 15.8万 - 项目类别:
Distortion Correction in Functional MRI with Deep Learning
利用深度学习进行功能 MRI 畸变校正
- 批准号:
10647991 - 财政年份:2023
- 资助金额:
$ 15.8万 - 项目类别: