CAREER: Statistical Learning with Recursive Partitioning: Algorithms, Accuracy, and Applications

职业:递归分区的统计学习:算法、准确性和应用

基本信息

  • 批准号:
    2239448
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-06-01 至 2028-05-31
  • 项目状态:
    未结题

项目摘要

As data-driven technologies continue to be adopted and deployed in high-stakes decision-making environments, the need for fast, interpretable algorithms has never been more important. As one such candidate, it has become increasingly common to use decision trees, a hierarchically organized data structure, for building a predictive or causal model. This trend is spurred by the appealing connection between decision trees and rule-based decision-making, particularly in clinical, legal, or business contexts, as the tree structure mimics the sequential way a human user may think and reason, thereby facilitating human-machine interaction. To make them fast to compute, decision trees are popularly constructed with an algorithm called recursive partitioning, in which the decision nodes of the tree are learned from the data in a greedy, top-down manner. The overarching goal of this project is to develop a precise understanding of the strengths and limitations of decision trees based on recursive partitioning, and, in doing so, gain insights on how to improve their performance in practice. In addition to this impact, high-school, undergraduate, and graduate research assistants will be vertically integrated and benefit both academically and professionally. Innovative curricula, workshops, and data and methods competitions involving students, academics, and industry professionals will facilitate outreach and encourage participation from a broad audience. This proposal aims to provide a comprehensive study of the statistical properties of greedy recursive partitioning algorithms for training decision trees, as is demonstrated in two fundamental contexts. The first thrust of the project will develop a theoretical framework for the analysis of oblique decision trees, where, in contrast to conventional axis-aligned splits involving only a single covariate, the splits at each decision node occur at linear combinations of the covariates. While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, the PI will investigate how oblique regression trees, constructed by recursively minimizing squared error, can adapt to a rich class of regression models consisting of linear combinations of ridge functions. This provides a quantitative baseline for a statistician to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, that target similar model forms. Crucially, to address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, the PI’s framework can accommodate many existing computational tools in the literature. A major component of the research is derived from connections between recursive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms). The second thrust focuses on the delicate pointwise properties of axis-aligned recursive partitioning, with implications for heterogeneous causal effect estimation, where accurate pointwise estimates over the entire support of the covariates are essential for valid inference (e.g., testing hypotheses and constructing confidence intervals). Motivated by simple setting where decision trees provably fail to achieve optimal performance, the PI will investigate how the signal-to-noise ratio affects the quality of pointwise estimation. While the focus is on causal effect estimation directly using decision trees, the PI will also investigate implications for multi-step semi-parametric settings, where preliminary unknown functions (e.g., propensity scores) are estimated with machine learning tools, as well as conditional quantile regression, both of which require estimators with high pointwise accuracy.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着数据驱动的技术继续在高风险决策环境中采用并部署,对快速,可解释的算法的需求从未如此重要。作为一个这样的候选人,使用决策树(一种层次组织的数据结构)来构建预测性或因果模型已变得越来越普遍。决策树与基于规则的决策之间的有吸引力的联系,尤其是在临床,法律或业务环境中,刺激了这种趋势,因为树结构模仿了人类用户可以思考和理性的顺序方式,从而支持人机相互作用。为了使它们快速计算,决策树通常是用一种称为递归分区的算法构建的,其中从数据中以贪婪,自上而下的方式从数据中学到了树的决策节点。该项目的总体目标是基于递归分区的决策树的优势和局限性,并为此获得有关如何在实践中提高其绩效的见解。除了这种影响外,高中,本科和研究生研究助理还将垂直整合并准确和专业地受益。涉及学生,学者和行业专业人士的创新课程,研讨会以及数据和方法竞赛将促进宣传并鼓励广泛受众的参与。该建议旨在对培训决策树的贪婪递归分区算法的统计特性进行全面研究,这在两个基本情况下证明。该项目的第一个推力将开发一个理论框架,用于分析倾斜决策树,与仅涉及单个协变量的常规轴一致分裂相比,每个决策节点在协变量的线性组合处发生分裂。自从80年代中期以来,这种方法引起了计算机科学和优化社区的极大关注,但它们比轴心一致的对应物所提供的优势仍然是迫切合理的,而且成功的解释在很大程度上基于启发式方法。填补理论和实践之间的这一长期存在的差距,PI将研究如何通过递归最小化平方误差而构建的倾斜回归树,可以适应由脊函数的线性组合组成的丰富回归模型。这为统计学家提供了定量基线,以将目标树与其他不容易解释的方法进行比较和对比决策树,例如投影追踪回归和神经网络,这些方法针对相似的模型形式。至关重要的是,为了解决在每个决策节点上找到最佳分裂超平面的组合复杂性,PI的框架可以容纳文献中的许多现有计算工具。该研究的主要组成部分来自递归分区与凸优化问题的顺序贪婪近似算法之间的连接(例如,正交贪婪算法)。第二个推力着重于轴对准的递归分区的微妙特性,对异质性灾难效应估计的影响,其中对协变量的整个支持的准确估计值对有效推理至关重要(例如,测试假设和构造置信区间)。在决策树正确无法实现最佳性能的情况下,PI将调查信噪比如何影响点估计的质量。虽然重点是使用决策树直接进行因果效应估计,但PI还将调查对多步骤半参数设置的影响,其中初步的未知功能(例如,承诺得分)是用机器学习工具估算的,它是用机器学习工具进行估算的,以及有条件的分数回归,两者都需要使用高点的估计来进行评估。优点和更广泛的影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jason Klusowski其他文献

Jason Klusowski的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jason Klusowski', 18)}}的其他基金

Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
  • 批准号:
    2054808
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
  • 批准号:
    1915932
  • 财政年份:
    2019
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant

相似国自然基金

深度统计学习:理论基础与模型设计
  • 批准号:
    62376028
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
结构化模型的分布式学习:复杂度、隐私与统计推断
  • 批准号:
    12371291
  • 批准年份:
    2023
  • 资助金额:
    44.00 万元
  • 项目类别:
    面上项目
完全统计学习原则下的零经验风险记忆学习研究
  • 批准号:
    62366035
  • 批准年份:
    2023
  • 资助金额:
    31 万元
  • 项目类别:
    地区科学基金项目
面向高维和流式数据的高效机器学习统计与优化研究
  • 批准号:
    72371172
  • 批准年份:
    2023
  • 资助金额:
    41.00 万元
  • 项目类别:
    面上项目
基于主动统计迁移学习的电动汽车传动系统关键部件智能故障诊断研究
  • 批准号:
    52305109
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: New Frameworks for Ethical Statistical Learning: Algorithmic Fairness and Privacy
职业:道德统计学习的新框架:算法公平性和隐私
  • 批准号:
    2340241
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Identifying and Addressing the Effects of Social Media Use on Young Adults' E-Cigarette Use: A Solutions-Oriented Approach
识别和解决社交媒体使用对年轻人电子烟使用的影响:面向解决方案的方法
  • 批准号:
    10525098
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
Characterizing the genetic etiology of delayed puberty with integrative genomic techniques
利用综合基因组技术表征青春期延迟的遗传病因
  • 批准号:
    10663605
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
Toward measures and behavioral trials for effective online AUD recovery support
采取措施和行为试验以提供有效的在线澳元复苏支持
  • 批准号:
    10643056
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
PUFA metabolism for prevention and treatment of TMD pain: an interdisciplinary, translational approach.
PUFA 代谢预防和治疗 TMD 疼痛:一种跨学科的转化方法。
  • 批准号:
    10820840
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了