Leveraging Structural Information in Regression Tree Ensembles

利用回归树集成中的结构信息

基本信息

  • 批准号:
    1712870
  • 负责人:
  • 金额:
    $ 10万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2020-02-29
  • 项目状态:
    已结题

项目摘要

A common task in statistics is prediction; for example, a practitioner may be interested in predicting the presence of a disease given genetic information about an individual. Due to recent advances in data collection, frequently one has access to datasets which contain a massive number of predictors, but with correspondingly few subjects. This setting is generally referred to as the "big P, small n" scenario. Drawing meaningful conclusions under such circumstances is generally impossible unless the underlying data satisfy certain structural assumptions. The simplest such structural assumption is that only a small number of the predictors are relevant; in this setting, finding the useful predictors corresponds to finding a so-called "needle in a haystack." The goal of this project is to construct procedures which adapt to this, and other, structural assumptions. The project will focus on methods based on decision trees, which are flowchart-like structures in which predictions are based on whether the predictors satisfy various rules. Usually an ensemble of decision trees are constructed, with the predictions for each individual tree averaged. While decision tree ensembles are frequently used with high dimensional data, it is unclear to what extent they adapt to the structural properties of the data. This project will show that, in practice, off-the-shelf decision tree ensembling methods do not adapt to common structural assumptions, and will develop new methods which do. In addition to developing methods with strong theoretical support, this project will support the development of an R package to give practitioners easy access to our methodology. The PI will develop Bayesian methods for incorporating structural information into tree-based ensemble methods, and establish theoretically the benefit of making use of this additional information. This forms a nonparametric counterpart to the parametric approaches used in linear models, such as the lasso, graphical lasso, or group lasso; Bayesian approaches in the parametric setting include the use of variable selection priors, such as spike-and-slab priors and global-local shrinkage priors. Structural information will be incorporated by modifying the commonly used priors on decision tree ensembles so that the prior is concentrated on models which satisfy the desired structure. The PI will first investigate the theoretical properties of a sparsity inducing prior which is designed to eliminate unnecessary predictors. Sparsity here is obtained by applying a sparsity inducing Dirichlet prior to the a priori probability that a given branch is associated to a given predictor. This prior will be extended to allow for grouped variable selection in a similar manner to the group lassoby considering the class of Dirichlet tree priors, and further to accommodate graphical structures in the predictors through sparsity inducing logistic normal priors. Additionally, the PI will develop computationally efficient Markov chain Monte Carlo algorithms to fit the resulting models. Compared to existing methods, these structural priors will be shown to lead to substantial gains in predictive accuracy, and to more accurate scientific discovery.
统计中的一个常见任务是预测。例如,从业者可能有兴趣根据个人的遗传信息来预测疾病的存在。由于数据收集方面的最新进展,人们经常可以访问包含大量预测变量但主题相对较少的数据集。这种设置一般称为“大P,小n”场景。在这种情况下得出有意义的结论通常是不可能的,除非基础数据满足某些结构假设。最简单的结构假设是只有少数预测变量是相关的;在这种情况下,找到有用的预测变量相当于找到所谓的“大海捞针”。该项目的目标是构建适应这一以及其他结构假设的程序。该项目将重点关注基于决策树的方法,决策树是类似流程图的结构,其中预测基于预测变量是否满足各种规则。通常会构建决策树集合,并对每棵树的预测进行平均。虽然决策树集成经常用于高维数据,但尚不清楚它们在多大程度上适应数据的结构特性。该项目将表明,在实践中,现成的决策树集成方法不适应常见的结构假设,并将开发能够适应常见结构假设的新方法。除了开发具有强大理论支持的方法外,该项目还将支持 R 包的开发,以便从业者轻松访问我们的方法。 PI 将开发贝叶斯方法,将结构信息合并到基于树的集成方法中,并从理论上确立利用这些附加信息的好处。这形成了线性模型中使用的参数方法的非参数对应方法,例如套索、图形套索或组套索;参数设置中的贝叶斯方法包括使用变量选择先验,例如尖峰和平板先验和全局局部收缩先验。将通过修改决策树集成上常用的先验来合并结构信息,以便先验集中在满足所需结构的模型上。 PI 将首先研究稀疏性诱导先验的理论属性,该先验旨在消除不必要的预测变量。这里的稀疏性是通过在给定分支与给定预测器相关联的先验概率之前应用稀疏性诱导狄利克雷来获得的。该先验将被扩展以允许以与组套索类似的方式进行分组变量选择,考虑狄利克雷树先验的类别,并进一步通过稀疏性诱导逻辑正态先验来适应预测变量中的图形结构。此外,PI 将开发计算高效的马尔可夫链蒙特卡罗算法来拟合生成的模型。与现有方法相比,这些结构先验将被证明可以显着提高预测准确性,并带来更准确的科学发现。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Interaction Detection with Bayesian Decision Tree Ensembles
使用贝叶斯决策树集成进行交互检测
Bayesian Approaches for Missing Not at Random Outcome Data: The Role of Identifying Restrictions
  • DOI:
    10.1214/17-sts630
  • 发表时间:
    2018-05-01
  • 期刊:
  • 影响因子:
    5.7
  • 作者:
    Linero, Antonio R.;Daniels, Michael J.
  • 通讯作者:
    Daniels, Michael J.
A Bayesian approach to sequential monitoring of nonlinear profiles using wavelets: Wavelet-Based Bayesian Profile Monitoring
使用小波连续监测非线性剖面的贝叶斯方法:基于小波的贝叶斯剖面监测
Multi-rubric models for ordinal spatial data with application to online ratings data
有序空间数据的多标题模型及其应用于在线评级数据
  • DOI:
    10.1214/18-aoas1143
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Linero, Antonio R.;Bradley, Jonathan R.;Desai, Apurva
  • 通讯作者:
    Desai, Apurva
Incorporating Grouping Information into Bayesian Decision Tree Ensembles
将分组信息合并到贝叶斯决策树集成中
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Antonio Linero其他文献

Advances in Periodic Difference Equations with Open Problems
具有开放问题的周期差分方程的进展
  • DOI:
    10.1007/978-3-662-44140-4_6
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Z. Alsharawi;Jose C´anovas;Antonio Linero
  • 通讯作者:
    Antonio Linero

Antonio Linero的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Antonio Linero', 18)}}的其他基金

CAREER: Foundations for Bayesian Nonparametric Causal Inference
职业:贝叶斯非参数因果推理基础
  • 批准号:
    2144933
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
  • 批准号:
    2015636
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant

相似国自然基金

非线性模型结构性误差的动力学订正方法研究
  • 批准号:
    42375059
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
高纬旱区复杂结构性特殊土水敏致灾机理与重大工程灾变防控
  • 批准号:
    42330708
  • 批准年份:
    2023
  • 资助金额:
    231 万元
  • 项目类别:
    重点项目
橡胶木非结构性碳水化合物原位交联改性及梯级保护机制
  • 批准号:
    32371791
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
高性能稀土基磁热非晶粉芯高通量制备及结构性能关联性研究
  • 批准号:
    52301212
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
异质性视角下我国结构性货币政策工具传导机制研究
  • 批准号:
    72373080
  • 批准年份:
    2023
  • 资助金额:
    40 万元
  • 项目类别:
    面上项目

相似海外基金

Leveraging a community-driven approach to address the impact of social determinants of health on structural inequities among Miami-Dade County's intergenerational LGBTQ+ Community
利用社区驱动的方法解决健康问题社会决定因素对迈阿密戴德县代际 LGBTQ 社区结构性不平等的影响
  • 批准号:
    10781877
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
Leveraging Community-Academic Partnerships and Social Networks to Disseminate Vaccine-Related Information and Increase Vaccine Uptake Among Black Individuals with Rheumatic Diseases
利用社区学术合作伙伴关系和社交网络传播疫苗相关信息并提高患有风湿病的黑人个体的疫苗接种率
  • 批准号:
    10442270
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
Leveraging Community-Academic Partnerships and Social Networks to Disseminate Vaccine-Related Information and Increase Vaccine Uptake Among Black Individuals with Rheumatic Diseases
利用社区学术合作伙伴关系和社交网络传播疫苗相关信息并提高患有风湿病的黑人个体的疫苗接种率
  • 批准号:
    10620245
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
  • 批准号:
    2015636
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Leveraging Covariate and Structural Information for Efficient Large-Scale and High-Dimensional Inference
利用协变量和结构信息进行高效的大规模和高维推理
  • 批准号:
    1811747
  • 财政年份:
    2018
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了