Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
基本信息
- 批准号:1712870
- 负责人:
- 金额:$ 10万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2020-02-29
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A common task in statistics is prediction; for example, a practitioner may be interested in predicting the presence of a disease given genetic information about an individual. Due to recent advances in data collection, frequently one has access to datasets which contain a massive number of predictors, but with correspondingly few subjects. This setting is generally referred to as the "big P, small n" scenario. Drawing meaningful conclusions under such circumstances is generally impossible unless the underlying data satisfy certain structural assumptions. The simplest such structural assumption is that only a small number of the predictors are relevant; in this setting, finding the useful predictors corresponds to finding a so-called "needle in a haystack." The goal of this project is to construct procedures which adapt to this, and other, structural assumptions. The project will focus on methods based on decision trees, which are flowchart-like structures in which predictions are based on whether the predictors satisfy various rules. Usually an ensemble of decision trees are constructed, with the predictions for each individual tree averaged. While decision tree ensembles are frequently used with high dimensional data, it is unclear to what extent they adapt to the structural properties of the data. This project will show that, in practice, off-the-shelf decision tree ensembling methods do not adapt to common structural assumptions, and will develop new methods which do. In addition to developing methods with strong theoretical support, this project will support the development of an R package to give practitioners easy access to our methodology. The PI will develop Bayesian methods for incorporating structural information into tree-based ensemble methods, and establish theoretically the benefit of making use of this additional information. This forms a nonparametric counterpart to the parametric approaches used in linear models, such as the lasso, graphical lasso, or group lasso; Bayesian approaches in the parametric setting include the use of variable selection priors, such as spike-and-slab priors and global-local shrinkage priors. Structural information will be incorporated by modifying the commonly used priors on decision tree ensembles so that the prior is concentrated on models which satisfy the desired structure. The PI will first investigate the theoretical properties of a sparsity inducing prior which is designed to eliminate unnecessary predictors. Sparsity here is obtained by applying a sparsity inducing Dirichlet prior to the a priori probability that a given branch is associated to a given predictor. This prior will be extended to allow for grouped variable selection in a similar manner to the group lassoby considering the class of Dirichlet tree priors, and further to accommodate graphical structures in the predictors through sparsity inducing logistic normal priors. Additionally, the PI will develop computationally efficient Markov chain Monte Carlo algorithms to fit the resulting models. Compared to existing methods, these structural priors will be shown to lead to substantial gains in predictive accuracy, and to more accurate scientific discovery.
统计中的一个常见任务是预测。例如,从业者可能有兴趣预测有关个人的遗传信息的存在。由于数据收集的最新进展,通常可以访问包含大量预测变量的数据集,但受试者很少。该设置通常称为“ Big P,Small n”场景。在这种情况下,在这种情况下得出有意义的结论通常是不可能的,除非基本数据满足某些结构性假设。最简单的结构假设是,只有少数预测因子是相关的。在这种情况下,找到有用的预测因子对应于找到所谓的“在干草堆中”。该项目的目的是构建适应此项目以及其他结构假设的程序。该项目将集中在基于决策树的方法上,这些方法是基于流程图的结构,其中预测基于预测因素是否满足各种规则。通常,构建了决策树的合奏,对每棵树的平均预测进行了预测。虽然决策树集合经常与高维数据一起使用,但尚不清楚它们在多大程度上适应了数据的结构属性。该项目将表明,在实践中,现成的决策树结合方法不能适应常见的结构假设,并且会开发出可行的新方法。除了开发具有强大理论支持的方法外,该项目还将支持开发R软件包,以使从业者易于访问我们的方法。 PI将开发贝叶斯方法,将结构信息纳入基于树的集合方法,并从理论上确定使用此其他信息的好处。这形成了与线性模型中使用的参数方法的非参数对应物,例如拉索,图形拉索或组套索;参数设置中的贝叶斯方法包括使用可变选择先验,例如尖峰和slab先验和全球本地收缩先验。结构信息将通过在决策树组合上修改常用的先验来纳入结构信息,以便先验集中于满足所需结构的模型。 PI将首先研究引起稀疏性的理论特性,以消除不必要的预测因子。稀疏性是通过在先验概率之前施加诱导dirichlet的稀疏性来获得的,即给定分支与给定的预测变量相关联。该先验将扩展,以允许以类似于Dirichlet Tree先验的类别的方式进行分组的变量选择,并通过诱导Logistic正常先验的稀疏性来适应预测指标中的图形结构。此外,PI将开发计算高效的马尔可夫链蒙特卡洛算法以适合所得模型。与现有方法相比,这些结构先验将显示出可观的预测准确性和更准确的科学发现。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Interaction Detection with Bayesian Decision Tree Ensembles
使用贝叶斯决策树集成进行交互检测
- DOI:
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Du, Junliang;Linero, Antonio Ricardo
- 通讯作者:Linero, Antonio Ricardo
Bayesian Approaches for Missing Not at Random Outcome Data: The Role of Identifying Restrictions
- DOI:10.1214/17-sts630
- 发表时间:2018-05-01
- 期刊:
- 影响因子:5.7
- 作者:Linero, Antonio R.;Daniels, Michael J.
- 通讯作者:Daniels, Michael J.
A Bayesian approach to sequential monitoring of nonlinear profiles using wavelets: Wavelet-Based Bayesian Profile Monitoring
使用小波连续监测非线性剖面的贝叶斯方法:基于小波的贝叶斯剖面监测
- DOI:10.1002/qre.2409
- 发表时间:2019
- 期刊:
- 影响因子:2.3
- 作者:Varbanov, Roumen;Chicken, Eric;Linero, Antonio;Yang, Yun
- 通讯作者:Yang, Yun
Multi-rubric models for ordinal spatial data with application to online ratings data
有序空间数据的多标题模型及其应用于在线评级数据
- DOI:10.1214/18-aoas1143
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Linero, Antonio R.;Bradley, Jonathan R.;Desai, Apurva
- 通讯作者:Desai, Apurva
Incorporating Grouping Information into Bayesian Decision Tree Ensembles
将分组信息合并到贝叶斯决策树集成中
- DOI:
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Du, Junliang;Linero, Antonio Ricardo
- 通讯作者:Linero, Antonio Ricardo
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Antonio Linero其他文献
Advances in Periodic Difference Equations with Open Problems
具有开放问题的周期差分方程的进展
- DOI:
10.1007/978-3-662-44140-4_6 - 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Z. Alsharawi;Jose C´anovas;Antonio Linero - 通讯作者:
Antonio Linero
Antonio Linero的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Antonio Linero', 18)}}的其他基金
CAREER: Foundations for Bayesian Nonparametric Causal Inference
职业:贝叶斯非参数因果推理基础
- 批准号:
2144933 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
- 批准号:
2015636 - 财政年份:2019
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
相似国自然基金
利用细胞内RNA结构信息结合深度学习算法设计高效细胞环境特异的CRISPR-Cas13d gRNA
- 批准号:32300521
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于土地利用信息的公交出行频谱分析与网络结构智能化生成方法
- 批准号:72371021
- 批准年份:2023
- 资助金额:40 万元
- 项目类别:面上项目
利用性状遗传结构先验信息优化多性状贝叶斯评估模型
- 批准号:32102505
- 批准年份:2021
- 资助金额:24.00 万元
- 项目类别:青年科学基金项目
利用性状遗传结构先验信息优化多性状贝叶斯评估模型
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
利用三维结构信息解码蛋白-DNA结合特异性
- 批准号:
- 批准年份:2020
- 资助金额:58 万元
- 项目类别:面上项目
相似海外基金
Leveraging a community-driven approach to address the impact of social determinants of health on structural inequities among Miami-Dade County's intergenerational LGBTQ+ Community
利用社区驱动的方法解决健康问题社会决定因素对迈阿密戴德县代际 LGBTQ 社区结构性不平等的影响
- 批准号:
10781877 - 财政年份:2023
- 资助金额:
$ 10万 - 项目类别:
Leveraging Community-Academic Partnerships and Social Networks to Disseminate Vaccine-Related Information and Increase Vaccine Uptake Among Black Individuals with Rheumatic Diseases
利用社区学术合作伙伴关系和社交网络传播疫苗相关信息并提高患有风湿病的黑人个体的疫苗接种率
- 批准号:
10442270 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Leveraging Community-Academic Partnerships and Social Networks to Disseminate Vaccine-Related Information and Increase Vaccine Uptake Among Black Individuals with Rheumatic Diseases
利用社区学术合作伙伴关系和社交网络传播疫苗相关信息并提高患有风湿病的黑人个体的疫苗接种率
- 批准号:
10620245 - 财政年份:2022
- 资助金额:
$ 10万 - 项目类别:
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
- 批准号:
2015636 - 财政年份:2019
- 资助金额:
$ 10万 - 项目类别:
Continuing Grant
Leveraging Covariate and Structural Information for Efficient Large-Scale and High-Dimensional Inference
利用协变量和结构信息进行高效的大规模和高维推理
- 批准号:
1811747 - 财政年份:2018
- 资助金额:
$ 10万 - 项目类别:
Standard Grant