High-dimensional Data Analysis: Modeling Unobserved Heterogeneity in Data, and Studying Imbalanced Classification Problems
高维数据分析:对数据中未观察到的异质性进行建模,并研究不平衡分类问题
基本信息
- 批准号:RGPIN-2020-05011
- 负责人:
- 金额:$ 1.75万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Data science has become the center of attention in a wide range of scientific disciplines, thanks to ever-expanding means of data collection in today's world. Unprecedented size and structural complexity of current data in many applications call for computationally efficient and statistically sound methodologies for extracting useful information from such data. Toward this goal, the general theme of my research program focuses on analyzing high-dimensional data. More specifically, over the five years of this proposal, my short-term objectives are: I) Statistical modeling of heterogeneous high-dimensional data: In applications such as health sciences, engineering and environment, social sciences, and financial econometrics, high-dimensional data often arise from heterogeneous populations consisting of multiple hidden homogeneous sub-populations. Finite mixture of regressions (FMR) and Markov regime-switching autoregressive (MSAR) models provide flexible tools for capturing unobserved heterogeneity in data. The later models are used for modeling time series data. In practice, when fitting such models to a dataset, one faces three inferential problems: order selection or estimation of the number of hidden sub-populations or regimes, variable selection, and so-called post-selection statistical inference such as hypothesis testing or confidence intervals for parameters of a data-driven selected model. Despite their wide applications, rigorous methodological developments addressing the aforementioned problems in the growing literature on high-dimensional statistics have been very limited. In my short-term objectives, I will investigate new likelihood-based regularization techniques for: order selection in FMR and MSAR, and variable selection in sparse dynamic FMR and vector MSAR with fixed order and in high-dimensional settings. Establishment of such results will pave the way toward post-selection inference problems which are the subjects of my long-term objectives. II) High-dimensional imbalanced classification problems: In applications such as fraud detection, medical diagnosis, or equipment malfunction detection, classification tasks often suffer from both high-dimensionality and imbalance in the observed frequency of some classes in the training data. The latter is due to either data collection process or because some classes are indeed rare in the population. Due to data scarcity in minority class(es), conventional discriminative methods are often biased toward the majority class(es) resulting in much higher misclassification rates for the minority class(es). Imbalanced classification problems are generally hard, so I begin by studying imbalanced linear binary cases. I will investigate the utility of divide-and-conquer techniques coupled with hard-thresholding variable selection methods for bias correction in the standard linear discriminant analysis toward the minority class in high-dimensions. I will also study multi-class problems.
由于当今世界数据收集手段的不断扩展,许多应用中当前数据的规模和结构复杂性前所未有,因此需要计算高效且令人惊讶的可靠提取方法,数据科学已成为广泛科学学科的关注中心。为了实现这一目标,我的研究计划的总体主题侧重于分析高维数据,更具体地说,在本提案的五年内,我的短期目标是:I)异构高维数据的统计建模。维度数据:在健康科学、工程和环境、社会科学以及金融计量经济学等应用中,高维数据通常来自由多个隐藏的同质子群体组成的异质群体(FMR)和马尔可夫政权切换。自回归(MSAR)模型提供了灵活的工具来捕获数据中未观察到的异质性,后来的模型用于对时间序列数据进行建模,在将此类模型拟合到数据集时,一个面临三个。推理问题:隐藏子群体或机制数量的顺序选择或估计、变量选择以及所谓的选择后统计推理,例如数据驱动选择模型的参数的假设检验或置信区间,尽管它们的范围很广。在我的短期目标中,我将研究新的基于似然的正则化技术:FMR 和 MSAR 中的阶次选择以及变量。稀疏动态选择具有固定阶数和高维设置的 FMR 和向量 MSAR 将为后选择推理问题铺平道路,这是我的长期目标 II) 高维不平衡分类问题:在应用中。例如欺诈检测、医疗诊断或设备故障检测,分类任务通常会受到训练数据中某些类别的观察频率的高维性和不平衡的影响,后者是由于数据收集过程或某些类别的不平衡造成的。确实在人群中很少见。由于少数类别的数据稀缺,传统的判别方法通常偏向多数类别,导致少数类别的错误分类率更高。不平衡分类问题通常很困难,因此我首先研究不平衡线性。我将研究分而治之技术与硬阈值变量选择方法在针对高维度少数类的标准线性判别分析中的实用性。多类问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Khalili, Abbas其他文献
Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space
- DOI:
10.1093/biostatistics/kxq048 - 发表时间:
2011-01-01 - 期刊:
- 影响因子:2.1
- 作者:
Khalili, Abbas;Chen, Jiahua;Lin, Shili - 通讯作者:
Lin, Shili
Disseminated Intravascular Coagulation Associated with Large Deletion of Immunoglobulin Heavy Chain
- DOI:
10.18502/ijaai.v20i6.8030 - 发表时间:
2021-12-01 - 期刊:
- 影响因子:1.5
- 作者:
Khalili, Abbas;Yadegari, Amir Hosein;Abolhassani, Hassan - 通讯作者:
Abolhassani, Hassan
Autosomal Recessive Agammaglobulinemia: A Novel Non-sense Mutation in CD79a
- DOI:
10.1007/s10875-014-9989-3 - 发表时间:
2014-02-01 - 期刊:
- 影响因子:9.1
- 作者:
Khalili, Abbas;Plebani, Alessandro;Aghamohammadi, Asghar - 通讯作者:
Aghamohammadi, Asghar
Order Selection in Finite Mixture Models With a Nonsmooth Penalty
- DOI:
10.1198/016214508000001075 - 发表时间:
2008-12-01 - 期刊:
- 影响因子:3.7
- 作者:
Chen, Jiahua;Khalili, Abbas - 通讯作者:
Khalili, Abbas
Order Selection in Finite Mixture Models With a Nonsmooth Penalty
- DOI:
10.1198/jasa.2009.0103 - 发表时间:
2009-03-01 - 期刊:
- 影响因子:3.7
- 作者:
Chen, Jiahua;Khalili, Abbas - 通讯作者:
Khalili, Abbas
Khalili, Abbas的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Khalili, Abbas', 18)}}的其他基金
High-dimensional Data Analysis: Modeling Unobserved Heterogeneity in Data, and Studying Imbalanced Classification Problems
高维数据分析:对数据中未观察到的异质性进行建模,并研究不平衡分类问题
- 批准号:
RGPIN-2020-05011 - 财政年份:2022
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
High-dimensional Data Analysis: Modeling Unobserved Heterogeneity in Data, and Studying Imbalanced Classification Problems
高维数据分析:对数据中未观察到的异质性进行建模,并研究不平衡分类问题
- 批准号:
RGPIN-2020-05011 - 财政年份:2020
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference in finite mixture of regressions and mixture-of-experts models in high-dimensional spaces, and varying coefficient finite mixture of regression models
高维空间中回归和专家混合模型的有限混合的统计推断,以及回归模型的变系数有限混合
- 批准号:
RGPIN-2015-03805 - 财政年份:2019
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference in finite mixture of regressions and mixture-of-experts models in high-dimensional spaces, and varying coefficient finite mixture of regression models
高维空间中回归和专家混合模型的有限混合的统计推断,以及回归模型的变系数有限混合
- 批准号:
RGPIN-2015-03805 - 财政年份:2018
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference in finite mixture of regressions and mixture-of-experts models in high-dimensional spaces, and varying coefficient finite mixture of regression models
高维空间中回归和专家混合模型的有限混合的统计推断,以及回归模型的变系数有限混合
- 批准号:
RGPIN-2015-03805 - 财政年份:2017
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference in finite mixture of regressions and mixture-of-experts models in high-dimensional spaces, and varying coefficient finite mixture of regression models
高维空间中回归和专家混合模型的有限混合的统计推断,以及回归模型的变系数有限混合
- 批准号:
RGPIN-2015-03805 - 财政年份:2016
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference in finite mixture of regressions and mixture-of-experts models in high-dimensional spaces, and varying coefficient finite mixture of regression models
高维空间中回归和专家混合模型的有限混合的统计推断,以及回归模型的变系数有限混合
- 批准号:
RGPIN-2015-03805 - 财政年份:2015
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Model selection and statistical inference in mixture distributions and hidden markov (regression) models
混合分布和隐马尔可夫(回归)模型中的模型选择和统计推断
- 批准号:
386578-2010 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Model selection and statistical inference in mixture distributions and hidden markov (regression) models
混合分布和隐马尔可夫(回归)模型中的模型选择和统计推断
- 批准号:
386578-2010 - 财政年份:2013
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Model selection and statistical inference in mixture distributions and hidden markov (regression) models
混合分布和隐马尔可夫(回归)模型中的模型选择和统计推断
- 批准号:
386578-2010 - 财政年份:2012
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
水下隧道泥质充填裂隙围岩灾变机理与数据驱动稳定性分析
- 批准号:52379106
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于引力波数据分析研究中子星相变及检验黑洞面积定律
- 批准号:12303056
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
不完备工业时序数据特征分析
- 批准号:62306212
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
数字技术应用对发明人员研发行为的影响——基于专利发明人的大数据分析
- 批准号:72372152
- 批准年份:2023
- 资助金额:41 万元
- 项目类别:面上项目
知识与数据混合驱动的含缺陷点阵结构不确定性分析与优化方法研究
- 批准号:12302149
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
I-Corps: Vision analysis system using inferred three-dimensional data to analyze and correct a user’s pose in relation to 3D space
I-Corps:视觉分析系统,使用推断的三维数据来分析和纠正用户相对于 3D 空间的姿势
- 批准号:
2403992 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Oral pathogen - mediated pro-tumorigenic transformation through disruption of an Adherens Junction - associated RNAi machinery
通过破坏粘附连接相关的 RNAi 机制,口腔病原体介导促肿瘤转化
- 批准号:
10752248 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Fluency from Flesh to Filament: Collation, Representation, and Analysis of Multi-Scale Neuroimaging data to Characterize and Diagnose Alzheimer's Disease
从肉体到细丝的流畅性:多尺度神经影像数据的整理、表示和分析,以表征和诊断阿尔茨海默病
- 批准号:
10462257 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别: