New Frontiers of Robust Statistics in the Era of Big Data
大数据时代稳健统计的新领域
基本信息
- 批准号:2113568
- 负责人:
- 金额:$ 23.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern technologies have facilitated the collection of an unprecedented amount of features with complex structures. Although extensive progress has been made towards extracting useful information from massive data, the statistical analysis typically assumed that data are drawn without any contamination. However, in reality the data sets arising in applications such as genomics and medical imaging are usually more inhomogeneous due to either data collection process or the intrinsic nature of the data in the era of big data. For instance, in gene expression data analysis, outliers frequently arise in microarray experiments due to the array chip artifacts such as uneven spray of reagents within arrays. Compared to the recent advances in the era of big data, research in modeling and theoretical foundations for robust procedures under contamination models has fallen behind. To bridge this gap, this project seeks to develop new robust estimation and inference procedures which are rate-optimal for various contamination models as building blocks to address the modeling, theory and computational challenges. Upon completion, this work will lead to a comprehensive understanding of contamination models and have an immediate impact on various disciplines such as biology, genomics, astronomy and finance. The project also provides training opportunities for undergraduate and graduate students, and is used to enrich courses and outreach educational materials in statistics and data science.This project aims to address some of the most pressing challenges that are faced by robust procedures in high-dimensional and nonparametric contamination models. Specifically, (I) the research begins with statistical inference of low-dimensional parameters in both increasing-dimensional and high-dimensional regressions under contamination models. The PI will study the influence of contamination proportion in obtaining the root-n consistency results. Robust large-scale simultaneous inference under contamination models are also considered. (II) Next, the PI will revisit some classical nonparametric density estimation problems both under arbitrary and structured contamination distributions. The PI plans to propose rate-optimal procedures and carefully study the effect of contamination on estimation through various model indices, including contamination proportion, the structure of contamination and the choice of loss function. (III) The PI will develop a U-type robust covariance estimator under structured contamination models and provide rigorous theoretical guarantees on its rate optimality. This general robust estimator can serve as building blocks for establishing many rate-optimal procedures for structured large covariance/precision matrix estimation problems. User-friendly R packages will be developed to implement the proposed methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代技术促进了前所未有数量的复杂结构特征的收集。尽管在从海量数据中提取有用信息方面已经取得了广泛进展,但统计分析通常假设数据是在没有任何污染的情况下提取的。然而,实际上,由于数据收集过程或大数据时代数据的本质,基因组学和医学成像等应用中产生的数据集通常更加不均匀。例如,在基因表达数据分析中,由于阵列芯片伪影(例如阵列内试剂喷雾不均匀),微阵列实验中经常出现异常值。与大数据时代的最新进展相比,污染模型下鲁棒程序的建模和理论基础研究已经落后。为了弥补这一差距,该项目寻求开发新的稳健估计和推理程序,这些程序对于各种污染模型来说都是速率最优的,作为解决建模、理论和计算挑战的构建块。 完成后,这项工作将带来对污染模型的全面理解,并对生物学、基因组学、天文学和金融等多个学科产生直接影响。该项目还为本科生和研究生提供培训机会,并用于丰富统计和数据科学方面的课程和推广教育材料。该项目旨在解决高维和数据科学中稳健程序所面临的一些最紧迫的挑战。非参数污染模型。具体来说,(I)研究从污染模型下的增维和高维回归中低维参数的统计推断开始。 PI将研究污染比例对获得根n一致性结果的影响。还考虑了污染模型下的鲁棒大规模同步推理。 (II) 接下来,PI 将重新审视任意和结构化污染分布下的一些经典非参数密度估计问题。 PI计划提出速率最优程序,并通过各种模型指标仔细研究污染对估计的影响,包括污染比例、污染结构和损失函数的选择。 (三)PI将开发结构化污染模型下的U型鲁棒协方差估计器,并为其速率最优性提供严格的理论保证。这种通用的鲁棒估计器可以用作构建块,用于为结构化大协方差/精度矩阵估计问题建立许多速率最优过程。将开发用户友好的 R 包来实施拟议的方法。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
High-dimension to high-dimension screening for detecting genome-wide epigenetic and noncoding RNA regulators of gene expression
高维到高维筛选,用于检测基因表达的全基因组表观遗传和非编码 RNA 调节因子
- DOI:10.1093/bioinformatics/btac518
- 发表时间:2022-07
- 期刊:
- 影响因子:5.8
- 作者:Ke, Hongjie;Ren, Zhao;Qi, Jianfei;Chen, Shuo;Tseng, George C.;Ye, Zhenyao;Ma, Tianzhou;Alkan, ed., Can
- 通讯作者:Alkan, ed., Can
Adaptive minimax density estimation on ℝ d for Huber’s contamination model
Huber 污染模型对 d 的自适应极小极大密度估计
- DOI:10.1093/imaiai/iaad045
- 发表时间:2023-11
- 期刊:
- 影响因子:0
- 作者:Zhang, Peiliang;Ren, Zhao
- 通讯作者:Ren, Zhao
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhao Ren其他文献
Prediction of shot peen forming effects with single and repeated impacts
单次和重复冲击喷丸成形效果的预测
- DOI:
10.1016/j.ijmecsci.2018.01.006 - 发表时间:
2018-03-01 - 期刊:
- 影响因子:7.3
- 作者:
X. Xiao;X. Tong;L. Yanwei;Zhao Ren;Guoqiang Gao;Yan Li - 通讯作者:
Yan Li
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition
阿尔茨海默氏症痴呆识别的声学和语言学方法的比较
- DOI:
10.21437/interspeech.2020-2635 - 发表时间:
2020-10-25 - 期刊:
- 影响因子:0
- 作者:
N. Cummins;Yilin Pan;Zhao Ren;J. Fritsch;Venkata Srikanth Nallanthighal;H. Christensen;D. Blackburn;Björn Schuller;M. Magimai;H. Strik;Aki Härmä - 通讯作者:
Aki Härmä
Lysine 624 of the Amyloid Precursor Protein (APP) Is a Critical Determinant of Amyloid β Peptide Length
淀粉样蛋白前体蛋白 (APP) 的赖氨酸 624 是淀粉样蛋白 β 肽长度的关键决定因素
- DOI:
10.1074/jbc.m111.274696 - 发表时间:
2011-08-25 - 期刊:
- 影响因子:0
- 作者:
T. Kukar;Thomas B. Ladd;P. Robertson;Sean A. Pintchovski;Brenda D. Moore;M. Bann;Zhao Ren;Karen R. Jansen;Kimberly G. Malphrus;S. Eggert;H. Maruyama;B. Cottrell;Pritam Das;G. Basi;E. Koo;T. Golde - 通讯作者:
T. Golde
Frustration recognition from speech during game interaction using wide residual networks
使用宽残差网络在游戏交互过程中从语音中识别挫败感
- DOI:
10.1016/j.vrih.2020.10.004 - 发表时间:
2021-02-01 - 期刊:
- 影响因子:0
- 作者:
Meishu Song;Adria Mallol;Emilia Parada;Zijiang Yang;Shuo Liu;Zhao Ren;Ziping Zhao;Björn Schuller - 通讯作者:
Björn Schuller
Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition
挤压打喷嚏:用于感冒和流感识别的紧凑神经网络
- DOI:
10.21437/interspeech.2020-2531 - 发表时间:
2020-10-25 - 期刊:
- 影响因子:1.6
- 作者:
M. Albes;Zhao Ren;Björn Schuller;N. Cummins - 通讯作者:
N. Cummins
Zhao Ren的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhao Ren', 18)}}的其他基金
New Methods and Theory of Statistical Inference for Non-Gaussian Graphical Models
非高斯图模型统计推断的新方法和理论
- 批准号:
1812030 - 财政年份:2018
- 资助金额:
$ 23.59万 - 项目类别:
Standard Grant
相似国自然基金
煤火前沿焦油残余的有机污染物释放特性与原位活化降解方法
- 批准号:52374246
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
肝细胞癌侵袭性前沿肿瘤免疫屏障的MRI研究
- 批准号:82302155
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
分形几何与度量丢番图逼近中的若干前沿问题
- 批准号:12331005
- 批准年份:2023
- 资助金额:194 万元
- 项目类别:重点项目
基于鲁棒非参数前沿的ESG基金漂绿行为测度与防范研究
- 批准号:72371100
- 批准年份:2023
- 资助金额:41 万元
- 项目类别:面上项目
专题研讨类:PEM氢电能量转化战略前沿研讨会
- 批准号:22342006
- 批准年份:2023
- 资助金额:10 万元
- 项目类别:专项基金项目
相似海外基金
New Frontiers for Anonymous Authentication
匿名身份验证的新领域
- 批准号:
DE240100282 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Discovery Early Career Researcher Award
Deregulated Infrastructures of Extraction in Rainforest Frontiers
放松对雨林边境采伐基础设施的管制
- 批准号:
EP/Y036174/1 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Research Grant
Collaborative Research: AF: Small: Exploring the Frontiers of Adversarial Robustness
合作研究:AF:小型:探索对抗鲁棒性的前沿
- 批准号:
2335412 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Standard Grant
New Frontiers in Large-Scale Polynomial Optimisation
大规模多项式优化的新领域
- 批准号:
DE240100674 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Discovery Early Career Researcher Award
CAREER: New Frontiers of Private Learning and Synthetic Data
职业:私人学习和合成数据的新领域
- 批准号:
2339775 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Continuing Grant