New Frontiers of Robust Statistics in the Era of Big Data
大数据时代稳健统计的新领域
基本信息
- 批准号:2113568
- 负责人:
- 金额:$ 23.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern technologies have facilitated the collection of an unprecedented amount of features with complex structures. Although extensive progress has been made towards extracting useful information from massive data, the statistical analysis typically assumed that data are drawn without any contamination. However, in reality the data sets arising in applications such as genomics and medical imaging are usually more inhomogeneous due to either data collection process or the intrinsic nature of the data in the era of big data. For instance, in gene expression data analysis, outliers frequently arise in microarray experiments due to the array chip artifacts such as uneven spray of reagents within arrays. Compared to the recent advances in the era of big data, research in modeling and theoretical foundations for robust procedures under contamination models has fallen behind. To bridge this gap, this project seeks to develop new robust estimation and inference procedures which are rate-optimal for various contamination models as building blocks to address the modeling, theory and computational challenges. Upon completion, this work will lead to a comprehensive understanding of contamination models and have an immediate impact on various disciplines such as biology, genomics, astronomy and finance. The project also provides training opportunities for undergraduate and graduate students, and is used to enrich courses and outreach educational materials in statistics and data science.This project aims to address some of the most pressing challenges that are faced by robust procedures in high-dimensional and nonparametric contamination models. Specifically, (I) the research begins with statistical inference of low-dimensional parameters in both increasing-dimensional and high-dimensional regressions under contamination models. The PI will study the influence of contamination proportion in obtaining the root-n consistency results. Robust large-scale simultaneous inference under contamination models are also considered. (II) Next, the PI will revisit some classical nonparametric density estimation problems both under arbitrary and structured contamination distributions. The PI plans to propose rate-optimal procedures and carefully study the effect of contamination on estimation through various model indices, including contamination proportion, the structure of contamination and the choice of loss function. (III) The PI will develop a U-type robust covariance estimator under structured contamination models and provide rigorous theoretical guarantees on its rate optimality. This general robust estimator can serve as building blocks for establishing many rate-optimal procedures for structured large covariance/precision matrix estimation problems. User-friendly R packages will be developed to implement the proposed methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代技术已促进了具有复杂结构的前所未有的特征的收集。尽管从大量数据中提取有用信息方面取得了广泛的进展,但统计分析通常假定数据是没有任何污染的。但是,实际上,由于数据收集过程或大数据时代的数据的固有性质,基因组学和医学成像等应用中产生的数据集通常更加不均匀。例如,在基因表达数据分析中,由于阵列芯片伪影(例如阵列中的不均匀喷雾剂喷雾剂),在微阵列实验中经常出现异常值。与大数据时代的最新进展相比,在污染模型下为稳健程序建模和理论基础的研究落后了。为了弥合这一差距,该项目旨在制定新的强大估计和推理程序,这些估计和推理程序对于各种污染模型来说是最佳的,作为解决建模,理论和计算挑战的基础。 完成后,这项工作将导致对污染模型的全面理解,并对生物学,基因组学,天文学和金融等各种学科产生直接影响。该项目还为本科生和研究生提供了培训机会,并用于丰富统计和数据科学领域的课程和外展教育材料。该项目旨在应对高维和非参数污染模型中强大程序面临的一些最紧迫的挑战。具体而言,(i)研究始于在污染模型下增加维度和高维回归中低维参数的统计推断。 PI将研究污染比例在获得根N一致性结果中的影响。还考虑了在污染模型下同时进行强大的大规模同时推断。 (ii)接下来,PI将在任意和结构化污染分布下重新审视一些经典的非参数密度估计问题。 PI计划提出最佳速率程序,并通过各种模型指标仔细研究污染对估计的影响,包括污染比例,污染结构和损失功能的选择。 (iii)PI将在结构化污染模型下开发U型强大的协方差估计器,并根据其速率优化性提供严格的理论保证。该一般强大的估计器可以用作建立许多结构化的大协方差/精度矩阵估计问题的基本块。将开发用户友好的R软件包来实施拟议的方法。该奖项反映了NSF的法定任务,并认为使用基金会的知识分子优点和更广泛的影响审查标准,被认为值得通过评估。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Adaptive minimax density estimation on ℝ d for Huber’s contamination model
Huber 污染模型对 d 的自适应极小极大密度估计
- DOI:10.1093/imaiai/iaad045
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Zhang, Peiliang;Ren, Zhao
- 通讯作者:Ren, Zhao
High-dimension to high-dimension screening for detecting genome-wide epigenetic and noncoding RNA regulators of gene expression
高维到高维筛选,用于检测基因表达的全基因组表观遗传和非编码 RNA 调节因子
- DOI:10.1093/bioinformatics/btac518
- 发表时间:2022
- 期刊:
- 影响因子:5.8
- 作者:Ke, Hongjie;Ren, Zhao;Qi, Jianfei;Chen, Shuo;Tseng, George C.;Ye, Zhenyao;Ma, Tianzhou;Alkan, ed., Can
- 通讯作者:Alkan, ed., Can
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhao Ren其他文献
Clapeyron equation and phase equilibrium properties in higher dimensional charged topological dilaton AdS black holes with a nonlinear source
非线性源高维带电拓扑膨胀 AdS 黑洞中的克拉佩龙方程和相平衡性质
- DOI:
10.1140/epjc/s10052-017-4831-8 - 发表时间:
2016-09 - 期刊:
- 影响因子:4.4
- 作者:
Li Huai-Fan;Zhao Hui-Hua;Zhang Li-Chun;Zhao Ren - 通讯作者:
Zhao Ren
Tunneling mechanism in higher-dimensional rotating black hole with a cosmological constant in the approach of dimensional reduction
降维方法中具有宇宙学常数的高维旋转黑洞中的隧道机制
- DOI:
10.1007/s10509-011-0660-7 - 发表时间:
2011-03 - 期刊:
- 影响因子:1.9
- 作者:
Zhang Li-Chun;Li Huai-Fan;Zhao Ren - 通讯作者:
Zhao Ren
Quantum Statistical Entropy of Black Hole
黑洞的量子统计熵
- DOI:
10.1023/a:1021179316964 - 发表时间:
2002 - 期刊:
- 影响因子:0
- 作者:
Zhao Ren;Zhang Junfang;Zhang Lichun - 通讯作者:
Zhang Lichun
A new explanation for statistical entropy of charged black hole
带电黑洞统计熵的新解释
- DOI:
10.1007/s11433-013-5167-5 - 发表时间:
2013-07 - 期刊:
- 影响因子:0
- 作者:
Zhao Ren;Zhang LiChun - 通讯作者:
Zhang LiChun
The EIHW-GLAM Deep Attentive Multi-model Fusion System for Cough-based COVID-19 Recognition in the DiCOVA 2021 Challenge
EIHW-GLAM 深度注意力多模型融合系统,用于 DiCOVA 2021 挑战赛中基于咳嗽的 COVID-19 识别
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Zhao Ren;Yi Chang;Björn Schuller - 通讯作者:
Björn Schuller
Zhao Ren的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhao Ren', 18)}}的其他基金
New Methods and Theory of Statistical Inference for Non-Gaussian Graphical Models
非高斯图模型统计推断的新方法和理论
- 批准号:
1812030 - 财政年份:2018
- 资助金额:
$ 23.59万 - 项目类别:
Standard Grant
相似国自然基金
中缅边境恶性疟原虫对双氢青蒿素-哌喹的抗药性研究及该地区恶性疟替代治疗方案探索
- 批准号:32360118
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
碳边境调节机制对我国区域经济、社会和环境协调发展的影响——考虑企业所有制异质性的研究
- 批准号:72303240
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
图们江流域中国边境地区候鸟-蜱-蜱携带病原流行病学数据库的建立及候鸟对蜱携带病原体多态性影响机制研究
- 批准号:32360886
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
基于电波传播先验知识的边境区域无线电发射源定位及应用研究
- 批准号:62361055
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
西藏边境城镇人居单元的传承机制及适宜性营造模式研究
- 批准号:52308035
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
相似海外基金
Conference: 2024 NanoFlorida Conference: New Frontiers in Nanoscale interactions
会议:2024 年纳米佛罗里达会议:纳米尺度相互作用的新前沿
- 批准号:
2415310 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Standard Grant
New Frontiers for Anonymous Authentication
匿名身份验证的新领域
- 批准号:
DE240100282 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Discovery Early Career Researcher Award
Collaborative Research: AF: Small: Exploring the Frontiers of Adversarial Robustness
合作研究:AF:小型:探索对抗鲁棒性的前沿
- 批准号:
2335411 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Standard Grant
New Frontiers in Large-Scale Polynomial Optimisation
大规模多项式优化的新领域
- 批准号:
DE240100674 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Discovery Early Career Researcher Award
Mapping the Frontiers of Private Property in Australia
绘制澳大利亚私有财产的边界
- 批准号:
DP240100395 - 财政年份:2024
- 资助金额:
$ 23.59万 - 项目类别:
Discovery Projects