CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
基本信息
- 批准号:1752614
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In an era of Big Data, computer-intensive statistical inference faces unprecedented challenges and opportunities. High-dimensional and massive data are now emerging in scientific areas including biomedical engineering, environmental science, financial econometrics, array signal processing, and social networks, among many others. An important associated research challenge is to develop efficient methods to extract information and quantify its uncertainty for a large number of variables and measurements. Data-driven statistical inferential procedures for uncertainty quantification via the bootstrap methods are often computationally intensive for high-dimensional large-scale datasets. On the computational side, this research project will make use of distributed inference via the parallel high-performance computing technique, which is an essential ingredient to speed up bootstrap calculations. On the statistical side, this research will introduce a general framework for studying the performance of various bootstrap methods. This research project aims to lead to a comprehensive understanding of the fundamental tradeoff between statistical and computational concerns in quantifying uncertainty for a broad class of inferential procedures, thus providing guidance to practically optimize statistical accuracy and computational cost in potential real applications. Both undergraduate and graduate students are involved in the project.The overarching goal of this research project is to provide new insights and deepen the theoretical understanding of strengths and fundamental limitations of fully data-dependent inferential procedures (such as bootstraps) in the high-dimensional and massive data framework on two classical problems: i) change point detection and identification; ii) computationally-aware statistical inference for U-statistics. The research aims to develop statistically correct and computationally scalable inferential procedures when the dimension can be larger (or even much larger) than the sample size. In contrast to existing work, the methods under development have strong theoretical guarantees, are robust under mild assumptions, require no tuning, and are easy to parallelize. Of practical interest, the research will develop needed software tools for researchers from disciplines with applications of high-dimensional and nonparametric statistics. Theoretical contributions of the proposed research include establishing new approximation and coupling theorems (under weaker regularity conditions than existing literature) in high-dimensional and infinite-dimensional spaces of increasing dimension and complexity, where classical probability tools such as the central limit theorem and extreme value theory are no longer applicable. The mathematical theory is of independent interest and will provide powerful new tools to analyze other statistical procedures on high-dimensional and nonparametric models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在大数据时代,计算机密集型统计推断面临着前所未有的挑战和机遇。现在,在科学领域,包括生物医学工程,环境科学,金融计量经济学,阵列信号处理和社交网络等科学领域正在出现高维和大量数据。一个重要的相关研究挑战是开发有效的方法来提取信息并量化大量变量和测量的不确定性。通过引导方法进行不确定性定量的数据驱动的统计推理程序通常在高维大规模数据集的计算密集程度上是计算密集型的。在计算方面,该研究项目将通过平行的高性能计算技术利用分布式推理,这是加快自举计算的重要组成部分。在统计方面,这项研究将引入一个一般框架,用于研究各种自举方法的性能。该研究项目旨在使统计和计算问题之间的基本权衡对量化广泛的推论程序的不确定性进行全面了解,从而为潜在的实际应用中的统计准确性和计算成本提供指导,从而提供指导。本科生和研究生都参与了该项目。该研究项目的总体目标是提供新的见解,并加深对优势的理论理解和对高维和大规模数据框架在两个经典问题上完全依赖数据的推论程序(例如引导程序)的基本限制:i)变化点检测和识别; ii)对U统计量的计算意识到的统计推论。该研究的目的是在尺寸大于样本量大(甚至更大)时开发统计正确和计算可扩展的推论程序。与现有工作相反,所开发的方法具有强大的理论保证,在轻度假设下,不需要调整并且易于平行。令人感兴趣的是,该研究将为来自学科的研究人员开发所需的软件工具,并具有高维和非参数统计的应用。拟议研究的理论贡献包括在高维和无限维空间中建立新的近似值和耦合定理(在弱规则条件下),以增加维度和复杂性,其中经典概率工具(例如中心限制理论)不再适用。数学理论具有独立的兴趣,将提供有力的新工具来分析有关高维和非参数模型的其他统计程序。该奖项反映了NSF的法定任务,并被认为是通过基金会的智力优点和更广泛的影响来通过评估来支持的。
项目成果
期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Mean-field nonparametric estimation of interacting particle systems
- DOI:
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:Rentian Yao;Xiaohui Chen;Yun Yang
- 通讯作者:Rentian Yao;Xiaohui Chen;Yun Yang
Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data
希尔伯特空间中的汉森赖特不等式及其应用于非欧几里得数据的 $K$ 均值聚类
- DOI:10.3150/20-bej1251
- 发表时间:2021
- 期刊:
- 影响因子:1.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations
- DOI:10.1016/j.acha.2020.03.002
- 发表时间:2021-02-19
- 期刊:
- 影响因子:2.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering
- DOI:
- 发表时间:2022-01
- 期刊:
- 影响因子:0
- 作者:Yubo Zhuang;Xiaohui Chen;Yun Yang
- 通讯作者:Yubo Zhuang;Xiaohui Chen;Yun Yang
Cutoff for Exact Recovery of Gaussian Mixture Models
- DOI:10.1109/tit.2021.3063155
- 发表时间:2021-06-01
- 期刊:
- 影响因子:2.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Chen其他文献
Enhancement of temperature coefficient of resistance (TCR) and Magneto-resistance (MR) in La1–xCaxMnO3:Ag0.2 polycrystalline composites
La1–xCaxMnO3:Ag0.2 多晶复合材料中电阻温度系数 (TCR) 和磁阻 (MR) 的增强
- DOI:
10.1007/s10971-016-4294-7 - 发表时间:
2017-01 - 期刊:
- 影响因子:2.5
- 作者:
Fei Jin;Hui Zhang;Xiaohui Chen;Xiang Liu;Qingming Chen - 通讯作者:
Qingming Chen
Conversion of the Channel Covariance in FDD Systems with 3D Antenna Array
具有 3D 天线阵列的 FDD 系统中信道协方差的转换
- DOI:
10.1109/vtcfall.2018.8690887 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Min Qin;Yi Zhang;Haichao Wei;Li Chen;Xiaohui Chen;Guo Wei - 通讯作者:
Guo Wei
Nanogel-based scaffolds fabricated for bone regeneration with mesoporous bioactive glass and strontium: In vitro and in vivo characterization
使用介孔生物活性玻璃和锶制造用于骨再生的纳米凝胶支架:体外和体内表征
- DOI:
10.1002/jbm.a.35980 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Qiao Zhang;Xiaohui Chen;Shinan Geng;Lingfei Wei;Richard J Miron;Yanbing Zhao;Yufeng Zhang - 通讯作者:
Yufeng Zhang
Identification of pH Neutralization Process Based on the T-S Fuzzy Model
基于T-S模糊模型的pH中和过程辨识
- DOI:
10.1007/978-3-642-23324-1_93 - 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Xiaohui Chen;Jinpeng Chen;B. Lei - 通讯作者:
B. Lei
An adaptive activity sequencing instrument to enhance e-learning: an integrated application of overlay user model and mathematical programming on the Web
增强电子学习的自适应活动排序工具:覆盖用户模型和数学编程在网络上的集成应用
- DOI:
10.1109/iccisci.2019.8716473 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
D. Ahmadaliev;Xiaohui Chen;M. Abduvohidov;Asilbek Medatov;Gulbahor Temirova - 通讯作者:
Gulbahor Temirova
Xiaohui Chen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Chen', 18)}}的其他基金
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Developing an MND oral health care pathway and a dynamic toolkit
开发 MND 口腔保健途径和动态工具包
- 批准号:
ES/Y008200/1 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Research Grant
Collaborative Research: Second Order Inference for High-Dimensional Time Series and Its Applications
合作研究:高维时间序列的二阶推理及其应用
- 批准号:
1404891 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似国自然基金
基于多重计算全息片(Computer-generated Hologram,CGH)的光学非球面干涉绝对检验方法研究
- 批准号:62375132
- 批准年份:2023
- 资助金额:54.00 万元
- 项目类别:面上项目
植物病毒的微流控芯片ELISA智能便携平台测定方法研究
- 批准号:21505061
- 批准年份:2015
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
不同精神压力与身体负荷对电脑工作相关颈痛的成因机制研究
- 批准号:81472155
- 批准年份:2014
- 资助金额:61.0 万元
- 项目类别:面上项目
电脑豁达治疗对肺癌的康复作用及其脑代谢机制研究
- 批准号:81372488
- 批准年份:2013
- 资助金额:65.0 万元
- 项目类别:面上项目
Journal of Computer Science and Technology
- 批准号:61224001
- 批准年份:2012
- 资助金额:20.0 万元
- 项目类别:专项基金项目
相似海外基金
Predicting the Absence of Serious Bacterial Infection in the PICU
预测 PICU 中不存在严重细菌感染
- 批准号:
10806039 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Transitions Among Discrete Clinical States During ICU Stays in Patients with SARS-CoV-2 Pneumonia
SARS-CoV-2 肺炎患者入住 ICU 期间离散临床状态的转变
- 批准号:
10537554 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Joint longitudinal and survival models for intensive longitudinal data from mobile health studies of smoking cessation
来自戒烟移动健康研究的密集纵向数据的联合纵向和生存模型
- 批准号:
10677935 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Integrating Novel Physiological Biomarkers of Feeding Intolerance in Preterm Infants
整合早产儿喂养不耐受的新型生理生物标志物
- 批准号:
10739943 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant