CDS&E: Collaborative Research: Scalable Nonparametric Learning for Massive Data with Statistical Guarantees
CDS
基本信息
- 批准号:2005779
- 负责人:
- 金额:$ 12.76万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-08-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
We now live in the era of data deluge. The sheer volume of the data to be processed, together with the growing complexity of statistical models and the increasingly distributed nature of the data sources, creates new challenges to modern statistics theory. Standard machine learning methods are no longer able to accommodate the computational requirements. They need to be re-designed or adapted, which calls for a new generation of design and theory of scalable learning algorithms for massive data. This project aims to provide a collection of state-of-the-art nonparametric learning tools for big data analysis, which can be directly used by scientists and practitioners and have beneficial impacts on various fields such as biomedicine, health-care, defense and security, and information technology. The deliverables of this project include easy-to-use software packages that will be thoroughly evaluated using a range of application examples. They will directly help scientists to explore and analyze complex data sets. Due to storage and computational bottlenecks, traditional statistical inferential procedures originally designed for a single machine are no longer applicable to modern large datasets. This project aims to design new scalable learning algorithms of wide-ranging nonparametric models for data that are distributed across a large number of multi-core computational nodes, or in a fashion of random sketching if only a single machine is available. The computational limits of these new algorithms will be examined from a statistical perspective. For example, in the divide-and-conquer setup, the number of deployed machines can be viewed as a simple proxy for computing cost. The project aims to establish a sharp upper bound for this number: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. Related questions will also be addressed in the randomized sketching method in terms of the minimal number of random projections.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
我们现在生活在数据洪流的时代。需要处理的数据量巨大,加上统计模型的日益复杂性和数据源日益分散的性质,给现代统计理论带来了新的挑战。标准机器学习方法不再能够满足计算要求。它们需要重新设计或调整,这就需要针对海量数据的新一代可扩展学习算法的设计和理论。该项目旨在提供一系列最先进的大数据分析非参数学习工具,可供科学家和从业者直接使用,并对生物医学、医疗保健、国防和安全等各个领域产生有益影响和信息技术。该项目的交付成果包括易于使用的软件包,将使用一系列应用示例对其进行彻底评估。他们将直接帮助科学家探索和分析复杂的数据集。由于存储和计算瓶颈,最初为单机设计的传统统计推理程序不再适用于现代大型数据集。该项目旨在为分布在大量多核计算节点上的数据设计新的可扩展的非参数模型学习算法,或者在只有一台机器可用的情况下以随机草图的方式设计。这些新算法的计算限制将从统计角度进行检验。例如,在分而治之的设置中,部署的机器数量可以被视为计算成本的简单代理。该项目旨在为这个数字建立一个明确的上限:当数字低于这个界限时,可以实现统计最优性(在非参数估计或测试方面);否则,统计最优性就不可能实现。相关问题也将在随机草图方法中根据最小数量的随机预测得到解决。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks
深度神经网络的概率连接重要性推断和无损压缩
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Xin Xing, Long Sha
- 通讯作者:Xin Xing, Long Sha
Nonparametric distributed learning under general designs
- DOI:10.1214/20-ejs1733
- 发表时间:2020-01-01
- 期刊:
- 影响因子:1.1
- 作者:Liu, Meimei;Shang, Zuofeng;Cheng, Guang
- 通讯作者:Cheng, Guang
Distributed adaptive nearest neighbor classifier: algorithm and theory
- DOI:10.1007/s11222-023-10267-7
- 发表时间:2021-05
- 期刊:
- 影响因子:2.2
- 作者:Ruiqi Liu;Ganggang Xu;Zuofeng Shang
- 通讯作者:Ruiqi Liu;Ganggang Xu;Zuofeng Shang
Optimal Nonparametric Inference via Deep Neural Network
- DOI:10.1016/j.jmaa.2021.125561
- 发表时间:2019-02
- 期刊:
- 影响因子:0
- 作者:Ruiqi Liu;B. Boukai;Zuofeng Shang
- 通讯作者:Ruiqi Liu;B. Boukai;Zuofeng Shang
Identification and estimation in panel models with overspecified number of groups
具有过度指定组数的面板模型中的识别和估计
- DOI:10.1016/j.jeconom.2019.09.008
- 发表时间:2020-04
- 期刊:
- 影响因子:6.3
- 作者:Liu Ruiqi;Shang Zuofeng;Zhang Yonghui;Zhou Qiankun
- 通讯作者:Zhou Qiankun
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zuofeng Shang其他文献
Statistica Sinica Preprint No: SS-2022-0057
《统计》预印本编号:SS-2022-0057
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Shuoyang Wang;Zuofeng Shang;Guanqun Cao;Jun Liu - 通讯作者:
Jun Liu
Empirical likelihood test for community structure in networks
网络中社区结构的经验似然检验
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Mingao Yuan;Sharmin Hossain;Zuofeng Shang - 通讯作者:
Zuofeng Shang
Sharp detection boundaries on testing dense subhypergraph
测试密集子超图时的清晰检测边界
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:1.5
- 作者:
Mingao Yuan;Zuofeng Shang - 通讯作者:
Zuofeng Shang
A Fast Non-Linear Coupled Tensor Completion Algorithm for Financial Data Integration and Imputation
一种用于金融数据集成和插补的快速非线性耦合张量完成算法
- DOI:
10.1145/3604237.3626899 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
D. Zhou;Ajim Uddin;Zuofeng Shang;C. Sylla;Xinyuan Tao;Dantong Yu - 通讯作者:
Dantong Yu
Testing community structure for hypergraphs
测试超图的社区结构
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:4.5
- 作者:
Mingao Yuan;Ruiqi Liu;Yang Feng;Zuofeng Shang - 通讯作者:
Zuofeng Shang
Zuofeng Shang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zuofeng Shang', 18)}}的其他基金
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
2005746 - 财政年份:2019
- 资助金额:
$ 12.76万 - 项目类别:
Continuing Grant
CDS&E: Collaborative Research: Scalable Nonparametric Learning for Massive Data with Statistical Guarantees
CDS
- 批准号:
1821157 - 财政年份:2018
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
1764280 - 财政年份:2017
- 资助金额:
$ 12.76万 - 项目类别:
Continuing Grant
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
1712919 - 财政年份:2017
- 资助金额:
$ 12.76万 - 项目类别:
Continuing Grant
相似国自然基金
基于交易双方异质性的工程项目组织间协作动态耦合研究
- 批准号:72301024
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向5G超高清移动视频传输的协作NOMA系统可靠性研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向协作感知车联网的信息分发时效性保证关键技术研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
数据物理驱动的车间制造服务协作可靠性机理与优化方法研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
医保基金战略性购买促进远程医疗协作网价值共创的制度创新研究
- 批准号:
- 批准年份:2022
- 资助金额:45 万元
- 项目类别:面上项目
相似海外基金
CDS&E/Collaborative Research: Local Gaussian Process Approaches for Predicting Jump Behaviors of Engineering Systems
CDS
- 批准号:
2420358 - 财政年份:2024
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant
CDS&E/Collaborative Research: Data-Driven Inverse Design of Additively Manufacturable Aperiodic Architected Cellular Materials
CDS
- 批准号:
2245298 - 财政年份:2023
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant
Collaborative Research: CDS&E: Computational Exploration of Electrically Conductive Metal-Organic Frameworks as Cathode Materials in Lithium-Sulfur Batteries
合作研究:CDS
- 批准号:
2302618 - 财政年份:2023
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant
CDS&E/Collaborative Research: A Symbolic Artificial Intelligence Framework for Discovering Physically Interpretable Constitutive Laws of Soft Functional Composites
CDS
- 批准号:
2244952 - 财政年份:2023
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant
CDS&E/Collaborative Research: A Symbolic Artificial Intelligence Framework for Discovering Physically Interpretable Constitutive Laws of Soft Functional Composites
CDS
- 批准号:
2244953 - 财政年份:2023
- 资助金额:
$ 12.76万 - 项目类别:
Standard Grant