BIGDATA: F: DKA: Collaborative Research: Theory and Algorithms for Parallel Probabilistic Inference with Big Data, via Big Model, in Realistic Distributed Computing Environments
BIGDATA:F:DKA:协作研究:在现实分布式计算环境中通过大模型进行大数据并行概率推理的理论和算法
基本信息
- 批准号:1447676
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project develops a new framework that enables machine learning (ML) systems to automatically comprehend and mine massive and complex data via parallel Bayesian inference on large computer clusters. The research has a profound impact on the practice and direction of Big Learning. The developed technologies have a catalytic effect on both ML research and applications: ML scientists are able to rapidly experiment on novel, cutting-edge ML models with minimal programming effort, unhindered by the limitations of single machines. Researchers from other fields, like biology and social sciences, are able to run contemporary advanced ML methods that transcend the capabilities of simple models, yielding new scientific insights on data whose size would otherwise be daunting. Data scientists at small start-ups are able to conduct ML analytics with complex models, putting their capabilities on par with huge companies possessing dedicated engineering and infrastructure teams. Students and beginners are able to witness distributed ML in action with just a few lines of code, driving ML education to new heights. Technically, this research focuses on scaling up and parallelizing Bayesian machine learning, which provides a powerful, elegant and theoretically justified framework for modeling a wide variety of datasets. The research team develops a suite of complementary distributed inference algorithms for hierarchical Bayesian models, which cover most commonly used Bayesian ML methods. The project focuses on combining speed and scalability with theoretical guarantees that allow us to assess the accuracy of the resulting methods, and allow practitioners to make trade-offs between speed and accuracy. Rather than focus on a few disconnected models, the project develops techniques applicable to a broad spectrum of hierarchical Bayesian models, resulting in a toolkit of building blocks that can be combined as needed for arbitrary probabilistic models - be they parametric or nonparametric, discriminative or generative. This is in contrast to much existing work on parallel inference, which tends to focus on parallelization in a specific model and cannot be easily extended. The project provides a solid algorithmic foundation for learning on Big Data with powerful models. The research contributes to democratizing advanced and large-scale ML methods for broad applications, by offering the user and developer community a library of general-purpose parallelizable algorithms for working on diverse problems using computer clusters and the cloud, bridging the gap between practical needs from data and basic research in ML.
该项目开发了一个新的框架,该框架使机器学习(ML)系统能够通过大型计算机簇上的平行贝叶斯推断自动理解并进行大规模和复杂的数据。这项研究对大型学习的实践和方向产生了深远的影响。开发的技术对ML研究和应用都有催化作用:ML科学家能够以最少的编程工作来快速地对新颖的,最先进的ML模型进行实验,但受到单个机器的局限性的影响。来自其他领域的研究人员,例如生物学和社会科学,能够运行超越简单模型能力的当代高级ML方法,从而对数据产生新的科学见解。小型初创企业的数据科学家能够通过复杂的模型进行ML分析,使其能力与拥有专门的工程和基础设施团队的大型公司相当。学生和初学者只需几行代码,就可以目睹分布式的ML行动,将ML教育推向新的高度。从技术上讲,这项研究的重点是扩展和并行化贝叶斯机器学习,该机器学习提供了一个强大,优雅且理论上合理的框架,用于建模各种数据集。 研究团队为等级贝叶斯模型开发了一套互补的分布推理算法,该算法涵盖了最常用的贝叶斯ML方法。该项目着重于将速度和可扩展性与理论保证相结合,使我们能够评估所得方法的准确性,并允许从业者在速度和准确性之间进行权衡。该项目并没有专注于一些断开的模型,而是开发适用于各种层次贝叶斯模型的技术,从而产生了一个可以根据任意概率模型所需的构建块工具包(无论是参数或非参数还是非参数,歧视性,歧视性或生成性)。这与并行推断上的许多现有工作相反,该工作倾向于集中在特定模型中的并行化上,并且不能轻易扩展。该项目为具有强大模型的大数据学习提供了可靠的算法基础。这项研究通过为用户和开发人员社区提供通用的可行算法库来使高级和大规模的ML ML方法民主化,以便使用计算机群集和云弥补了多种问题,从而弥补了ML数据和基础研究的实际需求之间的差距。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric Xing其他文献
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
An exploratory study of self-supervised pre-training on partially supervised multi-label classification on chest X-ray images
胸部X射线图像部分监督多标签分类自监督预训练的探索性研究
- DOI:
10.1016/j.asoc.2024.111855 - 发表时间:
2024 - 期刊:
- 影响因子:8.7
- 作者:
Nanqing Dong;Michael Kampffmeyer;Haoyang Su;Eric Xing - 通讯作者:
Eric Xing
Eric Xing的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric Xing', 18)}}的其他基金
III: Small: Multiple Device Collaborative Learning in Real Heterogeneous and Dynamic Environments
III:小:真实异构动态环境中的多设备协作学习
- 批准号:
2311990 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
ML Basis for Intelligence Augmentation:Toward Personalized Modeling, Reasoning under Data-Knowledge Symbiosis, and Interpretable Interaction for AI-assisted Human Decision-making
智能增强的机器学习基础:面向人工智能辅助人类决策的个性化建模、数据知识共生下的推理和可解释的交互
- 批准号:
2040381 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Collaborative Research: SCH: Trustworthy and Explainable AI for Neurodegenerative Diseases
合作研究:SCH:值得信赖且可解释的人工智能治疗神经退行性疾病
- 批准号:
2123952 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CNS Core: Small: Toward Globally-Optimal Resource Distribution and Computation Acceleration in Multi-Tenant and Heterogeneous Machine Learning Systems
CNS 核心:小型:在多租户和异构机器学习系统中实现全局最优资源分配和计算加速
- 批准号:
2008248 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
III: Small: A New Approach to Latent Space Learning with Diversity-Inducing Regularization and Applications to Healthcare Data Analytics
III:小型:具有多样性诱导正则化的潜在空间学习新方法及其在医疗保健数据分析中的应用
- 批准号:
1617583 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
XPS: FULL: Broad-Purpose, Aggressively Asynchronous and Theoretically Sound Parallel Large-scale Machine Learning
XPS:FULL:用途广泛、积极异步且理论上合理的并行大规模机器学习
- 批准号:
1629559 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Efficient, Nonparametric and Local-Minimum-Free Latent Variable Models: With Application to Large-Scale Computer Vision and Genomics
III:小型:协作研究:高效、非参数和局部最小自由潜变量模型:应用于大规模计算机视觉和基因组学
- 批准号:
1218282 - 财政年份:2012
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Collaborative Research: Using Large-Scale Image Data for Online Social Media Analysis
III:小:协作研究:使用大规模图像数据进行在线社交媒体分析
- 批准号:
1115313 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: Discovering and Exploiting Latent Communities in Social Media
协作研究:发现和利用社交媒体中的潜在社区
- 批准号:
1111142 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Indexing, Mining and Modeling Spatio-Temporal Patterns of Gene Expressions
基因表达时空模式的索引、挖掘和建模
- 批准号:
0640543 - 财政年份:2007
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似国自然基金
HIV-1逆转录酶/整合酶双重抑制剂DKA-DAPYs的分子设计、合成及抗HIV活性研究
- 批准号:21402148
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
- 批准号:
1661760 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
- 批准号:
1664720 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447473 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447413 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
- 批准号:
1447476 - 财政年份:2015
- 资助金额:
$ 50万 - 项目类别:
Standard Grant