III: Medium: SimSQL: A Database System Supporting Implementation and Execution of Distributed Machine Learning Codes
III:媒介:SimSQL:支持分布式机器学习代码实现和执行的数据库系统
基本信息
- 批准号:1409543
- 负责人:
- 金额:$ 120万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2020-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Statistical machine learning (ML) is a commonly-applied framework for analyzing very large data sets. In statistical ML, the goal is to learn a statistical model that can be used to understand the data, find patterns, or make predictions. Thus, many new software systems have been designed to support easy implementation and fast execution of parallel/distributed ML computer codes over large data sets. Almost all of those systems are "non-relational" in the sense that they utilize data and programming models that are very different from today's relational database management systems. Still, the attractiveness of the relational or database-oriented approach to data processing persists. For example, codes running on top of a database are declarative, so a programmer need only be concerned with what he or she wants, and not how to obtain it. This makes it easier to write codes and get them to run in a distributed environment, enabling a strong separation between the code and the database data processing algorithms, storage, hardware, and indexing, and even from the database schema. Further, much of the world's structured data sits in relational databases, and extracting anything more than a small subsample of a large data set for external use is typically a non-starter. Being able to execute a ML inference code within the database, using the database engine, would greatly increase applicability of statistical ML. This project will perform the fundamental research necessary to make ML-in-the-database a mature technology. All of the ideas developed by the project will be prototyped, evaluated, and distributed within the context of SimSQL, which is a parallel, relational database system, augmented with the ability to perform "stochastic analytics". This means that SimSQL has special facilities that allow a user to define special database tables that have simulated data---these are data that are not actually stored in the database, but are produced by calls to statistical distributions. Since tables of simulated data in SimSQL can have such recursive dependencies, it is easy to use SimSQL to run stochastic ML inference algorithms (such as MCMC) over "Big Data". Research tasks include increasing the level of performance of SimSQL by exploiting the optimization opportunities presented by large-scale, iterative, ML computations. They also include expanding the types of ML inference algorithms that can easily be specified in SimSQL's SQL dialect, making SimSQL applicable for various stochastic inference algorithms such as MCMC (Markov Chain Monte Carlo) and Monte Carlo EM (Expectation Maximization). Further, the project will investigate automatically compiling R and BUGS-like ML algorithm specifications into SimSQL SQL. All of the software developed by the project will be available open source under the Apache license. The code can be downloaded from (and more information can be found at) http://cmj4.web.rice.edu/SimSQL/SimSQL.html
统计机器学习(ML)是一个普遍应用的框架,用于分析非常大的数据集。在统计ML中,目标是学习一个统计模型,该模型可用于了解数据,查找模式或做出预测。 因此,许多新的软件系统旨在支持在大型数据集上轻松实现和快速执行并行/分布式ML计算机代码。 几乎所有这些系统都是“非关系”的,因为它们利用与当今关系数据库管理系统大不相同的数据和编程模型。 尽管如此,关系或以数据库为导向的数据处理的吸引力仍然存在。例如,在数据库之上运行的代码是声明性的,因此程序员只需要关注他或她想要的东西,而不是如何获得它。这使得编写代码并让它们在分布式环境中运行变得更加容易,从而使代码与数据库数据处理算法,存储,硬件和索引甚至在数据库架构之间进行了强大的分离。 此外,世界上许多结构化数据都位于关系数据库中,除了大型数据集的小型子样本外,要提取任何内容通常是非启动器。能够使用数据库引擎在数据库中执行ML推理代码,将大大提高统计ML的适用性。 该项目将进行必要的基础研究,以使数据库成为成熟的技术。项目开发的所有想法将在SIMSQL的背景下进行原型,评估和分发,这是一个平行的,关系数据库系统,具有执行“随机分析”的能力增强。这意味着SIMSQL具有特殊的设施,可以允许用户定义具有模拟数据的特殊数据库表 - 这些数据实际上并未存储在数据库中,但通过调用统计分布来生成。 由于SIMSQL中的模拟数据表可以具有此类递归依赖性,因此可以轻松地使用SIMSQL在“大数据”上运行随机ML推理算法(例如MCMC)。研究任务包括通过利用大规模,迭代,ML计算提供的优化机会来提高SIMSQL的性能水平。它们还包括扩展可以在SIMSQL的SQL方言中轻松指定的ML推理算法的类型,使SIMSQL适用于各种随机推理算法,例如MCMC(Markov Chain Monte Carlo)和Monte Carlo EM(期望最大化)。此外,该项目将自动研究将R和类似错误的ML算法规范编译到SIMSQL SQL中。该项目开发的所有软件将根据Apache许可证提供开源。 该代码可以从(可以在)http://cmj4.web.rice.edu/simsql/simsql.html下载(可以找到更多信息)
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher Jermaine其他文献
Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks
通过进化网络上的吉布斯采样探索系统发育假设
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:4.4
- 作者:
Yun Yu;Christopher Jermaine;Luay K. Nakhleh - 通讯作者:
Luay K. Nakhleh
The Latent Community Model for Detecting Sybil Attacks in Social Networks
用于检测社交网络中女巫攻击的潜在社区模型
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Zhuhua Cai;Christopher Jermaine - 通讯作者:
Christopher Jermaine
Christopher Jermaine的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christopher Jermaine', 18)}}的其他基金
Collaborative Research: SHF: Medium: Semantics-Aware Neural Models of Code
合作研究:SHF:媒介:代码的语义感知神经模型
- 批准号:
2212557 - 财政年份:2022
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: RPEP: III: celtSTEM Research Collaborative: Catapulting MSI Faculty and Students into Computational Research.
合作研究:CISE-MSI:RPEP:III:celtSTEM 研究合作:将 MSI 教师和学生推向计算研究。
- 批准号:
2131294 - 财政年份:2021
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
III: Small: Applying Relational Database Design Principles to Machine Learning System Design
三:小:将关系数据库设计原理应用于机器学习系统设计
- 批准号:
2008240 - 财政年份:2020
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
MLWiNS: Wireless On-the-Edge Training of Deep Networks Using Independent Subnets
MLWiNS:使用独立子网的深度网络无线边缘训练
- 批准号:
2003137 - 财政年份:2020
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Expeditions: Collaborative Research: Understanding the World Through Code
探险:合作研究:通过代码了解世界
- 批准号:
1918651 - 财政年份:2020
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
III: Small: Declarative Recursive Computation on a Database System
III:小型:数据库系统上的声明式递归计算
- 批准号:
1910803 - 财政年份:2019
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
ABI Innovation: Algorithms and Models for Distributed Computation of Bayesian Phylogenetics
ABI Innovation:贝叶斯系统发育分布式计算算法和模型
- 批准号:
1355998 - 财政年份:2014
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Data Mining and Cleaning for Medical Data Warehouses
III:媒介:协作研究:医疗数据仓库的数据挖掘和清理
- 批准号:
0964526 - 财政年份:2010
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
Small: The MCDB Database System for Managing and Modeling Uncertainty
小:用于管理和建模不确定性的 MCDB 数据库系统
- 批准号:
0915315 - 财政年份:2009
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
III-COR-Medium: Design and Implementation of the DBO Database System
III-COR-Medium:DBO数据库系统的设计与实现
- 批准号:
1007062 - 财政年份:2009
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
相似国自然基金
复合低维拓扑材料中等离激元增强光学响应的研究
- 批准号:12374288
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
基于管理市场和干预分工视角的消失中等企业:特征事实、内在机制和优化路径
- 批准号:72374217
- 批准年份:2023
- 资助金额:41.00 万元
- 项目类别:面上项目
托卡马克偏滤器中等离子体的多尺度算法与数值模拟研究
- 批准号:12371432
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
中等质量黑洞附近的暗物质分布及其IMRI系统引力波回波探测
- 批准号:12365008
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
中等垂直风切变下非对称型热带气旋快速增强的物理机制研究
- 批准号:42305004
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321102 - 财政年份:2024
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
- 批准号:
2327438 - 财政年份:2024
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
- 批准号:
2344489 - 财政年份:2024
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant