Theory and practice for exploiting the underlying structure of probability models in big data analysis
在大数据分析中利用概率模型的底层结构的理论与实践
基本信息
- 批准号:1622490
- 负责人:
- 金额:$ 25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-08-01 至 2019-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Ever-increasing use of data-intensive methods in scientific discoveries has led to a paradigm shift in science in recent years. High throughput scientific experiments, routine use of digital sensors, and intensive computer simulations have created a data deluge imposing new challenges on scientific communities to find effective and computationally feasible methods for processing and analyzing very large datasets. Despite many attempts, however, the necessary development of theoretical and computational foundations for big data analysis is lagging far behind. Many existing statistical methods are not capable of handling such data-intensive problems in terms of theoretical foundation as well as computational complexity and scalability. For analyzing high dimensional data with possibly complex structures, this research will offer a set of fundamental solutions using principled statistical methods. The resulting methods will provide a robust framework for big data analysis and allow scientists to use statistical models beyond their current limited applicability. The techniques developed in this project are likely to gain widespread acceptance across a broad spectrum of scientific disciplines, as well as in industry.The focus of this research is mainly on Bayesian statistics. Many recent methods aim to improve computational efficiency of Bayesian models by approximating the likelihood function using a small subset of data. In contrast, the objective of this research is to explore underlying structures of probability models and exploit these features to design efficient and scalable computational methods and algorithms for Bayesian inference in big data analysis. To this end, (1) the PIs will define and study the structure of probability distributions in order to develop novel geometrically motivated methods for statistical inference; (2) the PIs will develop efficient and scalable computational methods that accurately approximate probability distributions by exploiting their geometric properties; (3) finally, the PIs will apply these methods to real computationally-intensive problems from biological sciences. Due to its interdisciplinary nature, this research is expected to contribute to several fields, including statistics, machine learning, applied mathematics, and data-intensive computing.
近年来,在科学发现中越来越多地使用数据密集型方法导致了科学范式的转变。高通量科学实验、数字传感器的常规使用以及密集的计算机模拟造成了数据洪流,给科学界带来了新的挑战,需要找到有效且计算上可行的方法来处理和分析非常大的数据集。然而,尽管进行了许多尝试,大数据分析的理论和计算基础的必要发展仍然远远落后。许多现有的统计方法在理论基础、计算复杂性和可扩展性方面都无法处理此类数据密集型问题。为了分析可能具有复杂结构的高维数据,本研究将使用原则统计方法提供一组基本解决方案。由此产生的方法将为大数据分析提供一个强大的框架,并使科学家能够使用超出其当前有限适用性的统计模型。该项目开发的技术可能会获得广泛的科学学科以及工业界的广泛接受。这项研究的重点主要是贝叶斯统计。最近的许多方法旨在通过使用一小部分数据来近似似然函数来提高贝叶斯模型的计算效率。相比之下,本研究的目标是探索概率模型的底层结构,并利用这些特征为大数据分析中的贝叶斯推理设计高效且可扩展的计算方法和算法。为此,(1)PI将定义和研究概率分布的结构,以开发新颖的几何驱动的统计推断方法; (2) PI 将开发高效且可扩展的计算方法,通过利用其几何特性来准确近似概率分布; (3) 最后,PI 将应用这些方法来解决生物科学中真正的计算密集型问题。由于其跨学科性质,这项研究预计将为多个领域做出贡献,包括统计学、机器学习、应用数学和数据密集型计算。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Babak Shahbaba其他文献
Babak Shahbaba的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Babak Shahbaba', 18)}}的其他基金
Collaborative Research: HDR DSC: Data Science Training and Practices: Preparing a Diverse Workforce via Academic and Industrial Partnership
合作研究:HDR DSC:数据科学培训和实践:通过学术和工业合作培养多元化的劳动力
- 批准号:
2123366 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
MODULUS: Data-Driven Mechanistic Modeling of Hierarchical Tissues
MODULUS:分层组织的数据驱动机制建模
- 批准号:
1936833 - 财政年份:2019
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
相似国自然基金
互动、流动与再物质化:文化遗产实践中的参与式意义建构研究
- 批准号:42301261
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于认知过程挖掘的教师实践性知识演进机制研究
- 批准号:62307017
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
破除行政垄断、统一大市场建设与公司财务行为研究:基于政策审查与执法实践的视角
- 批准号:72302086
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
云南自然保护区社区生计空间的规制与实践:人地系统适应性视角
- 批准号:42361037
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
多模式音乐实践提高人工耳蜗植入者言语和音乐感知表现的试验研究
- 批准号:82301301
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Identifying and exploiting therapeutic vulnerabilities of tumor-host interactions that drive bone-to-meninges breast cancer metastasis
识别和利用导致骨到脑膜乳腺癌转移的肿瘤与宿主相互作用的治疗脆弱性
- 批准号:
10826488 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Exploiting alpha-ketoglutarate-dependent metabolism for therapeutic benefit in acute myeloid leukemia
利用α-酮戊二酸依赖性代谢来治疗急性髓系白血病
- 批准号:
10684842 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Exploiting alpha-ketoglutarate-dependent metabolism for therapeutic benefit in acute myeloid leukemia
利用α-酮戊二酸依赖性代谢来治疗急性髓系白血病
- 批准号:
10523632 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Understanding and exploiting novel therapeutic vulnerabilities of RIT1-driven lung cancer
了解和利用 RIT1 驱动的肺癌的新治疗漏洞
- 批准号:
10211377 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别:
Understanding and exploiting novel therapeutic vulnerabilities of RIT1-driven lung cancer
了解和利用 RIT1 驱动的肺癌的新治疗漏洞
- 批准号:
10641671 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别: