CAREER: Leveraging Randomization and Structure in Computational Linear Algebra for Data Science
职业:利用计算线性代数中的随机化和结构进行数据科学
基本信息
- 批准号:2338655
- 负责人:
- 金额:$ 64.94万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-05-01 至 2029-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Data science plays a central role in addressing societal challenges, such as healthcare, climate change, and urban planning. At the core of nearly all developments in algorithms for data science is computational linear algebra, an area that concerns the study of algorithms for solving ubiquitous problems involving matrices and other linear-algebraic objects that are used to represent data. With ever-increasing data sizes, randomization has become a key technique for developing efficient algorithms in computational linear algebra. Yet, there is a significant gap between the theory and practice of these algorithms, which has slowed their practical adoption in data science applications. This project identifies key challenges and puts forward new directions towards providing the algorithmic foundations necessary to ensure that a broad scope of randomized linear algebra algorithms are successfully deployed across computational data science over the next decade. This project leverages fundamental interdisciplinary ideas at the intersection of theoretical computer science, machine learning, statistics, and nonlinear optimization. In addition to developing the theoretical foundations, one of the key aims driving the project is to facilitate ongoing implementation efforts aimed at incorporating randomization into LAPACK, the default computational linear algebra software package in machine learning, engineering, statistics, and scientific computing for the past thirty years. At the core of the project is an integrated education plan focused on helping students to gain an interdisciplinary skillset at the intersection of algorithmic foundations and data science. The project also involves outreach to students from three underresourced high schools in Michigan through a collaboration with the university's Engineering Pathways program.The project’s objectives are to close the theory-practice gap in using randomization to design improved algorithms for ubiquitous matrix problems such as matrix multiplication, solving linear systems, and low-rank approximation. The project identifies three major thrusts, namely (1) reformulating optimal matrix sketching via black-box sampling methods; (2) randomized iterative refinement algorithms via stochastic optimization; (3) a study of robustness of randomized numerical linear algebra algorithms to preserve certain structural elements of data. The matrix sketch, i.e., a small randomized approximation of the input data is a key foundational component of these algorithms. The project aims to develop new algorithmic and theoretical approaches towards ensuring the control and reliability of the output produced by matrix sketching and sub-sampling, which is especially challenging when dealing with randomization and will be critical for successful software integration. Building on these tools, the project pursues new approaches for designing high-precision algorithms solving linear systems and quadratic problems, by exploring techniques that lie in the unexplored regime between deterministic iterative solvers and stochastic optimization. Finally, the project aims to contribute to a unified understanding of randomized matrix approximation algorithms that preserve the structure of the data, which is essential for feature selection, experimental design, interpretability, and more.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据科学在应对社会挑战(例如医疗保健,气候变化和城市规划)方面起着核心作用。数据科学算法中几乎所有发展的核心是计算线性代数,该区域涉及研究算法解决无处不在的问题的算法,涉及材料和其他用于表示数据的线性 - 代数对象。随着数据大小的不断增加,随机化已成为开发计算线性代数中有效算法的关键技术。然而,这些算法的理论和实践之间存在很大的差距,这使他们在数据科学应用中的实际采用减慢了。该项目确定了关键挑战,并提出了新的方向,以提供必要的算法基础,以确保在未来十年中成功地在计算数据科学中成功部署了广泛的随机线性代数算法。该项目利用理论计算机科学,机器学习,统计和非线性优化的交集的基本跨学科思想。除了开发理论基础外,推动该项目的关键目标之一是促进旨在将随机化纳入Lapack的持续实施工作,这是过去三十年来机器学习,工程,统计和科学计算的默认计算线性代数软件包。该项目的核心是一项综合教育计划,旨在帮助学生在算法基础和数据科学的交集中获得跨学科的技能。该项目还涉及与大学工程途径计划的合作,向密歇根州三所水资源不足的高中的学生展开展览。该项目的目标是弥合理论实践差距,以使用随机化来设计改进的算法,以改进无处不在的矩阵问题,例如matrix乘法,求解线性的线性系统,求解线性线性系统和低率近距离近距离。该项目确定了三个主要推力,即(1)通过黑盒抽样方法改革最佳矩阵草图; (2)通过随机优化的随机迭代改进算法; (3)研究随机数值线性代数算法的鲁棒性,以保留数据的某些结构元素。矩阵草图,即输入数据的小近似值是这些算法的关键基础成分。该项目旨在开发新的算法和理论方法,以确保矩阵草图和子采样产生的输出的控制和可靠性,这在处理随机化时尤其具有挑战性,对于成功的软件集成至关重要。该项目以这些工具为基础,采用新的方法来设计高精度算法解决线性系统和二次问题,通过探索在确定性迭代求解器和随机优化之间出乎意料的方向上的技术。最后,该项目旨在为保留数据结构的随机矩阵近似算法的统一理解做出贡献,这对于特征选择,实验设计,可解释性和更多是必不可少的。该奖项反映了NSF的法定任务,并认为使用该基金会的知识功能和广泛的影响来评估NSF的法定任务。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Michal Derezinski其他文献
Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems
回归问题中随机草图算法的基于代理的自动调整
- DOI:
10.48550/arxiv.2308.15720 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Younghyun Cho;J. Demmel;Michal Derezinski;Haoyun Li;Hengrui Luo;Michael W. Mahoney;Riley Murray - 通讯作者:
Riley Murray
Fast determinantal point processes via distortion-free intermediate sampling
- DOI:
- 发表时间:
2018-11 - 期刊:
- 影响因子:0
- 作者:
Michal Derezinski - 通讯作者:
Michal Derezinski
Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs
通过草图进行算法高斯化:将数据转换为亚高斯随机设计
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Michal Derezinski - 通讯作者:
Michal Derezinski
Determinantal Point Processes in Randomized Numerical Linear Algebra
随机数值线性代数中的行列式点过程
- DOI:
10.1090/noti2202 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Michal Derezinski;Michael W. Mahoney - 通讯作者:
Michael W. Mahoney
Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches
- DOI:
10.48550/arxiv.2206.02702 - 发表时间:
2022-06 - 期刊:
- 影响因子:0
- 作者:
Michal Derezinski - 通讯作者:
Michal Derezinski
Michal Derezinski的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
利用油菜-菘蓝附加系解析板蓝根药用活性成分及遗传稳定的抗病毒油菜创制
- 批准号:32372088
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
肠出血性大肠杆菌利用sRNA感应肠道环境信号、提高体内致病能力的分子机制的研究
- 批准号:82372267
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
稻米镁元素积累新主效QTL克隆和功能研究及其育种利用
- 批准号:32372095
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
PRL-3磷酸酶上调抑癌基因P53导致结直肠癌细胞乳酸再利用的促癌机制
- 批准号:82372656
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
利用碱基编辑筛选构建肝癌药物敏感性遗传变异功能图谱
- 批准号:32301243
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Leveraging pleiotropy to develop polygenic risk scores for cardiometabolic diseases
利用多效性开发心脏代谢疾病的多基因风险评分
- 批准号:
10797389 - 财政年份:2023
- 资助金额:
$ 64.94万 - 项目类别:
Leveraging Hispanic/Latino diversity to map and characterize cardiovascular disease loci
利用西班牙裔/拉丁裔多样性来绘制和描述心血管疾病基因座
- 批准号:
10587581 - 财政年份:2023
- 资助金额:
$ 64.94万 - 项目类别:
Leveraging the Genetics of carotid stenosis for identifying novel risk factors and therapeutic opportunities
利用颈动脉狭窄的遗传学来识别新的危险因素和治疗机会
- 批准号:
10589557 - 财政年份:2023
- 资助金额:
$ 64.94万 - 项目类别:
Leveraging drug target Mendelian randomization to predict the effect of lipoprotein(a) inhibition
利用药物靶标孟德尔随机化来预测脂蛋白(a) 抑制的效果
- 批准号:
473838 - 财政年份:2022
- 资助金额:
$ 64.94万 - 项目类别:
Fellowship Programs
Leveraging multi-omics approaches to examine metabolic challenges of obesity in relation to cardiovascular diseases
利用多组学方法检查肥胖与心血管疾病相关的代谢挑战
- 批准号:
10409657 - 财政年份:2019
- 资助金额:
$ 64.94万 - 项目类别: