EFFICIENT SPECTRAL APPROACHES FOR FINDING UNDERLYING STRUCTURES IN BIG DATA
用于查找大数据底层结构的高效谱方法
基本信息
- 批准号:9278252
- 负责人:
- 金额:$ 41万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-05-24 至 2019-04-30
- 项目状态:已结题
- 来源:
- 关键词:Algorithmic AnalysisAlgorithmsArchitectureBig DataBindingBioinformaticsBiologicalCategoriesCellsChIP-seqClinical ResearchComputer softwareConfusionDataData AnalysesData AnalyticsData DiscoveryData SetDatabasesDetectionDevelopmentDimensionsEmployee StrikesEventExcisionFutureGenomicsGoalsKnowledge DiscoveryLanguageLeadMassive Parallel SequencingMathematicsMemoryMethodsMicroprocessorNucleotidesNumeric Rating ScalePatternPerformanceRandomizedResearchRunningScienceSignal TransductionStreamStructureTechniquesTimeValidationVariantbasebig biomedical datacomputerized data processingcomputerized toolsdesignexperimental studyfallsgenome wide association studygenomic datahealth care deliveryhigh dimensionalityimprovedinsertion/deletion mutationinsightinterestlearning strategynovelnovel strategiespopulation stratificationprototypepublic health relevancerepositorytooluser-friendlywhole genome
项目摘要
DESCRIPTION (provided by applicant)l Recently we developed several spectral approaches for analyzing very large genomics datasets or complete databases that fall into the category of big data (BD). The first approach is designed to perform SVD or PCA based on randomization that can dramatically accelerate the computation of their eigenvectors and eigenvalues relative to the standard Lanczos algorithm implemented in all common software packages. Computing PCA and the SVD more efficiently could revolutionize the innumerable biomedical applications based on PCA and the SVD, e.g. population stratification in very large GWAS. These algorithms produce higher accuracy than classical (deterministic) methods, enable the processing of data streams that are too large to store, and parallelize easily to be used in multicore microprocessors. Our second novel approach is an unsupervised spectral learning method. It provides new mathematical insights of striking conceptual simplicity for ranking multiple competing algorithms without access to validation data and for optimally combining this ensemble of algorithms to obtain improved predictions in the absence of ground truth. Constructing a tool that provides end users an option to optimally rank or combine algorithms for analysis of genomics data is a practical and efficient solution to remove the confusion among end-users or bioinformaticians who are faced with the need to decide which tool to choose for their study, as a large number of biological results inferred by the different tools are often in disagreement. The choice of the best performing algorithm or pipeline is essential as it can often lead to substantial improvement in quality of the readout from massively parallel sequencing experiments. Moreover, combining these tools typically results in performance superior to the best performing algorithm. Our goal is to establish a team whose focus is to provide and disseminate full-blown implementations of spectral BD tools and methods that have broad applicability across the spectrum of biomedical sciences, clinical research, and healthcare delivery. Specifically we will develop scalable PCA and SVD for Genomics and biomedical applications, further advance our spectral method for ranking the performance of competing pipelines and combine them to achieve better predictions without access to validation data. Moreover, we will develop scalable dimensional reduction techniques for organizing BD from biomedical applications.
描述(由适用提供)l最近我们开发了几种用于分析属于大数据类别(BD)类别的非常大的基因组数据集或完整数据库的光谱方法。第一种方法旨在基于随机化执行SVD或PCA,该随机化可以显着加速其特征向量和特征值相对于所有通用软件包中实现的标准Lanczos算法的计算。计算PCA和SVD更有效地可以革新基于PCA和SVD的无数生物医学应用,例如非常大的GWAS中的人口分层。这些算法比经典(确定性)方法产生的精度更高,使得可以处理太大而无法存储的数据流,并轻松地并行化以在多核心微处理器中使用。我们的第二种新颖方法是一种无监督的光谱学习方法。它提供了引人注目的概念简单性的新数学见解,用于对多种竞争算法进行排名,而无需访问验证数据,并最佳地结合了这种算法的整体,以在没有地面真理的情况下获得改进的预测。构建一个为最终用户提供一个选项,以最佳排名或结合算法进行基因组学数据分析是一种实用和有效的解决方案,可以消除最终用户或生物信息学家之间的混乱,这些人面临选择要选择哪种工具进行研究的工具,因为经常通过不同的工具推断出的生物学结果,这些工具通常是在分解的。最佳性能算法或管道的选择是必不可少的,因为它通常可以从大规模平行的测序实验中大大提高读数质量。此外,结合这些工具通常会导致性能优于最佳性能算法。我们的目标是建立一个团队,其重点是提供和传播光谱BD工具和方法的全面实现,这些工具和方法在生物医学科学,临床研究和医疗保健提供方面具有广泛适用性。具体而言,我们将开发用于基因组学和生物医学应用的可扩展PCA和SVD,进一步推进我们对竞争管道的性能进行排名的光谱方法,并将它们组合在一起以实现更好的预测,而无需访问验证数据。此外,我们将开发可从生物医学应用组织BD的可扩展尺寸缩小技术。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yuval Kluger其他文献
Yuval Kluger的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yuval Kluger', 18)}}的其他基金
EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
- 批准号:
10335252 - 财政年份:2019
- 资助金额:
$ 41万 - 项目类别:
EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
- 批准号:
9920743 - 财政年份:2019
- 资助金额:
$ 41万 - 项目类别:
EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
- 批准号:
9764594 - 财政年份:2019
- 资助金额:
$ 41万 - 项目类别:
Co-ordination of recombination and allelic exclusion at IgH and Igk loci
IgH 和 Igk 位点重组和等位基因排除的协调
- 批准号:
8740626 - 财政年份:2014
- 资助金额:
$ 41万 - 项目类别:
Role of ATM and RAG in maintaining genome stability during Tcra/d rearrangement.
ATM 和 RAG 在 Tcra/d 重排期间维持基因组稳定性中的作用。
- 批准号:
8707743 - 财政年份:2013
- 资助金额:
$ 41万 - 项目类别:
Role of ATM and RAG in maintaining genome stability during Tcra/d rearrangement.
ATM 和 RAG 在 Tcra/d 重排期间维持基因组稳定性中的作用。
- 批准号:
8513573 - 财政年份:2012
- 资助金额:
$ 41万 - 项目类别:
相似国自然基金
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于物理信息神经网络的电磁场快速算法研究
- 批准号:52377005
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
- 批准号:12302257
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维不平衡数据的分类集成算法研究
- 批准号:62306119
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Deep Learning Image Analysis Algorithms to Improve Oral Cancer Risk Assessment for Oral Potentially Malignant Disorders
深度学习图像分析算法可改善口腔潜在恶性疾病的口腔癌风险评估
- 批准号:
10805177 - 财政年份:2023
- 资助金额:
$ 41万 - 项目类别:
Multi-modal Tracking of In Vivo Skeletal Structures and Implants
体内骨骼结构和植入物的多模式跟踪
- 批准号:
10839518 - 财政年份:2023
- 资助金额:
$ 41万 - 项目类别:
A Multi-Modal Wearable Sensor for Early Detection of Cognitive Decline and Remote Monitoring of Cognitive-Motor Decline Over Time
一种多模态可穿戴传感器,用于早期检测认知衰退并远程监控认知运动随时间的衰退
- 批准号:
10765991 - 财政年份:2023
- 资助金额:
$ 41万 - 项目类别:
An acquisition and analysis pipeline for integrating MRI and neuropathology in TBI-related dementia and VCID
用于将 MRI 和神经病理学整合到 TBI 相关痴呆和 VCID 中的采集和分析流程
- 批准号:
10810913 - 财政年份:2023
- 资助金额:
$ 41万 - 项目类别:
Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa
利用基于人工智能/机器学习的技术克服非洲结直肠癌诊断和预测的专业培训和技术障碍
- 批准号:
10712793 - 财政年份:2023
- 资助金额:
$ 41万 - 项目类别: