High-performance Computing System for Bioinformatics
高性能生物信息计算系统
基本信息
- 批准号:7595665
- 负责人:
- 金额:$ 46.14万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-06-01 至 2010-05-31
- 项目状态:已结题
- 来源:
- 关键词:AccountingAlgorithmsBioinformaticsBiologicalBiological PhenomenaBiomedical ResearchCellsComplicationComputational BiologyComputer SimulationComputer SystemsComputer softwareComputersCustomDataData AnalysesData SetData Storage and RetrievalDevelopmentDisastersEquipmentEvolutionFaceGene ExpressionGenomeGenomicsGrowthHigh Performance ComputingIndividualInvestigationMethodsModelingNoiseNucleic Acid Regulatory SequencesOccupationsOperative Surgical ProceduresOrganismPerformancePopulationRecoveryRegulator GenesRequest for ProposalsResearchResearch PersonnelResolutionResourcesRunningScientistSignal TransductionSimulateSoftware ToolsSpeedSystemSystems BiologyThe SunTimeTissuesWorkcell typecluster computingcomputerized toolscomputing resourcescostflexibilityinstrumentmass spectrometernew technologynext generationpublic health relevanceresponsesimulationtooltranslational medicine
项目摘要
DESCRIPTION (provided by applicant): The explosive growth of computational biology has made it difficult for research organizations to keep pace with users' demands for ever-increasing computational power. The complications that biologists face come from two developments. First, new technologies generate huge amounts of data. Of course this makes it possible for biological investigations to broaden their scope to whole genome, cellular, and even organism levels, but at a cost of overtaxing existing methods and resources for data analysis. Second, algorithms and methods of analysis have become more computationally intensive, in part as a response to the opportunities that data richness has brought about and in part to manage the unfortunate signal-to-noise ratio that seem implicit in genomic datasets. Also, the emergence of "systems biology" has led to growing complication in computational work, since systems biology seeks eventually to model biological phenomena in silico. In effect, one of the major -- and indeed the most flexible -- instruments for genome scientists and systems biologists is the high performance computer, because it is an essential tool for making sense of the prodigious amounts of data already coming from high-throughput sequencers, gene expression microarray equipment, mass spectrometers, and the like. On high-performance cluster computers, many researchers are making use of basic "job-level parallelism" by which a single user may run multiple jobs (or independent sub-parts of jobs) on many hundreds of computers at once. Often, this is in the form of computational "parameter space studies" where the same application is run on tens, hundreds and thousands of different sets of inputs. Simulating the evolution of regulatory regions, for example, requires multiple runs in which the size and number of short regulatory motifs are tuned. The prediction of gene regulatory networks requires multiple simulations in which different cell types and different tissue regions are modified. Simulations of gene expression dynamics in populations of cells must also be run multiple times in order to account for "cellular noise" and get a comprehensive picture of the phenomena. This need for repeated computations makes cluster computing an attractive approach for these problems. Our proposal requests 94 power-efficient compute servers and about 8 terabytes (usable) high-speed data storage with matched disaster recovery storage. This equipment will be put into operation using Sun Grid Engine, a software application that coordinates computational resources so that individual machines function as one clustered computational instrument. Bioinformatic software tools, as well as custom-made applications, are available for researchers to use on the equipment.
PUBLIC HEALTH RELEVANCE: Next-generation instruments have made acquiring genomic data inexpensive and ever more efficient, and new technologies promise to add greatly to the resolution and richness of data used for biomedical research and for translational medicine. This torrent of data needs equally powerful and flexible tools for analysis and information creation, in effect matching high-throughput data producers with high performance computational tools for analysis. We propose the creation of a well integrated computational system that matches in compute power the prodigious data flows from instruments producing genomic data.
描述(由申请人提供):计算生物学的爆炸性增长使研究组织很难跟上用户对越来越多的计算能力的需求。生物学家面临的并发症来自两个发展。首先,新技术会产生大量数据。当然,这使生物学研究可以扩大其范围对整个基因组,细胞甚至有机体的水平,但要付出超重的现有方法和资源来进行数据分析。其次,算法和分析方法在计算上已经变得更加密集,部分是对数据丰富性带来的机会的回应,部分是为了管理基因组数据集中似乎隐含的不幸信号噪声比率。同样,“系统生物学”的出现导致计算工作的并发症越来越大,因为系统生物学最终寻求建模硅中的生物学现象。实际上,基因组科学家和系统生物学家的主要仪器之一,甚至最灵活的工具之一是高性能计算机,因为它是使人们了解来自高通量的大量数据的重要工具音序器,基因表达微阵列设备,质谱仪等。在高性能集群计算机上,许多研究人员正在利用基本的“工作级并行性”,单个用户可以一次在数百台计算机上运行多个作业(或独立的作业子部门)。通常,这是计算“参数空间研究”的形式,其中相同的应用在数十万,成千上万的输入集上运行。例如,模拟调节区域的演变需要多次调节大小和数量的多次运行。基因调节网络的预测需要多个模拟,其中改变了不同的细胞类型和不同的组织区域。还必须多次运行基因表达动力学的模拟,以说明“细胞噪声”并全面了解现象。对重复计算的这种需求使集群计算成为这些问题的有吸引力的方法。我们的提案要求使用匹配的灾难恢复存储,要求94个发电效率的计算服务器和大约8台(可用)高速数据存储。该设备将使用Sun Grid Engine进行运行,Sun Grid Engine是一种协调计算资源的软件应用程序,以便单个机器充当一个集群计算仪器。生物信息学软件工具以及定制的应用程序可供研究人员用于设备上。
公共卫生相关性:下一代工具使获取基因组数据廉价且效率更高,而新技术有望大大增加生物医学研究和转化医学的数据的分辨率和丰富性。这种数据洪流需要同样强大而灵活的工具来分析和信息创建,实际上将高通量数据生产者与高性能计算工具匹配。我们建议创建一个良好的集成计算系统,该计算系统与计算功率相匹配的是产生基因组数据的仪器的巨大数据流。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Huntington F Willard其他文献
Huntington F Willard的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Huntington F Willard', 18)}}的其他基金
Analysis of Human Centromeres using Novel Artificial Chromosome Vectors
使用新型人工染色体载体分析人类着丝粒
- 批准号:
7391601 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7196328 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Analysis of Human Centromeres using Novel Artificial Chromosome Vectors
使用新型人工染色体载体分析人类着丝粒
- 批准号:
7599187 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7228255 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7099078 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7640611 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Analysis of Human Centromeres using Novel Artificial Chromosome Vectors
使用新型人工染色体载体分析人类着丝粒
- 批准号:
7201561 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7502696 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7643451 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
Training in the Genome Sciences and the Hemoglobinopathies
基因组科学和血红蛋白病培训
- 批准号:
7871516 - 财政年份:2006
- 资助金额:
$ 46.14万 - 项目类别:
相似国自然基金
基于深度和多示例学习的m6A-seq数据分析质量提升算法研究
- 批准号:61902323
- 批准年份:2019
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
复杂组织高通量数据的异质性分解及应用算法研究
- 批准号:61902061
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
面向致癌基因识别的多组学数据矩阵分解算法研究
- 批准号:61902215
- 批准年份:2019
- 资助金额:27.0 万元
- 项目类别:青年科学基金项目
整合基因突变与差异表达数据的癌症关键基因模块预测算法研究
- 批准号:61902390
- 批准年份:2019
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
大规模蛋白质功能预测的高效算法研究
- 批准号:61872094
- 批准年份:2018
- 资助金额:65.0 万元
- 项目类别:面上项目
相似海外基金
The microbiome associated with oral Leukoplakia: A multi-omics mechanistic study
与口腔白斑相关的微生物组:一项多组学机制研究
- 批准号:
10870268 - 财政年份:2023
- 资助金额:
$ 46.14万 - 项目类别:
Linking microbiome genetic variants with cardiovascular phenotypes in 50,000 individuals
将 50,000 名个体的微生物组遗传变异与心血管表型联系起来
- 批准号:
10516693 - 财政年份:2022
- 资助金额:
$ 46.14万 - 项目类别:
Integrating Genetic, Neuroimaging, Transcriptomic, and Clinical Risk Factors as Multivariate Predictors of Cognitive Deterioration in Alzheimer's Disease.
整合遗传、神经影像、转录组和临床风险因素作为阿尔茨海默病认知恶化的多变量预测因子。
- 批准号:
10673857 - 财政年份:2022
- 资助金额:
$ 46.14万 - 项目类别:
Integrating Genetic, Neuroimaging, Transcriptomic, and Clinical Risk Factors as Multivariate Predictors of Cognitive Deterioration in Alzheimer's Disease.
整合遗传、神经影像、转录组和临床风险因素作为阿尔茨海默病认知恶化的多变量预测因子。
- 批准号:
10515569 - 财政年份:2022
- 资助金额:
$ 46.14万 - 项目类别:
Integrative genomic and epigenomic analysis of cancer using long read sequencing
使用长读长测序对癌症进行综合基因组和表观基因组分析
- 批准号:
10396074 - 财政年份:2021
- 资助金额:
$ 46.14万 - 项目类别: