Bioconductor: an open computing resource for genomics
Bioconductor:基因组学的开放计算资源
基本信息
- 批准号:7910730
- 负责人:
- 金额:$ 109.32万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2006
- 资助国家:美国
- 起止时间:2006-09-28 至 2011-09-25
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
DESCRIPTION (provided by applicant): The Bioconductor project provides an open resource for the development and distribution of innovative reliable software for computational biology and bioinformatics. The range of available software is broad and rapidly growing as are both the user community and the developer community. The project maintains a web portal for delivering software and documentation to end users as well as an active mailing list. Additional services for developers include a software archive, mailing list and assistance and advice program development and design
We propose an active development strategy designed to meet new challenges while simultaneously providing user and developer support for existing tools and methods. In particular we emphasize a design strategy that accommodates the imperfect, yet evolving nature of biological knowledge and the relatively rapid development of new experimental technologies. Software solutions must be able to rapidly adapt and to facilitate new problems when they arise.
CRITQUE 1:
The Bioconductor project began in 2001. In 2002 it was awarded a BISTI grant for three years 2003-2006). During this time the project has expanded and provided support for a world wide community of researchers. This is a proposal for continued development for Bioconductor, which is a set of statistical programs which are specifically tailored to the computatational biology community. Bioconductor is composed of over 130 R packages that have been contributed by a large number of developers. The software packages range from state of the art statistical methods which typically are used in microarray analysis, to annotation tools, to plotting functions, GUIs, to sequence alignment and data management packages. Contributions to and usage of Bioconductor is growing rapidly and the applicants are requesting support to continue its development as well as general logistical support for software distribution and quality assurance. The proposal includes a research component for Bioconductor which will involve the development of analysis techniques. This will include optimization of the R statistical analyses, statistical processing of Affymetrix data, analysis of SNP data, improved standards, data storage, retreivals from NCBI, sequence management, machine learning, web services and distributed computing.
SCIENTIFIC MERIT
The applicants address many issues that are crucial to the success of a large open source project with multiple contributors. Examples of training, scientific publication, documentation and resource development run throughout the proposal. Many tangible examples were given on the usage of the system by the scientific community.
EXPERIMENTAL DESIGN
This is a description of their management workflow for the project which does a good job of demonstrating the technical excellence brought to the project by this group. 1) Build annotation packages every three months, Integrate changes in annotation source data structure into annotation package building code. 2) Maintain project website, mailing lists, source control archive. Organize web resources for short course and conferences. 3) Improve existing software. 4) Sustain automated nightly builds. Work with developers whose packages fail to pass QA. 5) Resolve cross-platform issues. 6) Review new submissions. Answer questions on the mailing lists. 7) Use software engineering best practices. Develop unit testing strategies. Design appropriate classes and methods for new data types. Refactor existing code for better interoperability and extensibility. 8) Develop and organize training materials and documentation.
Extensive detail on testing, build procedures, interoperability, quality assurance and project management is given elsewhere in the document. They clearly have dealt with many issues necessary for a project of this size. They state that one of the biggest cost items is support of this package to run on multiple platforms. They point out that many contributors focus on a single platform, much of their work is track down cross-platform bugs. This is time well-spent, given the platforms used are in sync with the needs of the greater bioinformatics community.
ORIGINALITY
While a high degree of originality is not a particularly critical element of open source software development project, there are certainly areas in the proposal that are unique. Most importantly, it is safe to say that there is not another project which has this blend of statistical analysis systems specifically tailored to a important research bioinformatics area that can be deployed on a number of different computer environments.
INVESTIGATOR AND CO-INVESTIGATORS
Dr. Gentleman is the founder and leader of the Bioconductor project. Dr. Gentlemen was an Associate Professor in the Department of Biostatistics, Harvard School of Public Health and Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute. In 2004 he became Program Head, Computational Biology, at the Fred Hutchinson Cancer Research Center in Seattle. He has on the order of ten publications relating to Bioconductor or related statistical analysis. He implemented the original versions of the R programming language jointly with another co-founder. He is PI or Investigator of a number of research grants, at least two are directly related to this work. He and other members of the proposal have taught a number of courses and given lectures on Bioconductor, the amount of these courses certainly indicate significant dedication to the project. A review of the PI and Co-PI activities related to this project are shown on Table 3 on page 42 of the application. The roles and time allocations assigned to each participant appear to be reasonable. Dr. Gentleman will serve as project leader and will manage the programmers, coordinating the project, and investigating new computational methods and approaches. Dr. Vincent Carey, as co Principal Investigator has 20% time allocated for the project. In 2005 he became Associate Professor of Medicine (Biostatistics). Carey is a senior member of the Bioconductor development core. He will improve interoperability to allow Bioconductor reuse of external modules in Java, Perl and other languages as well as strengthen interfaces between high throughput experimental workflows and machine learning tools, and ontology capture. An administrative assistant will assist Dr. Carey with administrative requirements, including call coordination, manuscript preparation and distribution, scheduling and budget management. Dr. Rafael Irizarry as co-PI will spend 30% effort on the project. Dr. Irizarry has four years experience developing methods for microarray data analysis and in the Department of Biostatistics serving as faculty liaison to the Johns Hopkins Medical Institution's Microarray Core. He will supervize all efforts to support preprocessing on all platforms and support for microarray related consortiums such as the ERCC, GEO, and ArrayExpress.
Programmers will be responsible for the project website, managing email lists, maintaining training materials, upgrading software, refactoring and other code enhancements, managing the svn archive, and Bioconductor releases. They will handle checking all submitted packages, developing unit tests, and simplifying downloads, nightly build procedures, cross-platform issues, data technologies as well as integrating resources found in other languages (e.g. large C libraries of routines for string handling, machine learning and so on). Programmers have familiarity with R packages and systems for database management and for parallel and distributed computing. They will be responsible for managing the annotation data including package building and liaising with organism specific and other data providers.
SIGNIFICANCE
Given the scope of the proposal, and the size of the Bioconductor project in general the request for the above resources is appropriate. There is an excellent mix of grounded project management along with development of newer state of the art techniques that will benifit many members of the bioinformatics community. There is a high probability that funding this project will help to maintain and advance this important community resource.
ENVIRONMENT
The computer infrastructure, and the local departments of the PI and Co-PIs, as well as the work with the larger scientific community are all excellent environments to support this project.
IN SUMMARY
This is a terrific resource. It is a well managed large open source project with very well crafted QA testing, documentation and training. Continuation of this is a three year project. Beyond that period, a statement of long term stated goals is needed. The PI should articulate the strategic goals, as well as their research motivation and translate that into an action plan. They should also use that context to describe how they would go about choosing packages that are put into the Bioconductor system; Table 3 only listed the names of the packages made by the applicants, it could have gone further to give the reader more information for choosing packages. A simple example would have been if they stated in the document: "Given our assessment of the microarray state of the art, we ultimately aim to overlay annotation data, ontological information, and other forms of meta data onto a statistical framework for expression data." The resulting research plan would then justify a five year project, but it was not strong enough in this application.
It should be noted that many of the benificiaries to this system are not just users that download the system. In many cases a centralized informatics service downloads their system and then performs analysis for other members of the campus or the wider www community. While that type of "success measure" is hard to assess, more effort in this area in subsequent proposals would be helpful.
描述(由申请人提供):Bioconductor 项目为计算生物学和生物信息学的创新可靠软件的开发和分发提供了开放资源。可用软件的范围广泛且增长迅速,用户社区和开发人员社区也是如此。该项目维护一个用于向最终用户提供软件和文档的门户网站以及一个活跃的邮件列表。为开发人员提供的其他服务包括软件存档、邮件列表以及帮助和建议程序开发和设计
我们提出了积极的开发策略,旨在应对新的挑战,同时为用户和开发人员提供对现有工具和方法的支持。我们特别强调一种设计策略,以适应生物知识的不完美但不断发展的性质以及新实验技术的相对快速发展。软件解决方案必须能够在新问题出现时快速适应并解决新问题。
批评 1:
Bioconductor项目始于2001年。2002年获得BISTI资助,为期三年(2003-2006)。在此期间,该项目得到了扩展,并为全球研究人员社区提供了支持。这是一项持续开发 Bioconductor 的提案,Bioconductor 是一套专门为计算生物学界量身定制的统计程序。 Bioconductor 由 130 多个 R 包组成,这些包由大量开发人员贡献。软件包范围包括通常用于微阵列分析的最先进的统计方法、注释工具、绘图功能、GUI、序列比对和数据管理包。 Bioconductor 的贡献和使用正在迅速增长,申请人正在请求支持以继续其开发以及软件分发和质量保证的一般后勤支持。该提案包括 Bioconductor 的研究部分,其中涉及分析技术的开发。这将包括 R 统计分析的优化、Affymetrix 数据的统计处理、SNP 数据分析、改进的标准、数据存储、NCBI 检索、序列管理、机器学习、网络服务和分布式计算。
科学价值
申请人解决了许多对于具有多个贡献者的大型开源项目的成功至关重要的问题。培训、科学出版、文档和资源开发的例子贯穿整个提案。科学界给出了许多关于该系统使用的具体例子。
实验设计
这是对该项目的管理工作流程的描述,很好地展示了该小组为该项目带来的技术卓越性。 1)每三个月构建一次标注包,将标注源数据结构的变化集成到标注包构建代码中。 2) 维护项目网站、邮件列表、源代码控制档案。组织短期课程和会议的网络资源。 3)改进现有软件。 4) 维持自动化的夜间构建。与软件包未能通过 QA 的开发人员合作。 5)解决跨平台问题。 6) 审查新提交的内容。回答邮件列表上的问题。 7) 使用软件工程最佳实践。制定单元测试策略。为新数据类型设计适当的类和方法。重构现有代码以获得更好的互操作性和可扩展性。 8) 开发和组织培训材料和文档。
本文档的其他部分给出了有关测试、构建程序、互操作性、质量保证和项目管理的详细信息。显然,他们已经处理了如此规模的项目所需的许多问题。他们表示最大的成本项目之一是支持该软件包在多个平台上运行。他们指出,许多贡献者专注于单一平台,他们的大部分工作都是追踪跨平台错误。鉴于所使用的平台与更大的生物信息学社区的需求同步,这是值得花时间的。
独创性
虽然高度原创性并不是开源软件开发项目的特别关键要素,但提案中肯定有一些独特的领域。最重要的是,可以肯定地说,没有另一个项目拥有专门针对重要的生物信息学研究领域定制的统计分析系统组合,并且可以部署在许多不同的计算机环境上。
调查员和联合调查员
Gentleman博士是Bioconductor项目的创始人和领导者。 Gentlemen 博士是哈佛大学公共卫生学院生物统计学系和达纳法伯癌症研究所生物统计学和计算生物学系的副教授。 2004 年,他成为西雅图 Fred Hutchinson 癌症研究中心计算生物学项目负责人。他发表了大约十篇与 Bioconductor 或相关统计分析相关的出版物。他与另一位联合创始人共同实现了 R 编程语言的原始版本。他是多项研究资助的 PI 或研究员,其中至少两项与这项工作直接相关。他和该提案的其他成员教授了许多关于 Bioconductor 的课程并进行了讲座,这些课程的数量无疑表明了对该项目的巨大奉献。 与该项目相关的 PI 和 Co-PI 活动的回顾如申请第 42 页的表 3 所示。分配给每个参与者的角色和时间分配似乎是合理的。 Gentleman 博士将担任项目负责人,并管理程序员、协调项目并研究新的计算方法和途径。 Vincent Carey 博士作为联合首席研究员,为该项目分配了 20% 的时间。 2005年,他成为医学副教授(生物统计学)。 Carey 是 Bioconductor 开发核心的高级成员。他将提高互操作性,以允许 Bioconductor 重用 Java、Perl 和其他语言的外部模块,并加强高通量实验工作流程和机器学习工具以及本体捕获之间的接口。 行政助理将协助凯里博士处理行政要求,包括电话协调、稿件准备和分发、日程安排和预算管理。 Rafael Irizarry 博士作为联合 PI 将在该项目上投入 30% 的精力。 Irizarry 博士拥有四年开发微阵列数据分析方法的经验,并在生物统计学系担任约翰·霍普金斯医疗机构微阵列核心的教职联络员。 他将监督所有支持所有平台预处理的工作,以及支持微阵列相关联盟(例如 ERCC、GEO 和 ArrayExpress)的工作。
程序员将负责项目网站、管理电子邮件列表、维护培训材料、升级软件、重构和其他代码增强、管理 svn 存档和 Bioconductor 版本。他们将负责检查所有提交的包、开发单元测试、简化下载、夜间构建程序、跨平台问题、数据技术以及集成其他语言中的资源(例如用于字符串处理、机器学习和很快)。程序员熟悉用于数据库管理以及并行和分布式计算的 R 软件包和系统。他们将负责管理注释数据,包括包构建以及与特定生物体和其他数据提供商的联络。
意义
考虑到提案的范围以及 Bioconductor 项目的总体规模,对上述资源的请求是适当的。扎根的项目管理与最新最先进技术的开发完美结合,这将使生物信息学界的许多成员受益。资助该项目很可能有助于维护和推进这一重要的社区资源。
环境
计算机基础设施、PI 和 Co-PI 的当地部门以及与更大的科学界的合作都是支持该项目的绝佳环境。
总之
这是一个很棒的资源。 这是一个管理良好的大型开源项目,拥有精心设计的 QA 测试、文档和培训。 这是一个为期三年的项目的延续。超过该期限后,需要一份长期既定目标的声明。 PI 应阐明战略目标及其研究动机,并将其转化为行动计划。他们还应该使用该上下文来描述他们将如何选择放入 Bioconductor 系统的软件包;表3仅列出了申请人制作的封装的名称,还可以进一步为读者提供更多选择封装的信息。 一个简单的例子是,如果他们在文件中声明:“鉴于我们对微阵列技术的评估,我们最终的目标是将注释数据、本体信息和其他形式的元数据覆盖到表达数据的统计框架上。 ”由此产生的研究计划将证明一个为期五年的项目是合理的,但在这个应用程序中它还不够强大。
应该指出的是,该系统的许多受益者不仅仅是下载该系统的用户。 在许多情况下,集中式信息学服务会下载他们的系统,然后为校园或更广泛的 www 社区的其他成员进行分析。虽然这种类型的“成功衡量标准”很难评估,但在后续提案中在这方面做出更多努力将会有所帮助。
项目成果
期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
rtracklayer: an R package for interfacing with genome browsers.
rtracklayer:一个用于与基因组浏览器连接的 R 包。
- DOI:
- 发表时间:2009-07-15
- 期刊:
- 影响因子:0
- 作者:Lawrence, Michael;Gentleman, Robert;Carey, Vincent
- 通讯作者:Carey, Vincent
Cloud-scale RNA-sequencing differential expression analysis with Myrna.
使用 Myrna 进行云规模 RNA 测序差异表达分析。
- DOI:
- 发表时间:2010
- 期刊:
- 影响因子:12.3
- 作者:Langmead, Ben;Hansen, Kasper D;Leek, Jeffrey T
- 通讯作者:Leek, Jeffrey T
Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers.
没有唯一分子标识符的单细胞 RNA-seq 读数计数的分位数标准化。
- DOI:
- 发表时间:2020-07-03
- 期刊:
- 影响因子:12.3
- 作者:Townes, F William;Irizarry, Rafael A
- 通讯作者:Irizarry, Rafael A
Rintact: enabling computational analysis of molecular interaction data from the IntAct repository.
Rintact:能够对 IntAct 存储库中的分子相互作用数据进行计算分析。
- DOI:
- 发表时间:2008-04-15
- 期刊:
- 影响因子:0
- 作者:Chiang, Tony;Li, Nianhua;Orchard, Sandra;Kerrien, Samuel;Hermjakob, Henning;Gentleman, Robert;Huber, Wolfgang
- 通讯作者:Huber, Wolfgang
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.
基于多项式模型的单细胞 RNA-Seq 特征选择和降维。
- DOI:
- 发表时间:2019-12-23
- 期刊:
- 影响因子:12.3
- 作者:Townes, F William;Hicks, Stephanie C;Aryee, Martin J;Irizarry, Rafael A
- 通讯作者:Irizarry, Rafael A
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Martin T Morgan其他文献
Martin T Morgan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Martin T Morgan', 18)}}的其他基金
Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor
癌症基因组学:R/Bioconductor 中的集成且可扩展的解决方案
- 批准号:
10703230 - 财政年份:2021
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor
癌症基因组学:R/Bioconductor 中的集成且可扩展的解决方案
- 批准号:
10449603 - 财政年份:2021
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor
癌症基因组学:R/Bioconductor 中的集成且可扩展的解决方案
- 批准号:
10594231 - 财政年份:2021
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor
癌症基因组学:R/Bioconductor 中的集成且可扩展的解决方案
- 批准号:
10478123 - 财政年份:2021
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics:Integrative and Scalable Solutions in R / Bioconductor
癌症基因组学:R / Bioconductor 中的集成且可扩展的解决方案
- 批准号:
9186264 - 财政年份:2014
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics:Integrative and Scalable Solutions in R / Bioconductor
癌症基因组学:R / Bioconductor 中的集成且可扩展的解决方案
- 批准号:
9334747 - 财政年份:2014
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics:Integrative and Scalable Solutions in R / Bioconductor
癌症基因组学:R / Bioconductor 中的集成且可扩展的解决方案
- 批准号:
9122328 - 财政年份:2014
- 资助金额:
$ 109.32万 - 项目类别:
Cancer Genomics: Integrative and Salable Solutions in R/Bioconductor
癌症基因组学:R/Bioconductor 中的综合且可销售的解决方案
- 批准号:
10017896 - 财政年份:2014
- 资助金额:
$ 109.32万 - 项目类别:
Bioconductor: An Open Computing Resource for Genomics
Bioconductor:基因组学的开放计算资源
- 批准号:
8337802 - 财政年份:2006
- 资助金额:
$ 109.32万 - 项目类别:
Bioconductor: An Open Computing Resource for Genomics
Bioconductor:基因组学的开放计算资源
- 批准号:
8337802 - 财政年份:2006
- 资助金额:
$ 109.32万 - 项目类别:
相似国自然基金
精子发生中mRNA下游开放阅读框(downstream Open Reading Frame,dORF)的功能研究
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
飞行器旋转翼打开过程扰动稳定性机理研究
- 批准号:U2141249
- 批准年份:2021
- 资助金额:260 万元
- 项目类别:联合基金项目
日本北海道奥尻岛新生代火山作用:对日本海打开深部过程的制约
- 批准号:
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
人才政策驱使下的高层次知识员工引进与企业双向开放式创新:打开“开放性”悖论
- 批准号:
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
打开经济运行的黑箱:生态经济学“经济解构模型”的构建及其实践
- 批准号:71973008
- 批准年份:2019
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Supplement: Enhancing Community Contributions to Bioconductor With Build System Containerization and a GPU for Testing
补充:通过构建系统容器化和用于测试的 GPU 增强社区对 Bioconductor 的贡献
- 批准号:
10838736 - 财政年份:2023
- 资助金额:
$ 109.32万 - 项目类别:
Novel Computational Methods for Microbiome Data Analysis in Longitudinal Study
纵向研究中微生物组数据分析的新计算方法
- 批准号:
10660234 - 财政年份:2023
- 资助金额:
$ 109.32万 - 项目类别:
Statistical Power Calculation Framework for Spatially Resolved Transcriptomics Experiments
空间分辨转录组学实验的统计功效计算框架
- 批准号:
10629262 - 财政年份:2022
- 资助金额:
$ 109.32万 - 项目类别:
New computational methods to dynamically pinpointing the subregions carrying disease-associated rare variants
新的计算方法可动态查明携带疾病相关罕见变异的子区域
- 批准号:
10709565 - 财政年份:2022
- 资助金额:
$ 109.32万 - 项目类别:
Statistical Power Calculation Framework for Spatially Resolved Transcriptomics Experiments
空间分辨转录组学实验的统计功效计算框架
- 批准号:
10453133 - 财政年份:2022
- 资助金额:
$ 109.32万 - 项目类别: