Bioconductor: an open computing resource for genomics

Bioconductor:基因组学的开放计算资源

基本信息

  • 批准号:
    7921192
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2006
  • 资助国家:
    美国
  • 起止时间:
    2006-09-28 至 2011-07-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): The Bioconductor project provides an open resource for the development and distribution of innovative reliable software for computational biology and bioinformatics. The range of available software is broad and rapidly growing as are both the user community and the developer community. The project maintains a web portal for delivering software and documentation to end users as well as an active mailing list. Additional services for developers include a software archive, mailing list and assistance and advice program development and design We propose an active development strategy designed to meet new challenges while simultaneously providing user and developer support for existing tools and methods. In particular we emphasize a design strategy that accommodates the imperfect, yet evolving nature of biological knowledge and the relatively rapid development of new experimental technologies. Software solutions must be able to rapidly adapt and to facilitate new problems when they arise. CRITQUE 1: The Bioconductor project began in 2001. In 2002 it was awarded a BISTI grant for three years 2003-2006). During this time the project has expanded and provided support for a world wide community of researchers. This is a proposal for continued development for Bioconductor, which is a set of statistical programs which are specifically tailored to the computatational biology community. Bioconductor is composed of over 130 R packages that have been contributed by a large number of developers. The software packages range from state of the art statistical methods which typically are used in microarray analysis, to annotation tools, to plotting functions, GUIs, to sequence alignment and data management packages. Contributions to and usage of Bioconductor is growing rapidly and the applicants are requesting support to continue its development as well as general logistical support for software distribution and quality assurance. The proposal includes a research component for Bioconductor which will involve the development of analysis techniques. This will include optimization of the R statistical analyses, statistical processing of Affymetrix data, analysis of SNP data, improved standards, data storage, retreivals from NCBI, sequence management, machine learning, web services and distributed computing. SCIENTIFIC MERIT The applicants address many issues that are crucial to the success of a large open source project with multiple contributors. Examples of training, scientific publication, documentation and resource development run throughout the proposal. Many tangible examples were given on the usage of the system by the scientific community. EXPERIMENTAL DESIGN This is a description of their management workflow for the project which does a good job of demonstrating the technical excellence brought to the project by this group. 1) Build annotation packages every three months, Integrate changes in annotation source data structure into annotation package building code. 2) Maintain project website, mailing lists, source control archive. Organize web resources for short course and conferences. 3) Improve existing software. 4) Sustain automated nightly builds. Work with developers whose packages fail to pass QA. 5) Resolve cross-platform issues. 6) Review new submissions. Answer questions on the mailing lists. 7) Use software engineering best practices. Develop unit testing strategies. Design appropriate classes and methods for new data types. Refactor existing code for better interoperability and extensibility. 8) Develop and organize training materials and documentation. Extensive detail on testing, build procedures, interoperability, quality assurance and project management is given elsewhere in the document. They clearly have dealt with many issues necessary for a project of this size. They state that one of the biggest cost items is support of this package to run on multiple platforms. They point out that many contributors focus on a single platform, much of their work is track down cross-platform bugs. This is time well-spent, given the platforms used are in sync with the needs of the greater bioinformatics community. ORIGINALITY While a high degree of originality is not a particularly critical element of open source software development project, there are certainly areas in the proposal that are unique. Most importantly, it is safe to say that there is not another project which has this blend of statistical analysis systems specifically tailored to a important research bioinformatics area that can be deployed on a number of different computer environments. INVESTIGATOR AND CO-INVESTIGATORS Dr. Gentleman is the founder and leader of the Bioconductor project. Dr. Gentlemen was an Associate Professor in the Department of Biostatistics, Harvard School of Public Health and Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute. In 2004 he became Program Head, Computational Biology, at the Fred Hutchinson Cancer Research Center in Seattle. He has on the order of ten publications relating to Bioconductor or related statistical analysis. He implemented the original versions of the R programming language jointly with another co-founder. He is PI or Investigator of a number of research grants, at least two are directly related to this work. He and other members of the proposal have taught a number of courses and given lectures on Bioconductor, the amount of these courses certainly indicate significant dedication to the project. A review of the PI and Co-PI activities related to this project are shown on Table 3 on page 42 of the application. The roles and time allocations assigned to each participant appear to be reasonable. Dr. Gentleman will serve as project leader and will manage the programmers, coordinating the project, and investigating new computational methods and approaches. Dr. Vincent Carey, as co Principal Investigator has 20% time allocated for the project. In 2005 he became Associate Professor of Medicine (Biostatistics). Carey is a senior member of the Bioconductor development core. He will improve interoperability to allow Bioconductor reuse of external modules in Java, Perl and other languages as well as strengthen interfaces between high throughput experimental workflows and machine learning tools, and ontology capture. An administrative assistant will assist Dr. Carey with administrative requirements, including call coordination, manuscript preparation and distribution, scheduling and budget management. Dr. Rafael Irizarry as co-PI will spend 30% effort on the project. Dr. Irizarry has four years experience developing methods for microarray data analysis and in the Department of Biostatistics serving as faculty liaison to the Johns Hopkins Medical Institution's Microarray Core. He will supervize all efforts to support preprocessing on all platforms and support for microarray related consortiums such as the ERCC, GEO, and ArrayExpress. Programmers will be responsible for the project website, managing email lists, maintaining training materials, upgrading software, refactoring and other code enhancements, managing the svn archive, and Bioconductor releases. They will handle checking all submitted packages, developing unit tests, and simplifying downloads, nightly build procedures, cross-platform issues, data technologies as well as integrating resources found in other languages (e.g. large C libraries of routines for string handling, machine learning and so on). Programmers have familiarity with R packages and systems for database management and for parallel and distributed computing. They will be responsible for managing the annotation data including package building and liaising with organism specific and other data providers. SIGNIFICANCE Given the scope of the proposal, and the size of the Bioconductor project in general the request for the above resources is appropriate. There is an excellent mix of grounded project management along with development of newer state of the art techniques that will benifit many members of the bioinformatics community. There is a high probability that funding this project will help to maintain and advance this important community resource. ENVIRONMENT The computer infrastructure, and the local departments of the PI and Co-PIs, as well as the work with the larger scientific community are all excellent environments to support this project. IN SUMMARY This is a terrific resource. It is a well managed large open source project with very well crafted QA testing, documentation and training. Continuation of this is a three year project. Beyond that period, a statement of long term stated goals is needed. The PI should articulate the strategic goals, as well as their research motivation and translate that into an action plan. They should also use that context to describe how they would go about choosing packages that are put into the Bioconductor system; Table 3 only listed the names of the packages made by the applicants, it could have gone further to give the reader more information for choosing packages. A simple example would have been if they stated in the document: "Given our assessment of the microarray state of the art, we ultimately aim to overlay annotation data, ontological information, and other forms of meta data onto a statistical framework for expression data." The resulting research plan would then justify a five year project, but it was not strong enough in this application. It should be noted that many of the benificiaries to this system are not just users that download the system. In many cases a centralized informatics service downloads their system and then performs analysis for other members of the campus or the wider www community. While that type of "success measure" is hard to assess, more effort in this area in subsequent proposals would be helpful.
描述(由申请人提供):生物导体项目为开发和分发用于计算生物学和生物信息学的创新可靠软件提供了开放资源。可用软件的范围范围广泛,并且迅速增长,并且用户社区和开发人员社区也是如此。该项目维护一个Web门户网站,用于向最终用户提供软件和文档以及活动邮件列表。开发人员的其他服务包括软件档案,邮件列表以及协助以及建议计划开发和设计 我们提出了一种积极的开发策略,旨在满足新的挑战,同时为现有工具和方法提供用户和开发人员的支持。特别是我们强调了一种设计策略,该策略适应生物学知识的不完善但不断发展的本质以及新实验技术的相对迅速发展。软件解决方案必须能够快速适应并促进新问题出现。 Critque 1: 生物导体项目始于2001年。2002年,该项目获得了2003 - 2006年三年的Bisti赠款。在此期间,该项目扩大了,并为全球研究人员社区提供了支持。这是针对生物导体持续开发的一项建议,这是一套专门针对计算生物学社区量身定制的统计程序。生物导体由大量开发人员贡献的130多个r包组成。软件包范围从通常用于微阵列分析,注释工具,绘制函数,GUI,序列对齐和数据管理软件包的最新统计方法的状态。生物导体对生物导体的贡献和使用迅速增长,申请人要求支持以继续其开发以及对软件分配和质量保证的一般后勤支持。该提案包括生物导体的研究组成部分,该研究成分将涉及分析技术的开发。这将包括优化R统计分析,Affymetrix数据的统计处理,SNP数据分析,改进的标准,数据存储,NCBI的退休,序列管理,机器学习,Web服务和分布式计算。 科学功绩 申请人解决了许多与多个贡献者的大型开源项目成功至关重要的问题。培训,科学出版物,文档和资源开发的示例在整个提案中运行。科学界给出了许多关于系统使用的明显示例。 实验设计 这是对项目的管理工作流程的描述,在展示该小组带来的技术卓越方面做得很好。 1)每三个月构建注释包,将注释源数据结构的变化整合到注释软件包构建代码中。 2)维护项目网站,邮件列表,源控制存档。组织短课程和会议的网络资源。 3)改进现有软件。 4)维持自动化的夜间构建。与包装未能通过QA的开发人员合作。 5)解决跨平台问题。 6)审查新提交的内容。在邮件列表上回答问题。 7)使用软件工程最佳实践。制定单元测试策略。为新数据类型设计适当的类和方法。重构现有代码,以更好地互操作性和可扩展性。 8)开发和组织培训材料和文档。 文档其他地方提供了有关测试,构建程序,互操作性,质量保证和项目管理的广泛细节。他们显然已经处理了这个规模项目所需的许多问题。他们指出,最大的成本项目之一是支持此软件包在多个平台上运行的支持。他们指出,许多贡献者都专注于一个平台,他们的大部分工作都是跟踪跨平台错误。鉴于所使用的平台与更大的生物信息学界的需求同步,这是时间范围的。 独创性 尽管高度独创性并不是开源软件开发项目的特别关键要素,但提案中肯定有独特的领域。最重要的是,可以肯定地说,没有另一个项目的统计分析系统混合在一起,专门针对重要的研究生物信息学领域量身定制,该系统可以在许多不同的计算机环境中部署。 研究人员和共同研究人员 Gentleman博士是生物导体项目的创始人和领导者。绅士博士是哈佛大学公共卫生学院生物统计学系的副教授,以及达纳·法伯癌症研究所(Dana Farber Cancer Institute)生物统计学和计算生物学系。 2004年,他成为西雅图弗雷德·哈钦森癌症研究中心的计算生物学计划负责人。他按照与生物导体或相关统计分析有关的十个出版物的顺序。他与另一个联合创始人共同实施了R编程语言的原始版本。他是许多研究补助金的PI或研究人员,至少有两项与这项工作直接相关。他和该提案的其他成员已经教授了许多课程,并为生物导体提供了讲座,这些课程的数量无疑表明了对该项目的重要奉献精神。 该应用程序的第42页的表3显示了与该项目相关的PI和CO-PI活动的审查。分配给每个参与者的角色和时间分配似乎是合理的。 绅士博士将担任项目负责人,并将管理程序员,协调项目并调查新的计算方法和方法。 Vincent Carey博士,作为CO首席研究员已为该项目分配了20%的时间。 2005年,他成为医学副教授(生物统计学)。凯里(Carey)是生物导体开发核心的高级成员。他将提高互操作性,以允许在Java,Perl和其他语言中重复使用外部模块,并加强高吞吐量实验工作流程和机器学习工具和本体学捕获之间的接口。 行政助理将协助凯里博士满足行政要求,包括呼叫协调,手稿准备和分发,计划和预算管理。 Rafael Irizarry博士作为Co-Pi将在该项目上花费30%的努力。 Irizarry博士拥有四年的经验,开发用于微阵列数据分析的方法,并在生物统计学系中作为约翰·霍普金斯医学机构的微阵列核心的教师联络。 他将监督所有在所有平台上支持预处理的努力,并支持与微阵列相关的财团,例如ERCC,GEO和ArrayExpress。 程序员将负责项目网站,管理电子邮件列表,维护培训材料,升级软件,重构和其他代码增强功能,管理SVN存档以及生物导体版本。他们将处理检查所有提交的软件包,开发单元测试并简化下载,夜间构建程序,跨平台问题,数据技术以及在其他语言中发现的资源(例如,大型的C弦乐库库,用于字符串处理,机器学习和机器学习和很快)。程序员熟悉用于数据库管理以及并行和分布式计算的R软件包和系统。他们将负责管理注释数据,包括包装构建和与有机体的特定数据提供商保持联系。 意义 鉴于该提案的范围以及一般而言的生物导体项目的规模是适当的。扎根的项目管理与新的最新技术的发展结合在一起,这些技术将促进生物信息学界的许多成员。资金很高的可能性将有助于维持和推进这一重要的社区资源。 环境 计算机基础设施以及PI和Co-Pis的本地部门以及与较大的科学界的工作都是支持该项目的绝佳环境。 总之 这是一个了不起的资源。 这是一个管理良好的大型开源项目,具有精心制作的质量保证测试,文档和培训。 继续这是一个三年的项目。除了那个时期,还需要对长期陈述的目标声明。 PI应阐明战略目标,以及他们的研究动机,并将其转化为行动计划。他们还应该使用这种环境来描述如何选择放入生物导体系统的软件包;表3仅列出了申请人制作的软件包的名称,它可能会进一步为读者提供更多信息以选择软件包。 如果他们在文档中说明了一个简单的例子:“鉴于我们对微阵列状态的评估,我们最终的目标是覆盖注释数据,本体论信息和其他形式的元数据到表达数据的统计框架中。 “然后,由此产生的研究计划将证明一个五年的项目是合理的,但在此应用程序中还不够强大。 应当指出的是,该系统的许多Benifiaries不仅是下载系统的用户。 在许多情况下,集中信息服务下载其系统,然后对校园或更广泛的WWW社区的其他成员进行分析。尽管这种类型的“成功度量”很难评估,但随后的建议中,这一领域的努力将有所帮助。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

MARTIN MORGAN其他文献

MARTIN MORGAN的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

零信任架构下的电子健康档案动态共享研究
  • 批准号:
    72274077
  • 批准年份:
    2022
  • 资助金额:
    45 万元
  • 项目类别:
    面上项目
科学基金档案资料信息化管理探索与实践研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10 万元
  • 项目类别:
胶州湾河口湿地盾纤亚纲纤毛虫的多样性研究与档案资料建立
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于基金项目全生命周期的档案规范化管理探索与实践研究
  • 批准号:
    52142301
  • 批准年份:
    2021
  • 资助金额:
    10 万元
  • 项目类别:
    专项基金项目
医联体内电子健康档案应用绩效提升研究:影响因素、动力系统与治理机制
  • 批准号:
    72164037
  • 批准年份:
    2021
  • 资助金额:
    28 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Analytical Core
分析核心
  • 批准号:
    10730061
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
AppalTRuST Career Enhancement Core
AppalTrust 职业提升核心
  • 批准号:
    10665324
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
ComPASS Collective for Community Engagement (C3E)
ComPASS 社区参与集体 (C3E)
  • 批准号:
    10903370
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
Multiplexed detection of cell-free M. Tuberculosis DNA and its drug-resistant variants in blood
血液中无细胞结核分枝杆菌 DNA 及其耐药变异体的多重检测
  • 批准号:
    10639855
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
Resource Section
资源部分
  • 批准号:
    10773479
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了