Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
基本信息
- 批准号:10926595
- 负责人:
- 金额:$ 13.85万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAlgorithmsAnniversaryAreaAwarenessBiologicalBiological AssayBooksCactaceaeCatalogsCharacteristicsChemical StructureChemicalsCollectionComputer AssistedComputersContractsCustomDataData SetDatabasesDepositionDevelopmental Therapeutics ProgramDrug DesignEvaluationGenerationsGoalsInformaticsInformation SciencesInternetLegal patentLinkMalignant NeoplasmsMethodsNaturePaperPharmacologic SubstancePropertyPubChemPublicationsReadabilityRecordsResearch PersonnelResourcesRunningSamplingSeriesServicesStructureSystemTelephoneTimeUnited States National Institutes of HealthUpdateVendorWorkWritingchemical groupchemical synthesiscloud platformdatabase structuredesigndrug developmentimprovedinformatics toolinsightnext generationpharmacophoreprogramsscreeningsmall moleculetautomertooltool developmentweb based interfaceweb platformweb serverweb servicesweb siteweb-based tool
项目摘要
The principal objective of this project is to make large collections of small molecules available for aiding in drug development, both in-house and publicly, to advance the fields of chemical structure identification and processing and of unique compound identifier generation, as well as to provide free chemoinformatics tools aiding one in dealing with such databases. This project started with posting the information in the Open NCI Database on the CADD Group's public web server. Many databases are available to the user, including large vendor catalogs of compounds that can be acquired for screening. Advanced processing is applied to the data, and powerful searching and display capabilities have been implemented. The nature of the resources currently being developed is exemplified by a brief description of this service: The data in this current Enhanced NCI Web Browser web service comprise data from NCI's Developmental Therapeutics Program (DTP) and additional information with which we have augmented the DTP data sets. We have subjected the Open NCI Database of about 260,000 compounds to various analyses that help to better understand its characteristics and put it in perspective of other large databases used in computer-aided drug design and chemical information sciences. Various clustering methods have been applied to it to elucidate its diversity, and the results have been compared with those for other databases. The Open NCI Database has been converted into various formats, suitable for further processing including 3D pharmacophore searching. We have also implemented a powerful public search tool for the Open NCI Database with a web interface based on the chemical information toolkit CACTVS. Using just a web browser, the user is able to search about 250,000 structures for more than 600 criteria. We have greatly augmented the original DTP files with numerous additional data fields, be it calculated, predicted or hyperlinked information. These data have also been made available in directly downloadable format. Links to several additional services for further processing have been implemented. An online 3D pharmacophore capability has been built, a capability that is currently unique on the web, as far as we are aware of. Searchable predictions of more than 550 different biological activities, calculated by the program PASS for most of the quarter-million compounds, have been included in the web service (abstract). A more recent service is our Chemical Structure Lookup Service (CSLS), available at http://cactus.nci.nih.gov/lookup. CSLS is essentially a "phone book" for small molecules, allowing the user to quickly find out in which, if any, of over 100 different databases (both public and commercial), comprising more than 74 million entries, their compounds occur. Updates of both the user interface and the structure and data holdings are underway as of the time of this writing, which will push the number of entries in CSLS beyond the 100 million mark. Part of these projects is the downloading, reformatting and evaluation for cancer-related purposes, of the massive set of structure and assay data as deposited in PubChem. The Chemical Identifier Resolver (CIR) is the service with the most use, with typically several hundred thousand requests per day. CIR works as a resolver for different chemical structure identifiers and allows one to convert a given structure identifier into another representation or structure identifier. Among others, our NCI/CADD Structure Identifiers developed in-house as well as the new Standard InChI and InChIKey identifiers are handled by this service. One of CIR's key features is that it is a programmatic interface into the Chemical Structure Database (CSDB). An update of CSDB has been completed to over 360 million original database records representing approximately 128 unique million small-molecule structures. Many additional capabilities are planned to be added to this service, which is increasingly being integrated with other web services and chemoinformatics tools world-wide. CIR will also become increasingly important in the area of publications involving chemical structures, as efforts increase to make inclusion of computer-readable representations of all compounds presented in a paper mandatory. We are working on the next generation web platform which will be the basis for a series of new web services and updates of existing services including CADD Group's Chemical Structure Lookup Service (CSLS II). The URL of our public web server is https://cactus.nci.nih.gov. The monthly average usage counts on cactus from January 2016 through December 2021 have been 14 million accesses, i.e. more than 450,000 per day. We have analyzed a set of 43 million chemical structure records extracted from patent data (EP, US PTO, WO) by the IBM-led consortium of large pharmaceutical companies in the context of the SIIP (Strategic IP Insight Platform) project. The originally CADD Group-developed utility OSRA was used in this project. Part of these data were given for public use to both PubChem and the CADD Group (see, e.g., http://www-935.ibm.com/services/us/gbs/bao/siip/nih/?sid=0015AFBF08D8F183C1F8E32A430CFFEB). Efforts to implement a resource for making affordable chemical synthesis of screening samples available to all NIH researchers were realized in the form of an extension of the contract with the formerly independent company ChemNavigator, now part of Sigma-Aldrich, in turn acquired by Merck GmbH, who have implemented the so-called Semi-Custom Synthesis Online Request System (SCSORS). Our database and chemoinformatics tools are benefiting from the work pertaining to tautomerism, in particular related to the redesign of the handling of tautomerism for version 2 of the IUPAC InChI identifier. These efforts include our downlaodable Tautomer Database. A recent new web tool in this context is the so-called Tautomerizer. Numerous additional downloadable data sets have been made available on the group's web server. The work of creating a database of more than a billion easily synthesizable compounds in the SAVI project is described elsewhere. Efforts to move some of these tools to cloud platforms are being undertaken. The cactus web server has celebrated its 25th anniversary. It is the longest-running freely accessible chemoinformatics website with advanced structure search capabilities in the world. A very significant update of the several of the services on our web server is currently underway.
该项目的主要目的是使大量的小分子可用于协助内部和公共药物开发,以推进化学结构识别和加工的领域以及独特的化合物标识符的产生,并提供免费的化学化形式工具,以帮助处理此类数据库。该项目首先将信息发布在CADD集团公共Web服务器上的OPEN NCI数据库中。用户可以使用许多数据库,包括可以获取用于筛选的化合物的大型供应商目录。高级处理应用于数据,并实现了强大的搜索和显示功能。当前正在开发的资源的性质是通过此服务的简要说明来举例说明的:当前增强的NCI Web浏览器Web服务中的数据包括NCI的发展疗法计划(DTP)的数据以及我们增强DTP数据集的其他信息。我们已经对大约260,000种化合物的开放NCI数据库进行了各种分析,这些分析有助于更好地理解其特征,并将其视为计算机辅助药物设计和化学信息科学中使用的其他大型数据库。已经将各种聚类方法应用于阐明其多样性,并将结果与其他数据库的结果进行了比较。开放的NCI数据库已转换为各种格式,适用于进一步处理,包括3D药效团搜索。我们还基于化学信息工具包CACTVS,使用Web界面为开放NCI数据库实施了强大的公共搜索工具。仅使用Web浏览器,用户可以搜索大约250,000个结构以获取600多个标准。我们已经通过许多其他数据字段来大大增加了原始DTP文件,无论是计算,预测还是超链接信息。这些数据也已直接可下载格式提供。已实施了多个其他服务的链接。据我们所知,已经建立了在线3D药效团功能,目前在网络上是独一无二的功能。 Web服务(摘要)中包括了由计划通行证中大部分的大多数化合物中的计划通过计算的550多种不同生物学活动的可搜索预测。最近的服务是我们的化学结构查找服务(CSL),可在http://cactus.nci.nih.gov/lookup上找到。 CSL本质上是针对小分子的“电话簿”,允许用户快速发现(如果有的话)有100多个不同的数据库(公共和商业)中有超过7400万个参赛作品,则会发生它们的化合物。截至撰写本文时,正在进行用户界面以及结构和数据持有的更新,这将使CSL中的条目数量超过1亿分。这些项目的一部分是出于与癌症相关的目的下载,重新格式化和评估,这些结构集和放入PubChem中的大量结构和测定数据。化学标识符解析器(CIR)是使用最多的服务,通常每天有数十万个请求。 CIR是不同化学结构标识符的分辨率,并允许一个人将给定的结构标识符转换为另一个表示形式或结构标识符。除其他外,我们的NCI/CADD结构标识符在内部开发了新的标准Inchi和Inchikey标识符。 CIR的主要特征之一是它是化学结构数据库(CSDB)的程序化接口。 CSDB的更新已完成,已完成超过3.6亿个原始数据库记录,代表大约128亿个唯一的小分子结构。计划将许多其他功能添加到这项服务中,该功能越来越多地与全球其他Web服务和化学信息技术工具集成在一起。在涉及化学结构的出版物领域中,CIR也将变得越来越重要,随着努力的努力使包括强制性纸张中所有化合物的计算机可读表示。我们正在使用下一代Web平台,这将是一系列新的Web服务的基础,以及包括CADD Group的化学结构查找服务(CSLS II)在内的现有服务的更新。我们公共Web服务器的URL是https://cactus.nci.nih.gov。从2016年1月到2021年12月,仙人掌的月平均使用情况为1400万,即每天超过450,000。我们已经分析了在SIIP(战略IP Insight Platform)项目的背景下,由IBM领导的大型制药公司的IBM领导的大型制药公司联盟从专利数据(EP,US PTO,WO)中提取的4300万个化学结构记录。该项目使用了最初的CADD组开发的实用程序OSRA。这些数据的一部分均供公众用于PubChem和CADD组(参见,例如,例如http://www-935.ibm.com/services/us/us/gbs/bao/bao/siip/nih/?sid=001515AFBF015AFBF08D8D8D8F183C1F8E322A430CFFEB)。为实施所有NIH研究人员提供可负担的化学化学筛查样品的资源的努力以与前独立公司Chemnavigator的合同延长的形式(现为Sigma-Aldrich的一部分),而默克GMBH收购,后者已被所谓的半定制在线申请系统(Scsors)(SCSORS)。我们的数据库和化学信息学工具受益于互变异构的工作,尤其是与IUPAC INCHI标识符2版的互变异构象的重新设计有关。这些努力包括我们可下调的互变异者数据库。在这种情况下,最新的新Web工具是所谓的互变异器。该组的Web服务器已提供了许多其他可下载的数据集。在SAVI项目中,在其他地方描述了创建超过10亿个可容易合成化合物的数据库的工作。正在进行将其中一些工具转移到云平台的努力。仙人掌网络服务器已庆祝其成立25周年。它是最长的自由访问的化学信息信息网站,拥有世界高级结构搜索功能。目前正在进行我们网络服务器上的几项服务的非常重要的更新。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Optical structure recognition software to recover chemical information: OSRA, an open source solution.
- DOI:10.1021/ci800067r
- 发表时间:2009-03
- 期刊:
- 影响因子:5.6
- 作者:Filippov, Igor V.;Nicklaus, Marc C.
- 通讯作者:Nicklaus, Marc C.
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on.
- DOI:10.1186/1758-2946-3-37
- 发表时间:2011-10-14
- 期刊:
- 影响因子:8.6
- 作者:O'Boyle NM;Guha R;Willighagen EL;Adams SE;Alvarsson J;Bradley JC;Filippov IV;Hanson RM;Hanwell MD;Hutchison GR;James CA;Jeliazkova N;Lang AS;Langner KM;Lonie DC;Lowe DM;Pansanel J;Pavlov D;Spjuth O;Steinbeck C;Tenderholt AL;Theisen KJ;Murray-Rust P
- 通讯作者:Murray-Rust P
Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation.
- DOI:10.1021/acs.joc.5b01370
- 发表时间:2015-10-16
- 期刊:
- 影响因子:0
- 作者:Guasch L;Peach ML;Nicklaus MC
- 通讯作者:Nicklaus MC
Computer tools in the discovery of HIV-1 integrase inhibitors.
- DOI:10.4155/fmc.10.193
- 发表时间:2010-07
- 期刊:
- 影响因子:4.2
- 作者:Liao C;Nicklaus MC
- 通讯作者:Nicklaus MC
A new approach to radial basis function approximation and its application to QSAR.
- DOI:10.1021/ci400704f
- 发表时间:2014-03-24
- 期刊:
- 影响因子:5.6
- 作者:Zakharov AV;Peach ML;Sitzmann M;Nicklaus MC
- 通讯作者:Nicklaus MC
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MARC NICKLAUS其他文献
MARC NICKLAUS的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MARC NICKLAUS', 18)}}的其他基金
HIV Integrase Modeling and Computer-Aided Inhibitor Deve
HIV整合酶建模和计算机辅助抑制剂开发
- 批准号:
7291875 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor Development
HIV 整合酶建模和计算机辅助抑制剂开发
- 批准号:
7965392 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor and Microbicide Development
HIV 整合酶建模以及计算机辅助抑制剂和杀菌剂开发
- 批准号:
10702372 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
HIV Integrase Modeling and Computer-Aided Inhibitor Development
HIV 整合酶建模和计算机辅助抑制剂开发
- 批准号:
7733068 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
Synthetically Accessible Virtual Inventory (SAVI)
可综合访问的虚拟库存 (SAVI)
- 批准号:
10926263 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
Large Databases of Small Molecules - Drug Development Tool and Public Resource
小分子大型数据库 - 药物开发工具和公共资源
- 批准号:
10703018 - 财政年份:
- 资助金额:
$ 13.85万 - 项目类别:
相似国自然基金
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于物理信息神经网络的电磁场快速算法研究
- 批准号:52377005
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
- 批准号:12302257
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维不平衡数据的分类集成算法研究
- 批准号:62306119
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
アナログ回路に基づく進化計算手法による深層学習モデルの最適化
基于模拟电路的进化计算方法优化深度学习模型
- 批准号:
24K15115 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 13.85万 - 项目类别:
Continuing Grant