Virtual Approaches to New Chemistries

新化学的虚拟方法

基本信息

批准号：
10447249
负责人：
BARRY A BUNIN
金额：
$ 44万
依托单位：
COLLABORATIVE DRUG DISCOVERY, INC.
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-06 至 2024-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10447249
关键词：
Abbreviations Address Algorithms Automation Back Biological Biological Assay Categories Characteristics Chemical Structure Chemicals Chemistry Collection Data Databases Descriptor Drug Design Evaluation FAIR principles Generations Goals Human Informatics Internet Learning Module Machine Learning Measures Methodology Modeling National Center for Advancing Translational Sciences Natural Language Processing Natural regeneration Nature Ontology Process Program Development Protocols documentation Quantitative Structure-Activity Relationship Reaction Readability Reagent Recipe Research Personnel Running Scheme Scientist Semantics Solvents Sorting - Cell Movement Structure System Technology Text Update Vendor Visual Work base chemical reaction deep learning design drug development experience instrument interactive tool knowledge base natural language new technology novel preference small molecule stoichiometry success tool vector virtual

项目摘要

Project Summary/Abstract Two new virtual chemistry technologies will be added to the NCATS ASPIRE project as separate modules. The first module will enable new chemistries to be modelled and selected from cutting edge (deep) machine learning technology using the latest structure/activity data taken directly from instruments. The second module will be a novel informatics system for capturing chemistry-rich data in a semantic template as machine-readable reactions which will increase the utility of chemical reactions in electronic lab notebooks and allow more precise interrogation and automation of reaction analyses (and their corresponding reaction products). The deep learning technology in module 1 is based on our new chemically rich vector (CRV) methodology, which is able to compress information about chemical structures into a vector of 64 numbers with an efficiency that allows the encoding process to be reversed: not only can a CRV be converted back into its original structure with high success (>90% exact match), but a modified CRV can be converted into a structure that is representative of that point in chemical space. CRVs make excellent descriptors for SAR/QSAR iteration because they contain much more chemical information in a small space, allowing the automation of structure-activity models to be more streamlined, relative to conventional descriptors. The resulting models will explore the multi-dimensional space via an interactive visual interface (human-directed) or a back-end algorithm to constantly search for new and better structures (machine-directed). Both interactive and automated processes will be connected back into the ASPIRE automation cycle so that they can be synthesized and measured (hypothesis evaluation and iterative optimization). The second module, machine-readable reactions, draws from our extensive experience developing the BioHarmony Annotator (formerly: BioAssay Express) which uses natural language models to assign semantic ontology terms to biological assay protocols, turning them from unstructured text into machine-readable data. Extracting the full content of reactions from protocols and chemical structure diagrams is remarkably difficult given the unstructured nature of text, abbreviations, shortcuts and assumptions that go into diagrams. It is further complicated by the need to connect the materials in the scheme with the reaction text description (e.g. reagents, solvents, the sequences involved in the recipe, reaction workup, and product characterization). As an alternative, we will modularize the CDD stoichiometric sketcher, which will allow us to extract this data. We will work with NCATS to identify important fields to capture, creating a machine readable chemical reaction template.

项目摘要/摘要 NCATS ASPIRE项目将以单独的模块添加两种新的虚拟化学技术。这第一个模块将使新的化学物质能够建模并从尖端（深）机器中选择使用直接从工具中获取的最新结构/活动数据的学习技术。第二个模块将是一种新颖的信息系统，用于在语义模板中捕获化学丰富的数据作为机器可读反应将增加电子实验室笔记本中化学反应的效用允许对反应分析的更精确的询问和自动化（及其相应的反应产品）。模块1中的深度学习技术基于我们的新化学丰富向量（CRV）方法论，能够将有关化学结构的信息压缩到具有效率的64个数字的向量中这允许编码过程被颠倒：不仅可以将CRV转换为原始成功的结构很高（> 90％精确匹配），但是修改后的CRV可以转换为一个结构化学空间中该点的代表。 CRV为SAR/QSAR迭代提供了出色的描述符因为它们在较小的空间中包含更多的化学信息，因此可以自动化相对于常规描述符，结构活性模型更加精简。由此产生的模型将通过交互式视觉界面（人为指导）或后端探索多维空间算法不断搜索新的和更好的结构（机器定向）。互动和自动化过程将被连接到Aspire自动化周期，以便它们可以合成和测量（假设评估和迭代优化）。第二个模块，机器可读的反应，从我们的丰富经验中汲取了发展 Bioharmony注释器（以前：Bioassay Express）使用自然语言模型来分配语义生物测定协议的本体术语，将其从非结构化的文本转变为可读的数据。从方案和化学结构图中提取反应的全部内容非常困难鉴于文本的非结构化性质，缩写，快捷方式和图表的假设。这是需要将方案中的材料与反应文本描述联系起来更加复杂（例如试剂，溶剂，配方中涉及的序列，反应工作和产品表征）。作为替代方案，我们将模块化CDD化学计量学素描器，这将使我们能够提取此数据。我们将与NCAT合作以识别重要领域以捕获，创建机器可读的化学反应模板。