The Dark Reaction Project: A Machine Learning Approach to Materials Discovery

暗反应项目:材料发现的机器学习方法

基本信息

  • 批准号:
    1307801
  • 负责人:
  • 金额:
    $ 30万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-09-15 至 2017-08-31
  • 项目状态:
    已结题

项目摘要

Technical Summary:Hydrothermal synthesis reactions of organically-templated inorganic solids are an ideal test case for data-driven materials chemistry, as just a few reactants (one or two inorganic components, one or two organic components and solvent) and a few reaction conditions (pH, temperature, reaction time) yield a diversity of products with numerous applications. Despite this simplicity, the formation of crystalline products depends sensitively on the quantities of reagents used and the reaction conditions, which makes this a demanding test case for predicting success or failure of the reaction. Moreover, unlike other systems such as metal organic frameworks (MOFs), the many different types of intermolecular interactions that are present result in highly diverse crystal structures which cannot be predicted a priori. Rather than predicting a final crystal structure, we aim to address the simpler problem of whether a reaction will yield any crystalline product or not. Our project will address this with three strategies: We propose constructing a searchable online repository for "dark reactions", the chemical reactions that have been performed and recorded in laboratory notebooks, but never reported in the literature. This begins with putting our own reactions online, then the reactions of selected experimental collaborators, and finally creating a web-accessible public repository for depositing, retrieving, and utilizing reaction information. Using this data, we propose using machine learning to derive predictions to increase the success rate of performing novel reactions. From the experimental reaction data, we use cheminformatics calculations to predict 200 computed properties of the individual reagents (e.g., van der Waals surface areas, polar surface areas as a function of pH, number of hydrogen bond donors and acceptors, etc.) and compute 50 stoichiometric descriptors (e.g., ratios of organic and inorganic components, weighted by hydrogen bond donor/acceptors as a function of pH, etc.). Based on a preliminary dataset of 506 reactions, we have been able to train a decision tree model to achieve an 87% success rate in predicting whether a crystalline product is formed or not. During this project, we will make the improved model publicly available (via the web), address weaknesses in the physicochemical model, and integrate this with databases of commercially available starting materials. Finally, we will perform experimental validation to demonstrate a proof-of-principle synthesis of new compounds, address structural holes in the dataset, and engage with the broader research community to guide experiments in other laboratories for the synthesis of new materials and addressing limitations in the chemical space of our model. Besides the impact on this specific area of materials synthesis, the software architecture that we develop will serve as a starting point for others to begin similar projects and we commit to freely distributing our work to others by open-sourcing our code under a license that will allow its free use in academic settings. Non-technical Summary: Organically-templated metal oxide framework compounds have outstanding structural and chemical diversity, which lends them to applications for industrial catalysis, gas separation, and optical engineering. Yet, despite several decades of experimental effort, making new examples of these materials is a time-consuming trial-and-error process. Most of the chemical reactions that have been performed are deemed "unsuccessful" because they do not result in a crystalline product, and are never reported in the literature. There is no forum for collecting these experiments, nor a means for deriving value from them. Nevertheless, these "dark reactions" are valuable because they define the bounds on the reaction conditions needed to successfully produce a product. By providing a searchable online repository for reaction data, we will enable better management and sharing of these dark reactions. Moreover, we will use this data as a resource to train machine learning (aka statistical learning or data-mining) algorithms that predict the success of reactions ahead of time. Based on the machine learning predictions, we will perform experimental validation to test the predictions of the model.Our project will provide a mechanism for collecting the dark reactions and then using them to guide future reactions to be more successful, reducing the researcher time and cost of reagents needed to synthesize new materials. This will accelerate and lower the cost (in researcher time and materials) of discovering new materials. This directly addresses the call of the White House Office of Science and Technology Policy's 2011 Materials Genome Initiative, specifically finding ways to use computation to bring functional materials to market more quickly. Second, this project will serve as a model for collaboration between chemists and computer scientists that can be directly transferred to a wide range of other disciplines and avenues of investigation. Third, we will provide a cohesive, comprehensive, interdisciplinary and sustained research experience for undergraduate students, thus contributing to the scientific workforce. Fourth, our outreach activities will foster interest in data-driven techniques, create a network of collaborating laboratories and provide the software infrastructure to others wishing to initiate related projects.
技术摘要:有机化的无机固体的水热合成反应是数据驱动材料化学化学的理想测试用例,因为仅几个反应物(一种或两个无机成分,一个或两个有机成分和溶剂,一个或两个有机成分和溶剂)和少数反应条件(pH,温度,温度,反应时间)与多种应用产生产物的多样性。 尽管如此简单,但晶体产物的形成敏感地取决于所用试剂的数量和反应条件,这使得这是预测反应成功或失败的苛刻测试案例。 此外,与其他系统(例如金属有机框架(MOF))不同,存在的许多不同类型的分子间相互作用会导致高度多样化的晶体结构,这些结构无法先验地预测。 我们没有预测最终的晶体结构,而是旨在解决反应是否会产生任何结晶产物的更简单问题。 我们的项目将通过三种策略来解决这一点:我们建议为“黑暗反应”构建可搜索的在线存储库,“黑暗反应”是在实验室笔记本上进行和记录的化学反应,但从未在文献中报告。这首先要将自己的反应在线,然后是选定的实验合作者的反应,然后最终创建了一个可访问网络访问的公共存储库来存入,检索和利用反应信息。 使用这些数据,我们建议使用机器学习来得出预测,以提高执行新反应的成功率。 From the experimental reaction data, we use cheminformatics calculations to predict 200 computed properties of the individual reagents (e.g., van der Waals surface areas, polar surface areas as a function of pH, number of hydrogen bond donors and acceptors, etc.) and compute 50 stoichiometric descriptors (e.g., ratios of organic and inorganic components, weighted by hydrogen bond donor/acceptors as a function of pH, etc.).基于506个反应的初步数据集,我们能够训练决策树模型,以在预测是否形成晶体产品时获得87%的成功率。在此项目中,我们将公开获得改进的模型(通过Web),解决物理化学模型中的弱点,并将其与市售起始材料的数据库集成在一起。 最后,我们将进行实验验证,以证明新化合物的原理综合综合,解决数据集中的结构孔,并与更广泛的研究社区互动,以指导其他实验室中的实验,以合成新材料并解决我们模型化学空间中的限制。 除了对材料合成的特定领域的影响外,我们开发的软件体系结构还将成为其他人开始类似项目的起点,我们致力于通过将我们的代码自由开放给他人自由地分配我们的作品,以允许其在学术环境中自由使用。 非技术摘要:有机拟合的金属氧化物框架化合物具有出色的结构和化学多样性,这使它们适用于工业催化,气体分离和光学工程的应用。 然而,尽管进行了数十年的实验努力,但这些材料的新示例是一个耗时的试验过程。 大多数执行的化学反应被认为是“失败的”,因为它们不会导致结晶产物,也从未在文献中进行过报道。 没有一个论坛来收集这些实验,也不是从中获得价值的手段。 然而,这些“黑暗反应”是有价值的,因为它们在成功生产产品所需的反应条件上定义了界限。通过为反应数据提供可搜索的在线存储库,我们将启用更好的管理和共享这些黑暗反应。 此外,我们将使用此数据作为训练机器学习(又称统计学习或数据挖掘)算法的资源,以预测提前反应的成功。 根据机器学习预测,我们将执行实验验证以测试模型的预测。您的项目将提供一种收集黑暗反应的机制,然后使用它们来指导未来的反应更成功,从而减少了研究人员的时间和成本,以合成新材料所需的试剂。 这将加速并降低发现新材料的成本(在研究人员的时间和材料中)。 这直接解决了白宫科学技术政策2011年材料基因组倡议的呼吁,特别是找到了使用计算来更快地推销功能材料的方法。 其次,该项目将成为化学家与计算机科学家之间合作的模型,这些模型可以直接转移到其他学科和调查途径上。 第三,我们将为本科生提供凝聚力,全面,跨学科和持续的研究经验,从而为科学劳动力做出了贡献。 第四,我们的外展活动将促进对数据驱动技术的兴趣,创建一个合作实验室的网络,并向希望启动相关项目的其他人提供软件基础架构。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Auditing Black-Box Models for Indirect Influence
审计黑盒模型的间接影响
  • DOI:
    10.1109/icdm.2016.0011
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adler, Philip;Falk, Casey;Friedler, Sorelle A.;Rybeck, Gabriel;Scheidegger, Carlos;Smith, Brandon;Venkatasubramanian, Suresh
  • 通讯作者:
    Venkatasubramanian, Suresh
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Joshua Schrier其他文献

Predicting organic thin-film transistor carrier type from single molecule calculations
从单分子计算预测有机薄膜晶体管载流子类型
Research in Physical Chemistry at Primarily Undergraduate Institutions.
主要在本科院校进行物理化学研究。
Comment on “Comparing the Performance of College Chemistry Students with ChatGPT for Calculations Involving Acids and Bases”
评论“比较大学化学学生与 ChatGPT 涉及酸和碱的计算的表现”
Inducing polarity in [VO<sub>3</sub>]<sub><em>n</em></sub><sup><em>n</em>−</sup> chain compounds using asymmetric hydrogen-bonding networks
  • DOI:
    10.1016/j.jssc.2012.02.024
  • 发表时间:
    2012-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    Matthew D. Smith;Samuel M. Blau;Kelvin B. Chang;Thanh Thao Tran;Matthias Zeller;P. Shiv Halasyamani;Joshua Schrier;Alexander J. Norquist
  • 通讯作者:
    Alexander J. Norquist
Carbon dioxide separation with a two-dimensional polymer membrane.

Joshua Schrier的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Joshua Schrier', 18)}}的其他基金

MFB: Accelerating the Discovery of Novel Liposome Formations with Origins-of-Life Insights, Laboratory Automation, and Machine Learning
MFB:利用生命起源洞察、实验室自动化和机器学习加速新型脂质体形成的发现
  • 批准号:
    2226511
  • 财政年份:
    2022
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
CDS&E: D3SC: The Dark Reaction Project: A machine-learning approach to exploring structural diversity in solid state synthesis
CDS
  • 批准号:
    1928882
  • 财政年份:
    2018
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
CDS&E: D3SC: The Dark Reaction Project: A machine-learning approach to exploring structural diversity in solid state synthesis
CDS
  • 批准号:
    1709351
  • 财政年份:
    2017
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant

相似国自然基金

全细胞催化合成2,5-呋喃二甲酸的关键酶及底物转运机制研究
  • 批准号:
    22308180
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
I型X射线暴关键反应34Ar(α,p)37K的研究
  • 批准号:
    12375146
  • 批准年份:
    2023
  • 资助金额:
    52.00 万元
  • 项目类别:
    面上项目
基于烯酮亚胺中间体的芳烃Claisen重排反应
  • 批准号:
    22371261
  • 批准年份:
    2023
  • 资助金额:
    50.00 万元
  • 项目类别:
    面上项目
Baeyer-Villiger单加氧酶的非常规反应设计与催化机制解析
  • 批准号:
    32301043
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
基于原位激光/椭圆振动复合辅助的反应烧结碳化硅微结构表面金刚石切削技术研究
  • 批准号:
    52375430
  • 批准年份:
    2023
  • 资助金额:
    50.00 万元
  • 项目类别:
    面上项目

相似海外基金

A biologically-inspired, interactive digital device to introduce K12 students to computational neuroscience
一种受生物学启发的交互式数字设备,可向 K12 学生介绍计算神经科学
  • 批准号:
    10706026
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
Personalized End of Life Care in Safety-Net hospitals: Implementation of the 3 Wishes Project
安全网医院的个性化临终关怀:实施“三个愿望”项目
  • 批准号:
    10736466
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
Maglev LVAD with expandable stented inlet and anti-thrombotic coating to improve hemocompatibility
磁悬浮 LVAD 具有可扩张支架入口和抗血栓涂层,可改善血液相容性
  • 批准号:
    10736998
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
NIH resubmission Deyu Li - Etheno adductome and repair pathways
NIH 重新提交 Deyu Li - 乙烯加合组和修复途径
  • 批准号:
    10659931
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
A Low-Cost Wearable Connected Health Device for Monitoring Environmental Pollution Triggers of Asthma in Communities with Health Disparities
一种低成本可穿戴互联健康设备,用于监测健康差异社区中哮喘的环境污染诱因
  • 批准号:
    10601615
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了