喵ID:2lrCrk免责声明

Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence

针对构型无序材料的机器学习力场辅助集群扩展方法:训练集选择和大小收敛的关键评估

基本信息

DOI:
10.1021/acs.jctc.2c00017
发表时间:
2022
期刊:
J. Chem. Theo. Comput.
影响因子:
--
通讯作者:
Hong Jiang
中科院分区:
其他
文献类型:
--
作者: Jun-Zhong Xie;Xu-Yuan Zhou;Dong Luan;Hong Jiang研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Cluster expansion (CE) is a powerful theoretical tool to study the configuration-dependent properties of substitutionally disordered systems. Typically, a CE model is built by fitting a few tens or hundreds of target quantities calculated by first-principles approaches. To validate the reliability of the model, a convergence test of the cross-validation (CV) score to the training set size is commonly conducted to verify the sufficiency of the training data. However, such a test only confirms the convergence of the predictive capability of the CE model within the training set, and it is unknown whether the convergence of the CV score would lead to robust thermodynamic simulation results such as order-disorder phase transition temperature Tc. In this work, using carbon defective MoC1-x as a model system and aided by the machine-learning force field technique, a training data pool with about 13000 configurations has been efficiently obtained and used to generate different training sets of the same size randomly. By conducting parallel Monte Carlo simulations with the CE models trained with different randomly selected training sets, the uncertainty in calculated Tc can be evaluated at different training set sizes. It is found that the training set size that is sufficient for the CV score to converge still leads to a significant uncertainty in the predicted Tc and that the latter can be considerably reduced by enlarging the training set to that of a few thousand configurations. This work highlights the importance of using a large training set to build the optimal CE model that can achieve robust statistical modeling results and the facility provided by the machine-learning force field approach to efficiently produce adequate training data.
团簇展开(CE)是一种强大的理论工具,用于研究替代无序系统的构型相关性质。通常,CE模型是通过拟合由第一性原理方法计算出的几十个或几百个目标量来构建的。为了验证模型的可靠性,通常会对交叉验证(CV)分数相对于训练集大小进行收敛性测试,以验证训练数据的充分性。然而,这样的测试仅仅确认了CE模型在训练集内预测能力的收敛性,而CV分数的收敛是否会导致稳健的热力学模拟结果,例如有序 - 无序相变温度Tc,尚不清楚。在这项工作中,以含碳缺陷的MoC1 - x作为模型体系,并借助机器学习力场技术,高效地获得了一个包含约13000种构型的训练数据池,并用于随机生成相同大小的不同训练集。通过使用由不同随机选择的训练集训练的CE模型进行并行蒙特卡罗模拟,可以评估在不同训练集大小下计算得到的Tc的不确定性。研究发现,足以使CV分数收敛的训练集大小仍然会导致预测的Tc存在显著的不确定性,并且通过将训练集扩大到几千种构型,可以大大降低这种不确定性。这项工作强调了使用大训练集来构建能够实现稳健统计建模结果的最优CE模型的重要性,以及机器学习力场方法为高效生成足够训练数据所提供的便利。
参考文献(0)
被引文献(0)

数据更新时间:{{ references.updateTime }}

关联基金

针对分子磁性材料的第一性原理方法发展与理论研究
批准号:
21873005
批准年份:
2018
资助金额:
66.0
项目类别:
面上项目
Hong Jiang
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓