The dream of machine learning in materials science is for a model to learn the underlying physics of an atomic system, allowing it to move beyond the interpolation of the training set to the prediction of properties that were not present in the original training data. In addition to advances in machine learning architectures and training techniques, achieving this ambitious goal requires a method to convert a 3D atomic system into a feature representation that preserves rotational and translational symmetries, smoothness under small perturbations, and invariance under re-ordering. The atomic orbital wavelet scattering transform preserves these symmetries by construction and has achieved great success as a featurization method for machine learning energy prediction. Both in small molecules and in the bulk amorphous LiαSi system, machine learning models using wavelet scattering coefficients as features have demonstrated a comparable accuracy to density functional theory at a small fraction of the computational cost. In this work, we test the generalizability of our LiαSi energy predictor to properties that were not included in the training set, such as elastic constants and migration barriers. We demonstrate that statistical feature selection methods can reduce over-fitting and lead to remarkable accuracy in these extrapolation tasks.
材料科学中机器学习的梦想是让一个模型学习原子系统的基本物理原理,使其能够超越对训练集的插值,预测原始训练数据中未出现的性质。除了机器学习架构和训练技术的进步之外,要实现这一宏伟目标,需要一种将三维原子系统转换为一种特征表示的方法,该特征表示要保留旋转和平移对称性、在小扰动下的平滑性以及在重新排序下的不变性。原子轨道小波散射变换通过构造保留了这些对称性,并作为机器学习能量预测的一种特征化方法取得了巨大成功。在小分子以及块状非晶态LiαSi系统中,使用小波散射系数作为特征的机器学习模型在计算成本仅为很小一部分的情况下,已展示出与密度泛函理论相当的准确性。在这项工作中,我们测试了我们的LiαSi能量预测器对训练集中未包含的性质(如弹性常数和迁移势垒)的泛化能力。我们证明了统计特征选择方法可以减少过拟合,并在这些外推任务中导致显著的准确性。