Identifying collective variables (CVs) for chemical reactions is essential to reduce the 3N-dimensional energy landscape into lower dimensional basins and barriers of interest. However, in condensed phase processes, the nonmeaningful motions of bulk solvent often overpower the ability of dimensionality reduction methods to identify correlated motions that underpin collective variables. Yet solvent can play important indirect or direct roles in reactivity, and much can be lost through treatments that remove or dampen solvent motion. This has been amply demonstrated within principal component analysis (PCA), although less is known about the behavior of nonlinear dimensionality reduction methods, e.g., uniform manifold approximation and projection (UMAP), that have become recently utilized. The latter presents an interesting alternative to linear methods though often at the expense of interpretability. This work presents distance-attenuated projection methods of atomic coordinates that facilitate the application of both PCA and UMAP to identify collective variables in the presence of explicit solvent and further the specific identity of solvent molecules that participate in chemical reactions. The performance of both methods is examined in detail for two reactions where the explicit solvent plays very different roles within the collective variables. When applied to raw molecular dynamics data in solution, both PCA and UMAP representations are dominated by bulk solvent motions. On the other hand, when applied to data preprocessed by our attenuated projection methods, both PCA and UMAP identify the appropriate collective variables (though varying sensitivity is observed due to the presence of explicit solvent that results from the projection method). Importantly, this approach allows identification of specific solvent molecules that are relevant to the CVs and their importance.
确定化学反应的集体变量(CVs)对于将3N维的能量景观简化为低维的感兴趣的盆地和势垒至关重要。然而,在凝聚相过程中,大量溶剂的无意义运动常常使降维方法识别支撑集体变量的相关运动的能力失效。然而,溶剂在反应性中可以起到重要的间接或直接作用,并且通过去除或抑制溶剂运动的处理可能会损失很多信息。这在主成分分析(PCA)中已经得到了充分证明,尽管对于非线性降维方法(例如最近开始使用的均匀流形逼近与投影(UMAP))的行为了解较少。后者是线性方法的一种有趣替代方案,尽管往往以牺牲可解释性为代价。这项工作提出了原子坐标的距离衰减投影方法,该方法有助于应用PCA和UMAP在存在显式溶剂的情况下识别集体变量,并进一步确定参与化学反应的溶剂分子的具体身份。对于两个反应,详细研究了这两种方法的性能,在这两个反应中,显式溶剂在集体变量中起着非常不同的作用。当应用于溶液中的原始分子动力学数据时,PCA和UMAP表示都主要由大量溶剂运动所主导。另一方面,当应用于通过我们的衰减投影方法预处理的数据时,PCA和UMAP都能识别出适当的集体变量(尽管由于投影方法产生的显式溶剂的存在,观察到不同的敏感性)。重要的是,这种方法可以识别与CVs相关的特定溶剂分子及其重要性。