Abstract Abstract Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, in most applications using this method, the user must choose the number of classes into which the data are to be sorted in advance. Typically, a combination of statistical methods and expertise is used to choose the appropriate number of classes for a given study; however, it may not be possible to identify a single “optimal” number of classes. In this work, we present a heuristic method, the ensemble difference criterion, for unambiguously determining the maximum number of classes supported by model data ensembles. This method requires robustness in the class definition between simulated ensembles of the system of interest. For demonstration, we apply this to the clustering of Southern Ocean potential temperatures in a CMIP6 climate model, and show that the data supports between four and seven classes of a Gaussian mixture model.
摘要 无监督分类正成为一种日益常见的方法,用于客观地识别观测和模拟的气候数据中的连贯结构。然而,在大多数使用这种方法的应用中,用户必须事先选择数据要分类的类别数量。通常,结合统计方法和专业知识来为给定的研究选择合适的类别数量;然而,可能无法确定单一的“最优”类别数量。在这项工作中,我们提出一种启发式方法,即集合差异准则,用于明确确定模型数据集合所支持的最大类别数量。这种方法要求在感兴趣系统的模拟集合之间类别定义具有稳健性。为了演示,我们将其应用于一个CMIP6气候模型中南极海洋位温的聚类,并表明数据支持高斯混合模型的4到7个类别。