To complete the AUROC evaluation discussed in the main paper, we find the corresponding results for the OOD detection setting with CIFAR100 as the in-distribution dataset in Table 1 and Table 2. Table 1 provides the OOD detection results for the statistics stream but does not include the Energy-based model [7] due to no published results existing for the CIFAR100 setting. The results from Table 1 reinforce the findings from the main submission, with the gap between the top performers (HDFF and Gram) growing significantly between the other methods, averaging out to ≈10% AUROC difference. Comparative to these shifts in scores, the difference between HDFF and Gram remains small with HDFF taking significantly less computational time to attain its respective results. Table 2 displays the additional OOD detection results for the training stream but does not contain NMD [1] due to the absence of published results in the CIFAR100 setting. Similarly to the results from the main submission, we see that HDFF in combination with other state-of-theart OOD detectors increases performance across the majority of benchmarks. Specifically, we see that HDFF-1DS outperforms the Spectral Discrepancy Detector [12] in two of the four comparative benchmarks despite HDFF requiring 50x less computation to achieve these results. We note that in this CIFAR100 setting, Table 2 shows that HDFFMLP is weaker at the SVHN and CIFAR10 OOD datasets. Statistics Stream CIFAR100 OOD HDFF HDFF-Ens Gram MSP ML Dataset (Ours) (Ours) [11] [3] [2] iSun 95.2 95.8 98.8 82.5 85.5 TINc 93.1 93.8 98.2 83.5 86.3 TINr 95.4 96.0 98.5 81.6 84.3 LSUNc 91.7 92.5 96.0 83.9 86.5 LSUNr 94.5 95.3 99.3 82.7 85.5 SVHN 99.2 99.4 99.0 86.7 90.0 MNIST 99.8 99.8 99.9 82.4 84.6 KMNIST 99.5 99.6 99.99 86.6 87.5 FMNIST 98.4 98.4 99.4 91.0 93.3 DTD 92.9 93.5 97.5 78.1 79.7 CIFAR10 65.7 68.2 74.2 80.9 81.5 Average 93.2 93.8 96.4 83.6 85.9
为了完成主要论文中讨论的AUROC评估,我们找到了用CIFAR100的OOD检测设置的相应结果,因为表1和表2中的分发数据集。表1提供了统计流的OOD检测结果,但由于无法在CIFAR100的范围内列出coifar100的结果,因此不包括基于能量的模型[7]。表演者(HDFF和克)在其他方法之间显着生长,将≈10%的AUROC差异与这些分数相比,HDFF和Gram之间的差异仍然很小,而HDFF的计算时间显着较少,以获得其各自的训练结果,但与NMD相比,该结果均未显示出来。与主要提交的结果相似,我们看到HDFF与其他最先进的OOD检测器相结合,在大多数基准中增加了性能。 HDFFMLP在SVHN和CIFAR10 OOD数据集上较弱。 84.3 LSUNC 91.7 92.5 96.0 83.9 86.5 LSUNR 94.5 95.3 99.3 82.7 85.5 SVHN 99.2 99.4 99.0 86.7 90.0 MNIST 99.8 99.8 99.9 82.4 84.6 KMNIST 99.5 99.5 99.6 99.6 99.6 86.6 87.5 2 74.2 80.9 81.5平均93.2 93.8 96.4 83.6 85.9