Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering various spectral properties of the underlying matrix from a sampling of its entries. We propose a framework of first estimating the Schatten $k$-norms of a matrix for several values of $k$, and using these as surrogates for estimating spectral properties of interest, such as the spectrum itself or the rank. This paper focuses on the technical challenges in accurately estimating the Schatten norms from a sampling of a matrix. We introduce a novel unbiased estimator based on counting small structures in a graph and provide guarantees that match its empirical performances. Our theoretical analysis shows that Schatten norms can be recovered accurately from strictly smaller number of samples compared to what is needed to recover the underlying low-rank matrix. Numerical experiments suggest that we significantly improve upon a competing approach of using matrix completion methods.
矩阵形式数据的奇异值为数据结构、有效维度以及高级数据分析工具中超参数的选择提供了见解。然而,在许多实际应用中,比如协同过滤和网络分析,我们只能得到部分观测值。在这种情况下,我们考虑从矩阵元素的抽样中恢复基础矩阵的各种谱性质这一基本问题。我们提出一个框架,首先针对若干$k$值估计矩阵的Schatten $k$-范数,并将这些用作估计感兴趣的谱性质(比如谱本身或秩)的替代量。本文重点关注从矩阵抽样中精确估计Schatten范数的技术挑战。我们引入一种基于计算图中小结构的新型无偏估计量,并提供与其实验性能相符的保证。我们的理论分析表明,与恢复基础低秩矩阵所需的样本数量相比,从严格更少的样本数量中就可以精确恢复Schatten范数。数值实验表明,我们相较于使用矩阵补全方法的竞争方法有显著改进。