Single-cell RNA sequencing (scRNA-seq) reveals the transcriptome diversity in heterogeneous cell populations as it allows researchers to study gene expression at single-cell resolution. The latest advances in scRNA-seq technology have made it possible to profile tens of thousands of individual cells simultaneously. However, the technology also increases the number of missing values, i. e, dropouts, from technical constraints, such as amplification failure during the reverse transcription step. The resulting sparsity of scRNA-seq count data can be very high, with greater than 90% of data entries being zeros, which becomes an obstacle for clustering cell types. Current imputation methods are not robust in the case of high sparsity. In this study, we develop a Neural Network-based Imputation for scRNA-seq count data, NISC. It uses autoencoder, coupled with a weighted loss function and regularization, to correct the dropouts in scRNA-seq count data. A systematic evaluation shows that NISC is an effective imputation approach for handling sparse scRNA-seq count data, and its performance surpasses existing imputation methods in cell type identification.
单细胞RNA测序(scRNA - seq)揭示了异质细胞群体中的转录组多样性,因为它使研究人员能够在单细胞分辨率下研究基因表达。scRNA - seq技术的最新进展使得同时对数万个单个细胞进行分析成为可能。然而,该技术也因技术限制(例如逆转录步骤中的扩增失败)而增加了缺失值(即脱落值)的数量。由此产生的scRNA - seq计数数据的稀疏性可能非常高,超过90%的数据项为零,这成为细胞类型聚类的一个障碍。在高度稀疏的情况下,当前的填补方法并不稳健。在本研究中,我们开发了一种基于神经网络的scRNA - seq计数数据填补方法,即NISC。它使用自动编码器,并结合加权损失函数和正则化,来校正scRNA - seq计数数据中的脱落值。一项系统评估表明,NISC是一种处理稀疏scRNA - seq计数数据的有效填补方法,并且其在细胞类型识别方面的性能优于现有的填补方法。