The increasing amount of sequencing data available for a wide variety of species can be theoretically used for detecting copy number variations (CNVs) at the population level. However, the growing sample sizes and the divergent complexity of nonhuman genomes challenge the efficiency and robustness of current human-oriented CNV detection methods.
Here, we present CNVcaller, a read-depth method for discovering CNVs in population sequencing data. The computational speed of CNVcaller was 1–2 orders of magnitude faster than CNVnator and Genome STRiP for complex genomes with thousands of unmapped scaffolds. CNV detection of 232 goats required only 1.4 days on a single compute node. Additionally, the Mendelian consistency of sheep trios indicated that CNVcaller mitigated the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly. Furthermore, multiple evaluations using real sheep and human data indicated that CNVcaller achieved the best accuracy and sensitivity for detecting duplications.
The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detecting CNVs in large-scale sequencing data with complex genomic structures. Therefore, CNVcaller promotes population genetic analyses of functional CNVs in more species.
对于众多物种而言,可获取的测序数据量不断增加,从理论上讲,这些数据可用于在群体水平上检测拷贝数变异(CNV)。然而,样本量的不断增加以及非人类基因组的复杂多样性,对当前以人类为导向的CNV检测方法的效率和稳健性构成了挑战。
在此,我们介绍CNVcaller,这是一种用于在群体测序数据中发现CNV的读深方法。对于具有数千个未映射支架的复杂基因组,CNVcaller的计算速度比CNVnator和Genome STRiP快1 - 2个数量级。对232只山羊进行CNV检测在单个计算节点上仅需1.4天。此外,绵羊三联体的孟德尔一致性表明,CNVcaller减轻了非人类参考基因组组装中高比例的间隙和错误组装重复的影响。此外,利用真实绵羊和人类数据进行的多项评估表明,CNVcaller在检测重复方面达到了最佳的准确性和灵敏度。
CNVcaller中包含的快速通用检测算法克服了先前在具有复杂基因组结构的大规模测序数据中检测CNV的计算障碍。因此,CNVcaller促进了对更多物种中功能性CNV的群体遗传学分析。