It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by similar to 1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization (http://animal.nwsuaf.edu.cn/panGoat).
人们普遍预期下一代测序最终将生成一个完整的基因组,就像最新的山羊参考基因组(ARS1)那样,它被认为是家畜中最连续的组装基因组之一。然而,世界范围内山羊品种的丰富多样性表明,来自一个个体的基因组不足以代表山羊的整个基因组内容。通过将来自家山羊7个近缘物种的9个从头组装基因组与ARS1进行比较,并使用山羊的重测序和转录组数据进行验证,我们总共鉴定出了ARS1中缺失的38.3 Mb序列。泛基因组序列包含有相当表达量的基因部分。将泛基因组(ARS1连同泛基因组序列)用作参考基因组,变异检测效率可以显著提高。由于更好的读段比对质量,每个个体总共减少了56657个假阳性单核苷酸多态性(SNP),并且平均每个个体恢复了24414个新的SNP。转录组比对率也提高了约1.15%。我们的研究表明,比较近缘物种的从头组装基因组是一种寻找参考基因组中缺失序列的有效且可靠的策略,并且可能适用于其他物种。泛基因组可以作为动物中一种改进的参考基因组,以便更好地探索潜在的基因组变异,并可以提高通过一个包含个体间更多差异的综合变异数据库来评估基因型 - 表型关联的概率。我们已经构建了一个山羊泛基因组网络界面用于数据可视化(http://animal.nwsuaf.edu.cn/panGoat)