Scots pine (Pinus sylvestris) is the most widespread coniferous tree in the boreal forests of Eurasia and has major economic and ecological importance. However, its large and repetitive genome presents a challenge for conducting genome-wide analyses such as association studies and genomic selection. We present a new 50K SNP genotyping array for Scots pine research, breeding programs, and other applications. To select the SNP set, we first genotyped 480 Scots pine samples on a 407 540 SNP screening array, and identified 47 712 high-quality SNPs for the final array (called ‘PiSy50k’). Here, we provide details of the design and testing, as well as allele frequency estimates from the discovery panel, functional annotation, tissue-specific expression patterns, and expression level information for the SNPs or corresponding genes, when available. We validated the performance of the PiSy50k array using samples from breeding populations from Finland and Scotland. Overall, 39 678 (83.2%) SNPs showed low error rates (mean = 0.92%). Relatedness estimates based on array genotypes were consistent with the expected pedigrees, and the amount of Mendelian error was negligible. In addition, array genotypes successfully discriminate Scots pine populations from different geographic origins. The PiSy50k array will be a valuable tool for future genetic studies and forestry applications. Significance statement Scots pine is an evolutionary, economically and ecologically impressive coniferous species but its gigantic genome has limited studying e.g. the genetic basis of its functional trait variation. We have developed a genotyping array that facilitates Scots pine genetic research and linking its trait variation to genetic polymorphisms and gene expression levels across the genome.
欧洲赤松(Pinus sylvestris)是欧亚大陆北方森林中分布最广的针叶树,具有重要的经济和生态意义。然而,其庞大且重复的基因组给全基因组分析(如关联研究和基因组选择)带来了挑战。我们推出了一种新的用于欧洲赤松研究、育种计划及其他应用的50K单核苷酸多态性(SNP)基因分型阵列。为了选择SNP集合,我们首先在一个包含407540个SNP的筛选阵列上对480个欧洲赤松样本进行基因分型,并为最终的阵列(称为“PiSy50k”)确定了47712个高质量的SNP。在此,我们提供了设计和测试的详细信息,以及来自发现样本组的等位基因频率估计值、功能注释、组织特异性表达模式,以及在可获取的情况下SNP或相应基因的表达水平信息。我们使用来自芬兰和苏格兰育种群体的样本验证了PiSy50k阵列的性能。总体而言,39678个(83.2%)SNP显示出较低的错误率(平均值 = 0.92%)。基于阵列基因型的亲缘关系估计与预期的系谱一致,孟德尔误差量可忽略不计。此外,阵列基因型能够成功区分来自不同地理起源的欧洲赤松群体。PiSy50k阵列将成为未来遗传学研究和林业应用的有价值工具。重要声明:欧洲赤松是一种在进化、经济和生态方面令人瞩目的针叶树种,但其巨大的基因组限制了对其例如功能性状变异的遗传基础的研究。我们开发了一种基因分型阵列,它有助于欧洲赤松的遗传学研究,并将其性状变异与全基因组的遗传多态性和基因表达水平联系起来。