A correlation method was recently adopted to identify selection-favored ‘optimal’ codons from 675 bacterial genomes. Surprisingly, the identities of these optimal codons were found to track the bacterial GC content, leading to a conclusion that selection would generally shape the codon usages to the same direction as the overall mutation does. Raising several concerns, here we report a thorough comparative study on 203 well-selected bacterial species, which strongly suggest that the previous conclusion is likely an illusion. Firstly, the previous study did not preclude species that are suffering weak or no selection pressures on their codon usages. For these species, as showed in this study, the optimal codon identities are prone to be incorrect and follow GC content. Secondly, the previous study only adopted the correlation method, without considering another method to test the reliability of inferred optimal codons. Actually by definition, optimal codons can also be identified by simply comparing codon usages between high- and low-expression genes. After using both methods to identify optimal codons for the selected species, we obtained highly conflicting results, suggesting at least one method is misleading. Further we found a critical problem of correlation method at the step of calculating gene bias level. Due to a failure of accurately defining the background mutation, the problem would result in wrong optimal codon identities. In other words, partial mutational effects on codon choices were mistakenly regarded as selective influences, leading to incorrect and biased optimal codon identities. Finally, considering the translational dynamics, optimal codons identified by comparison method can be well-explained by tRNA compositions, whereas optimal codons identified by correlation method can not be. For all above reasons, we conclude that real optimal codons actually do not track the genomic GC content, and correlation method is misleading in identifying optimal codons and better be avoided.
最近采用了一种相关性方法从675个细菌基因组中识别受选择青睐的“最优”密码子。令人惊讶的是,发现这些最优密码子的特性与细菌的GC含量相关,从而得出一个结论:选择通常会使密码子使用情况朝着与整体突变相同的方向发展。在此我们提出几点担忧,并对203个精心挑选的细菌物种进行了全面的比较研究,该研究强烈表明之前的结论可能是一种错觉。首先,之前的研究没有排除那些在密码子使用上受到弱选择压力或没有选择压力的物种。对于这些物种,正如本研究所示,最优密码子的特性容易不正确且与GC含量相关。其次,之前的研究仅采用了相关性方法,没有考虑用另一种方法来检验推断出的最优密码子的可靠性。实际上,根据定义,也可以通过简单比较高表达基因和低表达基因之间的密码子使用情况来识别最优密码子。在对所选物种使用这两种方法识别最优密码子之后,我们得到了高度矛盾的结果,这表明至少有一种方法是有误导性的。此外,我们在计算基因偏好水平这一步骤中发现了相关性方法的一个关键问题。由于未能准确界定背景突变,这个问题会导致错误的最优密码子特性。换句话说,对密码子选择的部分突变影响被错误地视为选择影响,从而导致错误且有偏差的最优密码子特性。最后,考虑到翻译动力学,通过比较方法识别的最优密码子可以用tRNA组成很好地解释,而通过相关性方法识别的最优密码子则不能。基于上述所有原因,我们得出结论:真正的最优密码子实际上并不与基因组的GC含量相关,相关性方法在识别最优密码子方面具有误导性,最好避免使用。