Background. Inferring an evolutionary scenario for a gene family is a fundamental problem with applications both in functional and evolutionary genomics. The gene tree/species tree reconciliation approach has been widely used to address this problem, but mostly in a discrete parsimony framework that aims at minimizing the number of gene duplications and/or gene losses. Recently, a probabilistic approach has been developed, based on the classical birth-and-death process, including efficient algorithms for computing posterior probabilities of reconciliations and orthology prediction. Results. In previous work, we described an algorithm for exploring the whole space of gene tree/species tree reconciliations, that we adapt here to compute efficiently the posterior probability of such reconciliations. These posterior probabilities can be either computed exactly or approximated, depending on the reconciliation space size. We use this algorithm to analyze the probabilistic landscape of the space of reconciliations for a real data set of fungal gene families and several data sets of synthetic gene trees. Conclusion. The results of our simulations suggest that, with exact gene trees obtained by a simple birth-and-death process and realistic gene duplication/loss rates, a very small subset of all reconciliations needs to be explored in order to approximate very closely the posterior probability of the most likely reconciliations. For cases where the posterior probability mass is more evenly dispersed, our method allows to explore efficiently the required subspace of reconciliations.
背景。推断一个基因家族的进化历程是功能基因组学和进化基因组学应用中的一个基本问题。基因树/物种树协调方法已被广泛用于解决这一问题,但大多是在一个离散简约框架内,旨在使基因重复和/或基因丢失的数量最小化。最近,一种基于经典的生死过程的概率方法已经被开发出来,包括用于计算协调的后验概率和直系同源预测的高效算法。
结果。在之前的工作中,我们描述了一种用于探索基因树/物种树协调的整个空间的算法,在此我们对其进行调整以高效计算这种协调的后验概率。这些后验概率可以精确计算或近似计算,这取决于协调空间的大小。我们使用这种算法来分析真菌基因家族的一个真实数据集以及几个合成基因树数据集的协调空间的概率图景。
结论。我们的模拟结果表明,对于通过简单的生死过程获得的精确基因树以及现实的基因重复/丢失率,为了非常接近地近似最可能的协调的后验概率,只需要探索所有协调中的一个非常小的子集。对于后验概率质量分布更均匀的情况,我们的方法能够有效地探索所需的协调子空间。