Libraries of microorganisms have served as a cornerstone of therapeutic drug discovery, though the continued re-isolation of known natural product chemical entities has remained a significant obstacle to discovery efforts. A major contributing factor to this redundancy is the duplication of bacterial taxa in a library, which can be mitigated through the use of a variety of DNA sequencing strategies and/or mass spectrometry-informed bioinformatics platforms so that the library is created with minimal phylogenetic, and thus minimal natural product overlap. IDBac is a MALDI-TOF mass spectrometry-based bioinformatics platform used to assess overlap within collections of environmental bacterial isolates. It allows environmental isolate redundancy to be reduced while considering both phylogeny and natural product production. However, manually selecting isolates for addition to a library during this process was time intensive and left to the researcher’s discretion. Here, we developed an algorithm that automates the prioritization of hundreds to thousands of environmental microorganisms in IDBac. The algorithm performs iterative reduction of natural product mass feature overlap within groups of isolates that share high homology of protein mass features. Employing this automation serves to minimize human bias and greatly increase efficiency in the microbial strain prioritization process.
微生物库一直是治疗性药物研发的基石,然而已知天然产物化学实体的不断重复分离仍然是研发工作的一个重大障碍。造成这种冗余的一个主要因素是库中细菌分类群的重复,这可以通过使用多种DNA测序策略和/或基于质谱的生物信息学平台来缓解,从而使构建的库在系统发育上具有最小的重复性,进而使天然产物的重叠最小。IDBac是一个基于基质辅助激光解吸电离飞行时间质谱(MALDI - TOF MS)的生物信息学平台,用于评估环境细菌分离株集合内的重叠情况。它在考虑系统发育和天然产物产生的同时,能够减少环境分离株的冗余。然而,在此过程中手动选择要添加到库中的分离株既耗时,又取决于研究人员的判断。在此,我们开发了一种算法,可在IDBac中自动对数百到数千种环境微生物进行优先级排序。该算法对具有高度蛋白质质量特征同源性的分离株组内的天然产物质量特征重叠进行迭代减少。采用这种自动化方法有助于减少人为偏差,并极大地提高微生物菌株优先级排序过程的效率。