Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
基本信息
- 批准号:9555730
- 负责人:
- 金额:$ 31.91万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:Actinomyces InfectionsAmino Acid MotifsAmino Acid SequenceAnimalsAntitumor ResponseApoptosisArchaeal GenomeArchitectureBacteriaBacterial GenomeBiologicalCapsid ProteinsCensusesClassificationClustered Regularly Interspaced Short Palindromic RepeatsCollectionComplexCustomDNADeath DomainDevelopmentDiseaseDissectionEukaryotaEvolutionFamilyFamily memberGenerationsGenesGenomeGenome engineeringGenomicsGoalsHomology ModelingHumanIndividualInvestigationLeadLibrariesLifeMethodologyMethodsMobile Genetic ElementsNomenclatureOrganismPatternPeriodicityPhenotypePlanet EarthPositioning AttributeProcessProkaryotic CellsPropertyProtein AnalysisProtein FamilyProtein Structure InitiativeProteinsRNA BindingRecruitment ActivityRegulationResearchRouteSAM DomainSignal TransductionStructureSystemTertiary Protein StructureVariantViralViral GenomeVirionVirusWorkadaptive immunitydatabase structuredesignexhaustionexperiencegenetic elementgenome editingmarkov modelmicrobialmolecular sequence databasenovelnucleoside triphosphatasepolymerizationprotein profilingprotein structuresample fixationtooltrait
项目摘要
The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI forms the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively and increasingly applied. Furthermore, custom libraries of protein domain profiles as well as computational pipelines for novel domain identification have been developed and applied.
The research performed over the last year, has led to further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. In particular, we have performed a comprehensive analysis of the relationships among viral capsid proteins. Viruses are the most abundant biological entities on earth and show remarkable diversity of genome sequences, replication and expression strategies, and virion structures. Evolutionary genomics of viruses revealed many unexpected connections but the general scenario(s) for the evolution of the virosphere remains a matter of intense debate among proponents of the cellular regression, escaped genes, and primordial virus world hypotheses. A comprehensive sequence and structure analysis of major virion proteins indicates that they evolved on about 20 independent occasions, and in some of these cases likely ancestors are identifiable among the proteins of cellular organisms. Virus genomes typically consist of distinct structural and replication modules that recombine frequently and can have different evolutionary trajectories. The results of this analysis suggest that, although the replication modules of at least some classes of viruses might descend from primordial selfish genetic elements, bona fide viruses evolved on multiple, independent occasions throughout the course of evolution by the recruitment of diverse host proteins that became major virion components.
In another project, we performed a detailed analysis and classification of the protein domains that comprise the Class 2 CRISPR-Cas systems, the microbial defense machinery that has been recently exploited for development of a new generation of genome editing tools. Class 2 CRISPR-Cas systems are characterized by effector modules that consist of a single multidomain protein, such as Cas9 or Cpf1. We designed a computational pipeline for the discovery of novel class 2 variants and used it to identify six new CRISPR-Cas subtypes. The diverse properties of these new systems provide potential for the development of versatile tools for genome editing and regulation. We performed a comprehensive census of class 2 types and subtypes in complete and draft bacterial and archaeal genomes, outlined evolutionary scenarios for the independent origin of different class 2 CRISPR-Cas systems from mobile genetic elements, and proposed an amended classification and nomenclature of CRISPR-Cas.
In a separate development, we performed an exhaustive computational dissection of the domain architecture of the SAMD9 family proteins that are involved in antivirus and antitumor response in humans. We show that the SAMD9 protein family is represented in most animals and also, unexpectedly, in bacteria, in particular actinomycetes. From the N to C terminus, the core SAMD9 family architecture includes DNA/RNA-binding AlbA domain, a variant Sir2-like domain, a STAND-like P-loop NTPase, an array of TPR repeats and an OB-fold domain with predicted RNA-binding properties. Vertebrate SAMD9 family proteins contain the eponymous SAM domain capable of polymerization, whereas some family members from other animals instead contain homotypic adaptor domains of the DEATH superfamily, known as dedicated components of apoptosis networks. Such complex domain architecture is reminiscent of the STAND superfamily NTPases that are involved in various signaling processes, including programmed cell death, in both eukaryotes and prokaryotes. These findings suggest that SAMD9 is a hub of a novel, evolutionarily conserved defense network that remains to be characterized.
In a more theoretically oriented project, we performed a genomic census and evolutionary analysis of repeats arrays in diverse protein families. Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We showed that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Taken together, these studies expand the known repertoire of protein domains with defined functions and lead to the discovery of novel biologically important functional systems in diverse organisms some of which are expected to have practical implications, e.g. in genome engineering. The findings also contribute to the current understanding of the routes of protein evolution.
在过去十年中,基因组序列和蛋白质结构的快速积累与序列数据库搜索方法的重大进展相似。在NCBI开发的强大位置特异性迭代爆炸(PSI-BLAST)方法构成了我们在蛋白质基序分析上工作的基础。此外,隐藏的马尔可夫模型(HMM),在HHSearch方法中实施的蛋白质概况 - 重复比较,蛋白质结构比较方法,蛋白质结构的同源模型和基因组情境分析的同源性模型被广泛且越来越多地应用。此外,已经开发和应用了蛋白质结构域概况的自定义库以及用于新型域识别的计算管道。
在过去的一年中进行的这项研究导致了几类蛋白质和域的分类,进化和功能的进一步进步。特别是,我们已经对病毒式衣壳蛋白之间的关系进行了全面分析。病毒是地球上最丰富的生物学实体,并且显示出基因组序列,复制和表达策略以及病毒体结构的显着多样性。病毒的进化基因组学揭示了许多意想不到的联系,但是在细胞回归,ESC的基因和原始病毒世界假设的支持者中,病毒圈进化的一般情况仍然是激烈的争论。主要病毒蛋白的综合序列和结构分析表明它们在大约20个独立的场合中演变出来,在其中一些情况下,祖先可能在细胞生物的蛋白质中可以识别。病毒基因组通常由经常重组的不同结构和复制模块组成,并且可能具有不同的进化轨迹。该分析的结果表明,尽管至少某些类别的病毒的复制模块可能来自原始的自私遗传元素,但善意的病毒在整个过程中都在多种,独立的场合进化,这是由于募集多样化的宿主蛋白而成为主要病毒成分的多样化宿主蛋白。
在另一个项目中,我们对构成2类CRISPR-CAS系统的蛋白质域进行了详细的分析和分类,该蛋白质域是最近被利用的微生物防御机制,用于开发新一代的基因组编辑工具。 2类CRISPR-CAS系统的特征是由单个多域蛋白(例如CAS9或CPF1)组成的效应器模块。我们设计了一种用于发现2类新型变体的计算管道,并用它来识别六个新的CRISPR-CAS亚型。这些新系统的各种特性为开发用于基因组编辑和调节的多功能工具提供了潜力。我们在完整和草拟的细菌和古细菌基因组中对2类类型和亚型进行了全面的人口普查,概述了移动遗传元素不同2类CRISPR-CAS系统独立起源的进化场景,并提出了修订的CRISPR-CAS分类和术语。
在另一个开发中,我们对人类涉及的SAMD9家族蛋白的域结构进行了详尽的计算解剖。我们表明,SAMD9蛋白家族在大多数动物中都代表,并且出乎意料地在细菌中,尤其是放线菌。从N到C末端,Core SAMD9家族架构包括DNA/RNA结合ALBA结构域,一个变体SiR2样域,一种类似于固定的P循环NTPase,一系列TPR重复序列和具有预测RNA结合特性的OB折叠域。脊椎动物SAMD9家族蛋白包含能够聚合的同名SAM结构域,而其他动物的一些家庭成员则包含死亡超家族的同型适配域,称为细胞凋亡网络的专用组成部分。这种复杂的域结构让人联想到在真核生物和原核的各种信号传导过程(包括程序性细胞死亡,包括程序性细胞死亡)中涉及的超级家族NTPase。这些发现表明,SAMD9是一个尚待表征的新型,进化保守的防御网络的枢纽。
在一个更加理论上的项目中,我们对多种蛋白质家族的重复阵列进行了基因组普查和进化分析。蛋白质重复被认为是蛋白质进化的热点,与新功能的获取和新的表型性状有关,包括疾病。然而,自相矛盾的是,重复通常是通过长期进化而强烈保守的。为了解决这个难题,有必要直接比较蛋白质中重复序列的寄生虫(水平)演变与其直系同源(垂直)通过物种形成的进化。在这里,我们开发了一种严格的方法来识别具有显着序列相似性的高度周期性重复序列,可以估算进化速率和选择(DN/DS),并系统地表征其演变。我们表明,与紧密相关的物种中的直系同源物相比,重复序列的水平演变显着加速。这种观察在生命形式的多样性中是普遍的,并暗示了双相进化制度,在强烈放松的纯化选择和积极的选择的综合效果下,新副本经历了快速的功能差异,然后对每个单个重复进行固定和保护。
综上所述,这些研究扩大了具有定义功能的蛋白质领域的已知曲目,并导致在不同生物体中发现了新型生物学重要功能系统,其中一些有望具有实际含义,例如在基因组工程中。这些发现还有助于当前对蛋白质进化途径的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eugene V Koonin其他文献
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおける新規dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型去磷酸 CoA 激酶的鉴定和分析
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおけるアーキア特異的な新規 dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型古菌特异性去磷酸 CoA 激酶的鉴定和分析
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
Eugene V Koonin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eugene V Koonin', 18)}}的其他基金
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6988455 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6681337 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
7969213 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
8943217 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
9160910 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7594460 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7735068 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
6988458 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
相似国自然基金
三疣梭子蟹两种含不同关键氨基酸基序C型凝集素的糖识别机制研究
- 批准号:31702375
- 批准年份:2017
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
CRAC中心酪氨酸突变介导肝细胞胆小管侧膜BSEP膜胆固醇敏感性及转运功能减退的分子机制
- 批准号:81670580
- 批准年份:2016
- 资助金额:58.0 万元
- 项目类别:面上项目
相似海外基金
Understanding the origins and mechanisms of aryl hydrocarbon receptor promiscuity
了解芳烃受体混杂的起源和机制
- 批准号:
10679532 - 财政年份:2023
- 资助金额:
$ 31.91万 - 项目类别:
Membrane repair as a therapeutic intervention for treating Becker Muscular Dystrophy
膜修复作为治疗贝克尔肌营养不良症的治疗干预措施
- 批准号:
10761285 - 财政年份:2023
- 资助金额:
$ 31.91万 - 项目类别:
Deciphering the complex pharmacology of CB1: towards the understanding of a third signaling pathway
解读 CB1 的复杂药理学:了解第三条信号通路
- 批准号:
10667865 - 财政年份:2023
- 资助金额:
$ 31.91万 - 项目类别:
Leveraging evolutionary analyses and machine learning to discover multiscale molecular features associated with antibiotic resistance
利用进化分析和机器学习发现与抗生素耐药性相关的多尺度分子特征
- 批准号:
10658686 - 财政年份:2023
- 资助金额:
$ 31.91万 - 项目类别:
Comprehensive identification of E3 ubiquitin ligases that degrade heart, lung, and blood-relevant transcription factors
全面鉴定可降解心脏、肺和血液相关转录因子的 E3 泛素连接酶
- 批准号:
10677457 - 财政年份:2023
- 资助金额:
$ 31.91万 - 项目类别: