Finding Protein Sequence Motifs--methods And Applications

寻找蛋白质序列基序——方法和应用

基本信息

  • 批准号:
    8943217
  • 负责人:
  • 金额:
    $ 30.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively applied. Over the last year, we made further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we analyzed the evolution and functions of protein domains that are involved in virus-host interactions, from both the host and the virus sides. The CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and "effector" domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death. Polintons (also known as Mavericks) and Tlr elements of Tetrahymena thermophila represent two families of large DNA transposons widespread in eukaryotes. We performed a detailed analysis of protein sequences encoded by these transposable elements and showed that both Polintons and Tlr elements encode two key virion proteins, the major capsid protein with the double jelly-roll fold and the minor capsid protein, known as the penton, with the single jelly-roll topology. This observation along with the previously noted conservation of the genes for viral genome packaging ATPase and adenovirus-like protease strongly suggests that Polintons and Tlr elements combine features of bona fide viruses and transposons. We proposed the name 'Polintoviruses' to denote these putative viruses that could have played a central role in the evolution of several groups of DNA viruses of eukaryotes. These ongoing studies reveal new aspects on the remarkably diverse repertoire of protein domains involved in virus-host interactions. As part of our ongoing investigation of the evolution of protein domain architectures, we analyzed the contributions of alternative splicing (AS),and alternative transcription initiation (ATI) and alternative transcription termination (ATT) to the evolution of mammalian proteins. Together, AS, ATI and ATT create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. These studies enhance the existing understanding of the evolutionary plasticity of protein domain architecture.
在过去十年中,基因组序列和蛋白质结构的快速积累与序列数据库搜索方法的重大进展相似。在NCBI开发的强大位置特异性迭代BLAST(PSI-BLAST)方法构成了我们在蛋白质基序分析方面的基础。此外,在HHSearch方法中实施的隐藏马尔可夫模型(HMM),蛋白质结构比较方法,蛋白质结构的同源性模型和基因组情境分析。 在过去的一年中,我们在研究几类蛋白质和域的分类,进化和功能方面取得了进一步的进步。具体而言,我们分析了来自宿主和病毒侧的病毒宿主相互作用涉及的蛋白质结构域的演变和功能。细菌和古细菌的CRISPR-CAS自适应免疫系统将病毒或质粒DNA的片段插入隔离序列,作为隔离序列中的CRISPR重复基因座。包含这些间隔者的处理后转录本指导同源外源DNA或RNA的裂解。除了公认的CAS基因外,大多数CRISPR-CAS基因座还包括与间隔者获取,CRISPR转录物处理或干扰无关的基因。在这里,我们全面地分析了最广泛的基因组之一的序列,结构和基因组邻域,该基因中最广泛的群体之一编码含有带有Rossmann类似折叠的预测核苷酸结合结构域的蛋白质,我们表示CARF(CRIS PRPR-PRPPR相关的Rossmann折叠)。已经确定了几种CARF蛋白结构,但缺乏这些蛋白质的功能表征。 CARF结构域最常与C末端翼螺旋螺旋 - 旋转螺旋DNA结合结构域和“效应子”结构域结合使用,其中大多数被预测具有DNase或RNase活性。 RTCR蛋白,RTC RNA修复操纵子的Sigma-54依赖调节剂也发现了不同的CARF结构域。 CARF基因经常与编码含有WYL结构域的蛋白质的蛋白质相结合,该蛋白质与SM样SH3βBarrel褶皱相结合,该蛋白也可以预测结合配体。预计CRISPR-CAS和可能的其他防御系统会受到含有WYL和CARF结构域的多种配体结合蛋白的转录调节,这些蛋白具有修饰的核苷酸和核苷酸衍生物在病毒感染过程中产生的。我们假设CARF结构域还将信号从结合的配体传递到融合效应子结构域,这些效应子域分别攻击外星或自核酸,从而使免疫力补充了CRISPR-CAS作用或处于不良状态/程序性细胞死亡中。 polintons(也称为Mavericks)和Tetrahymena Thermophila的TLR元素代表两个大型DNA转座的家族,在真核生物中广泛分布。我们对这些可转座元件编码的蛋白质序列进行了详细的分析,并表明Polintons和TLR元素都编码了两个关键的病毒素蛋白,即带有双果冻折叠的主要capsid蛋白,并用双胶囊蛋白和次要的衣壳蛋白(称为Penton),称为Penton,具有单胶囊单胶流拓扑。该观察结果以及先前注意到的基因对病毒基因组包装ATPase和类似腺病毒的蛋白酶的保存强烈表明,Polintons和TLR元素结合了真正的特征和跨盆地的特征。我们提出了“ polintovires”这个名字来表示这些假定的病毒,这些病毒可能在多个真核生物的几组DNA病毒的进化中起着核心作用。 这些正在进行的研究揭示了有关病毒宿主相互作用涉及的蛋白质领域巨大多样化曲目的新方面。 作为我们对蛋白质结构架构演变的持续研究的一部分,我们分析了替代剪接(AS),替代转录起始(ATI)和替代转录终止(ATT)对哺乳动物蛋白的进化的贡献。同时,ATI和ATT创造了转录组的特殊复杂性,并对哺乳动物蛋白质组的结构和功能多样性做出了关键贡献。对哺乳动物基因组和转录组数据的分析表明,与传统观点相反,ATI和ATT对转录组和蛋白质组多样性的共同贡献在定量上大于AS的贡献。尽管基因基因座中蛋白质编码组成型和替代核苷酸的平均数几乎相同,但它们沿着转录本的分布非常均匀。平均而言,由ATI和ATT创建的变量5'和3'转录本中的编码外显子包含的替代核苷酸的替代核苷酸比仅通过AS多样化的核心蛋白质编码区域的替代核苷酸高约四倍。涵盖替代性5'-非翻译区和蛋白质的N末端的短上游外显子在强核苷酸水平的选择下进化,而在编码蛋白质C-末端的3'-末端外显子中,蛋白质水平的选择明显更强。受ATI和ATT的基因组显示出生物学作用,表达和选择模式的主要差异。 这些研究增强了对蛋白质结构结构进化可塑性的现有理解。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eugene V Koonin其他文献

Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおける新規dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型去磷酸 CoA 激酶的鉴定和分析
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおけるアーキア特異的な新規 dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型古菌特异性去磷酸 CoA 激酶的鉴定和分析
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi

Eugene V Koonin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eugene V Koonin', 18)}}的其他基金

Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6988455
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6681337
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7969213
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9160910
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    9555730
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7594460
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7735068
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6111075
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    6988458
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7316251
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:

相似海外基金

Virulence gene regulation in Staphylococcus aureus
金黄色葡萄球菌毒力基因调控
  • 批准号:
    8912102
  • 财政年份:
    2015
  • 资助金额:
    $ 30.99万
  • 项目类别:
Molecular mechanism of 4E-binding proteins on heart failure development
4E结合蛋白对心力衰竭发展的分子机制
  • 批准号:
    8461159
  • 财政年份:
    2011
  • 资助金额:
    $ 30.99万
  • 项目类别:
Molecular mechanism of 4E-binding proteins on heart failure development
4E结合蛋白对心力衰竭发展的分子机制
  • 批准号:
    8666798
  • 财政年份:
    2011
  • 资助金额:
    $ 30.99万
  • 项目类别:
Molecular mechanism of 4E-binding proteins on heart failure development
4E结合蛋白对心力衰竭发展的分子机制
  • 批准号:
    8183136
  • 财政年份:
    2011
  • 资助金额:
    $ 30.99万
  • 项目类别:
Role of Cu Transporter Proteins in Atherosclerosis
铜转运蛋白在动脉粥样硬化中的作用
  • 批准号:
    9590248
  • 财政年份:
    2011
  • 资助金额:
    $ 30.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了