Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
基本信息
- 批准号:10261214
- 负责人:
- 金额:$ 41.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:2019-nCoVAmino Acid MotifsAmino Acid SequenceAnimalsAntiviral AgentsArchaeaArchaeal GenomeArchitectureBacteriaBacterial GenomeBacteriophagesBindingBioinformaticsBiologicalBiotechnologyCOVID-19 pandemicCase Fatality RatesCellsCensusesCessation of lifeClassificationCleaved cellClustered Regularly Interspaced Short Palindromic RepeatsCollaborationsCollectionComplementComputing MethodologiesConflict (Psychology)CoronavirusCustomDNA Restriction EnzymesDNA biosynthesisDNA cassetteDefense MechanismsDevelopmentDiagnosticDisease OutbreaksDissectionEnzymesEvolutionFamilyGenesGenomeGenomicsGlycoproteinsGoalsHealthHomology ModelingHumanInstitutesInterventionInvestigationLaboratoriesLibrariesMachine LearningMediatingMethodsMicrobeMiddle East Respiratory Syndrome CoronavirusMobile Genetic ElementsModelingMolecular AnalysisNuclear Localization SignalNucleocapsid ProteinsOligonucleotidesParasitesPathogenicityPatternPeriodicityPolymerasePopulationPositioning AttributeProkaryotic CellsProtein AnalysisProtein FamilyProtein Structure InitiativeProteinsPublic HealthRNA EditingRaceResearchResourcesRibonucleasesSARS coronavirusSatellite DNASignal PathwaySignal TransductionSignal Transduction PathwayStructureSystemTechniquesTertiary Protein StructureTestingValidationViral ProteinsVirulenceVirusWorkZoonosesadaptive immunityarmbasecomparativecomparative genomicscomputational pipelinescross-species transmissiondatabase structuredeep learninggenome analysisgenome editinghuman pathogenmarkov modelmolecular arraymolecular sequence databasenovelnucleaseprognosticprotein profilingprotein structurereconstitutionresponsesensortooltransmission processvirus host interaction
项目摘要
The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI forms the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively and increasingly applied. Furthermore, custom libraries of protein domain profiles as well as computational pipelines for novel domain identification have been developed and applied. Lately, these methods for protein motif search are being complemented by deep learning computational methods.
During the year under review, we have continued and expanded our investigation of the proteins domains that are involved in virus-host interactions, and in particular, antivirus defense in prokaryotes. Bacteria and archaea are frequently attacked by viruses and other mobile genetic elements and rely on dedicated antiviral defense systems, such as restriction endonucleases and CRISPR, to survive. The enormous diversity of viruses suggests that more types of defense systems exist than are currently known. We developed computational methodology for systematic prediction of genes involved in antivirus defense and comprehensively characterized the domain architectures of the proteins encoded by these genes. As a result of these predictions followed by prediction and heterologous reconstitution, in collaboration with the laboratory of Dr. Feng Zhang (Broad Institute of MIT and Harvard), 29 widespread antiviral gene cassettes, collectively present in 32% of all sequenced bacterial and archaeal genomes, that mediate protection against specific bacteriophages. These systems incorporate enzymatic activities not previously implicated in antiviral defense, including RNA editing and retron satellite DNA synthesis. In addition, we computationally predict a diverse set of other putative defense genes that remain to be characterized. These results highlight an immense array of molecular functions that microbes use against viruses.
We further expanded our study of proteins domains involved in antivirus defense by performing a comprehensive computational census of the oligonucleotide-binding domains that are the key components of signal transduction pathways associated with CRISPR-Cas systems of adaptive immunity. CRISPR-associated Rossmann Fold (CARF) and SMODS-associated and fused to various effector domains (SAVED) are key components of cyclic oligonucleotide-based antiphage signaling systems (CBASS) that sense cyclic oligonucleotides and transmit the signal to an effector inducing cell dormancy or death. Most of the CARFs are components of a CBASS built into type III CRISPR-Cas systems, where the CARF domain binds cyclic oligoA (cOA) synthesized by Cas10 polymerase-cyclase and allosterically activates the effector, typically a promiscuous ribonuclease. Additionally, this signaling pathway includes a ring nuclease, often also a CARF domain (either the sensor itself or a specialized enzyme) that cleaves cOA and mitigates dormancy or death induction. We present a comprehensive census of CARF and SAVED domains in bacteria and archaea, and their sequence- and structure-based classification. There are 10 major families of CARF domains and multiple smaller groups that differ in structural features, association with distinct effectors, and presence or absence of the ring nuclease activity. By comparative genome analysis, we predicted specific functions of CARF and SAVED domains and partition the CARF domains into those with both sensor and ring nuclease functions, and sensor-only ones. Several families of ring nucleases functionally associated with sensor-only CARF domains were also predicted.
In the incessant host-parasite arms race, viruses evolved multiple anti-defense mechanisms including diverse anti-CRISPR proteins (Acrs) that specifically inhibit CRISPR-Cas and therefore have enormous potential for application as modulators of genome editing tools. Most Acrs are small and highly variable proteins which makes their bioinformatic prediction a formidable task. We developed a machine-learning approach for comprehensive Acr prediction. The model shows high predictive power when tested against an unseen test set and was employed to predict 2,500 candidate Acr families. Experimental validation of top candidates revealed two unknown Acrs (AcrIC9, IC10) and three other top candidates were coincidentally identified and found to possess anti-CRISPR activity. These results substantially expand the repertoire of predicted Acrs and provide a resource for experimental Acr discovery.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses an immediate, major threat to public health across the globe. Given this new major public health problem, we launched a new research direction aimed at the identification of motifs in coronavirus proteins that are likely to be associated with increased virulence in humans and zoonotic transmission into the human population. We performed an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identified key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions.
The research on protein domains and motifs performed during the last year has led to further progress in the study of the classification, evolution, and functions of several classes of proteins and domains, particularly, those involved in host-parasite interactions and other forms of biological conflicts. These findings have potential implications for human health and for developments in biotechnology. Additionally, protein motifs that are likely to contribute to coronavirus pathogenicity have been identified.
在过去十年中,基因组序列和蛋白质结构的快速积累与序列数据库搜索方法的重大进步并行。 NCBI 开发的强大的位置特异性迭代 BLAST (PSI-BLAST) 方法构成了我们蛋白质基序分析工作的基础。此外,隐马尔可夫模型(HMM)、HHSearch方法中实现的蛋白质图谱比较、蛋白质结构比较方法、蛋白质结构同源性建模和基因组背景分析也被广泛且越来越多地应用。此外,还开发并应用了蛋白质结构域概况的定制库以及用于新结构域识别的计算管道。最近,这些蛋白质基序搜索方法得到了深度学习计算方法的补充。
在回顾的一年中,我们继续并扩大了对参与病毒与宿主相互作用的蛋白质结构域的研究,特别是原核生物的抗病毒防御。细菌和古细菌经常受到病毒和其他可移动遗传元件的攻击,并依靠专用的抗病毒防御系统(例如限制性内切酶和 CRISPR)来生存。病毒的巨大多样性表明,存在的防御系统类型比目前已知的还要多。我们开发了系统预测参与抗病毒防御的基因的计算方法,并全面表征了这些基因编码的蛋白质的结构域结构。通过这些预测和异源重建,与张峰博士(麻省理工学院和哈佛大学博德研究所)的实验室合作,得出了 29 个广泛存在的抗病毒基因盒,它们共同存在于所有已测序细菌和古细菌基因组的 32% 中,介导针对特定噬菌体的保护。这些系统结合了以前未涉及抗病毒防御的酶活性,包括 RNA 编辑和逆转录子卫星 DNA 合成。此外,我们通过计算预测了一组不同的其他假定的防御基因,这些基因仍有待表征。这些结果凸显了微生物用来对抗病毒的大量分子功能。
我们通过对寡核苷酸结合域进行全面的计算普查,进一步扩大了对参与抗病毒防御的蛋白质域的研究,寡核苷酸结合域是与适应性免疫的 CRISPR-Cas 系统相关的信号转导途径的关键组成部分。 CRISPR 相关的罗斯曼折叠 (CRF) 和 SMODS 相关并融合到各种效应结构域 (SAVED) 是基于环状寡核苷酸的抗噬菌体信号系统 (CBASS) 的关键组件,该系统可感知环状寡核苷酸并将信号传递至效应子,诱导细胞休眠或死亡。大多数 CARF 是内置于 III 型 CRISPR-Cas 系统中的 CBASS 的组件,其中 CARF 结构域结合由 Cas10 聚合酶环化酶合成的环状寡A (cOA),并以变构方式激活效应子,通常是混杂的核糖核酸酶。此外,该信号传导途径包括环状核酸酶,通常还有 CARF 结构域(传感器本身或特殊酶),可切割 coA 并减轻休眠或死亡诱导。我们对细菌和古细菌中的 CARF 和 SAVED 结构域及其基于序列和结构的分类进行了全面普查。 CARF 结构域有 10 个主要家族和多个较小的家族,它们在结构特征、与不同效应子的关联以及是否存在环核酸酶活性方面有所不同。通过比较基因组分析,我们预测了 CARF 和 SAVED 结构域的具体功能,并将 CARF 结构域分为具有传感器和环核酸酶功能的结构域和仅具有传感器功能的结构域。还预测了与仅传感器 CARF 结构域功能相关的几个环状核酸酶家族。
在持续不断的宿主-寄生虫军备竞赛中,病毒进化出了多种反防御机制,包括特异性抑制 CRISPR-Cas 的多种抗 CRISPR 蛋白 (Acrs),因此具有作为基因组编辑工具调节剂的巨大潜力。大多数 Acrs 都是小且高度可变的蛋白质,这使得它们的生物信息预测成为一项艰巨的任务。我们开发了一种用于综合 Acr 预测的机器学习方法。该模型在未见过的测试集上进行测试时显示出很高的预测能力,并用于预测 2,500 个候选 Acr 家族。对顶级候选物的实验验证显示,两个未知的 Acrs(AcrIC9、IC10)和其他三个顶级候选物同时被鉴定并被发现具有抗 CRISPR 活性。这些结果大大扩展了预测的 Acrs 的范围,并为实验性 Acr 发现提供了资源。
严重急性呼吸综合征冠状病毒 2 (SARS-CoV-2) 对全球公共卫生构成直接的重大威胁。鉴于这一新的重大公共卫生问题,我们启动了一个新的研究方向,旨在识别冠状病毒蛋白中的基序,这些基序可能与人类毒力增加和人畜共患传播到人群有关。我们进行了深入的分子分析,以重建 SARS-CoV-2 和其他严重人类病原体冠状病毒致病性增强的进化起源。利用综合比较基因组学和机器学习技术,我们确定了关键的基因组特征,这些特征将 SARS-CoV-2 以及之前两次致命冠状病毒爆发背后的病毒(SARS-CoV 和中东呼吸综合征冠状病毒 (MERS-CoV))与致病性较低的病毒区分开来。冠状病毒。这些特征包括核衣壳蛋白中核定位信号的增强和刺突糖蛋白中的独特插入,这似乎与这些冠状病毒的高病死率以及宿主从动物到人类的转换有关。已确定的特征可能是冠状病毒致病性的关键因素,也是诊断、预测和干预的可能目标。
去年对蛋白质结构域和基序的研究导致了几类蛋白质和结构域的分类、进化和功能研究的进一步进展,特别是那些涉及宿主-寄生虫相互作用和其他形式的生物结构的蛋白质和结构域。冲突。 这些发现对人类健康和生物技术的发展具有潜在影响。此外,已经确定了可能有助于冠状病毒致病性的蛋白质基序。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eugene V Koonin其他文献
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおける新規dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型去磷酸 CoA 激酶的鉴定和分析
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
超好熱性アーキアThermococcus kodakarensisにおけるアーキア特異的な新規 dephospho-CoA kinaseの同定および解析
超嗜热古菌 Thermococcus kodakarensis 中新型古菌特异性去磷酸 CoA 激酶的鉴定和分析
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
Eugene V Koonin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eugene V Koonin', 18)}}的其他基金
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6681337 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6988455 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
7969213 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
8943217 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
9160910 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7735068 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7594460 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
9555730 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
6988458 - 财政年份:
- 资助金额:
$ 41.31万 - 项目类别:
相似国自然基金
三疣梭子蟹两种含不同关键氨基酸基序C型凝集素的糖识别机制研究
- 批准号:31702375
- 批准年份:2017
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
CRAC中心酪氨酸突变介导肝细胞胆小管侧膜BSEP膜胆固醇敏感性及转运功能减退的分子机制
- 批准号:81670580
- 批准年份:2016
- 资助金额:58.0 万元
- 项目类别:面上项目
相似海外基金
Understanding the origins and mechanisms of aryl hydrocarbon receptor promiscuity
了解芳烃受体混杂的起源和机制
- 批准号:
10679532 - 财政年份:2023
- 资助金额:
$ 41.31万 - 项目类别:
Membrane repair as a therapeutic intervention for treating Becker Muscular Dystrophy
膜修复作为治疗贝克尔肌营养不良症的治疗干预措施
- 批准号:
10761285 - 财政年份:2023
- 资助金额:
$ 41.31万 - 项目类别:
Deciphering the complex pharmacology of CB1: towards the understanding of a third signaling pathway
解读 CB1 的复杂药理学:了解第三条信号通路
- 批准号:
10667865 - 财政年份:2023
- 资助金额:
$ 41.31万 - 项目类别:
Leveraging evolutionary analyses and machine learning to discover multiscale molecular features associated with antibiotic resistance
利用进化分析和机器学习发现与抗生素耐药性相关的多尺度分子特征
- 批准号:
10658686 - 财政年份:2023
- 资助金额:
$ 41.31万 - 项目类别:
Comprehensive identification of E3 ubiquitin ligases that degrade heart, lung, and blood-relevant transcription factors
全面鉴定可降解心脏、肺和血液相关转录因子的 E3 泛素连接酶
- 批准号:
10677457 - 财政年份:2023
- 资助金额:
$ 41.31万 - 项目类别: