ECOD: Large scale classification of predicted and experimental protein structures
ECOD:预测和实验蛋白质结构的大规模分类
基本信息
- 批准号:10659763
- 负责人:
- 金额:$ 34.44万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-05-01 至 2027-04-30
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAccelerationAdoptedAmino Acid SequenceArtificial IntelligenceBase SequenceBenchmarkingBiochemicalBiologicalBiological AssayCatalogingClassificationCodeCollaborationsCommunitiesComputing MethodologiesDataData SetData SourcesDatabasesDedicationsDepositionDevelopmentDiseaseElectron MicroscopyEuropeanEvolutionFamilyFutureGeneticGrowthHomologous GeneHumanInfrastructureLaboratoriesLeadManualsMembraneMethodsModelingMolecular BiologyNMR SpectroscopyPathogenicityPathogenicity IslandPeptide Signal SequencesPeriodicalsPropertyProtein FamilyProteinsScienceSequence HomologySiteStructureSystemTechniquesTertiary Protein StructureTestingTimeUniversitiesUpdateVibrioVirulence FactorsWorkWorkloadX-Ray Crystallographycandidate identificationcaspase 14comparative genomicscomputational pipelinesdata qualitydatabase schemadeep learningexperimental studyhuman diseasehuman pathogenimprovedlearning networkmodel organismnovelpathogenpathogenic bacteriaprotein data bankprotein functionprotein structureprotein structure predictionstructural biologysuccessthree dimensional structuretoolweb interfaceweb pageweb site
项目摘要
Project Summary
Classification of protein domains have historically served to contextualize the 3D structural data collectively generated by
experimental structure determination methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy,
and electron microscopy. Our database, Evolutionary Classification of protein Domains (ECOD), has served the biological
community for seven years cataloguing evolutionary relationships between domains from experimental structures. The
recent advent of high-accuracy structure prediction methods, such as AlphaFold (AF) and RoseTTAFold (RF), and the
consequent release of 1 million predicted structures in AlphaFold Database (AFDB) heralds a paradigm shift in structural
biology and domain classification. The rate of structure deposition is expected to jump between a hundred to a thousand-
fold. We propose to take advantage of this revolution and transform ECOD into a comprehensive classification of the
entire protein university using sequence, structure, and functional evidence. By simultaneously classifying experimental
and predicted structures of proteins from model organisms and human pathogens, our classification will help the scientific
community to critically evaluate structure models and utilize the evolutionary information to discover and experimentally
characterize protein function.
Classifying AF models challenges the ECOD pipeline by a 50-fold increase in the workload and by the significant fraction of
non-globular and low-quality regions in the models. Thus, our first Aim is to upgrade ECOD’s infrastructure and develop
methods to identify single domains from AF models and to integrate sequence, structure, and functional site similarities
into our automatic classification. Compared to the current ECOD workflow that relies on human experts for structure-and-
function-based classification, these improvements will drastically decrease the need for manual curation and will allow us
to achieve our second Aim, i.e., classifying domains of over 1 million released AF models into ECOD via a combination of
computational pipelines and minimal manual efforts (0.25% 1% cases). Utilizing the deluge of AF models, the new
automatic pipeline, and expertise of human curators, we expect both to significantly improve ECOD and to evaluate the
quality of AF models by (1) covering all known protein families in Pfam, (2) confirming remote homology via evolutionary
intermediates, (3) comparing evolutionarily related experimental and predicted structures, and (4) resolving errors and
inconsistency through periodic quality checks. Finally, we will take the lead in making functional discoveries for
biomedically important proteins classified by ECOD in our third Aim, studying virulence factors (VFs) in bacterial pathogens
modelled by AFDB or studied by our experimental collaborators, the Orth lab. Fast evolving VFs were a challenge for
structure prediction or functional inference by sequence. We will identify candidate VFs in two dozen bacterial pathogens,
obtain their structure models, and infer their function using similarities to known proteins in structure and functional sites.
Promising hypotheses will be tested experimentally in the Orth lab through biochemical and genetic assays.
项目概要
蛋白质结构域的分类历来用于将 3D 结构数据集中起来
实验结构测定方法,如X射线晶体学、核磁共振波谱、
我们的数据库,蛋白质结构域的进化分类(ECOD),已服务于生物学。
社区七年来从实验结构中对领域之间的进化关系进行了分类。
最近出现的高精度结构预测方法,例如 AlphaFold (AF) 和 RoseTTAFold (RF),以及
随后在 AlphaFold 数据库 (AFDB) 中发布了 100 万个预测结构,预示着结构领域的范式转变
生物学和领域分类的结构沉积率预计将在一百到一千之间跳跃。
我们建议利用这场革命,将 ECOD 转变为综合分类。
通过同时对实验进行分类,使用序列、结构和功能证据来分析整个蛋白质大学。
并预测来自模式生物和人类病原体的蛋白质结构,我们的分类将有助于科学
社区批判性地评估结构模型并利用进化信息来发现和实验
表征蛋白质功能。
对 AF 模型进行分类对 ECOD 流程提出了挑战,工作量增加了 50 倍,并且大部分
因此,我们的首要目标是升级 ECOD 的基础设施并进行开发。
从 AF 模型中识别单个域并整合序列、结构和功能位点相似性的方法
与当前依赖人类专家进行结构和-的ECOD工作流程相比。
基于功能的分类,这些改进将大大减少手动管理的需要,并使我们能够
实现我们的第二个目标,即通过组合将超过 100 万个已发布的 AF 模型的域分类为 ECOD
利用大量的 AF 模型,新的计算流程和最少的手动工作(0.25% - 1% 的情况)。
自动管道和人类策展人的专业知识,我们期望能够显着改善 ECOD 并评估
通过 (1) 涵盖 Pfam 中所有已知的蛋白质家族,(2) 通过进化确认远程同源性来确保 AF 模型的质量
中间体,(3)比较进化相关的实验结构和预测结构,以及(4)解决错误和
最后,我们将通过定期质量检查来率先进行功能发现。
我们的第三个目标是研究细菌病原体中的毒力因子 (VF),通过 ECOD 分类的生物医学重要蛋白质
由 AFDB 建模或由我们的实验合作者研究的快速发展的 VF 对我们来说是一个挑战。
我们将通过序列识别候选 VF,
获得它们的结构模型,并利用与已知蛋白质在结构和功能位点上的相似性来推断它们的功能。
有希望的假设将在奥尔斯实验室通过生化和遗传分析进行实验测试。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Richard Dustin Schaeffer其他文献
Richard Dustin Schaeffer的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
衰老成纤维细胞通过逃逸巨噬细胞免疫监视加速皮肤衰老的机制研究
- 批准号:82373462
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
高血糖通过激活核糖激酶促进血红蛋白核糖基化加速糖尿病微血管病变的机制研究
- 批准号:82360165
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
M2 TAMs分泌的OGT通过促进糖酵解过程加速肝细胞癌恶性生物学行为的机制研究
- 批准号:82360529
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
Prss22通过调节uPA加速ADSCs介导的创面愈合的机制研究
- 批准号:82302807
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
尿毒症毒素通过激活环境感应器受体AhR抑制线粒体生物发生加速肾脏衰老和功能减退的机制研究
- 批准号:82370695
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
A platform to identify in vivo targets of covalent cancer drugs in 3D tissues
识别 3D 组织中共价癌症药物体内靶标的平台
- 批准号:
10714543 - 财政年份:2023
- 资助金额:
$ 34.44万 - 项目类别:
Cell Intrinsic and Extrinsic Factors Driving Maturation in Human PSC-derived Neurons
驱动人 PSC 衍生神经元成熟的细胞内在和外在因素
- 批准号:
10736603 - 财政年份:2023
- 资助金额:
$ 34.44万 - 项目类别:
High-resolution extended-depth phase-engineered objectives to accelerate spatial 'omics R&D through computational optics
高分辨率扩展深度阶段工程物镜加速空间组学研究
- 批准号:
10761173 - 财政年份:2023
- 资助金额:
$ 34.44万 - 项目类别:
Development and Evaluation of Advanced Non-Contrast Perfusion MRI for Monitoring Treatment Response in Brain Metastases
用于监测脑转移治疗反应的先进非对比灌注 MRI 的开发和评估
- 批准号:
10716949 - 财政年份:2023
- 资助金额:
$ 34.44万 - 项目类别:
Engineering 3D Osteosarcoma Models to Elucidate Biology and Inform Drug Discovery
工程 3D 骨肉瘤模型以阐明生物学并为药物发现提供信息
- 批准号:
10564801 - 财政年份:2023
- 资助金额:
$ 34.44万 - 项目类别: