Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
基本信息
- 批准号:10274223
- 负责人:
- 金额:$ 185.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-16 至 2024-08-31
- 项目状态:已结题
- 来源:
- 关键词:AffinityAmino Acid SequenceAntibodiesAntibody FormationAntibody SpecificityAntibody TherapyAntigen-Antibody ComplexAntigensArchitectureB-Cell Antigen ReceptorB-Cell Receptor BindingB-LymphocytesBase SequenceBindingBiologicalBiologyCellsChronic DiseaseCodeComputer ModelsComputersComputing MethodologiesCoupledDataData SetDatabasesDegenerative DisorderDevelopmentDiagnosisEconomicsElectronsEngineeringEnzymesEpitopesEquilibriumFoundationsFutureGenesGenomicsGoalsHourImmune systemImmunizeImmunoassayImmunoglobulinsImmunologistImmunologyIndustrializationLigandsLightLinkMachine LearningMalignant NeoplasmsMeasurableMechanicsMethodsMicroscopicModelingMolecularMolecular BiologyMolecular ComputationsMusNatureNetwork-basedNeural Network SimulationOutputPassive ImmunotherapyPhage DisplayPhasePlayProblem SolvingProcessProductionProteinsReadinessReagentResearch Project GrantsSARS-CoV-2 antigenSARS-CoV-2 spike proteinSavingsScientistSeriesSpecificityStructural ChemistryStructural ModelsStructureSurface Plasmon ResonanceSystemTestingTherapeuticTherapeutic antibodiesThermodynamicsTimeTrainingVaccinesValidationVariantViralViral AntigensViral ProteinsWorkbasecombatcomputer sciencedata streamsdatabase structuredeep learningdeep neural networkdeep sequencingdensitydesignexperimental studyhigh dimensionalityin silicoinnovationinsightlarge datasetsmachine learning methodmolecular modelingmouse modelneutralizing antibodynovelnovel viruspandemic diseasepandemic preparednesspathogenphysical propertyprotein structurequantumresponsescaffoldsimulationsingle cell sequencingsynthetic antibodiestherapeutic evaluationthree dimensional structure
项目摘要
ABSTRACT
One of the “holy grails” in immunology is to be able to directly predict tight-binding variable chain antibody
sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can
potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However,
only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins.
The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody
chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours
from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like
molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning
architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure-
based network topographical features of the antigens and their cognate antibodies, and will output their
respective binding affinity constants.
We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based
ML approaches for the immune system, to discover associations between the epitope and the variable chain
features. This approach requires a large data stream of antigen and cognate antibody sequences, which until
recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled
with single cell deep sequencing (“linking B cell receptor to antigen specificity through sequencing” or LIBRA-
seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity
to antigenic epitopes.
Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate
immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls),
chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but
not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training
sets, along with other data sets already available in public databases, will generate a series of structural features
(described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions
will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed
strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer
science to the solution of a general biological engineering problem.
Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine-
learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus
protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted
antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial
enzymes, will also be dramatically enhanced if this project were to be successful.
The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein
crystallographer, an immunologist, and a molecular biologist.
1
抽象的
免疫学中的“圣杯”之一是能够直接预测紧密结合变量链抗体
针对异物或非自身抗原蛋白的硅序列。免疫球蛋白链重排可以
潜在地编码大约1016种抗体重链和轻链序列的不同变体。然而,
通常只能访问序列空间的一小部分,以抗异国蛋白质的抗体。
计算挑战是从抗原结构的模型进行预测一组抗体
可以与抗原紧密结合的链序列。如果解决,可能会在不到24小时内移动
从新型病毒蛋白的第一个冷冻电子显微镜结构开始,以推动一组潜在的抗体样
用于测试的分子候选者。为了解决这个问题,该项目旨在发展深入学习
将采用输入热力学,量子机械(密度功能)和局部结构的结构 -
基于抗原及其同源抗体的网络地形特征,并将输出
主要结合亲和力常数。
我们将设计一个通用的对抗网络(GAN),我们认为它非常适合基于回归
ML接种系统的方法,以发现表位与变量链之间的关联
特征。这种方法需要大量的抗原和同源抗体序列的数据流,直到
最近描述的单个B细胞接收器(BCR)特定标记方法耦合
单细胞深度测序(“通过测序将B细胞受体与抗原特异性联系起来”或库
SEQ)可以快速隔离并对BCR变量链编码区进行序列,以高选择性结合
抗原表位。
朝着特定的项目目标,在任务1中,Libra-Seq将用于快速识别和生成候选人
免疫球蛋白编码序列响应特定的线性和非线性表位(反对),
通过计算/分子建模选择,并用SARS-COV-2尖峰蛋白表位优先(但是
不限于将其注入鼠标模型,以生成大型训练集;在任务2中,这些培训
集合以及公共数据库中已经可用的其他数据集将生成一系列结构功能
(如上所述),将用于训练甘恩;在任务3中,预测的表位 - 抗体相互作用
将通过合成抗体和噬菌体播放系统的直接实验来验证。那,提议
策略组合进化生物学,基因组学,结构化学和计算机的基础原理
科学解决通用生物工程问题的解决方案。
预计该项目的结果将为经过严格测试且完全自动化的机器奠定基础。
可以快速从新病毒结构中生成合成抗体的学习系统
蛋白质可以增强对未来大流行的快速反应能力。发展目标的能力
针对非感染或慢性疾病的抗体疗法以及基于抗体的工业的生产
如果该项目成功,酶也将动态增强。
团队:这个多机构研究项目的团队领导者建立了计算机科学家,一种蛋白质
晶体学学,免疫学家和分子生物学家。
1
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jeniffer Bertha Hernandez其他文献
Jeniffer Bertha Hernandez的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jeniffer Bertha Hernandez', 18)}}的其他基金
Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
- 批准号:
10845715 - 财政年份:2021
- 资助金额:
$ 185.16万 - 项目类别:
相似国自然基金
基于祖先序列重构的D-氨基酸解氨酶的新酶设计及分子进化
- 批准号:32271536
- 批准年份:2022
- 资助金额:54.00 万元
- 项目类别:面上项目
模板化共晶聚合合成高分子量序列聚氨基酸
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
模板化共晶聚合合成高分子量序列聚氨基酸
- 批准号:22201105
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
基于祖先序列重构的D-氨基酸解氨酶的新酶设计及分子进化
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
C-末端40个氨基酸插入序列促进细菌脂肪酸代谢调控因子FadR转录效率的机制研究
- 批准号:82003257
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
相似海外基金
Strategies for next-generation flavivirus vaccine development
下一代黄病毒疫苗开发策略
- 批准号:
10751480 - 财政年份:2024
- 资助金额:
$ 185.16万 - 项目类别:
Single-molecule protein sequencing by barcoding of N-terminal amino acids
通过 N 端氨基酸条形码进行单分子蛋白质测序
- 批准号:
10757309 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
CCR5 determinants for the HIV transmitted founder phenotype
HIV 传播创始人表型的 CCR5 决定因素
- 批准号:
10760884 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
Single-molecule protein sequencing by detection and identification of N-terminal amino acids
通过检测和鉴定 N 端氨基酸进行单分子蛋白质测序
- 批准号:
10646060 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
GPR160 antibody development for cancer treatment
用于癌症治疗的 GPR160 抗体开发
- 批准号:
10711206 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别: