Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
基本信息
- 批准号:10274223
- 负责人:
- 金额:$ 185.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-16 至 2024-08-31
- 项目状态:已结题
- 来源:
- 关键词:AffinityAmino Acid SequenceAntibodiesAntibody FormationAntibody SpecificityAntibody TherapyAntigen-Antibody ComplexAntigensArchitectureB-Cell Antigen ReceptorB-Cell Receptor BindingB-LymphocytesBase SequenceBindingBiologicalBiologyCellsChronic DiseaseCodeComputer ModelsComputersComputing MethodologiesCoupledDataData SetDatabasesDegenerative DisorderDevelopmentDiagnosisEconomicsElectronsEngineeringEnzymesEpitopesEquilibriumFoundationsFutureGenesGenomicsGoalsHourImmune systemImmunizeImmunoassayImmunoglobulinsImmunologistImmunologyIndustrializationLigandsLightLinkMachine LearningMalignant NeoplasmsMeasurableMechanicsMethodsMicroscopicModelingMolecularMolecular BiologyMolecular ComputationsMusNatureNetwork-basedNeural Network SimulationOutputPassive ImmunotherapyPhage DisplayPhasePlayProblem SolvingProcessProductionProteinsReadinessReagentResearch Project GrantsSARS-CoV-2 antigenSARS-CoV-2 spike proteinSavingsScientistSeriesSpecificityStructural ChemistryStructural ModelsStructureSurface Plasmon ResonanceSystemTestingTherapeuticTherapeutic antibodiesThermodynamicsTimeTrainingVaccinesValidationVariantViralViral AntigensViral ProteinsWorkbasecombatcomputer sciencedata streamsdatabase structuredeep learningdeep neural networkdeep sequencingdensitydesignexperimental studyhigh dimensionalityin silicoinnovationinsightlarge datasetsmachine learning methodmolecular modelingmouse modelneutralizing antibodynovelnovel viruspandemic diseasepandemic preparednesspathogenphysical propertyprotein structurequantumresponsescaffoldsimulationsingle cell sequencingsynthetic antibodiestherapeutic evaluationthree dimensional structure
项目摘要
ABSTRACT
One of the “holy grails” in immunology is to be able to directly predict tight-binding variable chain antibody
sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can
potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However,
only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins.
The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody
chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours
from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like
molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning
architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure-
based network topographical features of the antigens and their cognate antibodies, and will output their
respective binding affinity constants.
We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based
ML approaches for the immune system, to discover associations between the epitope and the variable chain
features. This approach requires a large data stream of antigen and cognate antibody sequences, which until
recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled
with single cell deep sequencing (“linking B cell receptor to antigen specificity through sequencing” or LIBRA-
seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity
to antigenic epitopes.
Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate
immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls),
chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but
not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training
sets, along with other data sets already available in public databases, will generate a series of structural features
(described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions
will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed
strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer
science to the solution of a general biological engineering problem.
Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine-
learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus
protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted
antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial
enzymes, will also be dramatically enhanced if this project were to be successful.
The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein
crystallographer, an immunologist, and a molecular biologist.
1
抽象的
免疫学的“圣杯”之一是能够直接预测紧密结合的可变链抗体
计算机模拟中针对外来或非自身“抗原”蛋白质的序列可以进行免疫球蛋白链重排。
可能编码大约 1016 种不同的抗体重链和轻链序列变体。
通常只有一小部分序列空间可用于进化针对外来蛋白质的抗体。
计算挑战是从抗原结构模型到预测一组抗体
能够与抗原紧密结合的链序列如果被解决,可能会在 24 小时内移动。
从新型病毒蛋白的第一个冷冻电子显微镜结构到推进一系列有效的抗体样
为了解决这个问题,该项目旨在开发一种深度学习方法。
将采用热力学、量子力学(密度泛函)和局部结构作为输入的架构-
基于抗原及其同源抗体的网络拓扑特征,并将输出它们
各自的结合亲和力常数。
我们将设计一个生成对抗网络(GAN),我们认为它特别适合基于回归的
免疫系统的机器学习方法,发现表位和可变链之间的关联
这种方法需要大量的抗原和同源抗体序列数据流,直到
最近很难获得最近描述的单一 B 细胞受体 (BCR) 特异性标记方法。
单细胞深度测序(“通过测序将 B 细胞受体与抗原特异性联系起来”或 LIBRA-
seq)可以快速分离和测序能够高选择性结合的BCR可变链编码区
至抗原表位。
为了实现具体的项目目标,在任务 1 中,LIBRA-seq 将用于快速识别和生成候选者
响应特定线性和非线性表位的免疫球蛋白编码序列(相对于对照),
通过计算/分子模型进行选择,并优先考虑 SARS-CoV-2 刺突蛋白表位(但
在任务 2 中,这些训练
集以及公共数据库中已有的其他数据集将生成一系列结构特征
(如上所述),将用于训练任务 3 中的 GAN;预测表位-抗体相互作用
将通过合成抗体和噬菌体展示系统的直接实验来验证。
策略结合了进化生物学、基因组学、结构化学和计算机的基本原理
科学来解决一般生物工程问题。
该项目的结果预计将为经过严格测试的全自动机器奠定基础
可以根据新型病毒的结构快速生成合成抗体候选物的学习系统
蛋白质,可以增强针对未来大流行的快速反应能力,开发有针对性的能力。
针对非传染性或慢性疾病的抗体疗法,以及基于抗体的工业生产
如果该项目成功的话,酶的含量也将得到显着增强。
团队:这个多机构研究项目的团队领导包括一名计算机科学家、一名蛋白质科学家
晶体学家、免疫学家和分子生物学家。
1
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jeniffer Bertha Hernandez其他文献
Jeniffer Bertha Hernandez的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jeniffer Bertha Hernandez', 18)}}的其他基金
Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
- 批准号:
10845715 - 财政年份:2021
- 资助金额:
$ 185.16万 - 项目类别:
相似国自然基金
模板化共晶聚合合成高分子量序列聚氨基酸
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于祖先序列重构的D-氨基酸解氨酶的新酶设计及分子进化
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
C-末端40个氨基酸插入序列促进细菌脂肪酸代谢调控因子FadR转录效率的机制研究
- 批准号:82003257
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
谷氧还蛋白PsGrx在南极海冰细菌极端生境适应中的功能研究
- 批准号:41876149
- 批准年份:2018
- 资助金额:62.0 万元
- 项目类别:面上项目
氨基酸转运蛋白LAT1调控mTOR信号通路对鼻咽癌放射敏感性的影响及其机制研究
- 批准号:81702687
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Strategies for next-generation flavivirus vaccine development
下一代黄病毒疫苗开发策略
- 批准号:
10751480 - 财政年份:2024
- 资助金额:
$ 185.16万 - 项目类别:
Single-molecule protein sequencing by barcoding of N-terminal amino acids
通过 N 端氨基酸条形码进行单分子蛋白质测序
- 批准号:
10757309 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
CCR5 determinants for the HIV transmitted founder phenotype
HIV 传播创始人表型的 CCR5 决定因素
- 批准号:
10760884 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
Single-molecule protein sequencing by detection and identification of N-terminal amino acids
通过检测和鉴定 N 端氨基酸进行单分子蛋白质测序
- 批准号:
10646060 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
GPR160 antibody development for cancer treatment
用于癌症治疗的 GPR160 抗体开发
- 批准号:
10711206 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别: