Developing robust and scalable genomics tools and databases to analyze immune receptor repertoires across diverse populations
开发强大且可扩展的基因组学工具和数据库来分析不同人群的免疫受体库
基本信息
- 批准号:10910354
- 负责人:
- 金额:$ 10.21万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-02-10 至 2028-01-31
- 项目状态:未结题
- 来源:
- 关键词:Adaptive Immune SystemAlgorithmsAllelesAutoimmune DiseasesB-Cell Antigen ReceptorB-LymphocytesBenchmarkingBioinformaticsBiological AssayCatalogingCertificationCollectionCommunicable DiseasesCommunitiesDNA amplificationDataData ScienceData SetDatabasesEthnic PopulationEuropeanEuropean ancestryGenesGeneticHeterogeneityHigh-Throughput Nucleotide SequencingHumanImmuneImmune System DiseasesImmune systemImmunogeneticsImmunogenomicsImmunologic ReceptorsIndividualInformation DisseminationInformation SystemsInternationalMalignant NeoplasmsMedicalMethodsNeurodegenerative DisordersPopulationPopulation HeterogeneityRNAReduce health disparitiesSamplingT cell receptor repertoire sequencingT-LymphocyteT-cell receptor repertoireTechnologyVariantadaptive immune responseaggregation databasebioinformatics toolcost effectivedeep sequencingethnic diversitygenetic architecturegenomic toolshealth disparityhuman diseaseimmune healthimprovedinsightnext generation sequencingnovelreceptortooltranscriptome sequencing
项目摘要
Abstract
The recent advances in high-throughput sequencing technologies enable cost-effective characterization of the
immune system and provide novel opportunities to study adaptive immune receptor repertoire (AIRR) at the
population scale. In particular, AIRR analysis provides essential insight into the complexity of the immune system
across a large variety of human diseases, including infectious diseases, cancer, autoimmune conditions, and
neurodegenerative diseases. A commonly used assay-based approach (i.e. AIRR-Seq) provides a detailed view
of the adaptive immune system by leveraging the deep sequencing of amplified DNA or RNA from the variable
region of the T and B cell receptors (TCR and BCR) loci. However, the limited number of samples probed by the
AIRR-Seq approach restricts the ability to detect novel population-specific V(D)J gene alleles across ethnically
diverse and admixed populations. Non-targeted next-generation sequencing (NGS) (e.g. WGS) promises to fill
the existing data gap by providing hundreds of thousands of NGS datasets across various ancestry groups.
However, reliable and scalable bioinformatics algorithms have yet to be developed to utilize non-targeted NGS
technologies to assemble novel population-specific alleles that would support effect-size heterogeneity across
ancestries. There's a lack of comprehensive population-specific allelic immunogenomics reference databases.
This void exacerbates existing health disparities, as discoveries in medical immunogenomics continue to be a
privilege and benefit for populations of European ancestry. The current state-of-the-art databases were built on
the genetic architecture based on individuals of European ancestry and thus fail to capture allelic variation across
diverse populations. Ongoing initiatives by the Adaptive Immune Receptor Repertoire Community (AIRR-C) to
improve the representation of diverse populations in reference databases (e.g. OGRDB and VDJbase) ignore
individuals of non-European ancestry and only incorporate an extremely small number of individuals of European
descent. We propose to utilize a data science approach for studying the variation of the human adaptive immune
system at a truly global scale, improving studies of immunological health and diseases, and reducing health
disparities. In this study, we will develop robust and scalable bioinformatics tools and databases able to leverage
the largest datasets covering individuals of various ancestries composed of over half a million NGS samples
spanning the AIRR-Seq, RNA-Seq, and WGS technologies. We will perform rigorous benchmarking of the
developed bioinformatics methods based on both simulated and real data to demonstrate the feasibility of using
NGS-based approaches to assemble novel V(D)J alleles. The availability of large and ethnically diverse sets of
samples will allow us to discover novel population-specific V(D)J alleles, which will enrich existing
immunogenomics databases with population-specific immune alleles. To promote the dissemination of the
obtained results, the novel alleles and assembled receptor sequences will be shared as an easy-to-use database
with a rich set of functionalities.
抽象的
高通量测序技术的最新进展使得能够经济有效地表征
免疫系统,并为研究适应性免疫受体库(AIRR)提供了新的机会
人口规模。特别是,AIRR 分析提供了对免疫系统复杂性的重要见解
涵盖多种人类疾病,包括传染病、癌症、自身免疫性疾病和
神经退行性疾病。常用的基于分析的方法(即 AIRR-Seq)提供了详细的视图
通过利用变量中扩增的 DNA 或 RNA 的深度测序来评估适应性免疫系统
T 细胞和 B 细胞受体(TCR 和 BCR)基因座区域。但由于检测样本数量有限
AIRR-Seq 方法限制了跨种族检测新的人群特异性 V(D)J 基因等位基因的能力
多样化和混合的人群。非靶向下一代测序 (NGS)(例如 WGS)有望填补这一空白
通过提供不同血统群体的数十万个 NGS 数据集来弥补现有的数据差距。
然而,尚未开发出可靠且可扩展的生物信息学算法来利用非靶向 NGS
组装新的群体特异性等位基因的技术将支持跨群体的效应大小异质性
祖先。缺乏全面的人群特异性等位基因免疫基因组学参考数据库。
这种空白加剧了现有的健康差距,因为医学免疫基因组学的发现仍然是一个难题
欧洲血统人口的特权和利益。目前最先进的数据库是建立在
基于欧洲血统个体的遗传结构,因此无法捕获跨区域的等位基因变异
不同的人群。适应性免疫受体库社区 (AIRR-C) 正在采取的举措
提高参考数据库(例如 OGRDB 和 VDJbase)中不同人群的代表性
非欧洲血统的个体,只包含极少数欧洲人
血统。我们建议利用数据科学方法来研究人类适应性免疫的变异
真正的全球范围内的系统,改进免疫健康和疾病的研究,并减少健康
差异。在这项研究中,我们将开发强大且可扩展的生物信息学工具和数据库,能够利用
涵盖不同血统个体的最大数据集,由超过 50 万个 NGS 样本组成
涵盖 AIRR-Seq、RNA-Seq 和 WGS 技术。我们将进行严格的基准测试
开发了基于模拟和真实数据的生物信息学方法,以证明使用的可行性
基于 NGS 的方法组装新型 V(D)J 等位基因。大量且种族多样化的群体的可用性
样本将使我们能够发现新的人群特异性 V(D)J 等位基因,这将丰富现有的
具有人群特异性免疫等位基因的免疫基因组数据库。为促进传播
获得的结果,新的等位基因和组装的受体序列将作为易于使用的数据库共享
具有丰富的功能。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
SERGHEI MANGUL其他文献
SERGHEI MANGUL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('SERGHEI MANGUL', 18)}}的其他基金
Developing robust and scalable genomics tools and databases to analyze immune receptor repertoires across diverse populations
开发强大且可扩展的基因组学工具和数据库来分析不同人群的免疫受体库
- 批准号:
10656981 - 财政年份:2023
- 资助金额:
$ 10.21万 - 项目类别:
相似国自然基金
基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
- 批准号:82304250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
多模态高层语义驱动的深度伪造检测算法研究
- 批准号:62306090
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高精度海表反照率遥感算法研究
- 批准号:42376173
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
基于新型深度学习算法和多组学研究策略鉴定非编码区剪接突变在肌萎缩侧索硬化症中的分子机制
- 批准号:82371878
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
基于深度学习与水平集方法的心脏MR图像精准分割算法研究
- 批准号:62371156
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Computational Methods for Analyzing lmmunoglobulin Allelic Diversity in B cells
分析 B 细胞中免疫球蛋白等位基因多样性的计算方法
- 批准号:
10751541 - 财政年份:2023
- 资助金额:
$ 10.21万 - 项目类别:
Developing robust and scalable genomics tools and databases to analyze immune receptor repertoires across diverse populations
开发强大且可扩展的基因组学工具和数据库来分析不同人群的免疫受体库
- 批准号:
10656981 - 财政年份:2023
- 资助金额:
$ 10.21万 - 项目类别:
Machine learning with immunogenetics for the prediction of hematopoietic cell transplant outcomes
机器学习与免疫遗传学预测造血细胞移植结果
- 批准号:
10322105 - 财政年份:2021
- 资助金额:
$ 10.21万 - 项目类别:
Machine learning with immunogenetics for the prediction of hematopoietic cell transplant outcomes
机器学习与免疫遗传学预测造血细胞移植结果
- 批准号:
10534187 - 财政年份:2021
- 资助金额:
$ 10.21万 - 项目类别: