Learn, transfer, generate: Developing novel deep learning models for enhancing robustness and accuracy of small-scale single-cell RNA sequencing studies
学习、转移、生成:开发新颖的深度学习模型,以增强小规模单细胞 RNA 测序研究的稳健性和准确性
基本信息
- 批准号:10535708
- 负责人:
- 金额:$ 3.76万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAnimalsAtlasesAttentionAwardBenchmarkingBiologicalBiologyBrainCOVID-19CellsClassificationCollectionCommunitiesComplexComplicationComputational algorithmComputer softwareDataData SetDependenceDiseaseEmerging TechnologiesExplosionFoundationsGene ExpressionGenerationsGenesGeneticGenomicsGoalsHeterogeneityImage AnalysisIntelligenceKnowledgeLaboratoriesLearningLiteratureMachine LearningMalignant NeoplasmsMasksMathematical BiologyMeasuresMethodologyMethodsModelingNaturePathologyPatientsPatternPerformancePopulationPositioning AttributeProcessPsychological TransferPublic HealthPythonsRNARNA analysisResearchResearch PersonnelResolutionResourcesSamplingScienceSignal TransductionSmall RNAStructureSupervisionTechnologyTimeTissuesTrainingbasecareercell typecomplex biological systemscomputer sciencecomputerized toolscostdeep learning modeldeep neural networkdesignexosomeexperimental studyflexibilityimprovedin silicomathematical learningnovelopen sourcepolysome profilingrelating to nervous systemsimulationsingle cell analysissingle-cell RNA sequencingtooltranscriptometranscriptome sequencingtranscriptomicstransfer learninguser-friendly
项目摘要
Project Summary. Single-cell RNA-sequencing (scRNAseq) technologies measure transcriptome-wide gene
expression at the single-cell level. In contrast to bulk RNA-sequencing, scRNAseq can elucidate dynamic
expression patterns between different cellular populations. A key problem in scRNAseq studies is the inability
to transfer knowledge between independent sequencing studies directly. As a result, it has been necessary for
researchers to spend a significant amount of time and resources generating massive datasets to enable
meaningful analyses, a process that is costly and often not reproducible. Another transformative technology is
spatial transcriptomics (ST), which provides genetic profiles of cells while containing the positional information
on the sequenced cell. ST has the potential to expand our understanding of cellular heterogeneity, interactions,
and pathology; however, ST is still an emerging technology and is not widely available for many studies.
This proposal will fulfill the unmet need for scalable algorithms that transfer knowledge from existing datasets
to new studies, leveraging learned representations to construct the sequenced tissue's spatial information. I
propose to achieve these goals through the following aims: (1) Transfer knowledge from existing public single-
cell data to new experimental data using a deep neural-attention network, and (2) develop the first spatially-
informed model for generating realistic scRNAseq data. In Aim 1, I will use the "attention" mechanisms (which
have revolutionized many fields in computer science) to learn complex gene dependencies intelligently and
learn important biological features (e.g., marker genes) in a fully self-supervised manner, providing biological
interpretability that is desperately needed. Such a model can be used in many tasks and for datasets with
relatively few samples. The learned knowledge obtained from Aim 1 will be used directly in Aim 2. In Aim 2, I
will build upon our state-of-the-art generative model to generate synthetic data that contains spatial information
(coordinates) of sequenced cells, even when no atlas is available. This model will allow researchers to produce
synthetic data with spatial information and augment sparse and noisy datasets for more robust and accurate
analyses, all possible without the need for additional costly experiments.
This proposal will support my dissertation research, which will be the foundational body of work for my career
as a researcher in computational genomics. During the tenure of this award, I will receive specialized training
in the underlying mathematics and biology needed for developing frameworks for scRNAseq analysis. I will
contribute to the existing literature by developing novel methodology and creating open-source software,
making our tools and models easily accessible to the broader scientific community. Achieving the proposed
aims will significantly enhance scRNAseq pipelines and analysis, making them more robust and accurate. This
will additionally facilitate the study of smaller datasets, potentially reducing the number of patients and animals
necessary in initial studies.
项目摘要。单细胞 RNA 测序 (scRNAseq) 技术可测量转录组范围内的基因
单细胞水平的表达。与批量 RNA 测序相比,scRNAseq 可以阐明动态
不同细胞群之间的表达模式。 scRNAseq 研究的一个关键问题是无法
直接在独立测序研究之间转移知识。结果,有必要
研究人员花费大量时间和资源生成大量数据集,以实现
有意义的分析,这是一个成本高昂且通常不可重复的过程。另一项变革性技术是
空间转录组学 (ST),提供细胞的遗传图谱,同时包含位置信息
在测序的细胞上。 ST 有潜力扩大我们对细胞异质性、相互作用、
和病理学;然而,ST 仍然是一项新兴技术,尚未广泛用于许多研究。
该提案将满足可扩展算法的未满足需求,这些算法可从现有数据集中传输知识
在新的研究中,利用学习到的表示来构建测序组织的空间信息。我
建议通过以下目标来实现这些目标:(1)从现有的公共单一领域转移知识
使用深度神经注意力网络将细胞数据转化为新的实验数据,以及(2)开发第一个空间-
用于生成真实 scRNAseq 数据的知情模型。在目标 1 中,我将使用“注意力”机制(其中
彻底改变了计算机科学的许多领域)智能地学习复杂的基因依赖关系
以完全自我监督的方式学习重要的生物学特征(例如标记基因),提供生物学
迫切需要的可解释性。这样的模型可以用于许多任务和数据集
样本相对较少。从目标 1 中获得的知识将直接用于目标 2。在目标 2 中,我
将基于我们最先进的生成模型来生成包含空间信息的合成数据
测序细胞的(坐标),即使没有可用的图谱。该模型将允许研究人员生产
具有空间信息的合成数据并增强稀疏和嘈杂的数据集,以获得更稳健和准确的结果
分析,所有这些都是可能的,无需额外昂贵的实验。
该提案将支持我的论文研究,这将成为我职业生涯的基础工作
作为计算基因组学的研究员。在该奖项任职期间,我将接受专门的培训
开发 scRNAseq 分析框架所需的基础数学和生物学知识。我会
通过开发新颖的方法和创建开源软件为现有文献做出贡献,
使更广泛的科学界可以轻松地使用我们的工具和模型。实现提议的
目标将显着增强 scRNAseq 管道和分析,使其更加稳健和准确。这
此外还将促进较小数据集的研究,有可能减少患者和动物的数量
在初步研究中是必要的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Abbas-Ali Heydari其他文献
Abbas-Ali Heydari的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
动物双歧杆菌对不同聚合度低聚木糖同化差异性的分子机制研究
- 批准号:32302789
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
基于扁颅蝠类群系统解析哺乳动物脑容量适应性减小的演化机制
- 批准号:32330014
- 批准年份:2023
- 资助金额:215 万元
- 项目类别:重点项目
以秀丽隐杆线虫为例探究动物在不同时间尺度行为的神经基础
- 批准号:32300829
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
城市化对土壤动物宿主-寄生虫关系的影响机制研究
- 批准号:32301430
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
城市河流底栖动物性状β多样性的空间格局及群落构建研究
- 批准号:32301334
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
New software tools for differential analysis of single-cell genomics perturbation experiments
用于单细胞基因组扰动实验差异分析的新软件工具
- 批准号:
10735033 - 财政年份:2023
- 资助金额:
$ 3.76万 - 项目类别:
A three dimensional multimodal cellular connectivity atlas of the mouse hypothalamus
小鼠下丘脑三维多模态细胞连接图谱
- 批准号:
10719606 - 财政年份:2023
- 资助金额:
$ 3.76万 - 项目类别:
Modularly built, complete, coordinate- and template-free brain atlases
模块化构建、完整、无坐标和模板的大脑图谱
- 批准号:
10570256 - 财政年份:2022
- 资助金额:
$ 3.76万 - 项目类别:
Leveraging Mammalian Cancers, Platinum-Quality Genome Assemblies, and Large-Scale Data to Identify Mechanisms of Rare Human Cancers
利用哺乳动物癌症、白金级基因组组装和大规模数据来识别罕见人类癌症的机制
- 批准号:
10334726 - 财政年份:2022
- 资助金额:
$ 3.76万 - 项目类别:
Modularly built, complete, coordinate- and template-free brain atlases
模块化构建、完整、无坐标和模板的大脑图谱
- 批准号:
10467697 - 财政年份:2022
- 资助金额:
$ 3.76万 - 项目类别: