Developing tools for the unbiased analysis and visualization of scRNA-seq data
开发用于 scRNA-seq 数据公正分析和可视化的工具
基本信息
- 批准号:10279320
- 负责人:
- 金额:$ 29.67万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-01 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:AddressAdipose tissueAlgorithmsB-Cell DevelopmentB-Cell LymphomasBenchmarkingBinomial ModelBiologicalBiological PreservationBiomedical ResearchCandidate Disease GeneCellsCollaborationsCommunitiesComplexComputational algorithmComputer softwareDataData AnalysesData SetDevelopmentDiabetes MellitusDimensionsDiseaseGene ExpressionGene Expression RegulationGenesGeometryGoalsHomeostasisIndividualKnowledgeLeadMalignant NeoplasmsMeasurementMessenger RNAMethodsMicroscopyModelingNeighborhoodsNoisePharmaceutical PreparationsPopulationProcessResearch PersonnelResolutionSourceStatistical ModelsStructureTechniquesTestingThree-Dimensional ImageTissue-Specific Gene ExpressionTissuesTrainingTranslatingVariantVisualizationWorkanalysis pipelinebasecell typecomputerized toolsdata complexitydata visualizationdeep neural networkexperimental studyfeature selectiongenome-widehigh dimensionalityimaging Segmentationimprovedinnovationinsightnovelnovel strategiessingle-cell RNA sequencingtooltreatment response
项目摘要
ABSTRACT
Single-cell RNA sequencing (scRNA-seq) provides genome-wide information about gene expression at the
resolution of individual cells. The unprecedented scope of these data is revolutionizing our understanding of
development and tissue homeostasis as well as diseases like cancer. A major issue with scRNA-seq, however,
is the shear scale of the data, consisting of ~20,000 gene expression measurements in thousands to millions
of cells. Effective computational approaches are clearly required to translate data of this size and complexity
into actionable biological insights. For instance, scRNA-seq data are approximately 20,000-dimensional, and
as a result all available analysis pipelines rely on multiple dimensionality reduction steps. This usually entails a
combination of linear tools like PCA and non-linear techniques like t-SNE and UMAP. The data is generally
reduced to between 10- and 100-D for data analysis (e.g. clustering into distinct cell types) and 2-D for
visualization. The problem, however, is that dimensionality reduction can lead to loss of information. We
recently showed that this loss of information is dramatic: for any given cell, over 95% of its neighbors are
changed in the process of dimensionality reduction. This complete change in the structure of the data can
introduce significant noise and bias into the analysis, and suggests the critical need for alternative approaches.
The premise of this application is that reducing bias in scRNA-seq data analysis will maximize our ability to
extract meaningful information from the data. In this proposal, we focus on developing new algorithms to
address three specific steps in the typical analysis pipeline: (1) Dimensionality Reduction: Our hypothesis is
that deep neural networks can be explicitly trained to maximize the amount of information that can be retained
for both data analysis and visualization. (2) Feature Selection: Not all genes are equally informative for
downstream analyses, so researchers generally choose a subset of genes based on variation in the
population. We have shown that standard approaches to selecting genes convolve true biological variation with
technical noise from the experiment. We hypothesize that statistical models based on our understanding of
sources of technical noise can be used to select more informative genes. (3) Cell clustering: Clustering the
data to determine cell types is critical, but cells with different identities often form complex, overlapping
geometries in gene expression space that are difficult for existing algorithms to resolve. Our hypothesis is that
new clustering tools, guided by prior knowledge and leveraging innovations in clustering from image
segmentation, can overcome this problem. We will build these new tools and test them against existing
benchmark datasets and novel data generated by our experimental collaborators. We will also integrate these
tools into popular scRNA-seq analysis packages. Successful completion of the proposed work will allow the
field to extract more biologically relevant information from the burgeoning set of scRNA-seq datasets.
抽象的
单细胞RNA测序(SCRNA-SEQ)提供了有关基因表达基因表达的基因表达的信息
单个细胞的分辨率。这些数据的前所未有的范围正在彻底改变我们对
发育和组织稳态以及癌症等疾病。但是,Scrna-Seq的一个主要问题
是数据的剪切量表,包括大约20,000个基因表达测量值,以数千至数百万美元的形式组成
细胞。显然需要有效的计算方法来翻译这种大小和复杂性的数据
进入可行的生物学见解。例如,SCRNA-SEQ数据约为20,000维,并且
结果,所有可用的分析管道都依赖于多个维度降低步骤。这通常需要一个
PCA等线性工具和T-SNE和UMAP等线性工具的组合。数据通常是
用于数据分析(例如,聚集到不同的单元格类型)和2-D的数据分析介于10到100-D之间
可视化。但是,问题是降低维度会导致信息丢失。我们
最近表明,信息丢失是巨大的:对于任何给定的单元,超过95%的邻居是
在降低维度的过程中发生了变化。数据结构的完全更改可以
将明显的噪音和偏见引入分析中,并提出对替代方法的关键需求。
该应用程序的前提是,减少SCRNA-SEQ数据分析中的偏差将最大化我们的能力
从数据中提取有意义的信息。在此提案中,我们专注于开发新算法
解决典型分析管道中的三个特定步骤:(1)降低维度:我们的假设是
可以明确训练深层神经网络,以最大程度地保留信息的量
用于数据分析和可视化。 (2)特征选择:并非所有基因都同样有用
下游分析,因此研究人员通常会根据基于差异的基因子集选择一部分基因
人口。我们已经表明,选择基因的标准方法使真实的生物学变异与
实验的技术噪音。我们根据我们对的理解来假设统计模型
技术噪声的来源可用于选择更多有用的基因。 (3)细胞聚类:聚类
确定细胞类型的数据至关重要,但是具有不同身份的细胞通常形成复杂的重叠
对于现有算法而言,基因表达空间中的几何形状很难解决。我们的假设是
新的聚类工具,以先验知识和利用创新为指导
细分,可以克服这个问题。我们将构建这些新工具,并根据现有
我们的实验合作者生成的基准数据集和新颖数据。我们还将整合这些
工具中流行的SCRNA-SEQ分析软件包。成功完成拟议的工作将允许
从迅速发展的SCRNA-SEQ数据集中提取更多具有生物学相关信息的字段。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric J Deeds其他文献
Eric J Deeds的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric J Deeds', 18)}}的其他基金
Structural and Dynamical Specificity in Intracellular Signaling Networks
细胞内信号网络的结构和动态特异性
- 批准号:
7224411 - 财政年份:2007
- 资助金额:
$ 29.67万 - 项目类别:
Structural and Dynamical Specificity on Intracellular Signaling Networks
细胞内信号网络的结构和动态特异性
- 批准号:
7570698 - 财政年份:2007
- 资助金额:
$ 29.67万 - 项目类别:
Structural and Dynamical Specificity in Intracellular Signaling Networks
细胞内信号网络的结构和动态特异性
- 批准号:
7361407 - 财政年份:2007
- 资助金额:
$ 29.67万 - 项目类别:
相似国自然基金
脂肪组织新型内分泌因子的鉴定及功能研究
- 批准号:82330023
- 批准年份:2023
- 资助金额:220 万元
- 项目类别:重点项目
脂肪干细胞外泌体miRNA-299a-3p调控巨噬细胞Thbs1缓解脂肪组织衰老的机制研究
- 批准号:82301753
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
血管周围脂肪组织TRPV1通道通过脂联素调控肥胖相关高血压的机制研究
- 批准号:82300500
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
糖尿病脂肪组织中SIRT3表达降低进而上调外泌体miR-146b-5p促进肾小管脂毒性的机制研究
- 批准号:82370731
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
CXCL1/CXCR2信号轴上调Bcl-2促进筋膜定植巨噬细胞迁移在皮下脂肪组织原位再生中的机制研究
- 批准号:82360615
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
相似海外基金
Opportunistic Atherosclerotic Cardiovascular Disease Risk Estimation at Abdominal CTs with Robust and Unbiased Deep Learning
通过稳健且公正的深度学习进行腹部 CT 机会性动脉粥样硬化性心血管疾病风险评估
- 批准号:
10636536 - 财政年份:2023
- 资助金额:
$ 29.67万 - 项目类别:
Image-based risk assessment to identify women at high-risk for breast cancer
基于图像的风险评估可识别乳腺癌高危女性
- 批准号:
10759110 - 财政年份:2023
- 资助金额:
$ 29.67万 - 项目类别:
Trimming the fat with small proteins: Micropeptides in adipogenesis
用小蛋白质减少脂肪:脂肪生成中的微肽
- 批准号:
10655394 - 财政年份:2022
- 资助金额:
$ 29.67万 - 项目类别:
Identifying biomarker signatures of prognostic value for Multisystem Inflammatory Syndrome in Children (MIS-C)
识别儿童多系统炎症综合征 (MIS-C) 预后价值的生物标志物特征
- 批准号:
10320491 - 财政年份:2021
- 资助金额:
$ 29.67万 - 项目类别: