Efficient Methods for Dimensionality Reduction ofSingle-Cell RNA-Sequencing Data
单细胞 RNA 测序数据降维的有效方法
基本信息
- 批准号:10356883
- 负责人:
- 金额:$ 5.18万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-03-16 至 2023-03-15
- 项目状态:已结题
- 来源:
- 关键词:AddressAdoptedAlgorithmsBiologicalCellsCodeCollectionCommunitiesComputer HardwareComputing MethodologiesConsensusDataData AnalysesData SetDevelopmentDimensionsDiseaseEvaluationFellowshipGaussian modelGenesHourHumanLanguageLearningLibrariesMathematicsMeasuresMentorshipMethodsModelingModernizationNamesNoiseNormal Statistical DistributionPaperPhysiciansPhysiologyPopulationPrincipal Component AnalysisProcessPublishingRNARandomizedResearch PersonnelResolutionRunningScientistSpeedStatistical BiasStatistical MethodsSystematic BiasTechniquesTechnologyTimeTissuesTrainingVariantVisualizationbasedesigndimensional analysisdistributed dataexperienceexperimental studyhigh dimensionalityimprovedinsightlaptopnon-Gaussian modelparallelizationprofessorsingle cell analysissingle-cell RNA sequencingstatisticssupercomputertheoriestooltranscriptometranscriptome sequencing
项目摘要
Project Summary: Efficient Methods for Dimensionality Reduction of Single-Cell RNA-Sequencing Data
Single-cell RNA-sequencing is a revolutionary technology enabling discoveries in human physiology and
disease. The datasets generated from single-cell RNA-sequencing experiments are so large that they cannot be
analyzed or visualized using traditional statistical methods until the datasets have been shrunk using a
technique named “dimensionality reduction.” Almost every analysis of single-cell RNA-sequencing begins
using a technique named principal component analysis (PCA) to accomplish dimensionality reduction.
However, single-cell RNA-sequencing presents unique challenges making PCA difficult. First, the size of these
datasets is so large that computing PCA requires specialized hardware and multiple hours. Fast algorithms to
approximate PCA have been shown to dramatically speed up this process, but have not proliferated in the
single cell-RNA sequencing community, in part because no parallelized algorithm has been written in the R
computing language. Second, PCA requires the researcher to decide the final desired size of the dataset.
Choosing too small of a size results in discarding valuable biological insights, while choosing too large a size
increases the noise. However, there is no consensus on how to pick the optimal size for single-cell RNA
sequencing, and there is evidence that this size might be systematically underestimated. Lastly, PCA cannot be
applied directly to the count-data measured in single cell RNA sequencing, so researchers must first apply a
preprocessing technique to normalize it. The current standard in the field is to apply the log transform –
however, several recent studies have shown that the log transform creates statistical biases in single-cell RNA
sequencing. In this fellowship, specifically tailored, fast methods for performing PCA on single-cell RNA-
sequencing data will be developed: 1a) A framework to rigorously measure the consequence of changing
preprocessing parameters on the final results of several publicly available single cell RNA sequencing datasets
to enable experimentation of PCA on single-cell RNA-sequencing data. 1b) An ultra-fast, parallelized
implementation of randomized PCA allowing researchers using standard laptops to rapidly perform PCA on
single cell RNA sequencing data. 2) A technique for rigorously choosing the final size when performing
principal component analysis for single-cell RNA-sequencing datasets. 3) A method for transforming single-cell
RNA-sequencing data so that it becomes appropriately distributed enabling proper usage of PCA without
incurring statistical biases. This fellowship also includes a detailed training plan with valuable learning
experiences for the applicant’s development as a physician-scientist who can apply methods from high
dimensional-statistics to solving biomedical problems.
项目摘要:降低单细胞RNA-Sere-Sere-Ser-Ser-Se-Sterionding数据的有效方法
单细胞RNA测序是一项革命性的技术,在人类生理学和
疾病。由单细胞RNA序列实验产生的数据集是如此之大,以至于不能是
使用传统统计方法对数据集进行了分析或可视化,直到使用
称为“降低维度”的技术。几乎所有对单细胞RNA的分析开始
使用名为主组件分析(PCA)的技术来降低维度。
但是,单细胞RNA测序提出了独特的挑战,使PCA变得困难。首先,这些大小
数据集是如此之大,以至于计算PCA需要专门的硬件和多小时。快速算法
已显示近似PCA可以显着加快此过程的速度,但没有在
单细胞-RNA测序社区,部分原因是没有在R中写入并行的算法
计算语言。其次,PCA要求研究人员决定数据集的最终所需尺寸。
选择太小的尺寸会导致丢弃有价值的生物学见解,同时选择太大的尺寸
增加噪音。但是,如何选择单细胞RNA的最佳尺寸尚无共识
测序,并且有证据表明这种大小可能会被系统地低估。最后,PCA不能是
直接应用于单细胞RNA测序中测量的计数数据,因此研究人员必须首先应用
预处理技术使其正常化。该领域的当前标准是应用日志变换 -
但是,最近的一些研究表明,对数变换会在单细胞RNA中产生统计偏差
测序。在此奖学金中,专门针对单细胞RNA-执行PCA的快速方法 -
将开发测序数据:1A)严格衡量更改后果的框架
几个公开可用的单细胞RNA测序数据集的最终结果的预处理参数
为了实现PCA在单细胞RNA-Sere-Ser-Ser-se-er-sepering数据上的实验。 1b)超快速的平行
实施随机PCA,允许研究人员使用标准笔记本电脑快速执行PCA
单细胞RNA测序数据。 2)执行时严格选择最终尺寸的技术
单细胞RNA序列数据集的主成分分析。 3)一种转换单细胞的方法
RNA序列数据,以便它变得适当分布,可以适当使用PCA
产生统计偏见。该奖学金还包括一项详细的培训计划,并具有宝贵的学习
作为身体科学家的申请人发展的经验,可以应用高度的方法
解决生物医学问题的维度统计。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Quantitative assessment of p16 expression in FNA specimens from head and neck squamous cell carcinoma and correlation with HPV status.
- DOI:10.1002/cncy.22399
- 发表时间:2021-05
- 期刊:
- 影响因子:3.4
- 作者:Abi-Raad R;Prasad ML;Gilani S;Garritano J;Barlow D;Cai G;Adeniran AJ
- 通讯作者:Adeniran AJ
Anaplastic Thyroid Carcinoma: Cytomorphologic Features on Fine-Needle Aspiration and Associated Diagnostic Challenges.
甲状腺未分化癌:细针抽吸的细胞形态学特征及相关诊断挑战。
- DOI:10.1093/ajcp/aqab159
- 发表时间:2022
- 期刊:
- 影响因子:3.5
- 作者:Podany,Peter;Abi-Raad,Rita;Barbieri,Andrea;Garritano,James;Prasad,ManjuL;Cai,Guoping;Adeniran,AdebowaleJ;Gilani,SyedM
- 通讯作者:Gilani,SyedM
RAS mutation and associated risk of malignancy in the thyroid gland: An FNA study with cytology-histology correlation.
- DOI:10.1002/cncy.22537
- 发表时间:2022-04
- 期刊:
- 影响因子:3.4
- 作者:Gilani, Syed M.;Abi-Raad, Rita;Garritano, James;Cai, Guoping;Prasad, Manju L.;Adeniran, Adebowale J.
- 通讯作者:Adeniran, Adebowale J.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
James Michael Garritano其他文献
James Michael Garritano的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
采用复合防护材料的水下多介质耦合作用下重力坝抗爆机理研究
- 批准号:51779168
- 批准年份:2017
- 资助金额:59.0 万元
- 项目类别:面上项目
采用数值计算求解一类半代数系统全部整数解
- 批准号:11671377
- 批准年份:2016
- 资助金额:48.0 万元
- 项目类别:面上项目
采用pinball loss的MEE算法研究
- 批准号:11401247
- 批准年份:2014
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
采用路径算法和管网简化的城市内涝近实时模拟
- 批准号:41301419
- 批准年份:2013
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
采用ε近似算法的盲信道均衡
- 批准号:60172058
- 批准年份:2001
- 资助金额:16.0 万元
- 项目类别:面上项目
相似海外基金
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 5.18万 - 项目类别:
Unified, Scalable, and Reproducible Neurostatistical Software
统一、可扩展且可重复的神经统计软件
- 批准号:
10725500 - 财政年份:2023
- 资助金额:
$ 5.18万 - 项目类别:
Toward Accurate Cardiovascular Disease Prediction in Hispanics/Latinos: Modeling Risk and Resilience Factors
实现西班牙裔/拉丁裔的准确心血管疾病预测:风险和弹性因素建模
- 批准号:
10852318 - 财政年份:2023
- 资助金额:
$ 5.18万 - 项目类别:
Applying Computational Phenotypes To Assess Mental Health Disorders Among Transgender Patients in the United States
应用计算表型评估美国跨性别患者的心理健康障碍
- 批准号:
10604723 - 财政年份:2023
- 资助金额:
$ 5.18万 - 项目类别:
Single viewpoint panoramic imaging technology for colonoscopy
肠镜单视点全景成像技术
- 批准号:
10580165 - 财政年份:2023
- 资助金额:
$ 5.18万 - 项目类别: