Statistical And Computational Methods For Gene Expression and Proteomic Analysis
基因表达和蛋白质组分析的统计和计算方法
基本信息
- 批准号:8746528
- 负责人:
- 金额:$ 94.38万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AccountingAddressAdultAffectAgingAlternative SplicingBioinformaticsBiological AssayBiologyBiomedical ResearchBlood PlateletsCalendarCardiovascular DiseasesCaringCase-Control StudiesCollaborationsComplete Blood CountComplexComputer softwareComputing MethodologiesDataData AdjustmentsData CollectionData SetData Storage and RetrievalDatabasesDefectDevelopmentDiabetes MellitusDiseaseEmbryoEpidemiologyEventExcisionExonsExperimental DesignsEyeEye diseasesFamilyFluorescenceFunctional RNAGene ExpressionGene Expression ProfileGene Expression ProfilingGenerationsGenesGeneticGenetic TranscriptionGenomicsHealthHeartHumanIndividualInterleukin-6InternationalInvestigationLaboratoriesLeukocytesManuscriptsMeasurementMeasuresMedicalMethodologyMethodsMicroRNAsMicroarray AnalysisModelingMonitorMorphologic artifactsMusNational Heart, Lung, and Blood InstituteNational Human Genome Research InstituteNational Institute of Allergy and Infectious DiseaseNational Institute of Child Health and Human DevelopmentNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institute of Drug AbuseNational Institute of Neurological Disorders and StrokeNatureNeonatalOsteoporosisPhasePhenotypePhotoreceptorsPrincipal Component AnalysisProteomicsPublishingQualifyingQuality ControlRattusReadingResearch PersonnelResolutionRetrievalSamplingSampling StudiesSeriesSmokingSourceSpliced GenesStagingStatistical MethodsSurveysTechniquesTechnologyTestingTimeUnited States National Institutes of HealthUpdateValidationVariantWritingage relatedbaseblood lipidcase controlcohortcoronary artery calcificationdatabase of Genotypes and Phenotypesdesignfollow-uphigh throughput analysisnext generation sequencingnoveloffspringresearch studytooltranscriptome sequencingtranscriptomicsworking group
项目摘要
Gene expression measurement using microarrays or next-generation sequencing techniques, is a popular and useful technology for genomic analysis. Challenging problems result from the large volume of data generated in these experiments. Quality control and experimental design remain important fundamental issues. Analytical techniques which account for complex experimental designs and minimize artifacts are required. Many problematic statistical and bioinformatics issues remain and are addressed in this project.
Next generation sequencing techniques are now a popular means for RNA expression measurement (RNAseq). As with microarrays, a host of technical and quality control issues remain as challenges, in addition to the new statistical problems implied by change of scale from continuous (microarray fluorescence) to discrete (read counts).
We develop and test methods for analysis of alternative gene splicing, based on microarray platforms especially designed for the purpose, and more recently, using RNAseq. Two measurement platforms, the Affymetrix exon array and the ExonHit junction probe array have been studied. A special version of our analysis package, The MSCL Toolbox, was written for this study, namely the ExonSVD. This statistical technique was shown to be highly efficient at identifying genes undergoing alternative splicing, and was less susceptible to the false positives encountered with the earlier ExonANOVA method. The ExonANOVA model has now been tested with RNAseq data in two different studies. It performs well, and perhaps better than it does in the microarray context, owing to better conformity of the data with the underlying assumptions of independence and uniformity of variance, after transformation.
The Framingham Heart Survey SABRe project uses the Affymetrix Exon array, which increases the available transcriptional information by roughly a factor of 10, compared to earlier expression arrays. This large project, which assayed almost 6,000 samples, has now been completed. The last phase (Third Generation cohort, about 3,000 samples) was completed in 2011. In addition to careful continuous quality control monitoring of data collection over 3 calendar years, our lab has carefully monitored and developed corrections for several important artifacts affecting the data. Data adjustment for laboratory measured QC parameters allowed for substantial reduction of variation in the data. Principal Components analysis led to the possibility of further correction of the data. Both raw and adjusted versions of the dataset for the Offspring and Third Generation cohorts have been completed and submitted to dbGaP for distribution to qualified investigators. Careful analysis of gene expression in conjunction with SNP determinations found that individual identities, to within close family membership could be re-established from expression data alone. This finding allowed for the determination and removal of about a dozen samples for which the identity had apparently been scrambled. Further analysis of expression data in combination with Complete Blood Count with Differential results on a fraction of the entire dataset, allowed for effective imputation of CBC results for the entire dataset. These data make it possible to adjust expression data for the varying makeup of white-blood cell and platelet composition, which might otherwise confound expression analysis.
The Offspring and Third Generation results have now been analyzed with many phenotype working groups and have provided strong results for such phenotypes as blood lipid levels, IL-6 levels, smoking effects, osteoporosis, diabetes, and cardiovascular disease
The case-control study (manuscript published), has yielded lists of genes significantly associated with cardiovascular disease (CVD). Pending the confirmation by qPCR analysis, many of these newly detected associations will become the subject of a third manuscript.
Together with other investigators, we are analyzing the expression data in combination with genetic data (eQTL analysis), with microRNA expression data and finding many strong statistical associations, due to the large, homogeneous nature of our dataset. We are comparing our results to that of others in a variety of international consortia, to find validation for many of our findings.
Affordable, high-quality software availability has been one of the bottlenecks in analysis of microarray data. We have further developed the "MSCL Analyst's Toolbox" to address this need. This toolbox allows investigators to download Affymetrix microarray data from a central database, normalize and transform the data, inspect it for a variety of outliers or defects, perform a variety of statistical tests to select relevant genes affected in the experiment, and then visualize and classify various patterns of gene expression. In collaboration with over forty investigators in NCI, CC, NHLBI, NINDS, NIAID, NHGRI, NICHD, NIA, NIDDK, NIDA , this tool has been applied to dozens of microarray studies. The Analyst's Toolbox has been extended to now handle analysis of RNAseq data, with inclusion of new data transformations, and utility functions.
In a continuing NIH-wide project, we maintain a database for storage, retrieval and analysis of Affymetrix microarrays, the NIHAGCC. Our downloadable tool set (MSCL Analyst's Toolbox) is now mature, widely tested and applied in numerous studies. We also maintain a quarterly-updated set of annotation files for use with Affymetrix data, in a format for convenient download and use by our collaborators. Last year, the NIHAGCC was re-hosted on newer server hardware, with high capacity data storage needed for RNAseq datasets.
In a continuing study of the rat pineal transcriptome, we have found a dramatic number of novel, unannotated, but demonstrably controlled regions of genomic expression, termed non-coding RNAs (ncRNAs) some of which were found to be pseudo-genes of highly expressed genes. The growing list of such novel features has grown to several hundred, as multiple RNA-seq experiments become available.
In a collaboration with NHGRI, we are conducting an RNA-seq investigation of transcriptomic differences using a case-control design, of coronary artery calcification, based on ClinSeq study samples. We integrated RNA-seq and microarray data from the same individuals, and found consistent changes across the two methodologies, which are now candidates for follow-up studies.
In a collaboration with NEI, we are analyzing the transcriptome of mouse photoreceptor from embryonic, through neonatal to later adult stages. This extensive time series, using bot the Affymetrix Exon array and RNA-seq in parallel, allows for high resolution analysis at the gene and exon levels, and is providing an unparalleled view of transcriptomic changes accompanying important developmental events (e.g. differentiation, eye opening). The aim is to identify genes involved in mammalian aging and which may be relevant to age-related diseases of the eye in human.
使用微阵列或下一代测序技术的基因表达测量是一种流行而有用的基因组分析技术。这些实验中产生的大量数据引起的挑战性问题。质量控制和实验设计仍然是重要的基本问题。 需要解释复杂的实验设计并最小化伪影的分析技术。 该项目仍然存在许多有问题的统计和生物信息学问题。
现在,下一代测序技术是RNA表达测量(RNASEQ)的流行手段。 与微阵列一样,除了从连续(微阵列荧光)变化到离散(读取计数)所暗示的新统计问题外,许多技术和质量控制问题仍然是挑战。
我们开发和测试用于分析替代基因剪接的方法,该方法基于专门为该目的而设计的微阵列平台,以及最近使用RNASEQ。 已经研究了两个测量平台,Affymetrix外显子阵列和外部连接探针阵列。 我们的分析软件包的特殊版本是为这项研究编写的,即ExonsVD。 该统计技术被证明在识别经过替代剪接的基因方面非常有效,并且不太容易受到早期exonanova方法遇到的假阳性的影响。 在两项不同的研究中,Exonanova模型现已通过RNASEQ数据进行了测试。 它的性能良好,甚至可能比在微阵列上下文中表现更好,这是由于数据与转换后的基本假设和差异的基本假设的更好合并。
与较早的表达阵列相比,Framingham心脏调查Saber Project使用Affymetrix外显子阵列,将可用的转录信息增加了大约10倍。 这个大型项目测定了近6,000个样本,现已完成。 最后一个阶段(第三代队列,约3,000个样本)在2011年完成。除了在3个日历年内仔细质量控制数据收集的连续质量控制监控外,我们的实验室还对几个影响数据的重要文物进行了仔细监控和开发的校正。 实验室测量的QC参数的数据调整允许大幅减少数据变化。 主成分分析导致可能进一步纠正数据。 后代和第三代同类群体的数据集的原始版本和调整后的版本均已完成并提交给DBGAP,以分发向合格的研究人员分发。 仔细分析基因表达与SNP的确定结合发现,单独的表达数据可以重新建立单个身份,与亲密家庭成员之间的个人身份。 这一发现允许确定和去除大约十二个样本,这些样本显然被扰乱了。 在整个数据集的一部分中,对表达数据的进一步分析与完整的血数和差异结果相结合,可以有效地插入整个数据集的CBC结果。 这些数据使调整表达数据可以为白血细胞和血小板组成的各种构成的表达数据,否则可能会混淆表达分析。
现在已经通过许多表型工作组对后代和第三代结果进行了分析,并为诸如血脂水平,IL-6水平,吸烟效应,骨质疏松症,糖尿病和心血管疾病等表型提供了强劲的结果。
病例对照研究(手稿)已得出与心血管疾病(CVD)显着相关的基因列表。 在通过QPCR分析确认之前,这些新检测到的关联将成为第三手稿的主题。
与其他研究者一起,由于我们数据集的较大,均匀的性质,我们正在分析与遗传数据(EQTL分析)结合使用遗传数据(EQTL分析),并找到许多强大的统计关联的表达数据。 我们将我们的结果与各种国际财团中的其他人的结果进行比较,以找到许多发现的验证。
负担得起的高质量软件可用性一直是微阵列数据分析的瓶颈之一。我们进一步开发了“ MSCL分析师的工具箱”来满足这一需求。该工具箱允许研究人员从中央数据库下载Affymetrix微阵列数据,将数据归一化和转换数据,检查其各种异常值或缺陷,执行各种统计测试以选择实验中影响的相关基因,然后可视化和分类。基因表达的各种模式。 与NCI,CC,NHLBI,NINDS,NIAID,NHGRI,NICHD,NIA,NIA,NIA,NIDDK,NIDA的40多名研究人员合作,该工具已应用于数十个微阵列研究。 分析师的工具箱已扩展到现在以处理RNASEQ数据的分析,其中包括新的数据转换和实用程序功能。
在持续的NIH范围内项目中,我们维护一个数据库,用于存储,检索和分析Affymetrix Microares,Nihagcc。 我们的可下载工具集(MSCL分析师的工具箱)现在已经成熟,在许多研究中经过了广泛的测试和应用。我们还维护了一组季度更新的注释文件,以与Affymetrix数据一起使用,以方便下载和使用我们的协作者使用。 去年,NIHAGCC在新的服务器硬件上重新托管,RNASEQ数据集需要高容量数据存储。
在对大鼠松果体转录组的持续研究中,我们发现了大量的新颖,未注释但明显受控的基因组表达区域,称为非编码RNA(NCRNA),其中一些被发现是高表达高表达的假基因基因。 随着多个RNA-seq实验可用,这种新颖特征的越来越多已增长到数百个。
在与NHGRI的合作中,我们基于Clinseq研究样本,使用病例对照设计(冠状动脉钙化)对转录组差异进行了RNA-SEQ研究。 我们整合了来自相同个体的RNA-seq和微阵列数据,并发现了两种方法的一致变化,这现在是进行后续研究的候选者。
在与NEI的合作中,我们正在分析从胚胎,新生儿到后来成人阶段的小鼠光感受器的转录组。 这个长时间的时间序列使用bot并行使用affymetrix外显子阵列和RNA-seq,可以在基因和外显子水平上进行高分辨率分析,并提供了伴随重要发育事件的转录组变化的无与伦比的视图(例如,差异化,开放,开放) 。 目的是鉴定与哺乳动物衰老有关的基因,并且可能与人类与年龄相关的疾病有关。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
peter j munson其他文献
peter j munson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('peter j munson', 18)}}的其他基金
Statistical And Computational Methods For Molecular Biology And Biomedicine
分子生物学和生物医学的统计和计算方法
- 批准号:
8565482 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
Statistical And Computational Methods For Gene Expression and Proteomic Analysis
基因表达和蛋白质组分析的统计和计算方法
- 批准号:
8148480 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
Statistical And Computational Methods For Molecular Biol
分子生物学的统计和计算方法
- 批准号:
7296867 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
Statistical And Computational Methods For Gene Expression and Proteomic Analysis
基因表达和蛋白质组分析的统计和计算方法
- 批准号:
8941406 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
Statistical And Computational Methods For Molecular Biology And Biomedicine
分子生物学和生物医学的统计和计算方法
- 批准号:
7966721 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
Statistical And Computational Methods For Gene Expression and Proteomic Analysis
基因表达和蛋白质组分析的统计和计算方法
- 批准号:
7966728 - 财政年份:
- 资助金额:
$ 94.38万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
相似海外基金
Climate Change Effects on Pregnancy via a Traditional Food
气候变化通过传统食物对怀孕的影响
- 批准号:
10822202 - 财政年份:2024
- 资助金额:
$ 94.38万 - 项目类别:
Feasibility Trial of a Novel Integrated Mindfulness and Acupuncture Program to Improve Outcomes after Spine Surgery (I-MASS)
旨在改善脊柱手术后效果的新型综合正念和针灸计划的可行性试验(I-MASS)
- 批准号:
10649741 - 财政年份:2023
- 资助金额:
$ 94.38万 - 项目类别:
NeuroMAP Phase II - Recruitment and Assessment Core
NeuroMAP 第二阶段 - 招募和评估核心
- 批准号:
10711136 - 财政年份:2023
- 资助金额:
$ 94.38万 - 项目类别:
Genetic and Environmental Influences on Individual Sweet Preference Across Ancestry Groups in the U.S.
遗传和环境对美国不同血统群体个体甜味偏好的影响
- 批准号:
10709381 - 财政年份:2023
- 资助金额:
$ 94.38万 - 项目类别:
Human-iPSC derived neuromuscular junctions as a model for neuromuscular diseases.
人 iPSC 衍生的神经肌肉接头作为神经肌肉疾病的模型。
- 批准号:
10727888 - 财政年份:2023
- 资助金额:
$ 94.38万 - 项目类别: