Applying Bioinformatics to Research in Immune, Muscle, and Bone Diseases
将生物信息学应用于免疫、肌肉和骨骼疾病的研究
基本信息
- 批准号:8559330
- 负责人:
- 金额:$ 58.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AcidsAddressAffectAntibodiesAutomationB cell differentiationBehcet SyndromeBindingBioinformaticsBiomedical ResearchBiopsyBlood specimenBone DiseasesCD4 Positive T LymphocytesChIP-seqChronicCollectionComplexDataData AnalysesData SetDentin FormationDetectionDevelopmentDiseaseEpigenetic ProcessExperimental Autoimmune EncephalomyelitisGene ExpressionGenesGenomeGenomicsGoalsGuidelinesHair follicle structureHelper-Inducer T-LymphocyteHeterotopic OssificationHumanIgEImmune System DiseasesIn VitroInflammatoryInterleukin-1Interleukin-17InvestigationJavaLengthLipodystrophyMediatingMethodsModificationMolecular ProfilingMorphogenesisMusMyopathyNational Institute of Arthritis and Musculoskeletal and Skin DiseasesNeonatalNeural CrestNuclearOsteoblastsOsteogenesisPatientsProceduresProcessProductionProtein IsoformsProteinsPsoriasisPublishingPythonsRNARNA SequencesReadingRegulatory PathwayResearchResearch Project GrantsRetinoidsRibosomal RNARun-On AssaysRunningSTAT4 geneSTAT5A geneSamplingScientistSignal PathwaySignal TransductionSkinSoftware ToolsSolutionsStagingStreamSyndromeSystemic Lupus ErythematosusT-LymphocyteTemperatureTestingTh1 CellsTissuesTranscriptTranscriptional Activationbasecluster computingcomputerized data processingdesigndistributed dataexperiencegenome-widein vivonext generationnovelplatform-independentpromoterresearch studyresponseskin disorderstatisticstext searchingtooltranscription factortricho-dento-osseous syndromeuser-friendly
项目摘要
The Biodata Mining and Discovery section has been actively involved in a variety of NIAMS research projects and in particular:
- Studies that demonstrate the FLCN-FNIP complex deregulated in BHD syndrome is absolutely required for B-cell differentiation
- a study that shows IL-27 priming of T cells controls IL-17 production in trans via induction of PD-L1
- A study that establishes a regulatory pathway where the transcription factor Dlx3 is essential in dentin formation by directly regulating a crucial matrix protein
- An investigation showing neural crest deletion of Dlx3 recapitulates features of Tricho-Dento-Osseous syndrome
- A study demonstrating that Tfh and Th1 cells share an early transitional stage through the signal mediated by STAT4
- Functional and epigenetic studies that reveal multistep differentiation and plasticity of in vitro-generated and in vivo-derived follicular T helper cells
- Homeostatic tissue responses in skin biopsies from NOMID patients with constitutive overproduction of IL-1-beta
- A project that generated data indicating Wnt-catenin signaling pathway as a target regulated by retinoid acid during hair follicle morphogenesis
- Applying RNA-Seq to SLE: identifying distinct gene expression profiles associated with high levels of auto-reactive IgE antibodies in systemic lupus erythematosus
- Genome-wide ChIP-Seq analyses that reveal the extent of opportunistic STAT5 binding that does not yield transcriptional activation of neighboring genes
- A study on Dlx3 Inactivation in osteoblasts showing defective endochondral bone formation
Major computational approaches and methods developed are highlighted below.
Development of scripts for and partial automation of RNA-Seq data processing and analysis
A number of Bash and Python scripts have been developed to facilitate RNA-Seq data processing and data analysis, including those for systematically renaming sequence files and counting the number of QC-passing reads in both original compressed sequence files and the matching uncompressed ones, generating input files for distributed data analysis runs on NIHs Biowulf cluster, post data analysis file manipulation and organization, making special Genome Browser viewable files (such as BigWig files), and generating data analysis files for distributed computing based on customized specifications tailored to a specific research project. Some of these scripts have been put together to partially automate RNA-Seq data processing and data analysis.
Exploring and developing methods for GRO-Seq data analysis
Methods have being explored and developed for GRO-Seq (nuclear run-on assay followed by sequencing), the newest research application of next generation sequencing. Procedures have been developed and tested to remove ribosomal RNA sequences, align ribo-free sequences to a genome, generating strand specific Genome Browser viewable files, and identify statistically significant strand specific peaks marking transcripts that are being actively transcribed. In addition, a python script has been developed that classifies all identified transcripts into a number of groups against known genomic annotations, such as annotated, anti_gene-body, anti_promoter, divergent, and intergenic.
An approach to investigate the effect of sequencing depth and reads length on RNA-Seq
This approach has been developed to address several unanswered yet critical questions for RNA-Seq, such as how the sequencing depth and reads length affect gene detection and how they affect junction and isoform determination. It involves pooling one billion single ended reads of 93 bases from normal human blood samples and systematically random sampling from the reads pool to generate multiple sequencing collections of varying sequencing depth at certain reads length. This is followed by both common RNA-Seq data analysis procedures and particularly designed customized data analysis solutions. Preliminary results applying this approach have shown that longer reads can detect more junctions, which may help isoform determination, whereas reads length effect (93 bases vs 50 bases) is not significant for known gene detection. They also shown that for an RPKM (reads per kilobase transcript per million total tags) cutoff of 1, 20 million 50 base reads can detect 95% of the transcripts detectable at 500 million reads, whereas 50 million 50 base reads are needed to achieve the same 95% detection rate if an RPKM cutoff of 0.05 is applied. Detailed sequencing depth vs transcript detection rate data have been calculated, providing a practical guideline for targeted coverage and reads length in designing an RNA-Seq experiment.
The further development and test of a Peak Assignment and Profile Search Tool (PAPST)
Based on our extensive experience in analyzing ChIP-Seq data, PAPST has been developed to combine several most useful data analysis methods developed previously with a unique feature of its own as an easy-to-use novel and fast profile search tool of ChIP-Seq data for genes with specific transcription factor binding and epigenetic modifications. Systematically analyzing post-peak-calling ChIP-Seq data is a great challenge not only because of a current lacking of the software tools, but equally important also because the limited existing tools are largely inaccessible to the lab scientists who are ultimately responsible for making sense of the peak-calling results. PAPST has been developed for post-peak-calling ChIP-Seq data analysis in response to this great challenge. With a few mouse clicks and within seconds, PAPST allows a user to quickly identify genes with specific transcription factor (TF) binding and/or epigenetic modification co-localization profiles, a novel and unique feature of the software tool that answers questions such as what are the genes with TF1 and TF2 binding and epigenetic mark A in their promoters, and epigenetic marks B and C in their gene bodies?. Other quick PAPST analysis results include peak distribution statistics among gene-centered genomic regions and the number of overlapping peaks for all pair-wise sample comparisons. PAPST can also generate microarray style gene-centered quantitative ChIP-Seq data with a single mouse click, which may then be combined with RNA-Seq or microarray data, if available, to facilitate further down-stream analysis. A Java based platform independent desktop application, PAPST is very user friendly and requires no special computational expertise to use. For advanced users, PAPST may also be creatively used as a general genomic interval based search tool to fast screen any coordinated genomic feature, such as genes or a set of TF binding peaks, against any other coordinated genomic features in any combination. PAPST has been tested using a published ChIP-Seq data set of multiple transcription factors.
Biodata采矿和发现部分已积极参与各种NIAMS研究项目,尤其是:
- B细胞分化绝对必需的研究表明在BHD综合征中失控的FLCN-FNIP复合物是
- 一项研究表明T细胞IL-27启动通过诱导PD-L1来控制IL-17的产生
- 一项建立调节途径的研究,其中转录因子DLX3通过直接调节关键基质蛋白而在牙本质形成中至关重要
- 调查显示DLX3的神经crest删除概述了毛dento骨综合征的特征
- 一项研究表明,TFH和TH1细胞通过STAT4介导的信号共享早期过渡阶段
- 揭示了体外生成和体内衍生的卵泡T辅助细胞的多步分化和可塑性的功能和表观遗传学研究
- 来自NOMID患者的皮肤活检中的稳态组织反应,IL-1-β的组成型过度生产
- 一个生成数据表明Wnt-catenin信号通路作为触发卵泡形态发生过程中的靶标的数据的项目
- 将RNA-Seq应用于SLE:识别与全身性红斑狼疮中与高水平自动反应性IgE抗体相关的不同基因表达曲线
- 全基因组CHIP-SEQ分析揭示了未产生相邻基因转录激活的机会性STAT5结合程度
- 成骨细胞中DLX3失活的研究,显示内侧软骨骨形成缺陷
下面突出显示了开发的主要计算方法和方法。
开发RNA-seq数据处理和分析的脚本和部分自动化
A number of Bash and Python scripts have been developed to facilitate RNA-Seq data processing and data analysis, including those for systematically renaming sequence files and counting the number of QC-passing reads in both original compressed sequence files and the matching uncompressed ones, generating input files for distributed data analysis runs on NIHs Biowulf cluster, post data analysis file manipulation and organization, making special Genome Browser viewable files (such作为Bigwig文件),并根据针对特定研究项目量身定制的定制规格生成数据分析文件。这些脚本中的一些已组合在一起,以部分自动化RNA-seq数据处理和数据分析。
探索和开发用于GRO-SEQ数据分析的方法
已经探索和开发了用于Gro-Seq的方法(核跑步测定,然后是测序),这是下一代测序的最新研究应用。已经开发并测试了程序,以去除核糖体RNA序列,将无核糖序列与基因组保持一致,生成链特定的基因组浏览器可查看的文件,并识别主动转录的标记转录本具有统计学意义的特定峰。此外,已经开发了一个Python脚本,该脚本将所有鉴定的转录物分类为许多针对已知基因组注释的组,例如注释,抗_Gene-Body,anti_gene-boty,anti-promoter,anti promoter,Divergent和Intergenic。
一种研究测序深度和读取长度对RNA-seq的影响的方法
已经开发了这种方法来解决RNA-seq的几个未解决但关键的问题,例如测序深度和读取长度如何影响基因检测以及它们如何影响结局和同工型的确定。它涉及从正常人类血液样本中汇集10亿个单端读数,并从读取池进行系统的随机采样,以在某些读取长度下生成多个测序集合的多种测序集合。接下来是常见的RNA-seq数据分析程序,尤其是设计定制的数据分析解决方案。采用这种方法的初步结果表明,较长的读数可以检测更多的连接,这可能有助于同工型测定,而读取长度效应(93个碱基vs 50碱基)对于已知的基因检测并不重要。他们还表明,对于RPKM(每千键式转录本读数为每百万个标签)1,2000万50个基本读数可检测到可检测到的5亿读的转录本的95%,而如果rpkm截止率为0.05,则需要5000万个50个基本读数,以达到相同的95%的检测率。已经计算了详细的测序深度与转录本检测率数据,为目标覆盖范围提供了实用的指南,并在设计RNA-Seq实验时读取了长度。
峰分配和配置文件搜索工具的进一步开发和测试(PAPST)
根据我们在分析芯片序列数据方面的丰富经验,帕普斯特(Papst)已开发出来,以结合以前开发的几种最有用的数据分析方法,它具有自己的独特功能,作为一种易于使用的小说和快速概况搜索的基因chip-seq数据,用于基因的基因和特定转录因子结合和表观遗传修饰的基因数据。系统地分析后呼叫后的芯片序列数据是一个巨大的挑战,这不仅是因为目前缺乏软件工具,而且同样重要,因为有限的现有工具在很大程度上是实验室科学家所无法获得的,而实验室的科学家最终负责使峰值呼吸结果了解。帕普斯特(Papst)是针对这一巨大挑战而开发的用于呼叫后芯片播种数据分析。 With a few mouse clicks and within seconds, PAPST allows a user to quickly identify genes with specific transcription factor (TF) binding and/or epigenetic modification co-localization profiles, a novel and unique feature of the software tool that answers questions such as what are the genes with TF1 and TF2 binding and epigenetic mark A in their promoters, and epigenetic marks B and C in their gene bodies?.其他快速的PAPST分析结果包括以基因为中心的基因组区域之间的峰分布统计以及所有成对样品比较的重叠峰数量。 Papst还可以使用单个鼠标单击生成以基因为中心的微阵列样式的定量芯片数据数据,然后可以将其与RNA-Seq或微阵列数据(如果可用)结合使用,以促进进一步的下游分析。 Papst是一个基于Java的平台独立桌面应用程序,非常用户友好,不需要使用专业的专业知识。对于高级用户,PAPST也可以创造性地用作基于基因组间隔的一般搜索工具,以快速筛选任何协调的基因组特征,例如基因或一组TF结合峰,以与任何组合中的任何其他协调的基因组特征相比。 Papst已使用已发表的多个转录因子的CHIP-SEQ数据集进行了测试。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Massimo Gadina其他文献
Massimo Gadina的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Massimo Gadina', 18)}}的其他基金
High Throughput Next Generation Sequencing: supports genomics and epigenomics research in muscle, skin, bone and autoimmune diseases.
高通量下一代测序:支持肌肉、皮肤、骨骼和自身免疫性疾病的基因组学和表观基因组学研究。
- 批准号:
10496410 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Translational Immunology research: a support for clinical immunological research
转化免疫学研究:临床免疫学研究的支持
- 批准号:
8344974 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Animal care: supporting research on pathogenesis and treatment of autoimmunity
动物护理:支持自身免疫发病机制和治疗的研究
- 批准号:
8345004 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Animal care: supporting research on autoimmune, inflammatory and muscle diseases
动物护理:支持自身免疫、炎症和肌肉疾病的研究
- 批准号:
8940198 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Applying Bioinformatics to Research in Immune, Muscle, and Bone Diseases
将生物信息学应用于免疫、肌肉和骨骼疾病的研究
- 批准号:
8940203 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Flow cytometry support to research in immune, skin, muscle and bone diseases
流式细胞术支持免疫、皮肤、肌肉和骨骼疾病的研究
- 批准号:
9563188 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Animal care: supporting research on pathogenesis and treatment of autoimmunity
动物护理:支持自身免疫发病机制和治疗的研究
- 批准号:
7970351 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Applying Bioinformatics to Research in Immune, Muscle, and Bone Diseases
将生物信息学应用于免疫、肌肉和骨骼疾病的研究
- 批准号:
7732838 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Animal care: supporting research on autoimmune, inflammatory and muscle diseases
动物护理:支持自身免疫、炎症和肌肉疾病的研究
- 批准号:
10267583 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
Animal care: supporting research on pathogenesis and treatment of autoimmunity
动物护理:支持自身免疫发病机制和治疗的研究
- 批准号:
8158460 - 财政年份:
- 资助金额:
$ 58.98万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Climate Change Effects on Pregnancy via a Traditional Food
气候变化通过传统食物对怀孕的影响
- 批准号:
10822202 - 财政年份:2024
- 资助金额:
$ 58.98万 - 项目类别:
Executive functions in urban Hispanic/Latino youth: exposure to mixture of arsenic and pesticides during childhood
城市西班牙裔/拉丁裔青年的执行功能:童年时期接触砷和农药的混合物
- 批准号:
10751106 - 财政年份:2024
- 资助金额:
$ 58.98万 - 项目类别:
Designing novel therapeutics for Alzheimer’s disease using structural studies of tau
利用 tau 蛋白结构研究设计治疗阿尔茨海默病的新疗法
- 批准号:
10678341 - 财政年份:2023
- 资助金额:
$ 58.98万 - 项目类别:
Functional, structural, and computational consequences of NMDA receptor ablation at medial prefrontal cortex synapses
内侧前额皮质突触 NMDA 受体消融的功能、结构和计算后果
- 批准号:
10677047 - 财政年份:2023
- 资助金额:
$ 58.98万 - 项目类别:
The transcriptional control of vascular calcification in disease
疾病中血管钙化的转录控制
- 批准号:
10647475 - 财政年份:2023
- 资助金额:
$ 58.98万 - 项目类别: