Mining the Sequence of the Human Genome for Important Sequence Features
挖掘人类基因组序列以获取重要序列特征
基本信息
- 批准号:7594316
- 负责人:
- 金额:$ 25.21万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AffectAlgorithmsAnimal ModelAtopic DermatitisBase SequenceCaenorhabditis elegansCodeCollaborationsComputer AnalysisComputer softwareComputersDataData AnalysesEvolutionExonsGene StructureGenesGeneticGenomeGenomicsGoalsHealthHumanHuman GenomeIndividualLengthLocationMacaca mulattaMapsMetagenomicsMethodologyMethodsMicrobeMiningMolecular BiologyNucleotidesNumbersPatientsPatternPlantsPlayProcessPropertyProteinsPublic DomainsRNA, Ribosomal, 16SResearch PersonnelResearch Project GrantsRetroviral VectorRetroviridaeRoleSamplingSequence AnalysisSiteSkinTechniquesTimeVertebratesWhole-Genome Shotgun SequencingWritingcomputerized toolsgene functiongene therapygenome sequencingimprovedmicrobialnovelprogramsresearch studysizetechnology development
项目摘要
The last few years have seen a dramatic increase in the number of publicly available complete genome sequences and annotations. At the same time, researchers have been taking advantage of technology developments that allow individual labs to efficiently perform experiments that generate tens of thousands of data points. This massive increase in data means that some lab projects are no longer tractable by individual biologists but, rather, require large-scale data analysis capabilities best handled by a computer programmer. The projects described here focuses on developing methodologies to integrate sequence, annotation, and experimentally generated data so that bench biologists can quickly and easily obtain results for their large-scale experiments.
The goal of one such project is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. The first step in the process is to align each sequence to the reference genome assembly to determine its genomic location. Existing programs suffice for most sequences, but we have developed a novel set of algorithms to map short sequences of less than 25 nucleotides. These programs can map tens of thousands of sequences in only a few minutes, even allowing for mismatches. The second step of the process is to compare the coordinates of the sequences to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome.
We are applying the above methods to a number of research projects. One example is to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Fabio Candottis lab at NHGRI, we have determined the integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patients health, as well as whether the pattern of integration sites changes in the years post gene therapy. We are also collaborating on a similar project with Dr. Cynthia Dunbar of NHLBI. Her lab is pursuing retroviral gene therapy in rhesus macaques (Macaca mulatta), with the eventual goal of improving techniques for retroviral gene therapy in humans.
In collaboration with Dr. Julie Segres lab in the Genetics and Molecular Biology Branch (GMBB) to characterize skin microbes using genomic methods. This project involves sequencing the gene for the 16s ribosomal RNA (rRNA) subunit from resident microbes of the skin. Our objectives are to (1) characterize the baseline microbial diversity of the skin; (2) analyze the changes in microbial diversity of an animal model and human patients with atopic dermatitis; and (3) characterize the skin microflora by 16S rRNA sampling to pick appropriate representative species for whole-genome shotgun sequencing and, ultimately, metagenomic studies. My group has written programs to prepare the 16s rRNA sequence for analysis, and is collaborating in the computational analysis using existing public domain software.
The completion of the human and other genome sequencing projects also makes it possible to perform comprehensive analyses on gene structure. With Dr. Lawrence Brody of NHGRI, we are exploring the role of exon size in protein evolution. We are expanding our initial analysis to computationally characterize the lengths and other properties of all protein-coding exons from a representative set of fully-sequenced genomes, including vertebrates, D. melanogaster, C. elegans, and plants. Our goal is to gain a greater understanding of how large exons came to exist in our genomes, how they evolve, and what role, if any, that they play.
过去几年,公开可用的完整基因组序列和注释的数量急剧增加。与此同时,研究人员一直在利用技术发展,使各个实验室能够有效地进行实验,生成数以万计的数据点。数据的大量增加意味着一些实验室项目不再由单个生物学家处理,而是需要最好由计算机程序员处理的大规模数据分析能力。这里描述的项目侧重于开发方法来整合序列、注释和实验生成的数据,以便实验室生物学家能够快速、轻松地获得大规模实验的结果。
其中一个项目的目标是利用公开的序列和注释集来开发自动化工具,用于对实验鉴定的基因组序列进行计算表征。该过程的第一步是将每个序列与参考基因组组装比对,以确定其基因组位置。现有程序足以满足大多数序列,但我们开发了一套新颖的算法来绘制少于 25 个核苷酸的短序列。这些程序可以在短短几分钟内绘制数万个序列,甚至允许不匹配。该过程的第二步是将序列的坐标与各种基因组注释的坐标进行比较。使用这种方法,我们可以根据实验识别的序列与已知序列特征的接近程度将推定函数分配给它们。为了为分析提供统计严谨性,我们开发了一个管道来表征从基因组中随机挑选的序列。
我们正在将上述方法应用于许多研究项目。一个例子是确定逆转录病毒和逆转录病毒载体在逆转录病毒基因治疗过程中是否随机整合到宿主基因组中。我们与 NHGRI 的 Fabio Candottis 博士实验室合作,确定了接受逆转录病毒基因治疗试验的患者体内的整合位点。我们正在确定这些整合是否会破坏基因功能,从而影响患者的健康,以及基因治疗后几年整合位点的模式是否会发生变化。我们还与 NHLBI 的 Cynthia Dunbar 博士合作开展一个类似的项目。她的实验室正在对恒河猴(Macaca mulatta)进行逆转录病毒基因治疗,最终目标是改进人类逆转录病毒基因治疗技术。
与遗传学和分子生物学分支 (GMBB) 的 Julie Segres 博士实验室合作,利用基因组方法表征皮肤微生物。该项目涉及对皮肤常驻微生物的 16s 核糖体 RNA (rRNA) 亚基基因进行测序。我们的目标是 (1) 表征皮肤微生物多样性的基线; (2)分析特应性皮炎动物模型和人类患者微生物多样性的变化; (3) 通过 16S rRNA 采样来表征皮肤微生物群,以挑选适当的代表性物种进行全基因组鸟枪法测序,并最终进行宏基因组研究。我的小组编写了程序来准备 16s rRNA 序列进行分析,并正在使用现有的公共领域软件进行计算分析。
人类和其他基因组测序项目的完成也使得对基因结构进行全面分析成为可能。我们与 NHGRI 的 Lawrence Brody 博士一起探索外显子大小在蛋白质进化中的作用。我们正在扩展我们的初步分析,以计算表征来自一组代表性的全测序基因组(包括脊椎动物、黑腹果蝇、线虫和植物)的所有蛋白质编码外显子的长度和其他特性。我们的目标是更好地了解我们的基因组中如何存在大外显子、它们如何进化以及它们发挥什么作用(如果有的话)。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Andreas Baxevanis其他文献
Andreas Baxevanis的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Andreas Baxevanis', 18)}}的其他基金
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
8750737 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
10910770 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
Comparative Genomic Studies on the Evolution of Morphological Complexity
形态复杂性进化的比较基因组研究
- 批准号:
10691105 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
8350237 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
相似国自然基金
地表与大气层顶短波辐射多分量一体化遥感反演算法研究
- 批准号:42371342
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
高速铁路柔性列车运行图集成优化模型及对偶分解算法
- 批准号:72361020
- 批准年份:2023
- 资助金额:27 万元
- 项目类别:地区科学基金项目
随机密度泛函理论的算法设计和分析
- 批准号:12371431
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于全息交通数据的高速公路大型货车运行风险识别算法及主动干预方法研究
- 批准号:52372329
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
高效非完全信息对抗性团队博弈求解算法研究
- 批准号:62376073
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
相似海外基金
Dynamic neural coding of spectro-temporal sound features during free movement
自由运动时谱时声音特征的动态神经编码
- 批准号:
10656110 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Germline Genetic Modifiers of Radiation Response
辐射反应的种系遗传修饰剂
- 批准号:
10741022 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Small Molecule Therapeutics for Sickle Cell Anemia
镰状细胞性贫血的小分子疗法
- 批准号:
10601679 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
High-throughput Flow Culture of 3D Human PKD Models for Therapeutic Screening
用于治疗筛选的 3D 人体 PKD 模型的高通量流式培养
- 批准号:
10649222 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Time-resolved laser speckle contrast imaging of resting-state functional connectivity in neonatal brain
新生儿大脑静息态功能连接的时间分辨激光散斑对比成像
- 批准号:
10760193 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别: