Mining the Sequence of the Human Genome for Important Sequence Features
挖掘人类基因组序列以获取重要序列特征
基本信息
- 批准号:7594316
- 负责人:
- 金额:$ 25.21万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AffectAlgorithmsAnimal ModelAtopic DermatitisBase SequenceCaenorhabditis elegansCodeCollaborationsComputer AnalysisComputer softwareComputersDataData AnalysesEvolutionExonsGene StructureGenesGeneticGenomeGenomicsGoalsHealthHumanHuman GenomeIndividualLengthLocationMacaca mulattaMapsMetagenomicsMethodologyMethodsMicrobeMiningMolecular BiologyNucleotidesNumbersPatientsPatternPlantsPlayProcessPropertyProteinsPublic DomainsRNA, Ribosomal, 16SResearch PersonnelResearch Project GrantsRetroviral VectorRetroviridaeRoleSamplingSequence AnalysisSiteSkinTechniquesTimeVertebratesWhole-Genome Shotgun SequencingWritingcomputerized toolsgene functiongene therapygenome sequencingimprovedmicrobialnovelprogramsresearch studysizetechnology development
项目摘要
The last few years have seen a dramatic increase in the number of publicly available complete genome sequences and annotations. At the same time, researchers have been taking advantage of technology developments that allow individual labs to efficiently perform experiments that generate tens of thousands of data points. This massive increase in data means that some lab projects are no longer tractable by individual biologists but, rather, require large-scale data analysis capabilities best handled by a computer programmer. The projects described here focuses on developing methodologies to integrate sequence, annotation, and experimentally generated data so that bench biologists can quickly and easily obtain results for their large-scale experiments.
The goal of one such project is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. The first step in the process is to align each sequence to the reference genome assembly to determine its genomic location. Existing programs suffice for most sequences, but we have developed a novel set of algorithms to map short sequences of less than 25 nucleotides. These programs can map tens of thousands of sequences in only a few minutes, even allowing for mismatches. The second step of the process is to compare the coordinates of the sequences to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome.
We are applying the above methods to a number of research projects. One example is to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Fabio Candottis lab at NHGRI, we have determined the integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patients health, as well as whether the pattern of integration sites changes in the years post gene therapy. We are also collaborating on a similar project with Dr. Cynthia Dunbar of NHLBI. Her lab is pursuing retroviral gene therapy in rhesus macaques (Macaca mulatta), with the eventual goal of improving techniques for retroviral gene therapy in humans.
In collaboration with Dr. Julie Segres lab in the Genetics and Molecular Biology Branch (GMBB) to characterize skin microbes using genomic methods. This project involves sequencing the gene for the 16s ribosomal RNA (rRNA) subunit from resident microbes of the skin. Our objectives are to (1) characterize the baseline microbial diversity of the skin; (2) analyze the changes in microbial diversity of an animal model and human patients with atopic dermatitis; and (3) characterize the skin microflora by 16S rRNA sampling to pick appropriate representative species for whole-genome shotgun sequencing and, ultimately, metagenomic studies. My group has written programs to prepare the 16s rRNA sequence for analysis, and is collaborating in the computational analysis using existing public domain software.
The completion of the human and other genome sequencing projects also makes it possible to perform comprehensive analyses on gene structure. With Dr. Lawrence Brody of NHGRI, we are exploring the role of exon size in protein evolution. We are expanding our initial analysis to computationally characterize the lengths and other properties of all protein-coding exons from a representative set of fully-sequenced genomes, including vertebrates, D. melanogaster, C. elegans, and plants. Our goal is to gain a greater understanding of how large exons came to exist in our genomes, how they evolve, and what role, if any, that they play.
在过去的几年中,公开可用的完整基因组序列和注释的数量急剧增加。同时,研究人员一直在利用技术发展,使各个实验室能够有效执行产生数万个数据点的实验。数据的大量增加意味着某些实验室项目不再由个别生物学家处理,而是需要计算机程序员最好处理的大规模数据分析功能。此处描述的项目着重于开发方法,以整合序列,注释和实验生成的数据,以便基准生物学家可以快速轻松地获得其大规模实验的结果。
一个这样一个项目的目的是利用公开可用的序列集和注释来开发自动化工具,用于实验鉴定的基因组序列的计算表征。该过程的第一步是将每个序列与参考基因组组装对齐,以确定其基因组位置。现有程序足以满足大多数序列,但是我们已经开发了一组新型算法来绘制小于25个核苷酸的短序列。这些程序只能在几分钟内绘制数万个序列,甚至允许失配。该过程的第二步是将序列的坐标与多种基因组注释的坐标进行比较。使用这种方法,我们可以根据其靠近已知序列特征将推定的函数分配给实验识别的序列。为了提供分析的统计严格,我们开发了一条管道来表征从基因组中随机选择的序列。
我们将上述方法应用于许多研究项目。一个例子是确定在逆转录病毒基因治疗过程中,逆转录病毒和逆转录病毒载体是否会随机整合到宿主基因组中。在NHGRI的Fabio Candottis实验室博士的情况下,我们确定了在逆转录病毒基因治疗试验中接受治疗的患者中的整合位点。我们正在确定这些整合是否会破坏基因功能,从而影响患者的健康状况,以及基因治疗后几年的整合位点的模式是否发生变化。我们还与NHLBI的Cynthia Dunbar博士合作。她的实验室正在鼠尾草(Macaca Mulatta)进行逆转录病毒基因疗法,最终是改善人类逆转录病毒基因疗法的技术。
通过与遗传学和分子生物学分支(GMBB)合作,使用基因组方法来表征皮肤微生物。该项目涉及对皮肤居民微生物的16S核糖体RNA(RRNA)亚基进行测序。我们的目标是(1)表征皮肤的基线微生物多样性; (2)分析动物模型的微生物多样性和特应性皮炎的人类多样性的变化; (3)通过16S rRNA采样来表征皮肤菌群,以选择适当的代表性物种进行全基因组shot弹枪测序,并最终是元基因组学研究。我的小组已经编写了计划,以准备16S rRNA序列进行分析,并正在使用现有的公共域软件进行计算分析。
人类和其他基因组测序项目的完成还可以对基因结构进行全面分析。在NHGRI的Lawrence Brody博士中,我们正在探索外显子大小在蛋白质进化中的作用。我们正在将初始分析扩展到计算上,以表征来自代表性的全序列基因组的所有蛋白质编码外显子的长度和其他特性,包括脊椎动物,D。Melanogaster,C。Elegrans和植物。我们的目标是对我们的基因组中的大外显子的存在,它们的发展方式以及它们扮演的角色(如果有的话)有了更大的了解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Andreas Baxevanis其他文献
Andreas Baxevanis的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Andreas Baxevanis', 18)}}的其他基金
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
8750737 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
10910770 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
Comparative Genomic Studies on the Evolution of Morphological Complexity
形态复杂性进化的比较基因组研究
- 批准号:
10691105 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
NHGRI/DIR Bioinformatics and Scientific Programming Core
NHGRI/DIR 生物信息学和科学编程核心
- 批准号:
8350237 - 财政年份:
- 资助金额:
$ 25.21万 - 项目类别:
相似国自然基金
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于物理信息神经网络的电磁场快速算法研究
- 批准号:52377005
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
- 批准号:12302257
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维不平衡数据的分类集成算法研究
- 批准号:62306119
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Dynamic neural coding of spectro-temporal sound features during free movement
自由运动时谱时声音特征的动态神经编码
- 批准号:
10656110 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Germline Genetic Modifiers of Radiation Response
辐射反应的种系遗传修饰剂
- 批准号:
10741022 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Small Molecule Therapeutics for Sickle Cell Anemia
镰状细胞性贫血的小分子疗法
- 批准号:
10601679 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
High-throughput Flow Culture of 3D Human PKD Models for Therapeutic Screening
用于治疗筛选的 3D 人体 PKD 模型的高通量流式培养
- 批准号:
10649222 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别:
Time-resolved laser speckle contrast imaging of resting-state functional connectivity in neonatal brain
新生儿大脑静息态功能连接的时间分辨激光散斑对比成像
- 批准号:
10760193 - 财政年份:2023
- 资助金额:
$ 25.21万 - 项目类别: