Genome analysis based on the integration of DNA sequence and shape
基于DNA序列和形状整合的基因组分析
基本信息
- 批准号:8632246
- 负责人:
- 金额:$ 33.43万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-02-01 至 2018-01-31
- 项目状态:已结题
- 来源:
- 关键词:AffectAffinityAlgorithmsBHLH ProteinBase PairingBase SequenceBenchmarkingBindingBinding SitesBiological ProcessChIP-on-chipChIP-seqCharacteristicsCommunitiesComputational algorithmDNADNA BindingDNA DatabasesDNA MethylationDNA SequenceDNA StructureDNA-Binding ProteinsDNase-I FootprintingDataData AnalysesDatabasesDeoxyribonuclease IDevelopmentDrosophila genusEmbryonic DevelopmentFamilyFunctional RNAGene Expression RegulationGenetic TranscriptionGenomeGenome ScanGenomicsGeometryGoalsGuanine + Cytosine CompositionHelix-Turn-Helix MotifsHumanHybridsHydroxyl RadicalIn VitroInternetLeadLengthLettersLinear RegressionsMachine LearningMalignant NeoplasmsMeasurementMeasuresMethodsMethylationMiningMinor GrooveModelingMolecular BiologyNMR SpectroscopyNucleotidesPilot ProjectsPlayProcessPropertyProtein BindingProtein FamilyProteinsPublishingQuantitative Trait LociRelative (related person)ResolutionRoleScanningSequence AnalysisShapesSignal TransductionSingle Nucleotide PolymorphismSiteSpecificityStructureSystemTechniquesTechnologyTestingTrainingValidationVariantWidthX-Ray CrystallographyYeastsbasedesignflexibilitygenetic evolutiongenome analysisgenome wide association studygenome-widehomeodomainhuman diseasein vivoinsightmembernovelnovel strategiespublic health relevanceresearch studythree dimensional structuretooltranscription factorvector
项目摘要
Title: Genome analysis based on the integration of DNA sequence and shape
PI: Rohs, Remo (USC); Co-I: Noble, William Stafford (UW); Co-I: Tullius, Thomas D. (BU)
PROJECT SUMMARY
Current techniques for genome analysis are mainly based on the one-dimensional DNA sequence, comprised
of the letters A, C, G, and T. However, proteins recognize DNA as a three-dimensional (3D) object. Nuances in
DNA shape at single nucleotide resolution play a crucial role in the binding specificity of transcription factors
(TFs), including those involved in embryonic development and human cancer. This project involves the
development of a battery of tools for genome analysis, through the integration of information derived from the
DNA sequence and the 3D structure of DNA, or "DNA shape". The basis for these novel tools is a high-
throughput (HT) method for the prediction of multiple features of local DNA shape at the genomic scale. Data
will be made available to the community in the UCSC Genome Browser track format through a web server
interface. These tools will enable users to analyze the shape of any number or length of DNA sequences,
including whole genomes and the effect of DNA methylation. HT shape predictions will be validated based on
X-ray crystallography, NMR spectroscopy, and hydroxyl radical cleavage data. Predictions will be combined
with ORChID, an ENCODE project that infers DNA minor groove geometry from hydroxyl radical cleavage
experiments. The HT method will be used to study how paralogous TFs select different target sites in vivo
despite sharing core-binding motifs or having similar binding properties in vitro. To study this question, we will
investigate the effect of flanking sequences on multiple structural features of TF binding sites (TFBSs). The
initial focus of this study will be homeodomains and basic helix-loop-helix (bHLH) TFs. Other protein families
will later be included and used to construct a comprehensive TFBS database that provides shape features for
binding motifs derived from JASPAR and other motif databases. Structural effects of single nucleotide
polymorphisms (SNPs) will also be analyzed. Some SNPs are associated with deleterious functions, whereas
others have no apparent effect. The HT shape prediction method will be used to predict the function of SNPs in
non-coding regions based on DNA shape. We will correlate quantitative effects of SNPs on DNA structure with
expression quantitative trait loci (eQTLs) and genome-wide association study (GWAS) signals, to develop a
predictive tool for the functional effect of SNPs. The HT shape prediction approach will be used to design DNA
sequences with different AT/GC contents but similar shapes. The relative contributions of sequence and shape
to binding will be tested with analytic models including multiple linear regression (MLR) and support vector
regression (SVR). For systems in which the integration of sequence and shape proves advantageous, novel
motif finding tools will be developed based on an extended alphabet that combines sequence with informative
structural features, selected by machine learning and feature selection approaches. Sequence+shape motifs
will be tested by motif scanning, compared to sequence-only motifs, and integrated into the MEME Suite. The
goal of this sequence-shape integration is to increase the accuracy of finding in vivo TFBSs in the genome.
标题:基于DNA序列和形状整合的基因组分析
PI:Rohs、Remo(南加州大学); Co-I:Noble,William Stafford(威斯康星大学); Co-I:Tullius, Thomas D.(BU)
项目概要
目前的基因组分析技术主要基于一维DNA序列,包括
字母 A、C、G 和 T 的组合。然而,蛋白质将 DNA 识别为三维 (3D) 物体。细微差别
单核苷酸分辨率下的 DNA 形状在转录因子的结合特异性中起着至关重要的作用
(TF),包括那些参与胚胎发育和人类癌症的。该项目涉及
通过整合来自基因组分析的信息,开发了一系列基因组分析工具
DNA 序列和 DNA 的 3D 结构,或“DNA 形状”。这些新颖工具的基础是高
通量(HT)方法,用于在基因组规模上预测局部 DNA 形状的多个特征。数据
将通过网络服务器以 UCSC 基因组浏览器轨道格式向社区提供
界面。这些工具将使用户能够分析任意数量或长度的 DNA 序列的形状,
包括整个基因组和 DNA 甲基化的影响。 HT 形状预测将基于以下内容进行验证
X 射线晶体学、NMR 光谱和羟基自由基裂解数据。预测将被合并
ORChID 是一个 ENCODE 项目,可从羟基自由基裂解推断 DNA 小沟几何形状
实验。 HT方法将用于研究旁系同源转录因子如何在体内选择不同的靶位点
尽管共享核心结合基序或在体外具有相似的结合特性。为了研究这个问题,我们将
研究侧翼序列对 TF 结合位点 (TFBS) 多个结构特征的影响。这
这项研究的最初重点是同源结构域和基本螺旋-环-螺旋 (bHLH) 转录因子。其他蛋白质家族
稍后将被纳入并用于构建一个全面的 TFBS 数据库,该数据库为
结合基序源自 JASPAR 和其他基序数据库。单核苷酸的结构效应
多态性(SNP)也将被分析。一些 SNP 与有害功能相关,而
其他没有明显效果。 HT形状预测方法将用于预测SNPs的功能
基于 DNA 形状的非编码区。我们将把 SNP 对 DNA 结构的定量影响与
表达数量性状位点(eQTL)和全基因组关联研究(GWAS)信号,以开发
SNP 功能效应的预测工具。 HT形状预测方法将用于设计DNA
AT/GC 含量不同但形状相似的序列。序列和形状的相对贡献
结合将使用分析模型进行测试,包括多元线性回归(MLR)和支持向量
回归(SVR)。对于序列和形状的集成被证明是有利的、新颖的系统
主题查找工具将基于扩展字母表开发,该字母表将序列与信息相结合
通过机器学习和特征选择方法选择的结构特征。序列+形状图案
将通过基序扫描进行测试,与纯序列基序进行比较,并集成到 MEME 套件中。这
这种序列形状整合的目标是提高在基因组中寻找体内 TFBS 的准确性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Remo Rohs其他文献
Remo Rohs的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Remo Rohs', 18)}}的其他基金
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10431863 - 财政年份:2019
- 资助金额:
$ 33.43万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10650775 - 财政年份:2019
- 资助金额:
$ 33.43万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10189652 - 财政年份:2019
- 资助金额:
$ 33.43万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
9975181 - 财政年份:2019
- 资助金额:
$ 33.43万 - 项目类别:
Genome analysis based on the integration of DNA sequence and shape
基于DNA序列和形状整合的基因组分析
- 批准号:
8795204 - 财政年份:2014
- 资助金额:
$ 33.43万 - 项目类别:
相似国自然基金
抗原非特异性B细胞进入生发中心并实现亲和力成熟的潜力与调控机制
- 批准号:32370941
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于计算生物学技术小分子农兽药残留物驼源单域抗体虚拟筛选与亲和力成熟 -以内蒙古阿拉善双峰驼为例
- 批准号:32360190
- 批准年份:2023
- 资助金额:34 万元
- 项目类别:地区科学基金项目
面向免疫疗法标志物识别的基于多特征融合的肽与MHC亲和力预测研究
- 批准号:62302277
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于胞内蛋白亲和力标记策略进行新型抗类风湿性关节炎的选择性OGG1小分子抑制剂的发现
- 批准号:82304698
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向多场景应用的药物-靶标结合亲和力预测研究
- 批准号:62371403
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
High-throughput thermodynamic and kinetic measurements for variant effects prediction in a major protein superfamily
用于预测主要蛋白质超家族变异效应的高通量热力学和动力学测量
- 批准号:
10752370 - 财政年份:2023
- 资助金额:
$ 33.43万 - 项目类别:
Small Molecule Therapeutics for Sickle Cell Anemia
镰状细胞性贫血的小分子疗法
- 批准号:
10601679 - 财政年份:2023
- 资助金额:
$ 33.43万 - 项目类别:
Quantifying proteins in plasma do democratize personalized medicine for patients with type 1 diabetes
量化血浆中的蛋白质确实使 1 型糖尿病患者的个性化医疗民主化
- 批准号:
10730284 - 财政年份:2023
- 资助金额:
$ 33.43万 - 项目类别:
Center for comprehensive proteogenomic data analysis
综合蛋白质组数据分析中心
- 批准号:
10440579 - 财政年份:2022
- 资助金额:
$ 33.43万 - 项目类别:
Center for comprehensive proteogenomic data analysis
综合蛋白质组数据分析中心
- 批准号:
10644013 - 财政年份:2022
- 资助金额:
$ 33.43万 - 项目类别: