Genome analysis based on the integration of DNA sequence and shape
基于DNA序列和形状整合的基因组分析
基本信息
- 批准号:8795204
- 负责人:
- 金额:$ 30.47万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-02-01 至 2018-01-31
- 项目状态:已结题
- 来源:
- 关键词:AffectAffinityAlgorithmsBHLH ProteinBase PairingBase SequenceBenchmarkingBindingBinding SitesBiological ProcessChIP-on-chipChIP-seqCharacteristicsCommunitiesComputational algorithmDNADNA BindingDNA DatabasesDNA MethylationDNA SequenceDNA StructureDNA-Binding ProteinsDNase-I FootprintingDataData AnalysesDatabasesDeoxyribonuclease IDevelopmentDrosophila genusEmbryonic DevelopmentFamilyGene Expression RegulationGenetic TranscriptionGenomeGenome ScanGenomicsGeometryGoalsGuanine + Cytosine CompositionHealthHelix-Turn-Helix MotifsHumanHybridsHydroxyl RadicalIn VitroInternetLeadLengthLettersLinear RegressionsMachine LearningMalignant NeoplasmsMeasurementMeasuresMethodsMethylationMiningMinor GrooveModelingMolecular BiologyNMR SpectroscopyNucleotidesPilot ProjectsPlayProcessPropertyProtein BindingProtein FamilyProteinsPublishingQuantitative Trait LociRelative (related person)ResolutionRoleScanningSequence AnalysisShapesSignal TransductionSingle Nucleotide PolymorphismSiteSpecificityStructureSystemTechniquesTechnologyTestingTrainingUntranslated RNAValidationVariantWidthX-Ray CrystallographyYeastsbasedesignflexibilitygenetic evolutiongenome analysisgenome wide association studygenome-widehomeodomainhuman diseasein vivoinsightmembernovelnovel strategiesresearch studythree dimensional structuretooltranscription factorvector
项目摘要
DESCRIPTION (provided by applicant): Current techniques for genome analysis are mainly based on the one-dimensional DNA sequence, comprised of the letters A, C, G, and T. However, proteins recognize DNA as a three-dimensional (3D) object. Nuances in DNA shape at single nucleotide resolution play a crucial role in the binding specificity of transcription facors (TFs), including those involved in embryonic development and human cancer. This project involves the development of a battery of tools for genome analysis, through the integration of information derived from the DNA sequence and the 3D structure of DNA, or "DNA shape". The basis for these novel tools is a high- throughput (HT) method for the prediction of multiple features of local DNA shape at the genomic scale. Data will be made available to the community in the UCSC Genome Browser track format through a web server interface. These tools will enable users to analyze the shape of any number or length of DNA sequences, including whole genomes and the effect of DNA methylation. HT shape predictions will be validated based on X-ray crystallography, NMR spectroscopy, and hydroxyl radical cleavage data. Predictions will be combined with ORChID, an ENCODE project that infers DNA minor groove geometry from hydroxyl radical cleavage experiments. The HT method will be used to study how paralogous TFs select different target sites in vivo despite sharing core-binding motifs or having similar binding properties in vitro. To study this question, we will investigate the effect of flanking sequences on multiple structural features of TF binding sites (TFBSs). The initial focus of this study will be homeodomains and basic helix-loop-helix (bHLH) TFs. Other protein families will later be included and used to construct a comprehensive TFBS database that provides shape features for binding motifs derived from JASPAR and other motif databases. Structural effects of single nucleotide polymorphisms (SNPs) will also be analyzed. Some SNPs are associated with deleterious functions, whereas others have no apparent effect. The HT shape prediction method will be used to predict the function of SNPs in non-coding regions based on DNA shape. We will correlate quantitative effects of SNPs on DNA structure with expression quantitative trait loci (eQTLs) and genome-wide association study (GWAS) signals, to develop a predictive tool for the functional effect of SNPs. The HT shape prediction approach will be used to design DNA sequences with different AT/GC contents but similar shapes. The relative contributions of sequence and shape to binding will be tested with analytic models including multiple linear regression (MLR) and support vector regression (SVR). For systems in which the integration of sequence and shape proves advantageous, novel motif finding tools will be developed based on an extended alphabet that combines sequence with informative structural features, selected by machine learning and feature selection approaches. Sequence+shape motifs will be tested by motif scanning, compared to sequence-only motifs, and integrated into the MEME Suite. The goal of this sequence-shape integration is to increase the accuracy of finding in vivo TFBSs in the genome.
描述(由申请人提供):用于基因组分析的当前技术主要基于一维DNA序列,该序列由字母A,C,G和T组成。但是,蛋白质识别DNA是三维(3D)对象。单核苷酸分辨率的DNA形状的细微差别在转录设备(TFS)的结合特异性中起着至关重要的作用,包括参与胚胎发育和人类癌症的那些。该项目涉及开发一系列用于基因组分析的工具,这是通过源自DNA序列和DNA的3D结构或“ DNA形状”的3D结构的整合。这些新工具的基础是一种高通量(HT)方法,用于预测基因组量表的局部DNA形状的多个特征。数据将通过Web服务器接口以UCSC基因组浏览器轨道格式提供给社区。这些工具将使用户能够分析任何数量或长度的DNA序列的形状,包括整个基因组和DNA甲基化的效果。 HT形状预测将根据X射线晶体学,NMR光谱和羟基自由基裂解数据验证。预测将与兰花(Orchid)相结合,兰花(Orchid)是一个编码项目,该项目从羟基自由基裂解实验中渗透了DNA小凹槽几何形状。 HT方法将用于研究寄生虫TF如何在体内选择不同的目标位点,尽管有核心结合基序或体外具有相似的结合特性。为了研究这个问题,我们将研究侧翼序列对TF结合位点多种结构特征(TFBS)的影响。这项研究的最初重点是同源域和基本的螺旋 - 环螺旋(BHLH)TFS。后来将包括其他蛋白质家族,并用于构建一个综合的TFBS数据库,该数据库为从Jaspar和其他基序数据库得出的结合基序提供形状特征。单核苷酸多态性(SNP)的结构效应也将进行分析。一些SNP与有害功能有关,而另一些SNP没有明显的影响。 HT形状预测方法将用于根据DNA形状预测非编码区域中SNP的功能。我们将将SNP对DNA结构的定量效应与表达定量性状基因座(EQTL)和全基因组关联研究(GWAS)信号相关联,以开发一种预测性工具,以实现SNP的功能效应。 HT形状预测方法将用于设计具有不同AT/GC含量但类似形状的DNA序列。序列和形状对结合的相对贡献将通过包括多个线性回归(MLR)和支持矢量回归(SVR)在内的分析模型进行测试。对于序列和形状集成被证明是有利的系统,将根据扩展的字母结合序列与信息的结构特征(通过机器学习和特征选择方法选择)开发出新的基线查找工具。与仅序列基序相比,将通过基序扫描来测试序列+形状基序,并集成到模因套件中。这种序列整合的目的是提高基因组中的体内TFBS的准确性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Remo Rohs其他文献
Remo Rohs的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Remo Rohs', 18)}}的其他基金
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10431863 - 财政年份:2019
- 资助金额:
$ 30.47万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10650775 - 财政年份:2019
- 资助金额:
$ 30.47万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
10189652 - 财政年份:2019
- 资助金额:
$ 30.47万 - 项目类别:
Quantitative Modeling of Transcription Factor-DNA Binding
转录因子-DNA 结合的定量建模
- 批准号:
9975181 - 财政年份:2019
- 资助金额:
$ 30.47万 - 项目类别:
Genome analysis based on the integration of DNA sequence and shape
基于DNA序列和形状整合的基因组分析
- 批准号:
8632246 - 财政年份:2014
- 资助金额:
$ 30.47万 - 项目类别:
相似国自然基金
基于计算生物学技术小分子农兽药残留物驼源单域抗体虚拟筛选与亲和力成熟 -以内蒙古阿拉善双峰驼为例
- 批准号:32360190
- 批准年份:2023
- 资助金额:34 万元
- 项目类别:地区科学基金项目
基于胞内蛋白亲和力标记策略进行新型抗类风湿性关节炎的选择性OGG1小分子抑制剂的发现
- 批准号:82304698
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于多尺度表征和跨模态语义匹配的药物-靶标结合亲和力预测方法研究
- 批准号:62302456
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
框架核酸多价人工抗体增强靶细胞亲和力用于耐药性肿瘤治疗
- 批准号:32301185
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
抗原非特异性B细胞进入生发中心并实现亲和力成熟的潜力与调控机制
- 批准号:32370941
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Small Molecule Therapeutics for Sickle Cell Anemia
镰状细胞性贫血的小分子疗法
- 批准号:
10601679 - 财政年份:2023
- 资助金额:
$ 30.47万 - 项目类别:
High-throughput thermodynamic and kinetic measurements for variant effects prediction in a major protein superfamily
用于预测主要蛋白质超家族变异效应的高通量热力学和动力学测量
- 批准号:
10752370 - 财政年份:2023
- 资助金额:
$ 30.47万 - 项目类别:
Quantifying proteins in plasma do democratize personalized medicine for patients with type 1 diabetes
量化血浆中的蛋白质确实使 1 型糖尿病患者的个性化医疗民主化
- 批准号:
10730284 - 财政年份:2023
- 资助金额:
$ 30.47万 - 项目类别:
Center for comprehensive proteogenomic data analysis
综合蛋白质组数据分析中心
- 批准号:
10440579 - 财政年份:2022
- 资助金额:
$ 30.47万 - 项目类别:
Center for comprehensive proteogenomic data analysis
综合蛋白质组数据分析中心
- 批准号:
10644013 - 财政年份:2022
- 资助金额:
$ 30.47万 - 项目类别: