Composition Patterns in Nucleotide Sequences
核苷酸序列的组成模式
基本信息
- 批准号:0413463
- 负责人:
- 金额:$ 8.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2003
- 资助国家:美国
- 起止时间:2003-09-01 至 2004-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
PI: Gary BensonProposal Number: 0073081Institution: Mt. Sinai School of MedicineProject SummaryThis project is an investigation of computational problems that arise for a new type of discrete pattern in DNA sequences, the composition pattern.Composition is a vector quantity describing the frequency of occurrence of each alphabet letter in a particular string. Let S be a string over E.Then, C (S) =(f1; f2; pH j _ j) is the composition of S, wherefore 2 _, fi is the fraction of the characters in S that are i. A composition pattern is a string P = r1r2 _ _ _ rp, where RI represents a composition region i.e. a substring of homogenous composition which differs from that of its surrounding regions. Note that the order of letters in RI is irrelevant, as it has no effect on the composition of RI. To date, algorithms which characterize DNA functional sites have concentrated primarily on identifying what this proposal terms position-specific patterns, such as the consensus sequence or themore flexible, but less specific weight matrix based pattern profile. Unfortunately, position-specific patterns are usually not selective enough to distinguish actual occurrences of a feature from false positives. Too often, when these patterns are used to search for unknown matches, one to several orders of magnitude more false positives than true positives are obtained. The composition pattern is a new approach which embraces an important physical property, the potential for variation in structural conformation (shape) of the DNA double helix, yet does so in the context of a type of discrete pattern which has apparently not been previously explored by the algorithmic community. DNA crystallization studies support the idea that certain dinucleotides base-steps confer specific types of flexibility. Further evidence is provided by studies of intrinsically curved and `kinkable' DNA. Based on these observations, it is suggested here that for conformational flexibility, the order of nucleotides in a sequence may be less important than the effect, which certain nucleotide or dinucleotide base-step biases impart on the sequence as a whole. In support of this assertion is an accumulating body of evidence of important DNA features whose unifying characteristic is composition bias rather than position-specific information. This research project encompasses algorithm development for three related problem areas which form the basis for understanding the functional importance of composition variation in nucleotide sequences and the detection of composition patterns. These areas are:Pattern matching. A composition pattern and sequence are given. Find all occurrences of the pattern in the sequence. Occurrences may be exact or approximate.Pattern detection. A sequence or set of sequences is given. Find all recurring composition patterns. Occurrences may be exact or approximate. The patterns are not specified or only partially specifiedSequence segmentation. A sequence is given. Partition it into statistically distinct regions of homogenous composition. These problems have theoretical interest in their own right, independent of biology and have received almost no attention from the algorithmic community.
PI:Gary BensonPropopals编号:0073081INCTITITION:Sinai Medicine Projective摘要摘要摘要项目是对DNA序列中一种新型离散模式出现的计算问题的调查,组成模式中的新型离散模式,POSTIONIT.COMPOSITION.COMPOSITION.COMPOSITION.COMETION.POSTION.COMPOSITION.com posector posector dectortity是描述每个Alphabet strings interphabet strings interphabet strings interforce nation的矢量定量。令s为e。然后c(s)=(f1; f2; ph j _ j)是s的组成,因此2 _,fi是s中字符的比例。组成模式是字符串p = r1r2 _ _ _ _ rp,其中RI代表一个组成区域,即同质组成的子字符串与周围区域不同。请注意,RI中的字母顺序无关紧要,因为它对RI的组成没有影响。迄今为止,表征DNA功能位点的算法主要集中于确定该建议项特定于位置特定模式的内容,例如共识序列或themore柔韧性,但基于较少特定的重量矩阵模式曲线。不幸的是,特定于位置的模式通常不够选择性,无法将功能的实际发生与假阳性区分开。通常,当这些模式被用于搜索未知匹配时,获得的误报比真实的阳性要高几个数量级。组成模式是一种新的方法,它具有重要的物理特性,这是DNA双螺旋的结构构象(形状)变化的潜力,但在某种类型的离散模式的背景下,算法社区显然没有探索过这种离散模式。 DNA结晶研究支持了某些二核苷酸碱基赋予特定类型的柔韧性的观念。进一步的证据是通过对本质弯曲和“可连锁” DNA的研究提供的。基于这些观察结果,这里建议为构象柔韧性,序列中的核苷酸的顺序可能不如效应,而某些核苷酸或二核苷酸基碱基阶段的偏置会赋予整个序列。支持这一主张是重要的DNA特征的积累体系,其统一特征是组成偏见而不是特定于位置的信息。该研究项目涵盖了三个相关问题领域的算法开发,这些算法是理解核苷酸序列中组成变化的功能重要性和组成模式检测的基础。这些区域是:模式匹配。给出了组成模式和序列。在序列中查找模式的所有发生。发生可能是准确的或近似的。给出了序列或一组序列。找到所有重复的组成模式。发生可能是准确或近似的。这些模式未指定或仅部分指定的序列分割。给出一个序列。将其划分为同质组成的统计上不同区域。 这些问题本身具有理论上的兴趣,独立于生物学,几乎没有受到算法社区的关注。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Gary Benson其他文献
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
用于 DNA 同源性检测的间隔种子统计量的精确分布
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Gary Benson;Denise Y. F. Mak - 通讯作者:
Denise Y. F. Mak
An Alphabet Independent Approach to Two-Dimensional Pattern Matching
一种与字母无关的二维模式匹配方法
- DOI:
10.1137/s0097539792226321 - 发表时间:
1994 - 期刊:
- 影响因子:0
- 作者:
A. Amir;Gary Benson;Martín Farach - 通讯作者:
Martín Farach
Minimal entropy probability paths between genome families
基因组家族之间的最小熵概率路径
- DOI:
- 发表时间:
2004 - 期刊:
- 影响因子:1.9
- 作者:
C. Ahlbrandt;Gary Benson;W. Casey - 通讯作者:
W. Casey
Evaluating distance functions for clustering tandem repeats.
评估聚类串联重复的距离函数。
- DOI:
- 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
Suyog Rao;Alfredo Rodriguez;Gary Benson - 通讯作者:
Gary Benson
Efficient two-dimensional compressed matching
高效的二维压缩匹配
- DOI:
10.1109/dcc.1992.227453 - 发表时间:
1992 - 期刊:
- 影响因子:0
- 作者:
A. Amir;Gary Benson - 通讯作者:
Gary Benson
Gary Benson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Gary Benson', 18)}}的其他基金
REU Site: Bioinformatics Research and Interdisciplinary Training Experience in Analysis and Interpretation of Information-Rich Biological Data Sets (REU-BRITE)
REU网站:信息丰富的生物数据集分析和解释的生物信息学研究和跨学科培训经验(REU-BRITE)
- 批准号:
1949968 - 财政年份:2020
- 资助金额:
$ 8.66万 - 项目类别:
Standard Grant
REU Site: Bioinformatics Research and Interdisciplinary Training Experience in Analysis and Interpretation of Information-Rich Biological Data Sets (REU-BRITE)
REU网站:信息丰富的生物数据集分析和解释的生物信息学研究和跨学科培训经验(REU-BRITE)
- 批准号:
1559829 - 财政年份:2016
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
III: Small: Bit-Parallel Algorithms for Sequence Alignment and Applications in Detecting Human Genetic Variation and Bacterial Strain Typing
III:小:序列比对的位并行算法及其在检测人类遗传变异和细菌菌株分型中的应用
- 批准号:
1423022 - 财政年份:2014
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
III:Small:Algorithms for Tandem Repeat Variant Discovery Using Next Generation Sequencing Data
III:Small:使用下一代测序数据发现串联重复变异的算法
- 批准号:
1017621 - 财政年份:2010
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
IGERT: Integrating Computational Science into Research in Biological Networks
IGERT:将计算科学融入生物网络研究
- 批准号:
0654108 - 财政年份:2007
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
SEI(BIO): DNA Inverted Repeats: Sensitive Detection Methods and Research Database
SEI(BIO):DNA 反向重复:灵敏检测方法和研究数据库
- 批准号:
0612153 - 财政年份:2006
- 资助金额:
$ 8.66万 - 项目类别:
Standard Grant
TRDB: A Multi-genome Database of Tandem Repeats
TRDB:串联重复的多基因组数据库
- 批准号:
0413462 - 财政年份:2003
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
TRDB: A Multi-genome Database of Tandem Repeats
TRDB:串联重复的多基因组数据库
- 批准号:
0090789 - 财政年份:2001
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
Composition Patterns in Nucleotide Sequences
核苷酸序列的组成模式
- 批准号:
0073081 - 财政年份:2000
- 资助金额:
$ 8.66万 - 项目类别:
Standard Grant
CAREER: Tandem Repeats: Sequence Comparison and Search Algorithms
职业:串联重复:序列比较和搜索算法
- 批准号:
9623532 - 财政年份:1996
- 资助金额:
$ 8.66万 - 项目类别:
Continuing Grant
相似国自然基金
拟南芥核苷酸转移酶HESO1、URT1协同参与mRNA降解代谢的机理研究
- 批准号:31900396
- 批准年份:2019
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
香港牡蛎cGAS/STING抗病免疫信号通路的鉴定和功能研究
- 批准号:31902404
- 批准年份:2019
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
双向调控模式识别受体(PRRs)逆转肿瘤化疗抵抗的药物分子设计
- 批准号:81803358
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
核苷酸单糖转运蛋白ROCK1/EBS8在拟南芥内质网蛋白质质量控制中的作用机制研究
- 批准号:31600996
- 批准年份:2016
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
NOD2在狼疮性肾炎发病机制中的作用和潜在信号通路的研究
- 批准号:81671617
- 批准年份:2016
- 资助金额:57.0 万元
- 项目类别:面上项目
相似海外基金
Genetics of early childhood obesity and its clinical implications
儿童早期肥胖的遗传学及其临床意义
- 批准号:
10445160 - 财政年份:2016
- 资助金额:
$ 8.66万 - 项目类别:
Genetics of early childhood obesity and its clinical implications
儿童早期肥胖的遗传学及其临床意义
- 批准号:
9757772 - 财政年份:2016
- 资助金额:
$ 8.66万 - 项目类别:
Genetics of early childhood obesity and its clinical implications
儿童早期肥胖的遗传学及其临床意义
- 批准号:
10013209 - 财政年份:2016
- 资助金额:
$ 8.66万 - 项目类别:
High-resolution map of human germline mutation patterns and inference of mutagenic mechanisms
人类种系突变模式的高分辨率图谱和诱变机制的推断
- 批准号:
9083570 - 财政年份:2016
- 资助金额:
$ 8.66万 - 项目类别:
Do Genotype Patterns Predict Weight Loss Success for Low Carb vs. Low Fat Diets?
基因型模式是否可以预测低碳水化合物与低脂肪饮食的减肥成功?
- 批准号:
8723168 - 财政年份:2012
- 资助金额:
$ 8.66万 - 项目类别: