III:Small:Algorithms for Tandem Repeat Variant Discovery Using Next Generation Sequencing Data
III:Small:使用下一代测序数据发现串联重复变异的算法
基本信息
- 批准号:1017621
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-08-15 至 2015-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A tandem repeat (TR) is any pattern of nucleotides which occurs as repeating, consecutive copies along a DNA molecule. Often, the pattern copies are not identical. A TR can be polymorphic, that is, it can be different across individuals in a population: 1) the number of copies may be different, 2) the arrangement of non-identical copies may be dfferent, and 3) the copies may contain different small mutations. TR variants are known to affect important biological processes, such as chromatin structure, gene plasticity, gene expression, and disease states, so their discovery is crucial for correctly understanding complex bio-molecular interactions. While a conservative estimate suggests that 100,000 human TRs may be polymorphic, until recently, genome-wide study of TR polymorphism, in humans and other organisms, has been too difficult and costly, with the result that the true extent of polymorphism and its effects are unknown. New genome sequencing technologies offer the first real opportunity to fill in the details of TR diversity. These technologies sequence millions of high quality, short DNA fragments in a singleexperiment. Current sequencing projects are producing many billions of reads rich in TR variant information. Yet, current read mapping algorithms,which attempt to assign each read to its proper location on the reference genome, are not designed to detect TR variants. This project has three central goals: 1. Algorithm Development; 2.Genome Studies; 3. Variation Curation in a public database. Strategies will be developed to accurately and efficiently map TR-containing reads to reference genome TR loci. Anticipated algorithmic developments include: 1) Optimization of tree-based alignment, for use when millions of short, disjoint sequences must be aligned to each other. The reads and references can each be merged into separate Patricia tree data structures and alignment computed between tree nodes, thereby eliminating redundant computation in the prefixes of the two sequence sets. 2) Production of space-saving, Burrows Wheeler transforms (BWT) of the most redundant tree parts by employing approximate shortest common superstrings (SCS) for the two sequence sets. 3) Development of an efficient Four-Russians style block computation for edit distance alignment in the trees by exploiting redundancy inherent in the small alphabet and block input scores, 4) Development of a bounding computation for edit-distance based on efficient, bit-register computation of longest common subsequence (LCS) alignment, and 5) Parallelization of all algorithms for further efficiency with multi-core processors, Single Instruction, Multiple Data (SIMD) bit-register computations, and highly parallel graphics processing units (GPUs). Data from six recently published whole human genomes, two human centenarian genomes, and the 1000 genomes project will be analyzed to discover TR variants. An internet-accessible, public database and analysis platform for curation and display of TR variants will be developed.The TR variant discovery software and all data sets produced will directly enhance the infrastructure for TR diversity research in genome biology, genome evolution, and comparative genomics. The software and data will be freely available to the research community through a high capacity website maintained in the PI's lab at Boston University. The PI will participate in a variety of activities that link research and education and support participation by members of underrepresented groups, including provision of opportunities in research for graduate and undergraduate students, participation in high school enrichment and curriculum development projects, and editorship of an international journal engaged in the dissemination of bioinformatics research.
串联重复(TR)是沿DNA分子的重复连续拷贝发生的任何模式。通常,图案副本不完全相同。 TR可以是多态性的,也就是说,在人群中的个体之间可能有所不同:1)副本的数量可能不同,2)非相同副本的排列可能是少数的,3)副本可能包含不同的小突变。已知TR变体会影响重要的生物学过程,例如染色质结构,基因可塑性,基因表达和疾病状态,因此它们的发现对于正确理解复杂的复杂生物分子相互作用至关重要。虽然保守的估计表明,有100,000人可能是多态性的,但直到最近,在人类和其他生物中,全基因组对三多态性的研究都太困难且昂贵,结果是多态性及其影响的真实程度尚不清楚。新的基因组测序技术为填写TR多样性的细节提供了第一个真正的机会。这些技术在单个经验中序列数百万个高质量的短DNA碎片。当前的测序项目正在产生数十亿个富含TR变体信息的读取。然而,当前的读取映射算法试图将每个读取为参考基因组上的适当位置,但并非旨在检测TR变体。该项目有三个中心目标:1。算法开发; 2.基因组研究; 3。公共数据库中的变化策划。将制定策略以准确有效地绘制含三键读数以参考基因组TR基因座。预期的算法开发包括:1)优化基于树木的对齐方式,以供数百万个短,不相交的序列相互对齐时使用。可以将读取和参考分别合并为单独的Patricia树数据结构和在树节点之间计算的对齐,从而消除了两个序列集的前缀中的冗余计算。 2)生产节省空间的,洞穴旋转器转换(BWT)是最冗余的树零件,通过对两个序列集使用大约最短的常见超级弦(SC)。 3)开发有效的四俄语风格的限制计算,通过利用小字母和块输入分数固有固有的冗余,4)开发基于有效的,比特的计算来开发编辑距离的界限计算,以对最长的常见分子(LCS)的均值(LCS)和5)的单身效率,以取得所有Algorith的效率,以取得所有al Algorith的效率。数据(SIMD)位注册计算和高度并行图形处理单元(GPU)。 将分析来自六个最近发表的整个人类基因组的数据,两个人类百年基因组和1000个基因组项目将进行分析以发现TR变体。将开发一个可互联网,公共数据库和分析平台,用于策展和展示TR变体。TR变体发现软件和所有生产的数据集将直接增强基因组生物学,基因组进化和比较基因组学中TR多样性研究的基础架构。该软件和数据将通过波士顿大学PI实验室中的高容量网站免费提供给研究社区。 PI将参与各种活动,这些活动将研究和教育和支持群体成员的参与联系起来,包括为研究生和本科生提供研究机会,参与高中富集和课程发展项目的参与以及从事生物信息构成研究的国际期刊的编辑。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Gary Benson其他文献
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
用于 DNA 同源性检测的间隔种子统计量的精确分布
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Gary Benson;Denise Y. F. Mak - 通讯作者:
Denise Y. F. Mak
An Alphabet Independent Approach to Two-Dimensional Pattern Matching
一种与字母无关的二维模式匹配方法
- DOI:
10.1137/s0097539792226321 - 发表时间:
1994 - 期刊:
- 影响因子:0
- 作者:
A. Amir;Gary Benson;Martín Farach - 通讯作者:
Martín Farach
Minimal entropy probability paths between genome families
基因组家族之间的最小熵概率路径
- DOI:
- 发表时间:
2004 - 期刊:
- 影响因子:1.9
- 作者:
C. Ahlbrandt;Gary Benson;W. Casey - 通讯作者:
W. Casey
Evaluating distance functions for clustering tandem repeats.
评估聚类串联重复的距离函数。
- DOI:
- 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
Suyog Rao;Alfredo Rodriguez;Gary Benson - 通讯作者:
Gary Benson
Efficient two-dimensional compressed matching
高效的二维压缩匹配
- DOI:
10.1109/dcc.1992.227453 - 发表时间:
1992 - 期刊:
- 影响因子:0
- 作者:
A. Amir;Gary Benson - 通讯作者:
Gary Benson
Gary Benson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Gary Benson', 18)}}的其他基金
REU Site: Bioinformatics Research and Interdisciplinary Training Experience in Analysis and Interpretation of Information-Rich Biological Data Sets (REU-BRITE)
REU网站:信息丰富的生物数据集分析和解释的生物信息学研究和跨学科培训经验(REU-BRITE)
- 批准号:
1949968 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
REU Site: Bioinformatics Research and Interdisciplinary Training Experience in Analysis and Interpretation of Information-Rich Biological Data Sets (REU-BRITE)
REU网站:信息丰富的生物数据集分析和解释的生物信息学研究和跨学科培训经验(REU-BRITE)
- 批准号:
1559829 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Bit-Parallel Algorithms for Sequence Alignment and Applications in Detecting Human Genetic Variation and Bacterial Strain Typing
III:小:序列比对的位并行算法及其在检测人类遗传变异和细菌菌株分型中的应用
- 批准号:
1423022 - 财政年份:2014
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
IGERT: Integrating Computational Science into Research in Biological Networks
IGERT:将计算科学融入生物网络研究
- 批准号:
0654108 - 财政年份:2007
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
SEI(BIO): DNA Inverted Repeats: Sensitive Detection Methods and Research Database
SEI(BIO):DNA 反向重复:灵敏检测方法和研究数据库
- 批准号:
0612153 - 财政年份:2006
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Composition Patterns in Nucleotide Sequences
核苷酸序列的组成模式
- 批准号:
0413463 - 财政年份:2003
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
TRDB: A Multi-genome Database of Tandem Repeats
TRDB:串联重复的多基因组数据库
- 批准号:
0413462 - 财政年份:2003
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
TRDB: A Multi-genome Database of Tandem Repeats
TRDB:串联重复的多基因组数据库
- 批准号:
0090789 - 财政年份:2001
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Composition Patterns in Nucleotide Sequences
核苷酸序列的组成模式
- 批准号:
0073081 - 财政年份:2000
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CAREER: Tandem Repeats: Sequence Comparison and Search Algorithms
职业:串联重复:序列比较和搜索算法
- 批准号:
9623532 - 财政年份:1996
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似国自然基金
员工算法规避行为的内涵结构、量表开发及多层次影响机制:基于大(小)数据研究方法整合视角
- 批准号:72372021
- 批准年份:2023
- 资助金额:40 万元
- 项目类别:面上项目
基于球面约束和小波框架正则化的磁共振图像处理变分模型与快速算法
- 批准号:12301545
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于谱图小波变换算法的2型糖尿病肠道微生物组学网络标志物筛选研究
- 批准号:82204161
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
基于谱图小波变换算法的2型糖尿病肠道微生物组学网络标志物筛选研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
用于非小细胞肺癌免疫疗效预测的复合传感模式电子鼻构建及智能算法研究
- 批准号:62176220
- 批准年份:2021
- 资助金额:57.00 万元
- 项目类别:面上项目
相似海外基金
Statistical Methods and Validation Analyses for the Integration of External Data in Clinical Trials
临床试验中外部数据整合的统计方法和验证分析
- 批准号:
10589150 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Statistical Methods and Validation Analyses for the Integration of External Data in Clinical Trials
临床试验中外部数据整合的统计方法和验证分析
- 批准号:
10386822 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
III: Small: Stochastic Algorithms for Large Scale Data Analysis
III:小型:大规模数据分析的随机算法
- 批准号:
2131335 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
III: Small: Collaborative Research: Algorithms, systems, and theories for exploiting data dependencies in crowdsourcing
III:小型:协作研究:在众包中利用数据依赖性的算法、系统和理论
- 批准号:
2007941 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant