POWRE: Combining Data Mining and Information Visualization Techniques with a Molecular Biology Sequence Similarity Database System

POWRE:将数据挖掘和信息可视化技术与分子生物学序列相似性数据库系统相结合

基本信息

  • 批准号:
    9753283
  • 负责人:
  • 金额:
    $ 7.06万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    1998
  • 资助国家:
    美国
  • 起止时间:
    1998-01-01 至 1999-12-31
  • 项目状态:
    已结题

项目摘要

The main objective of this project is to aid genome researchers with the task of elucidating patterns and clusters in large amounts of biological data. For genome researchers who are interested in comparing gene or protein sequences to the sequences within one genome or across genomes, this task involves executing hundreds of thousands of similarity searches that produce text output. This project involves the development of two specific software tools for visualizing and exploring the similarity data in a database of biological sequence similarity results. The first tool will be an Interactive Categorization Tool. This tool will display attributes of selected similarity database objects in a 2D scatterplot and enable dynamic manipulation of the display. This will enable the genome researcher to explore the attributes of similarities and categorize the similarities based on those attributes. For example, the genome researcher will be able to vary the input parameters of a function for computing the strength of each detected similarity and display a plot with the strength of each similarity shown as the color of each point, and the points situated in the 2D space based on score and statistical significance as the X and Y axes. The tool will enable genome researchers to dynamically manipulate the generation of higher- level concepts or categories for detected similarities (strong, marginal, and weak similarities as opposed to individual similarities with particular values of score and statistical significance that are more difficult to compare). This will lead to their ability to categorize hits as orthologous or paralogous, based on various attributes of the detected similarities. Score and p-value are not the only attributes that can be used -- the system is general enough that other attributes, such as percent identity, percent conserved, and length of alignment, among others, could be used in functions. Thus, genome researchers can cond uct exploration at different stages of the genome comparison research process. The second tool will be a Cluster Exploration Tool. Using the results from data mining techniques that cluster like sequences together, genome researchers will be able to visualize the similarities among the sequences in the clusters. For example, the tool can be used for a cluster of new unknown sequences that were found similar to members of a group of known sequences. The new sequences can be positioned as nodes on the left in a bipartite graph, and the known sequences that they are similar to can be positioned along the right. Lines drawn between the nodes, colored differently based on the strength of the hits, will enable the researcher to visualize the connectedness of the sequences in the cluster. Details about each sequence and each similarity in the cluster can be obtained from the DBMS. This will enable genome researchers to study groups of orthologous or parologous sequences. A key feature of these tools is that they will be 'thin' clients (often referred to as applets) that communicate with the underlying DBMS via queries formulated visually by the genome researchers. The use of Java- based components for these tools will enable them to be easily used and shared by the bioinformatics community and the genome research community. The development of these tools will demonstrate the feasibility of the thin-client approach that is the hallmark of the network computing architecture philosophy.
该项目的主要目的是帮助基因组研究人员阐明大量生物学数据的模式和簇的任务。 对于有兴趣将基因或蛋白质序列与一个基因组内或跨基因组内的序列进行比较的基因组研究人员,此任务涉及执行数十万个产生文本输出的相似性搜索。 该项目涉及开发两个特定的软件工具,用于可视化和探索生物序列相似性结果数据库中的相似性数据。 第一个工具将是一种交互式分类工具。 该工具将在2D散点图中显示所选相似性数据库对象的属性,并启用显示器的动态操作。这将使基因组研究人员能够探索相似性的属性,并根据这些属性对相似性进行分类。 例如,基因组研究人员将能够改变一个函数的输入参数,以计算每个检测到的相似性的强度,并显示具有每个点所显示的每个点的强度的图,并根据分数和统计意义在2D空间中显示的每个相似性,并以x和y轴为统计学意义。 该工具将使基因组研究人员能够动态操纵检测到的相似性(强,边际和弱相似性,而不是具有特定分数值和统计意义的个人相似性,更难比较)。这将导致他们基于检测到的相似性的各种属性将命中分类为直系同源或寄生虫的能力。得分和p值并不是唯一可以使用的属性 - 系统足够通用,以至于其他属性(例如身份百分比,保守百分比和对齐方式)可以用于函数中。因此,基因组研究人员可以在基因组比较研究过程的不同阶段进行探索。 第二个工具将是集群探索工具。使用类似序列聚集在一起的数据挖掘技术的结果,基因组研究人员将能够可视化簇中序列之间的相似性。例如,该工具可用于与一组已知序列的成员相似的新的未知序列集群。新序列可以将其定位在两部分图中的左侧节点,并且它们与之相似的已知序列可以沿右侧放置。节点之间绘制的线,根据命中的强度有所不同,将使研究人员能够可视化群集中序列的连接性。可以从DBMS获得有关每个序列和群集中每个相似性的详细信息。这将使基因组研究人员能够研究直系同源或差距序列的群体。 这些工具的一个关键特征是,它们将是“薄”客户(通常称为applet),它们通过基因组研究人员在视觉上与基础DBMS进行通信。将基于Java的组件用于这些工具将使生物信息学界和基因组研究社区轻松使用和共享它们。 这些工具的开发将证明是网络计算体系结构理念的标志的薄凝位方法的可行性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Elizabeth Shoop其他文献

Hands-on parallel & distributed computing with Raspberry Pi devices and clusters
  • DOI:
    10.1016/j.jpdc.2024.104996
  • 发表时间:
    2025-02-01
  • 期刊:
  • 影响因子:
  • 作者:
    Elizabeth Shoop;Suzanne J. Matthews;Richard Brown;Joel C. Adams
  • 通讯作者:
    Joel C. Adams

Elizabeth Shoop的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Elizabeth Shoop', 18)}}的其他基金

Collaborative Research: CS in Parallel: Scaling an Incremental Modular Approach to Injecting Parallel Computing Throughout CS Curricula
协作研究:并行计算机科学:扩展增量模块化方法以在整个计算机科学课程中注入并行计算
  • 批准号:
    1225796
  • 财政年份:
    2012
  • 资助金额:
    $ 7.06万
  • 项目类别:
    Standard Grant
Collaborative Research: CCLI-Responding to manycore: A strategy for injecting parallel computing education throughout the computer science curriculum
合作研究:CCLI-响应众核:在整个计算机科学课程中注入并行计算教育的策略
  • 批准号:
    0941962
  • 财政年份:
    2010
  • 资助金额:
    $ 7.06万
  • 项目类别:
    Standard Grant
Into the Community: Changing Perceptions and Increasing Participation in Computer Science
走进社区:改变观念并增加对计算机科学的参与
  • 批准号:
    0850106
  • 财政年份:
    2009
  • 资助金额:
    $ 7.06万
  • 项目类别:
    Standard Grant

相似国自然基金

结合数据特性分析的真实场景行人再识别技术研究
  • 批准号:
    62301346
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
张量表达与深度学习结合的卫星叶绿素荧光数据重构
  • 批准号:
    42371364
  • 批准年份:
    2023
  • 资助金额:
    47 万元
  • 项目类别:
    面上项目
多模态重症大数据上结合临床决策过程的可解释表征学习算法研究
  • 批准号:
    62302413
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
结合开放数据集的自监督小样本元学习目标检测
  • 批准号:
    62306183
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于数据结合的长期因果效应推断与决策
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Combining Qualitative and Quantitative AI data for mobility
结合移动性的定性和定量人工智能数据
  • 批准号:
    10080158
  • 财政年份:
    2023
  • 资助金额:
    $ 7.06万
  • 项目类别:
    Collaborative R&D
Combining job mobility patterns and vacancy data to better measure labour market opportunities and skill mismatch
结合工作流动模式和职位空缺数据,更好地衡量劳动力市场机会和技能不匹配
  • 批准号:
    ES/X011887/1
  • 财政年份:
    2023
  • 资助金额:
    $ 7.06万
  • 项目类别:
    Research Grant
A multicenter study in bronchoscopy combining Stimulated Raman Histology with Artificial intelligence for rapid lung cancer detection - The ON-SITE study
支气管镜检查结合受激拉曼组织学与人工智能快速检测肺癌的多中心研究 - ON-SITE 研究
  • 批准号:
    10698382
  • 财政年份:
    2023
  • 资助金额:
    $ 7.06万
  • 项目类别:
Combining Molecular Simulations and Biophysical Methods to Characterize Conformational Dynamics of the HIV-1 Envelope Glycoprotein
结合分子模拟和生物物理方法来表征 HIV-1 包膜糖蛋白的构象动力学
  • 批准号:
    10749273
  • 财政年份:
    2023
  • 资助金额:
    $ 7.06万
  • 项目类别:
Computer-assisted diagnosis of ear pathologies by combining digital otoscopy with complementary data using machine learning
通过使用机器学习将数字耳镜与补充数据相结合来计算机辅助诊断耳部病变
  • 批准号:
    10564534
  • 财政年份:
    2023
  • 资助金额:
    $ 7.06万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了