POWRE: Combining Data Mining and Information Visualization Techniques with a Molecular Biology Sequence Similarity Database System
POWRE:将数据挖掘和信息可视化技术与分子生物学序列相似性数据库系统相结合
基本信息
- 批准号:9753283
- 负责人:
- 金额:$ 7.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:1998
- 资助国家:美国
- 起止时间:1998-01-01 至 1999-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The main objective of this project is to aid genome researchers with the task of elucidating patterns and clusters in large amounts of biological data. For genome researchers who are interested in comparing gene or protein sequences to the sequences within one genome or across genomes, this task involves executing hundreds of thousands of similarity searches that produce text output. This project involves the development of two specific software tools for visualizing and exploring the similarity data in a database of biological sequence similarity results. The first tool will be an Interactive Categorization Tool. This tool will display attributes of selected similarity database objects in a 2D scatterplot and enable dynamic manipulation of the display. This will enable the genome researcher to explore the attributes of similarities and categorize the similarities based on those attributes. For example, the genome researcher will be able to vary the input parameters of a function for computing the strength of each detected similarity and display a plot with the strength of each similarity shown as the color of each point, and the points situated in the 2D space based on score and statistical significance as the X and Y axes. The tool will enable genome researchers to dynamically manipulate the generation of higher- level concepts or categories for detected similarities (strong, marginal, and weak similarities as opposed to individual similarities with particular values of score and statistical significance that are more difficult to compare). This will lead to their ability to categorize hits as orthologous or paralogous, based on various attributes of the detected similarities. Score and p-value are not the only attributes that can be used -- the system is general enough that other attributes, such as percent identity, percent conserved, and length of alignment, among others, could be used in functions. Thus, genome researchers can cond uct exploration at different stages of the genome comparison research process. The second tool will be a Cluster Exploration Tool. Using the results from data mining techniques that cluster like sequences together, genome researchers will be able to visualize the similarities among the sequences in the clusters. For example, the tool can be used for a cluster of new unknown sequences that were found similar to members of a group of known sequences. The new sequences can be positioned as nodes on the left in a bipartite graph, and the known sequences that they are similar to can be positioned along the right. Lines drawn between the nodes, colored differently based on the strength of the hits, will enable the researcher to visualize the connectedness of the sequences in the cluster. Details about each sequence and each similarity in the cluster can be obtained from the DBMS. This will enable genome researchers to study groups of orthologous or parologous sequences. A key feature of these tools is that they will be 'thin' clients (often referred to as applets) that communicate with the underlying DBMS via queries formulated visually by the genome researchers. The use of Java- based components for these tools will enable them to be easily used and shared by the bioinformatics community and the genome research community. The development of these tools will demonstrate the feasibility of the thin-client approach that is the hallmark of the network computing architecture philosophy.
该项目的主要目标是帮助基因组研究人员阐明大量生物数据中的模式和聚类。 对于有兴趣将基因或蛋白质序列与一个基因组内或跨基因组的序列进行比较的基因组研究人员来说,这项任务涉及执行数十万次相似性搜索以产生文本输出。 该项目涉及开发两种特定的软件工具,用于可视化和探索生物序列相似性结果数据库中的相似性数据。 第一个工具是交互式分类工具。 该工具将在 2D 散点图中显示所选相似性数据库对象的属性,并启用显示的动态操作。这将使基因组研究人员能够探索相似性的属性,并根据这些属性对相似性进行分类。 例如,基因组研究人员将能够改变用于计算每个检测到的相似性强度的函数的输入参数,并显示一个图,其中每个相似性的强度显示为每个点的颜色,以及位于 2D 中的点基于分数和统计显着性的空间作为 X 和 Y 轴。 该工具将使基因组研究人员能够动态地操纵生成更高级别的概念或类别以检测相似性(强相似性、边际相似性和弱相似性,而不是具有更难以比较的特定得分值和统计显着性的个体相似性)。这将导致他们能够根据检测到的相似性的各种属性将命中分类为直系同源或旁系同源。得分和 p 值并不是唯一可以使用的属性 - 该系统足够通用,可以在函数中使用其他属性,例如同一性百分比、保守百分比和比对长度等。因此,基因组研究人员可以在基因组比较研究过程的不同阶段进行探索。 第二个工具是集群探索工具。利用将相似序列聚类在一起的数据挖掘技术的结果,基因组研究人员将能够可视化聚类中序列之间的相似性。例如,该工具可用于发现与一组已知序列的成员相似的新未知序列簇。新序列可以作为节点放置在二分图中的左侧,而与它们相似的已知序列可以沿着右侧放置。在节点之间绘制的线(根据命中的强度以不同的颜色着色)将使研究人员能够可视化簇中序列的连通性。有关簇中每个序列和每个相似性的详细信息可以从 DBMS 获得。这将使基因组研究人员能够研究直系同源或旁系同源序列组。 这些工具的一个关键特征是它们将是“瘦”客户端(通常称为小程序),通过基因组研究人员直观地制定的查询与底层 DBMS 进行通信。这些工具使用基于 Java 的组件将使生物信息学界和基因组研究界能够轻松使用和共享它们。 这些工具的开发将证明瘦客户端方法的可行性,这是网络计算架构哲学的标志。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Elizabeth Shoop其他文献
Elizabeth Shoop的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Elizabeth Shoop', 18)}}的其他基金
Collaborative Research: CS in Parallel: Scaling an Incremental Modular Approach to Injecting Parallel Computing Throughout CS Curricula
协作研究:并行计算机科学:扩展增量模块化方法以在整个计算机科学课程中注入并行计算
- 批准号:
1225796 - 财政年份:2012
- 资助金额:
$ 7.06万 - 项目类别:
Standard Grant
Collaborative Research: CCLI-Responding to manycore: A strategy for injecting parallel computing education throughout the computer science curriculum
合作研究:CCLI-响应众核:在整个计算机科学课程中注入并行计算教育的策略
- 批准号:
0941962 - 财政年份:2010
- 资助金额:
$ 7.06万 - 项目类别:
Standard Grant
Into the Community: Changing Perceptions and Increasing Participation in Computer Science
走进社区:改变观念并增加对计算机科学的参与
- 批准号:
0850106 - 财政年份:2009
- 资助金额:
$ 7.06万 - 项目类别:
Standard Grant
相似国自然基金
张量表达与深度学习结合的卫星叶绿素荧光数据重构
- 批准号:42371364
- 批准年份:2023
- 资助金额:47 万元
- 项目类别:面上项目
结合开放数据集的自监督小样本元学习目标检测
- 批准号:62306183
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维数据可视分析投影与采样相结合的理论与方法研究
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
结合领域知识的数据驱动的消费选择行为研究及零售决策优化
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
结合带权超图和空间转录组数据在单细胞水平预测肿瘤微环境中的邻分泌通讯的研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Combining Qualitative and Quantitative AI data for mobility
结合移动性的定性和定量人工智能数据
- 批准号:
10080158 - 财政年份:2023
- 资助金额:
$ 7.06万 - 项目类别:
Collaborative R&D
Combining job mobility patterns and vacancy data to better measure labour market opportunities and skill mismatch
结合工作流动模式和职位空缺数据,更好地衡量劳动力市场机会和技能不匹配
- 批准号:
ES/X011887/1 - 财政年份:2023
- 资助金额:
$ 7.06万 - 项目类别:
Research Grant
A multicenter study in bronchoscopy combining Stimulated Raman Histology with Artificial intelligence for rapid lung cancer detection - The ON-SITE study
支气管镜检查结合受激拉曼组织学与人工智能快速检测肺癌的多中心研究 - ON-SITE 研究
- 批准号:
10698382 - 财政年份:2023
- 资助金额:
$ 7.06万 - 项目类别:
Combining Molecular Simulations and Biophysical Methods to Characterize Conformational Dynamics of the HIV-1 Envelope Glycoprotein
结合分子模拟和生物物理方法来表征 HIV-1 包膜糖蛋白的构象动力学
- 批准号:
10749273 - 财政年份:2023
- 资助金额:
$ 7.06万 - 项目类别:
Computer-assisted diagnosis of ear pathologies by combining digital otoscopy with complementary data using machine learning
通过使用机器学习将数字耳镜与补充数据相结合来计算机辅助诊断耳部病变
- 批准号:
10564534 - 财政年份:2023
- 资助金额:
$ 7.06万 - 项目类别: