Turning Data into Whole Cell Ontology Models for Functional Analysis

将数据转化为全细胞本体模型以进行功能分析

基本信息

批准号：
8951600
负责人：
Michael Harris Kramer
金额：
$ 3.93万
依托单位：
UNIVERSITY OF CALIFORNIA, SAN DIEGO
依托单位国家：
美国
项目类别：
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2017-08-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8951600
关键词：
Algorithms Antineoplastic Agents Bioinformatics Biological Biological Process Biology Cell Death Cell model Cell physiology Cells Cluster Analysis Code Collection Coupled DNA Repair Data Data Analyses Data Quality Data Set Disease Drug Targeting Future Gene Cluster Gene Proteins Generic Drugs Genes Genome Goals Hand Human Individual Intervention Knock-out Learning Lighting Machine Learning Malignant Neoplasms Manuals Methods Modeling Molecular Mutate Mutation Ontology Organism Pharmaceutical Preparations Phosphotransferases Processed Genes Proteins Research Research Personnel Ribosomes Saccharomyces cerevisiae Subgroup System Tissues Update Work Yeasts base biological information processing cancer cell cell type computerized tools design experimental analysis functional group gene function genome-wide improved killings novel public health relevance research study synthetic biology tool

项目摘要

DESCRIPTION (provided by applicant): A holy grail of bioinformatics is the creation of whole-cell models with the ability to enhance human understanding and facilitate discovery. To this end, a successful and widely-used effort is the Gene Ontology (GO), a massive project to manually annotate genes into terms describing molecular functions, biological processes and cellular components and provide relationships between terms, e.g. capturing that "small ribosomal subunit" and "large ribosomal subunit" come together to make "ribosome". GO is widely used to understand the function of a gene or group of genes. Unfortunately, GO is limited by the effort required to create and update it by hand. It exists only for well-studied organisms and even then in only one, generic form per organism with limited overall genome coverage and a bias towards well-studied genes and functions. It is not possible to learn about an uncharacterized gene or discover a new function using GO, and one cannot quickly assemble an ontology model for a new organism, let alone a specific cell-type or disease-state. This proposed research will change this state of affairs. Already, work has shown that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to computationally infer an ontology whose coverage and power are equivalent to those of the manually-curated GO Cellular Component ontology. Still, this first attempt was limited in the types of experimental data used and its ability to infer the more generally useful Biological Process ontology. Here machine learning approaches will be applied to integrate many types of experimental data into ontology model construction and analyze the type of biological information provided by each experiment, revealing those experiments most informative for capturing Biological Process information. Furthermore, the high-throughput experimental data to ontology paradigm explored here will be used to develop a computational tool to highlight novel types of hypotheses that are inaccessible by current high-throughput experimental data analysis methods. Preliminary work has shown GO to be useful for prediction of synthetic lethal pairs of genes, i.e. genes that are individually non-essential but when knocked out together cause cell death. Given the high mutation rate in cancer, these pairs provide potential cancer drug targets, as a drug may target a gene product which is now essential in the mutated cancer cells but not other cells, thereby killing only cancer cells. Because data-driven ontologies are not as hindered by issues with bias and coverage and are specifically designed to capture only functional relationships, this proposal will explore the idea that data-driven ontologies will be better suited to help predict synthetic lethal pairs than GO. To this end, algorithms will be developed to construct a data-driven ontology of yeast DNA repair and use this ontology to predict synthetic lethal pairs of genes. Overall, this proposal will develop the computational and experimental roadmap to construct a whole-cell model of gene function - an ontology - and use the model to discover useful biology - synthetic lethal pairs.

描述（由申请人提供）：生物信息学的一个圣杯是创建能够增强人类理解和促进发现的全细胞模型。为此，一项成功且广泛使用的工作是基因本体论（GO），这是一个大型项目，用于将基因手动注释为描述分子功能、生物过程和细胞成分的术语，并提供术语之间的关系，例如捕获“小核糖体亚基”和“大核糖体亚基”结合在一起形成“核糖体”。 GO 广泛用于理解一个基因或一组基因的功能。不幸的是，GO 受到手动创建和更新所需工作量的限制。它只存在于经过充分研究的生物体中，即使如此，每种生物体也只有一种通用形式，其整体基因组覆盖范围有限，并且偏向于经过充分研究的基因和功能。使用 GO 不可能了解未表征的基因或发现新功能，也无法快速组装新生物体的本体模型，更不用说特定的细胞类型或疾病状态了。这项拟议的研究将改变这种状况。研究已经表明，酿酒酵母中基因和蛋白质相互作用的大型网络可用于通过计算推断出一个本体，其覆盖范围和能力与手动策划的 GO 细胞成分本体相当。尽管如此，第一次尝试在所使用的实验数据类型及其推断更普遍有用的生物过程本体论的能力方面受到限制。这里将应用机器学习方法将多种类型的实验数据整合到本体模型构建中，并分析每个实验提供的生物信息类型，揭示那些最能捕获生物过程信息的实验。此外，这里探索的高通量实验数据本体范式将用于开发一种计算工具，以突出当前高通量实验数据分析方法无法访问的新型假设。初步研究表明，GO 可用于预测合成致死基因对，即单独非必需的基因，但一起敲除时会导致细胞死亡。鉴于癌症中的高突变率，这些对提供了潜在的癌症药物靶标，因为药物可能针对突变癌细胞而不是其他细胞所必需的基因产物，从而仅杀死癌细胞。由于数据驱动本体不会受到偏差和覆盖率问题的阻碍，并且专门设计用于仅捕获功能关系，因此该提案将探讨数据驱动本体比 GO 更适合帮助预测合成致死对的想法。为此，将开发算法来构建数据驱动的酵母 DNA 修复本体，并使用该本体来预测合成致死基因对。总体而言，该提案将开发计算和实验路线图，以构建基因功能的全细胞模型（本体论），并使用该模型发现有用的生物学——合成致死对。