Algorithmic approaches to systems biology, data integration, and evolution

系统生物学、数据集成和进化的算法方法

基本信息

  • 批准号:
    10268080
  • 负责人:
  • 金额:
    $ 138.52万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

My group continued to develop and apply computational methods that utilize and integrate large data sets to study gene regulation and diseases. We also develop methods to analyze data produced My group continued to develop and apply computational methods that utilize and integrate large data sets with a focus on gene regulation and diseases. We also developed new methods to analyze data produced by new, high throughput, technologies and experimental techniques such as single cell gene expression and HT-SELEX data. In our studies we use variety of algorithmic techniques including Integer Linear Programming (ILP) among other optimization strategies as well as Machine Learning approaches, including Hidden Markov Models and Deep Learning. Within this general area, the main focus of my group is on developing new computational methods allowing to utilize large cancer-related datasets (e.g. TCGA and ICGC) to obtain insists into etiology of cancer. Together with our experimental collaborators we also utilize new experimental data to obtain novel insights into fundamental biological processes. Much of the effort of the group during this reporting period has been devoted to studying of mutational patterns in cancer genomes. Specifically, through their lifetime, individuals acquire somatic mutations which might eventually led to cancer. These mutations often display characteristic patterns known as mutational signatures. Understanding relation between these patterns and their causes can provide important insights to into tumorigenesis in general and environmental contributions to cancer in particular. The two fundamental question in this area are (i) what is the best way to characterize these mutation patterns and (ii) leveraging such patterns of somatic mutations for understanding of mutagenic processes shaping human genome. One of the most challenging obstacles to a full characterization of mutational patterns comes from the fact that these patterns are the end-effect of several interplaying factors including carcinogenic exposures and potential deficiencies of the DNA repair mechanism. Separating these factors in nontrivial and thus the current methods typically do not attempt such separation assuming linear combination model. Yet, to fully understand the nature of each signature, it is important to disambiguate the atomic components that contribute to the final signature. As the first step in this direction we recently introduced a new descriptor of mutational signatures, DNA Repair FootPrint (RePrint) (1). Our work demonstrated, for the first time, that it is possible to identify signatures that include common DNA repair deficiency independent on the other mutagenic processes that contribute to the composite signature. We validated the method with published mutational signatures from cell lines targeted with CRISPR-Cas9-based knockouts of DNA repair genes. The second line of research related to mutational signatures is the identification of mutagenic processes underlying mutational signatures. To investigate the genetic aberrations associated with mutational signatures, we took a network-based approach considering mutational signatures as cancer phenotypes. Specifically, our analysis aimed to answer the following two complementary questions: (i) what are functional pathways whose gene expression activities correlate with the strengths of mutational signatures, and (ii) are there pathways whose genetic alterations might have led to specific mutational signatures? To identify mutated pathways, we adopted a recently developed optimization method based on integer linear programming. Analyzing a breast cancer dataset, we identified pathways associated with mutational signatures on both expression and mutation levels. Our analysis captured important differences in the etiology of the APOBEC-related signatures and the two clock-like signatures. In particular, it revealed that clustered and dispersed APOBEC mutations may be caused by different mutagenic processes. In addition, our analysis elucidated differences between two age-related signatures-one of the signatures is correlated with the expression of cell cycle genes while the other has no such correlation but shows patterns consistent with the exposure to environmental/external processes. This work investigated, for the first time, a network-level association of mutational signatures and dysregulated pathways. The identified pathways and subnetworks provide novel insights into mutagenic processes that the cancer genomes might have undergone and important clues for developing personalized drug therapies (2). In addition, we collaborated with Roded Sharans group from TAU, to provide a first probabilistic model of mutational signatures that accounts for context dependency and strand coordination (3). Finally, we started a research leveraging the concept mutational to study the relationship of smoking and expression ACE2 and other proteins known to be involved in the entrance the Coronavirus 2s (SARS-CoV-2) into the host cell. We also continued our research on methods to construct gene regulatory networks (GRNs). These networks describe regulatory relationships between transcription factors (TFs) and their target genes. Following the development of NetREX (Network Reprogramming using EXpression) technique to for constructing context-specific GRN given context-specific expression data and a context-agnostic prior network (reported last year), we developed NetREX-CF. The important novelty of NetREX-CF is the ability to deal with missing data. Specifically, NetREX-CF reconstruction approach that brings together a modern machine learning strategy (Collaborative Filtering model) and a biologically justified model of gene expression (sparse Network Component Analysis based model). The Collaborative Filtering (CF) is able to overcome the incompleteness of the prior knowledge and make edge recommends for building the GRN. Complementing CF, we use the sparse Network Component Analysis (NCA) to validate the recommended edges. Finally, we combine these two approaches using a novel data integration method and show that the new approach outperforms the currently leading GRN reconstruction methods. Our preliminary results show that this method drastically outperform previous approaches. This work has been selected for oral presentation RECOMB 2020 and the manuscript in preparation. My group also continues to develop software for public use including AptaBlocks Online -- a web-based toolkit for the In silico design of RNA complexes (4) and JUDY a flexible bioinformatics pipeline for diverse types of bioinformatics analysis (5). nWe also provided computational expertise and analysis of the specialized sequencing data, mRNA display, that our collaborators used for comparison of the performance of Linear, Monocyclic, and Bicyclic Libraries (6).
我的小组继续开发和应用利用和整合大型数据集来研究基因调节和疾病的计算方法。我们还开发了分析产生的数据的方法,我的小组继续开发和应用计算方法,该方法利用和集成了大型数据集,重点是基因调节和疾病。我们还开发了新的方法来分析由新的,高通量,技术和实验技术(例如单细胞基因表达和HT-SELEX数据)产生的数据。在我们的研究中,我们使用各种算法技术,包括整数线性编程(ILP)以及其他优化策略以及机器学习方法,包括隐藏的马尔可夫模型和深度学习。在这个一般区域内,我小组的主要重点是开发新的计算方法,从而利用与癌症相关的大型数据集(例如TCGA和ICGC)来确保对癌症的病因。与我们的实验合作者一起,我们还利用新的实验数据来获得对基本生物学过程的新见解。 在此报告期间,该小组的大部分努力都致力于研究癌症基因组突变模式。具体而言,在他们的一生中,个人获得了可能导致癌症的躯体突变。 这些突变通常显示出称为突变特征的特征模式。了解这些模式及其原因之间的关系可以为尤其是对癌症的一般肿瘤发生的重要见解。该领域的两个基本问题是(i)表征这些突变模式的最佳方法以及(ii)利用这种体细胞突变模式来理解塑造人类基因组的诱变过程。 完全表征突变模式的最具挑战性的障碍之一来自以下事实:这些模式是多种相互作用因素的终结效应,包括致癌性暴露和DNA修复机制的潜在缺陷。在非平凡中分离这些因素,因此当前方法通常不会尝试使用线性组合模型进行这种分离。 然而,要充分理解每个签名的性质,重要的是要放弃有助于最终签名的原子成分。作为朝这个方向的第一步,我们最近引入了一个新的突变标志描述符,即DNA修复足迹(Reprint)(1)。 我们的工作首次证明,可以识别包含常见DNA修复缺乏的特征,这些缺乏独立于其他有助于综合特征的诱变过程。 我们通过以基于CRISPR-CAS9的DNA修复基因敲除靶向的细胞系发表的突变特征来验证了该方法。 与突变特征有关的第二条研究是鉴定突变特征的诱变过程。 为了研究与突变特征相关的遗传像差,我们采用了一种基于网络的方法,将突变特征作为癌症表型。具体而言,我们的分析旨在回答以下两个互补问题:(i)哪些功能途径的基因表达活性与突变特征的强度相关,并且(ii)是否有途径的遗传改变可能导致特定的突变特征?为了识别突变的途径,我们采用了基于整数线性编程的最近开发的优化方法。分析乳腺癌数据集时,我们确定了与表达和突变水平上的突变特征相关的途径。我们的分析捕获了与APOBEC相关签名和两个时钟样特征的病因学上的重要差异。特别是,它揭示了聚类和分散的APOBEC突变可能是由不同的诱变过程引起的。此外,我们的分析阐明了两个与年龄相关的特征之间的差异 - 一个特征的一种与细胞周期基因的表达相关,而另一个标志没有这种相关性,但显示与暴露于环境/外部过程一致的模式。这项工作首次研究了突变特征和失调途径的网络级关联。确定的途径和子网提供了对诱变过程的新见解,癌症基因组可能经历了开发个性化药物疗法的重要线索(2)。此外,我们与Tau的Roded Sharans Group合作,提供了一个突变签名的第一个概率模型,该模型解释了上下文依赖性和链的协调(3)。 最后,我们开始了一项研究,利用该概念突变研究吸烟和表达ACE2和其他已知蛋白质的关系,该蛋白质与冠状病毒2s(SARS-COV-2)一起参与了宿主细胞。 我们还继续研究构建基因调节网络(GRN)的方法。这些网络描述了转录因子(TFS)及其目标基因之间的调节关系。在开发NetRex(使用表达式的网络重编程)中,以构建上下文特定的GRN给定特定于上下文的表达数据和上下文不合Snostic先验网络(去年报道),我们开发了NetRex-CF。 NetRex-CF的重要新颖性在于能够处理丢失的数据。 具体而言,NetRex-CF重建方法汇集了现代机器学习策略(协作过滤模型)和基因表达的生物学合理模型(基于稀疏网络组件分析的模型)。协作过滤(CF)能够克服先验知识的不完整,并为建立GRN提供优势。补充CF,我们使用稀疏网络组件分析(NCA)来验证推荐的边缘。最后,我们使用一种新的数据集成方法将这两种方法结合在一起,并表明新方法的表现优于当前领先的GRN重建方法。我们的初步结果表明,这种方法的表现远远超过了以前的方法。这项工作已被选为2020年的口头演示和手稿。 我的小组还继续开发用于公众使用的软件,包括在线适当的软件 - 一种基于网络的工具包,用于RNA综合体的计算机设计(4)和Judy A灵活的生物信息学管道,用于多种生物信息学分析(5)。 NWE还提供了计算专业知识和专业测序数据的分析, mRNA显示,我们的合作者用于比较线性,单核和双环库的性能(6)。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Teresa Przytycka其他文献

Teresa Przytycka的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Teresa Przytycka', 18)}}的其他基金

Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    8943247
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    8558125
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
  • 批准号:
    10927048
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    7969252
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    8344970
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
  • 批准号:
    9555743
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
  • 批准号:
    10018681
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    8149615
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
  • 批准号:
    7735092
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
  • 批准号:
    10688922
  • 财政年份:
  • 资助金额:
    $ 138.52万
  • 项目类别:

相似国自然基金

采用新型视觉-电刺激配对范式长期、特异性改变成年期动物视觉系统功能可塑性
  • 批准号:
    32371047
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
破解老年人数字鸿沟:老年人采用数字技术的决策过程、客观障碍和应对策略
  • 批准号:
    72303205
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
通过抑制流体运动和采用双能谱方法来改进烧蚀速率测量的研究
  • 批准号:
    12305261
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
采用多种稀疏自注意力机制的Transformer隧道衬砌裂缝检测方法研究
  • 批准号:
    62301339
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
政策激励、信息传递与农户屋顶光伏技术采用提升机制研究
  • 批准号:
    72304103
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
  • 批准号:
    10735564
  • 财政年份:
    2023
  • 资助金额:
    $ 138.52万
  • 项目类别:
Charge-Based Brain Modeling Engine with Boundary Element Fast Multipole Method
采用边界元快速多极子法的基于电荷的脑建模引擎
  • 批准号:
    10735946
  • 财政年份:
    2023
  • 资助金额:
    $ 138.52万
  • 项目类别:
Accelerating genomic analysis for time critical clinical applications
加速时间紧迫的临床应用的基因组分析
  • 批准号:
    10593480
  • 财政年份:
    2023
  • 资助金额:
    $ 138.52万
  • 项目类别:
A visualization interface for BRAIN single cell data, integrating transcriptomics, epigenomics and spatial assays
BRAIN 单细胞数据的可视化界面,集成转录组学、表观基因组学和空间分析
  • 批准号:
    10643313
  • 财政年份:
    2023
  • 资助金额:
    $ 138.52万
  • 项目类别:
METEOR-Data Synthesis and Transfer (METEOR-DST)
METEOR-数据合成和传输 (METEOR-DST)
  • 批准号:
    10715025
  • 财政年份:
    2023
  • 资助金额:
    $ 138.52万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了