Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
基本信息
- 批准号:10018681
- 负责人:
- 金额:$ 142.61万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AddressAlgorithmic SoftwareAlgorithmsAreaAttentionBenchmarkingBindingBiologicalBiologyCancer EtiologyCellsCommunitiesComplexComputer SimulationComputing MethodologiesDNADNA BindingDNA Double Strand BreakDNA MaintenanceDNA RepairDNA SequenceDataData AnalysesData SetDependenceDetectionDiseaseDrosophila genusElementsEmerging TechnologiesEscherichia coliEvolutionExplosionExposure toGene DosageGene ExpressionGene Expression RegulationGenesGeneticGenomeGenomicsGenotypeGoldGraphGrowthGrowth Factor GeneGuidelinesIn VitroJointsLanguageLinear ProgrammingMachine LearningMalignant NeoplasmsMathematicsMeasuresMethodsModelingMolecularMutationNoiseOncogenesPathway AnalysisPathway interactionsPatientsPatternPhenotypePlayPopulationProbabilityProcessPropertyRNARegulator GenesRegulatory ElementResearchRoleShapesSignal TransductionSingle base substitutionSomatic MutationStructural GenesStructureSystemSystems BiologyTechniquesTechnologyThe Cancer Genome Atlasbiological systemscancer genomecarcinogenicitycohesioncomputerized toolsdata integrationdeep learningexperimental studygene functiongraph theoryhigh throughput technologyhuman diseaseimprovedin vivoinsightmalignant breast neoplasmmarkov modelpersonalized medicineprotein protein interactionresponsesexsingle cell analysistheoriestooltranscription factortumorigenesis
项目摘要
My group continued to develop and apply computational methods that utilize and integrate large data sets with a focus on gene regulation and diseases. We also developed new methods to analyze data produced by new high throughput technologies and experimental techniques such as single cell gene expression and HT-SELEX data. In our studies we use variety of algorithmic techniques including Integer Linear Programming (ILP) among other optimization strategies as well as Machine Learning approaches, including Hidden Markov Models and Deep Learning.
Large data sets provide important window on human diseases (1). Within this general area, the main focus of my group is on developing new computational methods allowing to utilize large cancer-related datasets (e.g. TCGA and ICGC) to obtain insists into etiology of cancer. Following our previous studies on uncovering of cancer drivers and pathways, we shifted our attention towards uncovering and studying of mutational signatures inferred from properties of passenger mutations. Specifically, in addition to the mutations that confer a growth advantage, cancer genomes accumulate a large number of somatic mutations resulting from normal DNA damage and repair processes as well as carcinogenic exposures or cancer related aberrations of DNA maintenance machinery. Knowing the activity of the mutational processes shaping a cancer genome may provide insight into tumorigenesis and personalized therapy. It is thus important to characterize the signatures of active mutational processes in patients from their patterns of single base substitutions. However, mutational processes do not act uniformly on the genome, leading to statistical dependencies among neighboring mutations. To account for such dependencies, we developed the first sequence-dependent model, SigMa, for mutation signatures. We applied SigMa to characterize genomic and other factors that influence the activity of mutation signatures in breast cancer (2).
We continued our research on methods to construct gene regulatory networks (GRNs). These networks describe regulatory relationships between transcription factors (TFs) and their target genes. Computational methods to infer GRNs typically combine evidence across different conditions to infer context-agnostic networks. In contrast, we developed a method, Network Reprogramming using EXpression (NetREX), that constructs a context-specific GRN given context-specific expression data and a context-agnostic prior network. NetREX remodels the prior network to obtain the topology that provides the best explanation for expression data. Because NetREX utilizes prior network topology, we also develop PriorBoost, a method that evaluates a prior network in terms of its consistency with the expression data. We validated NetREX and PriorBoost using the "gold standard" E. coli GRN from the DREAM5 network inference challenge and apply them to construct sex-specific Drosophila GRNs. We utilized NetREX to construct sex-specific Drosophila GRNs that, on all applied measures, outperformed networks obtained from other methods indicating that NetREX is an important milestone toward building more accurate GRNs (3).
Related to gene regulation, we also studied the principles of DNA binding by transcription factors (TFs). Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However, the following compelling question was yet to be answered: in the absence of any sequence similarity to the binding motif, can DNA shape still increase binding probability? To address this challenge, we developed Co-SELECT, a computational approach to analyze the results of in vitro HT-SELEX experiments for TF-DNA binding. Specifically, Co-SELECT leverages the presence of motif-free sequences in late HT-SELEX rounds and their enrichment in weak binders allows Co-SELECT to detect an evidence for the role of DNA shape features in TF binding. Our approach revealed that, indeed, even in the absence of the sequence motif, TFs have propensity to bind to DNA molecules of the shape consistent with the motif specific binding. This provided the first direct evidence that shape features that accompany the preferred sequence motifs also bestow an advantage for sequence non-specific binding (4).
We also continue to develop methods for analysis of data produced by emerging technologies. Given the explosion of single-cell gene expression data, we focused on developing new computational tools for analyzing this data. In particular, the identification of subpopulations of cells in single-cell experiments, and the comparison of such subpopulations across experiments are among the most frequently performed analysis of single-cell experiments. This important task was still awaiting a fully satisfying computational solution. To address this need, we introduced a computational method, single-cell subpopulations comparison (scPopCorn). Leveraging the information from all input datasets, scPopCorn performs these two tasks simultaneously by optimizing a joint objective function. The optimization involves a measure of cohesiveness of a cell population, which combined with Google's personalized PageRank approach, guides subpopulation detection, while a measure of cell-to-cell similarity is used to guide the mapping. scPopCorn not only outperforms currently used approaches but also introduced mathematical concepts that can serve as stepping stones to improve other tools (5).
We also provided computational expertise and analysis of the specialized sequencing data developed by our collaborators (6,7) for in vivo probing of quadruplex structures (6) and double strand DNA breaks (7) .
Finally, we participated in the community DREAM challenge to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. This community challenge established biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology (8).
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Teresa Przytycka其他文献
Teresa Przytycka的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Teresa Przytycka', 18)}}的其他基金
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
8943247 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
8558125 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
- 批准号:
10927048 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
7969252 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
8344970 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
- 批准号:
9555743 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
7735092 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Combinatorial and graph theoretical approach to systems biology and mol. evo.
系统生物学和分子生物学的组合和图论方法。
- 批准号:
8149615 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
- 批准号:
10688922 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
Algorithmic approaches to systems biology, data integration, and evolution
系统生物学、数据集成和进化的算法方法
- 批准号:
10268080 - 财政年份:
- 资助金额:
$ 142.61万 - 项目类别:
相似海外基金
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 142.61万 - 项目类别:
An acquisition and analysis pipeline for integrating MRI and neuropathology in TBI-related dementia and VCID
用于将 MRI 和神经病理学整合到 TBI 相关痴呆和 VCID 中的采集和分析流程
- 批准号:
10810913 - 财政年份:2023
- 资助金额:
$ 142.61万 - 项目类别:
Wearable Wireless Respiratory Monitoring System that Detects and Predicts Opioid Induced Respiratory Depression
可穿戴无线呼吸监测系统,可检测和预测阿片类药物引起的呼吸抑制
- 批准号:
10784983 - 财政年份:2023
- 资助金额:
$ 142.61万 - 项目类别:
Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa
利用基于人工智能/机器学习的技术克服非洲结直肠癌诊断和预测的专业培训和技术障碍
- 批准号:
10712793 - 财政年份:2023
- 资助金额:
$ 142.61万 - 项目类别:
A visualization interface for BRAIN single cell data, integrating transcriptomics, epigenomics and spatial assays
BRAIN 单细胞数据的可视化界面,集成转录组学、表观基因组学和空间分析
- 批准号:
10643313 - 财政年份:2023
- 资助金额:
$ 142.61万 - 项目类别: