Measuring functional similarity between transcriptional enhancers using deep learning
使用深度学习测量转录增强子之间的功能相似性
基本信息
- 批准号:10302539
- 负责人:
- 金额:$ 36.79万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-01 至 2024-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
PROJECT SUMMARY
Understanding transcriptional regulation remains as a major task in the molecular biology field. Enhancers are
genetic elements that regulate when and where genes are expressed and their expression levels. These elements
are hard to discover because their locations and orientations are not constrained with respect to their target genes.
Several diseases and susceptibility to certain diseases are linked to mutations and variants in enhancers. Multiple
experimental and computational methods have been developed for locating enhancers. Computational methods
are more suitable to handle the large number of genomes being sequenced now because they are faster, cheaper,
and less labor intensive than experimental methods. Despite many available computational tools, we lack a
sophisticated tool that can measure similarity in the enhancer activity of a pair of sequences. We propose here
utilizing Deep Artificial Neural Networks (DANNs) to develop such a tool. The long-term objective of this project is
to decipher the code governing gene regulation with the following specific aims: (i) design a computational tool for
measuring enhancer-enhancer similarity, (ii) validate up to 96 putative enhancers experimentally, (iii) understand
enhancer grammar, and (iv) annotate enhancers in more than 50 insect genomes. To achieve these aims, a novel
application of DANNs is proposed. Current tools utilize DANNs to answer a yes-no question: does a sequence
have similar activity to the tissue-specific enhancers comprising a particular training set of known enhancers?
These approaches require training a separate network on each tissue, leading to inconsistent performances on
different tissues. Instead, here we use a DANN to answer a related but different question: does this sequence
have similar enhancer activity to a single known tissue-specific enhancer? This deep network should perform
consistently on different cell types because it is trained on pairs of sequences — not individual sequences as is the
case in the available tools — representing all tissues for which there are known enhancers. The DANN is trained
to recognize sequence pairs with similar enhancer activities and those with dissimilar activities including (i) two
enhancers active in two different tissues, (ii) one enhancer and a random genomic sequence, and (iii) two random
genomic sequences. The tool outputs a score between 0 and 1, indicating how similar the enhancer activities
of the two sequences are. Using a much simpler machine learning algorithm than DANNs, we demonstrate that
pairs with similar enhancer activities can be separated from pairs of random genomic sequences or pairs of
one enhancer and a random genomic sequence with a very high accuracy. The new tool has many important
potential applications including consistent annotation of enhancers across cell types and related species. Our tool
can annotate enhancers active in a cell type that has a small number of known enhancers, and it can annotate
enhancers in related genomes when there is a set of known enhancers demarcated in one of them. Discovering
new transcription factor binding sites is another potential application. Studying enhancer “design principles” and
the effects of variants can be facilitated using the proposed tool. Such applications will advance our field.
项目摘要
理解转录调节仍然是分子生物学领域的主要任务。增强剂是
调节基因表达何时何地的通用元素及其表达水平。这些元素
很难发现,因为它们的位置和方向不受其目标基因的约束。
一些疾病和对某些疾病的易感性与增强子中的突变和变体有关。多种的
已经开发了用于定位增强剂的实验和计算方法。计算方法
现在更适合处理现在测序的大量基因组,因为它们更快,更便宜,
与实验方法相比,劳动密集型较少。尽管有许多可用的计算工具,我们缺乏
软化工具,可以测量一对序列的增强剂活性中的相似性。我们在这里提出建议
利用深度人工神经网络(DANN)来开发这种工具。该项目的长期目标是
用以下特定目的来确定控制基因调节的代码:(i)设计一个计算工具
测量增强子增强剂的相似性,(ii)在实验中最多验证96个假定的增强剂,(iii)理解
增强剂语法和(IV)在50多个昆虫基因组中注释增强子。为了实现这些目标,一本小说
提出了Danns的应用。当前的工具利用dann回答是不是问题:序列是否序列
与完成特定的已知增强剂训练集的组织特异性增强剂相似吗?
这些方法需要在每个组织上训练一个单独的网络,从而导致表现不一致
不同的组织。相反,我们在这里使用Dann回答一个相关但不同的问题:此顺序是否吗
具有与单个已知组织特异性增强剂相似的增强剂活性吗?这个深层网络应该执行
一贯在不同的细胞类型上,因为它是通过成对序列训练的,而不是单个序列
可用工具中的情况 - 代表所有已知增强剂的组织。丹恩经过训练
识别具有类似增强剂活动的序列对以及具有不同活动的序列对,包括(i)两个
在两个不同组织中活跃的增强剂,(ii)一个增强子和一个随机基因组序列,(iii)两个随机
基因组序列。该工具输出分数在0到1之间,表明增强器活动的相似之处
这两个序列是。使用比Danns更简单的机器学习算法,我们证明了这一点
具有相似增强剂活性的对可以与成对的随机基因组序列或对成对分开
一个增强子和一个随机基因组序列具有很高的精度。新工具有许多重要的
潜在的应用,包括对细胞类型和相关物种之间增强子的一致注释。我们的工具
可以注释具有少数已知增强剂的单元格类型中的增强剂,并且可以注释
相关基因组的增强子当其中一个已知的增强子划分时。发现
新的转录因子结合位点是另一个潜在的应用。研究增强器的“设计原理”和
可以使用所提出的工具制备变体的效果。这样的应用将推进我们的领域。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluation of metric and representation learning approaches: Effects of representations driven by relative distance on the performance.
度量和表示学习方法的评估:相对距离驱动的表示对性能的影响。
- DOI:10.1109/imsa58542.2023.10217475
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Garza,AnthonyB;Garcia,Rolando;Halfon,MarcS;Girgis,HaniZ
- 通讯作者:Girgis,HaniZ
共 1 条
- 1
Hani Z. Girgis其他文献
Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models
Look4TRs:一种使用自监督隐马尔可夫模型检测简单串联重复的从头工具
- DOI:10.1101/44980110.1101/449801
- 发表时间:20182018
- 期刊:
- 影响因子:0
- 作者:Alfredo Velasco;Benjamin T. James;Vincent D Wells;Hani Z. GirgisAlfredo Velasco;Benjamin T. James;Vincent D Wells;Hani Z. Girgis
- 通讯作者:Hani Z. GirgisHani Z. Girgis
Characterizing the epigenetic signatures of the human regulatory elements: A pilot study
表征人类调控元件的表观遗传特征:一项试点研究
- DOI:10.1101/05939410.1101/059394
- 发表时间:20162016
- 期刊:
- 影响因子:0
- 作者:S. L. Clement;Hani Z. GirgisS. L. Clement;Hani Z. Girgis
- 通讯作者:Hani Z. GirgisHani Z. Girgis
共 2 条
- 1
相似国自然基金
人工智能驱动的营销模式和消费者行为研究
- 批准号:72332006
- 批准年份:2023
- 资助金额:165 万元
- 项目类别:重点项目
基于“人工智能算法+高精度遥感数据”的棉花表型信息识别及解析
- 批准号:32360436
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
巴氏杀菌乳中金黄色葡萄球菌和肠毒素A风险预测和溯源的人工智能模型构建研究
- 批准号:32302241
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
制造企业人工智能工作场景下员工AI认同影响机制与员工主动行为内在机理研究
- 批准号:72362025
- 批准年份:2023
- 资助金额:27 万元
- 项目类别:地区科学基金项目
基于原子贡献与人工智能的萃取精馏溶剂分子设计研究
- 批准号:22308037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Correlating structure and function in KATP channel isoforms
KATP 通道亚型的结构和功能相关
- 批准号:1062941210629412
- 财政年份:2022
- 资助金额:$ 36.79万$ 36.79万
- 项目类别:
Therapeutic Targeting a Non-Hodgkin Lymphoma Driver Using AI
使用人工智能针对非霍奇金淋巴瘤驱动者进行治疗
- 批准号:1058571710585717
- 财政年份:2022
- 资助金额:$ 36.79万$ 36.79万
- 项目类别:
Mechanism-based Targeting of the RNA Processing Machinery of SARS-CoV-2
基于机制的 SARS-CoV-2 RNA 加工机制靶向
- 批准号:1045797810457978
- 财政年份:2021
- 资助金额:$ 36.79万$ 36.79万
- 项目类别:
Mechanism-based Targeting of the RNA Processing Machinery of SARS-CoV-2
基于机制的 SARS-CoV-2 RNA 加工机制靶向
- 批准号:1024014610240146
- 财政年份:2021
- 资助金额:$ 36.79万$ 36.79万
- 项目类别:
Mechanism-based Targeting of the RNA Processing Machinery of SARS-CoV-2
基于机制的 SARS-CoV-2 RNA 加工机制靶向
- 批准号:1067162810671628
- 财政年份:2021
- 资助金额:$ 36.79万$ 36.79万
- 项目类别: