Effective prediction of microRNAs in the face of class imbalance
面对类别不平衡时有效预测 microRNA
基本信息
- 批准号:RGPIN-2016-06179
- 负责人:
- 金额:$ 1.6万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
MicroRNA (miRNA) are short expressed genomic sequences which encode small RNA molecules that adopt a “hairpin” secondary structure. Computational prediction of miRNA is important, since miRNA are now believed to disrupt or otherwise control the expression of 60-90% of mammalian genes. Sequence-based de novo prediction of miRNA is made difficult due to the acute class imbalance: for each true miRNA within a genome, we expect 1000 pseudo-miRNA (i.e. genomic regions producing miRNA-like hairpin structures). Therefore, effective miRNA prediction systems must have extremely high specificity (i.e. the ability to reject pseudo-miRNA), while also retaining the ability to correctly detect true miRNA (i.e. recall).
We have recently introduced the Species-specific miRNA Prediction (SMIRP) framework for training highly effective species-specific miRNA prediction systems. When applied to three popular miRNA prediction methods, we observe significant improvements in precision (i.e. the proportion of predictions expected to be true miRNA) while maintaining the same high recall rates observed by the original methods. We propose to extend our research in three key areas:
1) Existing miRNA prediction methods perform well on canonical pre-miRNA, but are not well-suited for high-throughput annotation of entire genomes. Therefore, new classification techniques will be developed which optimally differentiate between real and pseudo-miRNA sequences within predicted hairpin structures. This will include the development of novel methods to compute general-purpose information-rich DNA/RNA descriptors. In addition to miRNA prediction, these descriptors will benefit other nucleic acid classification problem domains.
2) With the increasing availability of transcriptomic data, there is a need and an opportunity to develop an integrated miRNA discovery pipeline that leverages both next-generation sequencing (NGS) read patterns and powerful sequence-based methods such as SMIRP. We will develop and apply advanced machine learning approaches to optimally combine NGS- and sequence-based approaches, improving our ability to discover novel miRNA of potential importance to human health.
3) Contributions will also be made in the broader field of machine learning in the presence of extreme class imbalance where many classic performance metrics, such as ROC curves, become inappropriate as they do not adequately reflect the impact of false positive predictions. To address this and other issues, we will develop novel performance metrics for cases of acute class imbalance. While these new metrics will find immediate application in the development of miRNA prediction tools, they will also be widely applicable to other problem domains within bioinformatics and beyond.
MicroRNA (miRNA) 是编码采用“发夹”二级结构的小 RNA 分子的短表达基因组序列,因为现在认为 miRNA 会破坏或以其他方式控制 60-90% 的哺乳动物基因的表达,因此 miRNA 的计算预测非常重要。由于严重的类别不平衡,基于序列的 miRNA 从头预测变得困难:对于基因组内的每个真实 miRNA,我们预计有 1000 个伪 miRNA(即产生类似 miRNA 发夹的基因组区域)因此,有效的miRNA预测系统必须具有极高的特异性(即拒绝伪miRNA的能力),同时还保留正确检测真实miRNA的能力(即召回)。
我们最近引入了物种特异性 miRNA 预测 (SMIRP) 框架,用于训练高效的物种特异性 miRNA 预测系统。当应用于三种流行的 miRNA 预测方法时,我们观察到精确度(即预测正确的比例)的显着提高。 miRNA),同时保持与原始方法相同的高召回率,我们建议在三个关键领域扩展我们的研究:
1) 现有的 miRNA 预测方法在典型的 pre-miRNA 上表现良好,但不太适合整个基因组的高通量注释,因此,将开发新的分类技术,以最佳地区分预测发夹内的真实和伪 miRNA 序列。这将包括开发计算通用信息丰富的 DNA/RNA 描述符的新方法,除了 miRNA 预测之外,这些描述符还将有益于其他核酸分类问题领域。
2) 随着转录组数据的可用性不断增加,我们将开发和利用下一代测序 (NGS) 读取模式和强大的基于序列的方法(例如 SMIRP)开发集成的 miRNA 发现流程。应用先进的机器学习方法,以最佳方式结合基于 NGS 和序列的方法,提高我们发现对人类健康具有潜在重要性的新型 miRNA 的能力。
3)在存在极端类别不平衡的情况下,还将在更广泛的机器学习领域做出贡献,在这种情况下,许多经典的性能指标(例如 ROC 曲线)变得不合适,因为它们不能充分反映误报预测的影响。和其他问题,我们将为严重类别不平衡的情况开发新的性能指标,虽然这些新指标将立即应用于 miRNA 预测工具的开发,但它们也将广泛适用于生物信息学及其他领域的其他问题领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Green, James其他文献
Citations and science
- DOI:
10.1007/s11096-017-0539-y - 发表时间:
2017-10-01 - 期刊:
- 影响因子:2.4
- 作者:
van Mil, J. W. Foppe;Green, James - 通讯作者:
Green, James
Internet use in an orthopaedic outpatient population
- DOI:
10.1097/bco.0b013e31828e542b - 发表时间:
2013-05-01 - 期刊:
- 影响因子:0.3
- 作者:
Baker, Joseph F.;Green, James;Mulhall, Kevin J. - 通讯作者:
Mulhall, Kevin J.
Critical Role of the Virus-Encoded MicroRNA-155 Ortholog in the Induction of Marek's Disease Lymphomas
- DOI:
10.1371/journal.ppat.1001305.s001 - 发表时间:
2011-01-01 - 期刊:
- 影响因子:0
- 作者:
Green, James;Petherbridge, Lawrence;Kgosana, Lydia - 通讯作者:
Kgosana, Lydia
Child pedestrian casualties and deprivation
- DOI:
10.1016/j.aap.2010.10.016 - 发表时间:
2011-05-01 - 期刊:
- 影响因子:5.9
- 作者:
Green, James;Muir, Helen;Maher, Mike - 通讯作者:
Maher, Mike
Quality and Variability of Patient Directions in Electronic Prescriptions in the Ambulatory Care Setting.
- DOI:
10.18553/jmcp.2018.17404 - 发表时间:
2018-07 - 期刊:
- 影响因子:2.1
- 作者:
Yang, Yuze;Ward-Charlerie, Stacy;Dhavle, Ajit A.;Rupp, Michael T.;Green, James - 通讯作者:
Green, James
Green, James的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Green, James', 18)}}的其他基金
Reciprocal Perspective Machine Learning to Identify Relationships in Sparse Biological Networks
交互视角机器学习识别稀疏生物网络中的关系
- 批准号:
RGPIN-2021-04184 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2022-04761 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Reciprocal Perspective Machine Learning to Identify Relationships in Sparse Biological Networks
交互视角机器学习识别稀疏生物网络中的关系
- 批准号:
RGPIN-2021-04184 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2020
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2020
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Effective prediction of microRNAs in the face of class imbalance
面对类别不平衡时有效预测 microRNA
- 批准号:
RGPIN-2016-06179 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
强子三维结构分布函数的理论预言
- 批准号:12375080
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
次次领头阶anti-kT以及同类型的喷注函数和pA对撞中喷注向前产生的预言
- 批准号:12175016
- 批准年份:2021
- 资助金额:63 万元
- 项目类别:面上项目
理论预言的三维碳同素异构体T-carbon的制备及其物性的实验深入研究
- 批准号:
- 批准年份:2020
- 资助金额:58 万元
- 项目类别:面上项目
原子核经验平均场的理论建模不确定性及预言能力优化研究
- 批准号:11975209
- 批准年份:2019
- 资助金额:60 万元
- 项目类别:面上项目
LHC上若干带喷注过程的次次领头阶预言
- 批准号:11775023
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
相似海外基金
The cardiovascular consequences of sleep apnea plus COPD (Overlap syndrome)
睡眠呼吸暂停加慢性阻塞性肺病(重叠综合征)对心血管的影响
- 批准号:
10733384 - 财政年份:2023
- 资助金额:
$ 1.6万 - 项目类别:
Identifying multimodal biomarkers for autologous serum tears in the treatment of chronic postoperative ocular pain
识别治疗慢性术后眼痛的自体血清泪液的多模式生物标志物
- 批准号:
10794761 - 财政年份:2023
- 资助金额:
$ 1.6万 - 项目类别:
Digital Multiplexed Analysis of Circulating Nucleic Acids in Small-Volume Blood Specimens
小体积血液样本中循环核酸的数字多重分析
- 批准号:
10467839 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Lung Cancer Early Detection and Immunotherapy Response Prediction and Monitoring with an Exo-PROS Liquid Biopsy Assay
使用 Exo-PROS 液体活检测定进行肺癌早期检测和免疫治疗反应预测和监测
- 批准号:
10665754 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别: