Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
基本信息
- 批准号:10020414
- 负责人:
- 金额:$ 38.88万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressBasic ScienceBig DataBiologicalBiological ProcessBiologyBiomedical ResearchComplexComputing MethodologiesDataDependenceDevelopmentDiseaseEffectivenessFutureGene ExpressionGenesHealthcareInterventionKnowledgeLeadLearningLinear ModelsMachine LearningMeasurementMethodologyModelingModernizationMolecularMolecular BiologyMolecular GeneticsOutcomePatient-Focused OutcomesPhenotypeResearchSamplingSelection CriteriaSignal TransductionStatistical ModelsTechniquesTechnologyTrainingTranslatingbiomarker discoveryclinical practiceclinically translatablecomputer frameworkdeep learningexperimental studyfeature selectionhigh dimensionalityinnovationinquiry-based learningmolecular markernovelprecision medicinepredictive modelingsuccesstherapeutic target
项目摘要
Project Summary
Biomedical data is vastly increasing in quantity, scope, and generality, expanding opportunities to discover
novel biological processes and clinically translatable outcomes. Machine learning (ML), a key technology in
modern biology that addresses these changing dynamics, aims to infer meaningful interactions among variables
by learning their statistical relationships from data consisting of measurements on variables across samples.
Accurate inference of such interactions from big biological data can lead to novel biological discoveries,
therapeutic targets, and predictive models for patient outcomes. However, a greatly increased hypothesis space,
complex dependencies among variables, and complex “black-box” ML models pose complex, open challenges.
To meet these challenges, we have been developing innovative, rigorous, and principled ML techniques to infer
reliable, accurate, and interpretable statistical relationships in various kinds of biological network inference problems,
pushing the boundaries of both ML and biology.
Fundamental limitations of current ML techniques leave many future opportunities to translate inferred
statistical relationships into biological knowledge, as exemplified in a standard biomarker discovery problem –
an extremely important problem for precision medicine. Biomarker discovery using high-throughput molecular
data (e.g., gene expression data) has significantly advanced our knowledge of molecular biology and genetics.
The current approach attempts to find a set of features (e.g., gene expression levels) that best predict a phenotype
and use the selected features, or molecular markers, to determine the molecular basis for the phenotype.
However, the low success rates of replication in independent data and of reaching clinical practice indicate three
challenges posed by current ML approach. First, high-dimensionality, hidden variables, and feature correlations
create a discrepancy between predictability (i.e., statistical associations) and true biological interactions; we need
new feature selection criteria to make the model better explain rather than simply predict phenotypes. Second,
complex models (e.g., deep learning or ensemble models) can more accurately describe intricate relationships
between genes and phenotypes than simpler, linear models, but they lack interpretability. Third, analyzing
observational data without conducting interventional experiments does not prove causal relations.
To address these problems, we propose an integrated machine learning methodology for learning interpretable models
from data that will: 1) select interpretable features likely to provide meaningful phenotype explanations, 2) make
interpretable predictions by estimating the importance of each feature to a prediction, and 3) iteratively validate
and refine predictions through interventional experiments. For each challenge, we will develop a generalizable
ML framework that focuses on different aspects of model interpretability and will therefore be applicable to any
formerly intractable, high-impact healthcare problems. We will also demonstrate the effectiveness of each ML
framework for a wide range of topics, from basic science to disease biology to bedside applications.
项目概要
生物医学数据在数量、范围和通用性方面都在大幅增加,扩大了发现的机会
新颖的生物过程和临床可转化的结果机器学习(ML)是一项关键技术。
现代生物学致力于解决这些不断变化的动态,旨在推断变量之间有意义的相互作用
通过从样本变量测量数据组成的数据中了解它们的统计关系。
从大生物数据中准确推断这种相互作用可以带来新的生物发现,
然而,假设空间大大增加,
变量之间复杂的依赖关系以及复杂的“黑盒”机器学习模型带来了复杂、开放的挑战。
为了应对这些挑战,我们一直在开发创新、严格且有原则的机器学习技术来推断
各种生物网络推理问题中可靠、准确和可解释的统计关系,
突破机器学习和生物学的界限。
当前机器学习技术的基本局限性为未来翻译推断留下了许多机会
生物知识的统计关系,如标准生物标志物发现问题所示 –
使用高通量分子发现生物标志物是一个非常重要的问题。
数据(例如基因表达数据)极大地提高了我们对分子生物学和遗传学的了解。
当前的方法试图找到一组最能预测表型的特征(例如基因表达水平)
并使用选定的特征或分子标记来确定表型的分子基础。
然而,独立数据的复制成功率和达到临床实践的成功率较低,这表明三个
当前机器学习方法带来的挑战首先是高维、隐藏变量和特征相关性。
在可预测性(即统计关联)和我们需要的真实生物相互作用之间产生差异;
新的特征选择标准使模型更好地解释而不是简单地预测表型。
复杂模型(例如深度学习或集成模型)可以更准确地描述复杂的关系
基因和表型之间的关系比更简单的线性模型更好,但它们缺乏可解释性。
没有进行介入实验的观察数据并不能证明因果关系。
为了解决这些问题,我们提出了一种用于学习可解释模型的集成机器学习方法
从数据中:1)选择可能提供有意义的表型解释的可解释特征,2)
通过估计每个特征对预测的重要性来做出可解释的预测,并且 3)迭代验证
并通过介入实验完善预测。对于每个挑战,我们将制定一个可推广的方案。
ML 框架专注于模型可解释性的不同方面,因此适用于任何
我们还将展示每个 ML 的有效性。
框架涵盖广泛的主题,从基础科学到疾病生物学再到临床应用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Su-In Lee其他文献
Su-In Lee的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Su-In Lee', 18)}}的其他基金
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10347341 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10132962 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10613437 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Application of Data Sciences in Traumatic Brain Injury
数据科学在脑外伤中的应用
- 批准号:
9685513 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
- 批准号:
10224845 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
- 批准号:
10437684 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10042623 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10670111 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10260483 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10438909 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
相似国自然基金
先进航空发动机中超临界态煤油燃烧过程中的基础科学问题研究
- 批准号:52336006
- 批准年份:2023
- 资助金额:230 万元
- 项目类别:重点项目
极端高温环境流动沸腾技术的基础科学问题及关键材料研究
- 批准号:52333015
- 批准年份:2023
- 资助金额:230 万元
- 项目类别:重点项目
含氮杂环配体聚合物结构精准调控与功能涂层材料表界面基础科学问题研究
- 批准号:
- 批准年份:2022
- 资助金额:53 万元
- 项目类别:面上项目
耐高温高电压SiC功率器件灌封材料的多性能协同中的基础科学问题研究
- 批准号:52272001
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
轻量化低脉动高可靠直驱式永磁电机系统基础科学问题与关键技术研究
- 批准号:52237002
- 批准年份:2022
- 资助金额:269 万元
- 项目类别:重点项目
相似海外基金
Computational Ophthalmology and Biomedical Informatics
计算眼科和生物医学信息学
- 批准号:
10709404 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Pulmonary Hypertension: State of the Art and Therapeutic Opportunities
肺动脉高压:最新技术和治疗机会
- 批准号:
10682118 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
2023 Chemical Imaging Gordon Research Conferences
2023 年化学成像戈登研究会议
- 批准号:
10605394 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Longitudinal neural fingerprinting of opioid-use trajectories
阿片类药物使用轨迹的纵向神经指纹图谱
- 批准号:
10805031 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别: