Integrating genomic and clinical data to predict disease phenotypes using heterogeneous ensembles
使用异质集合整合基因组和临床数据来预测疾病表型
基本信息
- 批准号:10218766
- 负责人:
- 金额:$ 54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-06-01 至 2025-03-31
- 项目状态:未结题
- 来源:
- 关键词:AddressAlgorithmsAsthmaAutomobile DrivingCaringCharacteristicsClinicalClinical DataComputational algorithmComputer softwareComputing MethodologiesDataData CollectionData SetDiseaseDisease OutcomeDockingEffectivenessElectronic Health RecordEncapsulatedExerciseGenomicsGoalsHealthIndividualInflammatory Bowel DiseasesInstitutionLaboratoriesLearningMalignant NeoplasmsMedicalMedical ImagingMedical centerMethodsModalityModelingMolecularMolecular ProfilingMultiomic DataPatientsPerformancePhenotypePhysiciansPopulationRecurrenceResearch PersonnelRiskSamplingStructureTechnologyTestingThe Cancer Genome AtlasValidationVariantWorkadvanced diseasebaseclinical phenotypecohortdata formatdata integrationdeep learningdesigndisease phenotypediverse datafeature selectionflexibilitygenomic dataheterogenous dataimprovedindividual patientinnovationinsightmembermultiple datasetsmultiple omicsmultitasknovelnovel strategiesoutreachpatient populationpersonalized medicinepersonalized predictionsprecision medicinepredictive modelingprogramsrapid growthrepositoryscale uptranscriptomicsvector
项目摘要
PROJECT SUMMARY
Genomic and other “omic” profiles hold immense potential for advancing personalize/precision medicine by
enabling the accurate prediction of disease phenotypes or outcomes for individual patients, which can be used
by a clinician to design an appropriate plan of care. However, despite this potential, the actual impact of these
omic profiles on disease phenotype prediction may be limited by the fact that even large cohorts collecting
these data do not cover large enough numbers of individuals. In contrast, a variety of clinical data types, such
as laboratory tests and physician notes, are routinely collected and studied for a much larger number of
patients undergoing treatment for such diseases at medical centers. The abundance of these clinical data, and
their complementarity with multi-omic data, offer an opportunity to advance personalized medicine by
integrating these disparate types of data. However, this disparity in data formats, namely several omic profiles
being structured, and several clinical data types, such as physician notes, being unstructured, poses
challenges for this integration. An associated challenge due to this disparity is that different classes of
computational methods are likely to be the most effective for predicting disease phenotypes from these clinical
and omics datasets. These challenges pose barriers for current data integration methods to address this
problem. Here, we propose an innovative approach to this integration by assimilating diverse base phenotype
predictors inferred from individual clinical and omics datasets into heterogeneous ensembles. These
ensembles, which have shown promise for several other computational genomics problems, can aggregate an
unrestricted number and variety of base predictors, which is ideal for this integration problem. Specifically, we
describe how existing heterogeneous ensemble methods for single datasets can be transformed and advanced
to address the multiple clinical and omic dataset integration problem. In particular, we detail novel algorithms
for improving these integrative ensembles by modeling and incorporating the inherent patient and dataset
heterogeneity in these datasets. We also propose novel algorithms for leveraging the inherent complementarity
among clinical and omic datasets, as well as an innovative approach for handling expected missing data, both
with the goal of making ensemble phenotype predictors more accurate and applicable to patient cohorts. To
assess the performance of this novel suite of data integration-oriented heterogeneous ensembles, we will
validate their effectiveness for predicting asthma and Inflammatory Bowel Disease phenotypes in substantial
patient cohorts with diverse omics and clinical datasets. We will publicly release efficient software
implementations of the methods developed in this project to enable others to carry out similar analyses with
other diverse data collections. Successful accomplishment of the proposed work will contribute to the
advancement of personalized medicine through accurate individualized prediction of disease phenotypes.
项目摘要
基因组和其他“ OMIC”概况具有巨大的潜力,可以通过
为单个患者的疾病表型或结果的准确预测,可以使用
由临床医生设计适当的护理计划。但是,要求这种潜力,这些潜力的实际影响
关于疾病表型预测的OMIC概况可能受到以下事实的限制:
这些数据不涵盖足够数量的个人。相反,多种临床数据类型,例如
作为实验室测试和物理笔记,通常会收集大量
在医疗中心接受此类疾病治疗的患者。这些临床数据的抽象,以及
通过多摩尼亚数据,他们的完整性,提供了一个机会,可以通过
整合这些不同类型的数据。但是,这种数据格式的差异,即几个OMIC配置文件
结构化,几种临床数据类型,例如物理笔记,无组织,姿势
这种整合的挑战。由于这种差异,相关的挑战是
计算方法可能最有效地预测这些临床的疾病表型
和OMICS数据集。这些挑战对当前数据集成方法构成障碍,以解决这一问题
问题。在这里,我们通过吸收潜水员的基本表型提出了一种创新的方法来解决这种整合
从单个临床和OMICS数据集推断为异质合奏的预测因素。这些
对其他几个计算基因组问题的有希望的合奏可以汇总
基本预测变量的数量和种类不受限制,这是此整合问题的理想选择。具体来说,我们
描述如何对单个数据集的现有异质集合方法进行转换和高级
解决多个临床和OMIC数据集集成问题。特别是,我们详细介绍了新颖的算法
通过建模和转换继承患者和数据集来改善这些集成的合奏
这些数据集的异质性。我们还提出了新的算法来利用继承完整性
在临床和OMIC数据集中,以及一种用于处理预期数据的创新方法
以使整体表型预测变量更准确,适用于患者同类。到
评估这套新颖的数据集成套件的性能,我们将
验证其预测大量哮喘和炎症性肠病表型的有效性
患者与不同的OMICS和临床数据集队列。我们将公开发布高效的软件
该项目中开发的方法的实施,使其他人能够对
其他潜水员数据收集。拟议工作的成功完成将有助于
通过准确的个性化疾病表型来进步个性化医学。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Gaurav Pandey其他文献
Gaurav Pandey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Gaurav Pandey', 18)}}的其他基金
Multi-modal data integration to identify kinase substrates
多模式数据集成识别激酶底物
- 批准号:
10659156 - 财政年份:2022
- 资助金额:
$ 54万 - 项目类别:
Multi-modal data integration to identify kinase substrates
多模式数据集成识别激酶底物
- 批准号:
10451941 - 财政年份:2022
- 资助金额:
$ 54万 - 项目类别:
Integrating genomic and clinical data to predict disease phenotypes using heterogeneous ensembles
使用异质集合整合基因组和临床数据来预测疾病表型
- 批准号:
10589827 - 财政年份:2021
- 资助金额:
$ 54万 - 项目类别:
Integrating genomic and clinical data to predict disease phenotypes using heterogeneous ensembles
使用异质集合整合基因组和临床数据来预测疾病表型
- 批准号:
10409755 - 财政年份:2021
- 资助金额:
$ 54万 - 项目类别:
Boosting the Translational Impact of Scientific Competitions by Ensemble Learning
通过集成学习提升科学竞赛的转化影响
- 批准号:
8864679 - 财政年份:2015
- 资助金额:
$ 54万 - 项目类别:
相似国自然基金
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于物理信息神经网络的电磁场快速算法研究
- 批准号:52377005
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
- 批准号:12302257
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维不平衡数据的分类集成算法研究
- 批准号:62306119
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Developing a Childhood Asthma Risk Passive Digital Marker
开发儿童哮喘风险被动数字标记
- 批准号:
10571461 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别:
3D genome organization of the Ets1-Fli1 locus controls allergic responses
Ets1-Fli1 基因座的 3D 基因组组织控制过敏反应
- 批准号:
10654172 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别:
Revealing the role of blood microbiome in childhood asthma
揭示血液微生物组在儿童哮喘中的作用
- 批准号:
10590805 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别:
Development of an all-in-one soft wearable device for accurate lung function detection and asthma diagnosis
开发一款用于精确肺功能检测和哮喘诊断的一体式软可穿戴设备
- 批准号:
10726175 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别:
Identifying pediatric asthma subtypes using novel privacy-preserving federated machine learning methods
使用新颖的隐私保护联合机器学习方法识别小儿哮喘亚型
- 批准号:
10713424 - 财政年份:2023
- 资助金额:
$ 54万 - 项目类别: