Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
基本信息
- 批准号:10653214
- 负责人:
- 金额:$ 62.02万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-10 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:AddressAll of Us Research ProgramBig DataBiologicalBiological FactorsBiological MarkersCardiologyCatalogsCohort StudiesCommunitiesComplexCountryDataData ScienceData SetDemographic FactorsDepositionDiabetes MellitusDiet and NutritionDisadvantagedDiseaseDisparityEnvironmentEnvironmental ExposureEnvironmental Risk FactorEpidemiologyEtiologyExhibitsGoalsHealthHeart DiseasesHumanIncidenceLeadLibrariesLinkLiteratureMachine LearningMalignant NeoplasmsMeasurementMeasuresMeta-AnalysisMetadataMethodsModelingNational Health and Nutrition Examination SurveyNational Institute of Environmental Health SciencesObservational StudyPhenotypePollutionPopulationPopulation HeterogeneityProcessReproducibilityResearch DesignResearch PersonnelResourcesRisk FactorsRoleSample SizeSamplingTestingTimeTranslationsUnited States National Institutes of HealthVariantanalytical methodbiobankcohortdata resourcedeep learningdisease disparitydisease phenotypedisorder riskenvironmental health disparityfeature selectiongenetic risk factorhealth differencehealth disparityhypercholesterolemiamachine learning methodnovelphenomeprecision medicinescale uptoolvibration
项目摘要
Project Summary/Abstract
Phenotypic variability across demographically diverse populations are driven by environmental factors. The
overall goal of this proposal is to deploy data science approaches to drive discovery of associations between
exposures (E) and phenotypes (P) in demographically diverse populations. We lack data science methods to
associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease
incidence (D), required for the delivery of precision medicine. Observational studies are fraught with 4 unsolved
data science challenges. First, E-based studies are: (1) limited to associating a few hypothesized exposure-
phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine
learning (ML) approaches for feature selection and prediction hold promise, however, (2) most extant E-based
cohorts contain missing data, challenging the use of ML to detect complex E-P associations, Third, (3) biases,
such as confounding and study design influence associations and hinder translation. Fourth, (4) there are few
well-powered data resources that systematically document longitudinal E-P and E-D associations across
massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple
phenotypes and replicate these associations across cohorts. (Aim 1). The “vibration of effects”, or the degree
to which associations change as a function of study design (e.g., analytic method, sample size) and model
choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which
environmental differences lead to health disparities. To address these challenges and gaps, we propose to Aim
1: develop and test machine learning methods to associate multiple environmental exposure indicators with
multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in
phenotype in populations and will deposit all data and models in a novel EP-WAS Catalog. Aim 2: Quantitate
how study design influences associations between exposure biomarkers and phenotype. We will scale up,
extend, and test a method called “vibration of effects” (VoE) to measure how study criteria influences the
stability of associations (how reproducible associations are as a function of analytic choice). Aim 3. Leverage
EP-WAS and VoE to disentangle biological, demographic, and environmental influences of phenotypic
disparities in hypercholesterolemia. We will deploy EP-WAS and VoE packaged libraries in the largest cohort
study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will
equip the biomedical community with data science approaches for robust data-driven discovery and
interpretation of exposure-phenotype factors in observational datasets, required for the identification of
environmental health disparities. For the first time, investigators will ascertain the collective role of the
environment in heart disease at scale just in time for the All of Us program.
项目摘要/摘要
人口多样性种群之间的表型变异性是由环境因素驱动的。这
该建议的总体目标是部署数据科学方法,以推动发现关联
人口多样的人群中的暴露(E)和表型(P)。我们缺乏数据科学方法
在表型(P)和疾病中的曝光变量(E)的副本,重复和优先级
发病率(d),提供精密医学所需的。观察性研究用4个未解决
数据科学挑战。首先,基于E的研究是:(1)仅限于一些假设的暴露 -
一次表型对(E-P)一次,导致环境关联的碎片文献。机器
但是,(2)最现存的基于E
包含丢失数据的队列,挑战使用ML检测复杂的E-P关联,第三,(3)偏见,
例如混淆和研究设计影响关联并阻碍翻译。第四,(4)很少
能够系统地记录纵向E-P和E-D关联的数据资源
大量精确医学。系统地将多个暴露在多个接触中是一个挑战
表型并在跨人群中复制这些关联。 (目标1)。 “效果的振动”或程度
关联随着研究设计的函数(例如,分析方法,样本量)和模型的变化而变化
选择是观察性研究中的隐藏偏见(AIM 2)。第三,一个杰出的问题是程度
环境差异导致健康差异。为了解决这些挑战和差距,我们建议针对
1:开发和测试机器学习方法将多个环境暴露指标与
多种表型:EP-WAS。我们假设暴露将解释大量差异
种群中的表型将使所有数据和模型都沉积在新型的EP-WAS目录中。目标2:定量
研究设计如何影响暴露生物标志物与表型之间的关联。我们将扩大规模,
扩展并测试一种称为“效应振动”(VOE)的方法,以衡量研究标准如何影响
关联的稳定性(可复制关联是分析选择的函数)。目标3。杠杆
EP-WAS和VOE脱离表型的生物学,人口统计和环境影响
高胆固醇血症的差异。我们将在最大的队列中部署EP-WAS和VOE包装库
研究对高胆固醇血症因素的人群组之间的表型变异。我们将
为生物医学界配备数据科学方法,以实现强大的数据驱动发现和
识别识别所需的观测数据集中暴露 - 表型因子的解释
环境健康差异。调查人员首次确定
为了及时,我们所有人的计划及时进行心脏病的环境。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ARJUN KUMAR MANRAI其他文献
ARJUN KUMAR MANRAI的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ARJUN KUMAR MANRAI', 18)}}的其他基金
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10705899 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
Precision Cardiovascular Medicine for Multi-Ethnic Populations
为多民族人群提供精准心血管医学
- 批准号:
10582991 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10874056 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10487388 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10095924 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Precision Cardiovascular Medicine for Multi-Ethnic Populations
为多民族人群提供精准心血管医学
- 批准号:
9917879 - 财政年份:2018
- 资助金额:
$ 62.02万 - 项目类别:
相似海外基金
All of Us Research Program Trans-America Consortium of the HCSRN
我们所有人研究计划 HCSRN 泛美联盟
- 批准号:
10871074 - 财政年份:2023
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10705899 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10669429 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10386527 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10540243 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别: