Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
基本信息
- 批准号:10653214
- 负责人:
- 金额:$ 62.02万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-10 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:AddressAll of Us Research ProgramBig DataBiologicalBiological FactorsBiological MarkersCardiologyCatalogsCohort StudiesCommunitiesComplexCountryDataData ScienceData SetDemographic FactorsDepositionDiabetes MellitusDiet and NutritionDisadvantagedDiseaseDisparityEnvironmentEnvironmental ExposureEnvironmental Risk FactorEpidemiologyEtiologyExhibitsGoalsHealthHeart DiseasesHumanIncidenceLeadLibrariesLinkLiteratureMachine LearningMalignant NeoplasmsMeasurementMeasuresMeta-AnalysisMetadataMethodsModelingNational Health and Nutrition Examination SurveyNational Institute of Environmental Health SciencesObservational StudyPhenotypePollutionPopulationPopulation HeterogeneityProcessReproducibilityResearch DesignResearch PersonnelResourcesRisk FactorsRoleSample SizeSamplingTestingTimeTranslationsUnited States National Institutes of HealthVariantanalytical methodbiobankcohortdata resourcedeep learningdisease disparitydisease phenotypedisorder riskenvironmental health disparityfeature selectiongenetic risk factorhealth differencehealth disparityhypercholesterolemiamachine learning methodnovelphenomeprecision medicinescale uptoolvibration
项目摘要
Project Summary/Abstract
Phenotypic variability across demographically diverse populations are driven by environmental factors. The
overall goal of this proposal is to deploy data science approaches to drive discovery of associations between
exposures (E) and phenotypes (P) in demographically diverse populations. We lack data science methods to
associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease
incidence (D), required for the delivery of precision medicine. Observational studies are fraught with 4 unsolved
data science challenges. First, E-based studies are: (1) limited to associating a few hypothesized exposure-
phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine
learning (ML) approaches for feature selection and prediction hold promise, however, (2) most extant E-based
cohorts contain missing data, challenging the use of ML to detect complex E-P associations, Third, (3) biases,
such as confounding and study design influence associations and hinder translation. Fourth, (4) there are few
well-powered data resources that systematically document longitudinal E-P and E-D associations across
massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple
phenotypes and replicate these associations across cohorts. (Aim 1). The “vibration of effects”, or the degree
to which associations change as a function of study design (e.g., analytic method, sample size) and model
choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which
environmental differences lead to health disparities. To address these challenges and gaps, we propose to Aim
1: develop and test machine learning methods to associate multiple environmental exposure indicators with
multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in
phenotype in populations and will deposit all data and models in a novel EP-WAS Catalog. Aim 2: Quantitate
how study design influences associations between exposure biomarkers and phenotype. We will scale up,
extend, and test a method called “vibration of effects” (VoE) to measure how study criteria influences the
stability of associations (how reproducible associations are as a function of analytic choice). Aim 3. Leverage
EP-WAS and VoE to disentangle biological, demographic, and environmental influences of phenotypic
disparities in hypercholesterolemia. We will deploy EP-WAS and VoE packaged libraries in the largest cohort
study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will
equip the biomedical community with data science approaches for robust data-driven discovery and
interpretation of exposure-phenotype factors in observational datasets, required for the identification of
environmental health disparities. For the first time, investigators will ascertain the collective role of the
environment in heart disease at scale just in time for the All of Us program.
项目概要/摘要
不同人口群体的表型变异是由环境因素驱动的。
该提案的总体目标是部署数据科学方法来推动发现之间的关联
我们缺乏数据科学方法来分析不同人口中的暴露(E)和表型(P)。
在表型 (P) 和疾病中关联、复制暴露组 (E) 的暴露变量并确定其优先级
发生率 (D),提供精准医疗所需的观察性研究充满了 4 个未解决的问题。
首先,基于电子的研究是:(1)仅限于关联一些骚扰暴露——
表型对(E-P)一次,导致环境协会的文献支离破碎。
然而,用于特征选择和预测的学习(ML)方法大有希望,(2)最现存的基于电子的
队列包含缺失数据,对使用 ML 检测复杂的 E-P 关联提出了挑战,第三,(3) 偏差,
第四,(4)很少有混杂和研究设计影响联想和阻碍翻译。
强大的数据资源,系统地记录纵向 E-P 和 E-D 关联
大规模精准医学系统地关联多个暴露是一项挑战。
表型并在队列中复制这些关联(目标 1)。
关联性随着研究设计(例如分析方法、样本量)和模型的变化而变化
选择是观察性研究中隐藏的偏见(目标 2)。第三,一个突出的问题是其程度。
环境差异导致健康差异 为了解决这些挑战和差距,我们提出了目标。
1:开发和测试机器学习方法,将多个环境暴露指标与
多种表型:EP-WAS 很难解释暴露的大量变异。
目标 2:定量。
研究设计如何影响暴露生物标志物和表型之间的关联我们将扩大规模。
扩展并测试一种称为“效果振动”(VoE)的方法来衡量研究标准如何影响
关联的稳定性(关联的重现性如何作为分析选择的函数)。目标 3. 杠杆作用。
EP-WAS 和 VoE 解开表型的生物、人口和环境影响
我们将在最大的队列中部署 EP-WAS 和 VoE 打包库。
我们将进行研究,以区分不同人口群体的表型变异是否影响高胆固醇血症。
为生物医学界提供数据科学方法,以实现强大的数据驱动发现和
解释观察数据集中的暴露表型因素,需要识别
调查人员将首次确定环境健康差异的集体作用。
大规模的心脏病环境正好适合“我们所有人”计划。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ARJUN KUMAR MANRAI其他文献
ARJUN KUMAR MANRAI的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ARJUN KUMAR MANRAI', 18)}}的其他基金
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10705899 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
Precision Cardiovascular Medicine for Multi-Ethnic Populations
为多民族人群提供精准心血管医学
- 批准号:
10582991 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10874056 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10487388 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10095924 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
Precision Cardiovascular Medicine for Multi-Ethnic Populations
为多民族人群提供精准心血管医学
- 批准号:
9917879 - 财政年份:2018
- 资助金额:
$ 62.02万 - 项目类别:
相似海外基金
All of Us Research Program Trans-America Consortium of the HCSRN
我们所有人研究计划 HCSRN 泛美联盟
- 批准号:
10871074 - 财政年份:2023
- 资助金额:
$ 62.02万 - 项目类别:
Data science tools to identify robust exposure-phenotype associations for precision medicine
数据科学工具可识别精准医学中强大的暴露-表型关联
- 批准号:
10705899 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10669429 - 财政年份:2022
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10386527 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别:
California Partnership for Personalized Nutrition
加州个性化营养合作伙伴关系
- 批准号:
10540243 - 财政年份:2021
- 资助金额:
$ 62.02万 - 项目类别: