Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
基本信息
- 批准号:10626148
- 负责人:
- 金额:$ 23.65万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-01 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:AccelerationAdultAffectAmericanAwardBasic ScienceBehavioralBioinformaticsClinicalClinical DataClinical ResearchComputing MethodologiesConsensusDataData ScienceData SetDetectionDiagnosisDiagnosticDiagnostics ResearchDiseaseEconomic BurdenElectronic Health RecordEnvironmentFacultyFamilyFrequenciesGenesGeneticGenomicsGenotypeGoalsHealthcareHealthcare SystemsIndividualInformaticsKnowledgeMachine LearningManuscriptsMarkov ChainsMedicalMedical GeneticsMedicineMental disordersMentorsMentorshipMethodsMiningModelingMolecularNamesNatural Language ProcessingNatural Language Processing pipelineOntologyOutcomePacific NorthwestPatient RecruitmentsPatientsPatternPersonsPhasePhenotypePopulationPositioning AttributePrevalencePrincipal InvestigatorRare DiseasesRecording of previous eventsResearchResearch PersonnelStandardizationSymptomsSystemTestingTrainingUniversitiesValidationVisualizationVocabularyWashingtonWorkaccurate diagnosisbiomedical data sciencebiomedical informaticscareercausal variantclinical data warehouseclinical decision-makingcohortdiagnostic accuracydisease phenotypeearly onset disorderexome sequencinggene discoverygenomic datahealth care deliveryhealth datahealth recordimprovedmembermultimodal datanovelopen sourcepatient health informationphenotypic dataprototypepsychologicrare conditionrare genetic disorderrecruitskillssoftware developmentsupport toolstooltrait
项目摘要
ABSTRACT: There are nearly 7,000 diseases that have a prevalence of only one in 2,000 individuals or less.
Yet, such rare diseases are estimated to collectively affect over 300 million people worldwide, representing a
significant healthcare concern. Although rare diseases have predominantly genetic origins, nearly half of them
do not manifest symptoms until adulthood and frequently confound discovery and diagnosis. Even in the case
of early onset disorders, the sheer number of possible diagnoses can often overwhelm clinicians. As a result,
rare diseases are often diagnosed with delay, misdiagnosed or even remain undiagnosed, not only disrupting
patient lives but also hindering progress on our understanding of such diseases. Data science methods that
mine large-scale retrospective health record data for phenotypic information will aid in timely and accurate
diagnoses of rare diseases, especially when combined with additional data types, thus, having significant real-
world impact. This proposal will integrate electronic health record (EHR) data sets with publicly available
vocabularies and ontologies, and genomic data for the improved identification and characterization of patients
with rare diseases, using approaches from machine learning, natural language processing (NLP) and basic
bioinformatics. The work has three specific aims and will be carried out in two phases. During the mentored
phase, the principal investigator (PI) will develop data-driven methods to extract standardized concepts related
to rare diseases from clinical notes and infer the occurrence of each disease (Aim 1). He will also develop data
science approaches to compare and contrast longitudinal patterns associated with patients' journeys through
the healthcare system when seeking a diagnosis for a rare disease, and aid in clinical decision-making by
leveraging these patterns (Aim 2). During the independent phase (Aim 3), computational methods will be
developed for the integrated modeling and analysis of genotypic (from Aim 3) and phenotypic information (from
Aims 1 and 2). Cohorts to be sequenced will cover diseases for which causal genes or disease definitions are
unclear (discovery), as well as those for which these are well known (validation). This work will be carried out
under the mentorship of four faculty members with complementary expertise in biomedical informatics, data
science, NLP, and rare disease genomics at the University of Washington, the largest medical system in the
Pacific Northwest (four million EHRs), world-renowned researchers in medical genetics, and a robust data
science environment. In addition, under the direction of the mentoring team, the PI will complete advanced
coursework, receive training in translational bioinformatics and clinical research informatics, submit
manuscripts, and seek an independent research position. This proposal will yield preliminary results for
subsequent studies on data-driven phenotyping and enable the realization of the PI's career goals by providing
him with the necessary training to build on his machine learning and basic bioinformatics expertise to transition
into an independent investigator in biomedical data science.
摘要:有近 7,000 种疾病的患病率仅为 2,000 人中就有 1 人或更少。
然而,据估计,此类罕见疾病总共影响了全世界 3 亿多人,相当于
重大的医疗保健问题。尽管罕见疾病主要有遗传起源,但其中近一半
直到成年才出现症状,并且经常混淆发现和诊断。即使在这种情况下
对于早发性疾病,可能的诊断数量之多往往会让临床医生不知所措。因此,
罕见病往往诊断延迟、误诊甚至漏诊,不仅扰乱
患者的生命,但也阻碍了我们对此类疾病的理解的进展。数据科学方法
挖掘大规模回顾性健康记录数据以获取表型信息将有助于及时、准确
罕见疾病的诊断,特别是与其他数据类型相结合时,因此具有显着的真实性
世界影响。该提案将把电子健康记录 (EHR) 数据集与公开可用的数据集相集成
词汇表和本体以及基因组数据,用于改进患者的识别和表征
使用机器学习、自然语言处理 (NLP) 和基础方法来治疗罕见疾病
生物信息学。这项工作有三个具体目标,将分两个阶段进行。辅导期间
阶段,首席研究员(PI)将开发数据驱动的方法来提取相关的标准化概念
从临床记录中识别罕见疾病并推断每种疾病的发生(目标 1)。他还将开发数据
科学方法来比较和对比与患者经历的旅程相关的纵向模式
寻求罕见疾病诊断时的医疗保健系统,并通过以下方式协助临床决策
利用这些模式(目标 2)。在独立阶段(目标 3),计算方法将是
开发用于基因型(来自目标 3)和表型信息(来自
目标 1 和 2)。待测序的队列将涵盖其致病基因或疾病定义不明确的疾病
不清楚的(发现),以及那些众所周知的(验证)。这项工作将进行
在四位在生物医学信息学、数据方面具有互补专业知识的教员的指导下
华盛顿大学的科学、NLP 和罕见疾病基因组学是美国最大的医疗系统
太平洋西北地区(400 万份电子病历)、世界知名的医学遗传学研究人员以及可靠的数据
科学环境。此外,在导师团队的指导下,PI将完成高级任务
课程作业,接受转化生物信息学和临床研究信息学培训,提交
手稿,并寻求独立的研究职位。该提案将产生初步结果
数据驱动表型的后续研究,并通过提供以下内容实现 PI 的职业目标
他接受了必要的培训,以利用他的机器学习和基本生物信息学专业知识来进行过渡
成为生物医学数据科学的独立研究者。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Vikas Rao Pejaver其他文献
Vikas Rao Pejaver的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Vikas Rao Pejaver', 18)}}的其他基金
Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
- 批准号:
10541283 - 财政年份:2022
- 资助金额:
$ 23.65万 - 项目类别:
Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
- 批准号:
9884791 - 财政年份:2019
- 资助金额:
$ 23.65万 - 项目类别:
相似国自然基金
成人免疫性血小板减少症(ITP)中血小板因子4(PF4)通过调节CD4+T淋巴细胞糖酵解水平影响Th17/Treg平衡的病理机制研究
- 批准号:82370133
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
依恋相关情景模拟对成人依恋安全感的影响及机制
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
生活方式及遗传背景对成人不同生命阶段寿命及死亡的影响及机制的队列研究
- 批准号:
- 批准年份:2021
- 资助金额:56 万元
- 项目类别:面上项目
成人与儿童结核病发展的综合研究:细菌菌株和周围微生物组的影响
- 批准号:81961138012
- 批准年份:2019
- 资助金额:100 万元
- 项目类别:国际(地区)合作与交流项目
统计学习影响成人汉语二语学习的认知神经机制
- 批准号:31900778
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
The Proactive and Reactive Neuromechanics of Instability in Aging and Dementia with Lewy Bodies
衰老和路易体痴呆中不稳定的主动和反应神经力学
- 批准号:
10749539 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Elucidation of contributions of telomere damage and non-cell autonomy to the pathophysiology of Friedreich ataxia using a zebrafish model
使用斑马鱼模型阐明端粒损伤和非细胞自主性对弗里德赖希共济失调病理生理学的贡献
- 批准号:
10723485 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Plasma neurofilament light chain as a potential disease monitoring biomarker in Wolfram syndrome
血浆神经丝轻链作为 Wolfram 综合征潜在疾病监测生物标志物
- 批准号:
10727328 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Evaluating the impacts of sea level rise on migration and wellbeing in coastal communities
评估海平面上升对沿海社区移民和福祉的影响
- 批准号:
10723570 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Randomized clinical trial to test the efficacy of a smartphone app for smoking cessation for nondaily smokers
随机临床试验,测试智能手机应用程序对非日常吸烟者戒烟的功效
- 批准号:
10715401 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别: