Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
基本信息
- 批准号:8818096
- 负责人:
- 金额:$ 55.84万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-05-31 至 2018-09-28
- 项目状态:已结题
- 来源:
- 关键词:AbbreviationsActive LearningAddressAdoptionAlgorithmsAttentionBiomedical ResearchClassificationClinicalClinical DataClinical InformaticsClinical ResearchCognitiveCommunitiesDataData SetDevelopmentDiseaseEducational workshopElectronic Health RecordFaceGoalsGrantHumanHybridsKnowledgeLabelLearningLinguisticsMachine LearningManualsMedicalMethodologyMethodsModelingNamesNatural Language ProcessingPatientsPatternPerformancePharmaceutical PreparationsPhysiciansProcessResearchResearch PersonnelResearch PriorityResourcesSamplingSolutionsSourceSpecific qualifier valueStatistical MethodsStatistical ModelsSystemTechnologyTestingTextTimeUnited States National Library of Medicinebaseclinical applicationclinical phenotypecohortcomputer human interactioncomputerizedcostexperienceimprovedmodel developmentnovelopen sourcestatisticssuccesstoolusability
项目摘要
DESCRIPTION (provided by applicant): Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing.
In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims. In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.
描述(由申请人提供):电子健康记录(EHRS)系统的越来越多的部署已使大量的临床数据以电子方式提供。但是,患者的许多详细临床信息都嵌入叙事文本中,并且无法直接用于计算机化的临床应用。因此,可以在叙事文档中解锁信息的自然语言处理(NLP)技术在医学领域受到了极大的关注。当前的最新NLP方法通常涉及构建概率模型。但是,临床NLP中广泛采用统计方法面临两个巨大的挑战:1)缺乏大量注释的临床语料库; 2)缺乏可以有效地将语言和领域知识与统计学习相结合的方法。高性能统计NLP方法依赖于临床文本的大规模和高质量注释,但是创建大型注释的临床语料库是耗时且昂贵的,因为它通常需要医生的手动审查。此外,医学领域是知识密集的。为了实现最佳性能,概率模型需要利用医疗领域知识。因此,可以有效地将域和专家知识与机器学习过程相结合的方法快速构建具有最低注释成本的高质量概率模型对于临床文本处理非常需要。
在这项研究中,我们建议研究互动机器学习(IML)方法,以应对临床NLP中上述挑战。 IML系统在迭代过程中构建了分类模型,该过程可以根据基于先前注释的样本构建的模型积极选择信息样本,从而降低模型开发的注释成本。更重要的是,IML系统还涉及人类的学习过程(例如,专家可以根据域知识为分类任务指定重要功能)。因此,IML是有效整合基于规则的(通过指定特征的域专家)和基于统计的临床NLP方法(通过不同的学习算法)方法的理想框架。为了实现我们的目标,我们提出了三个具体目标。在AIM 1中,我们计划调查IML的不同方面,以进行单词感官歧义,包括开发新的主动学习算法和进行认知可用性分析,以进行用户的有效特征注释。为了证明IML的广泛用途,我们进一步将IML方法扩展到其他两个重要的临床NLP分类任务:AIM 2中的指定实体识别和临床表。最后,我们建议将IML方法和工具传播到AIM 3中的生物医学研究界。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HUA XU其他文献
HUA XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HUA XU', 18)}}的其他基金
Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia
利用纵向数据和信息学技术了解双语在认知弹性、衰老和痴呆中的作用
- 批准号:
10583170 - 财政年份:2023
- 资助金额:
$ 55.84万 - 项目类别:
Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
检测 AD/ADRD 药物和非药物干预措施的协同效应
- 批准号:
10501245 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
参与和推广,为 BICAN 实现公平的数据生态系统
- 批准号:
10523908 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
- 批准号:
9132834 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8077875 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
7866149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8589822 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8305149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
8110041 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
7726747 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
相似国自然基金
企业领导者积极心理优势的识别、效应及机制:基于追随者视角的研究
- 批准号:71872117
- 批准年份:2018
- 资助金额:48.0 万元
- 项目类别:面上项目
积极背景刺激影响学习记忆的认知神经机制
- 批准号:31470980
- 批准年份:2014
- 资助金额:80.0 万元
- 项目类别:面上项目
大规模垃圾邮件过滤中的集成化SVM增量学习机制研究
- 批准号:60970081
- 批准年份:2009
- 资助金额:31.0 万元
- 项目类别:面上项目
相似海外基金
Diversity Supplement to Structure/Function of Transcription Complex Regulation to Support Predoctoral Student Christiana Binkley
转录复合体调节结构/功能的多样性补充以支持博士生克里斯蒂娜·宾克利
- 批准号:
10351034 - 财政年份:2020
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10596532 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10226824 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10395559 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Intranasal Nanodelivery of Oxytocin to Treat Morphine Addiction in HIV Patients by Gene Editing
通过基因编辑鼻内纳米递送催产素治疗 HIV 患者吗啡成瘾
- 批准号:
9411292 - 财政年份:2017
- 资助金额:
$ 55.84万 - 项目类别: