Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
基本信息
- 批准号:8818096
- 负责人:
- 金额:$ 55.84万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-05-31 至 2018-09-28
- 项目状态:已结题
- 来源:
- 关键词:AbbreviationsActive LearningAddressAdoptionAlgorithmsAttentionBiomedical ResearchClassificationClinicalClinical DataClinical InformaticsClinical ResearchCognitiveCommunitiesDataData SetDevelopmentDiseaseEducational workshopElectronic Health RecordFaceGoalsGrantHumanHybridsKnowledgeLabelLearningLinguisticsMachine LearningManualsMedicalMethodologyMethodsModelingNamesNatural Language ProcessingPatientsPatternPerformancePharmaceutical PreparationsPhysiciansProcessResearchResearch PersonnelResearch PriorityResourcesSamplingSolutionsSourceSpecific qualifier valueStatistical MethodsStatistical ModelsSystemTechnologyTestingTextTimeUnited States National Library of Medicinebaseclinical applicationclinical phenotypecohortcomputer human interactioncomputerizedcostexperienceimprovedmodel developmentnovelopen sourcestatisticssuccesstoolusability
项目摘要
DESCRIPTION (provided by applicant): Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing.
In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims. In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.
描述(由申请人提供):电子健康记录 (EHR) 系统的不断部署使大量临床数据可以电子方式获得。然而,患者的许多详细临床信息都嵌入在叙述性文本中,并且无法直接用于计算机化临床应用。因此,能够解锁叙述性文档信息的自然语言处理(NLP)技术在医学领域受到了极大的关注。当前最先进的 NLP 方法通常涉及构建概率模型。然而,统计方法在临床自然语言处理中的广泛采用面临着两大挑战:1)缺乏大量带注释的临床语料库; 2)缺乏能够有效地将语言和领域知识与统计学习相结合的方法。高性能统计 NLP 方法依赖于大规模和高质量的临床文本注释,但创建大型注释临床语料库既耗时又昂贵,因为它通常需要医生进行手动审核。此外,医学领域是知识密集型的。为了实现最佳性能,概率模型需要利用医学领域知识。因此,临床文本处理非常需要能够有效地将领域和专家知识与机器学习过程相结合,以最小注释成本快速构建高质量概率模型的方法。
在本研究中,我们建议研究交互式机器学习(IML)方法来解决临床 NLP 中的上述挑战。 IML系统在迭代过程中构建分类模型,可以基于先前注释样本构建的模型主动选择信息丰富的样本进行注释,从而降低模型开发的注释成本。更重要的是,IML 系统还涉及学习过程中的人工输入(例如,专家可以根据领域知识为分类任务指定重要特征)。因此,IML 是一个理想的框架,可以有效地将基于规则(通过领域专家指定特征)和基于统计(通过不同的学习算法)的方法集成到临床 NLP 中。为了实现我们的目标,我们提出了三个具体目标。在目标 1 中,我们计划研究 IML 的词义消歧的不同方面,包括开发新的主动学习算法和进行认知可用性分析,以便用户进行有效的特征注释。为了展示 IML 的广泛用途,我们进一步将 IML 方法扩展到另外两个重要的临床 NLP 分类任务:目标 2 中的命名实体识别和临床表型分析。最后,我们建议在目标 3 中向生物医学研究界传播 IML 方法和工具。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HUA XU其他文献
HUA XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HUA XU', 18)}}的其他基金
Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia
利用纵向数据和信息学技术了解双语在认知弹性、衰老和痴呆中的作用
- 批准号:
10583170 - 财政年份:2023
- 资助金额:
$ 55.84万 - 项目类别:
Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
参与和推广,为 BICAN 实现公平的数据生态系统
- 批准号:
10523908 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
检测 AD/ADRD 药物和非药物干预措施的协同效应
- 批准号:
10501245 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8077875 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8589822 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
- 批准号:
9132834 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
7866149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8305149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
7726747 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
8298614 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
相似国自然基金
基于共识主动性学习的城市电动汽车充电、行驶行为与交通网—配电网协同控制策略研究
- 批准号:62363022
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
基于主动迁移学习的SAR图像场景目标联合识别方法研究
- 批准号:62301250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
量子点光学膜的原位动态高光谱监测与主动学习优化
- 批准号:22305015
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
面向医学图像处理任务的主动学习新技术研究
- 批准号:82372097
- 批准年份:2023
- 资助金额:48 万元
- 项目类别:面上项目
基于主动统计迁移学习的电动汽车传动系统关键部件智能故障诊断研究
- 批准号:52305109
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Diversity Supplement to Structure/Function of Transcription Complex Regulation to Support Predoctoral Student Christiana Binkley
转录复合体调节结构/功能的多样性补充以支持博士生克里斯蒂娜·宾克利
- 批准号:
10351034 - 财政年份:2020
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10596532 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10395559 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Oscillons in Wakefulness and in Sleep: Discrete Structure of Hippocampal Brain Rhythms
清醒和睡眠中的振荡:海马脑节律的离散结构
- 批准号:
10226824 - 财政年份:2019
- 资助金额:
$ 55.84万 - 项目类别:
Intranasal Nanodelivery of Oxytocin to Treat Morphine Addiction in HIV Patients by Gene Editing
通过基因编辑鼻内纳米递送催产素治疗 HIV 患者吗啡成瘾
- 批准号:
9411292 - 财政年份:2017
- 资助金额:
$ 55.84万 - 项目类别: