POET: Consolidated, Comprehensive Clinical Text Preprocessing

POET:整合、全面的临床文本预处理

基本信息

  • 批准号:
    7570254
  • 负责人:
  • 金额:
    $ 16.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-09-30 至 2010-08-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): As electronic health records (EHRs) continue their expansion into clinical settings, there has been a corresponding increase in interest in mining the data they contain, both for research as well as for clinical decision support. Informaticists are increasingly studying ways to mine EHR textual content. This is an important trend, because there is a wealth of information contained in clinical text not represented anywhere else in the EHR. There is a low level text-as-data issue which presents a significant obstacle to the widespread use of available medical NLP systems: hand-typed clinical narratives in EHRs are usually ungrammatical; short or telegraphic in style; full of abbreviations, acronyms, and misspellings; formatted in a templated or pseudo-tabular form; and contain embedded non-text such as a list of laboratory values cut-and-pasted from elsewhere in the EHR. As we show in the Preliminary Studies Section, this makes high-level processing by popular tools like MedLEE and MetaMap effectively useless for all but a few "clean" document types like discharge summaries or consult reports (e.g., pathology or radiology reports). This in turn explains why there is so little published about what is certainly the preponderance of clinical texts, those that are not as well-behaved lexically and syntactically as a discharge summary. In this application we distinguish clinical narratives (e.g., a progress note) from biomedical narratives (e.g., a PubMed abstract). We are interested in texts that arise in the clinical or research setting; texts that are composed by clinicians and researchers directly into a computer system. We propose to build and publish a tool called POET (Parsable Output Extracted from Text). POET will be designed to accept unstructured textual documents and return structured, linguistic equivalents that are, to the extent possible, parsable by higher-level NLP engines. POET will have an architecture is modular, extensible, and based on open-source platforms and sources (e.g., Java, Perl, UMLS, NegEx, the Stanford Parser, HL7 Clinical Document Architecture, caGRID, etc.). To implement POET, we will collect, program, and evaluate published as well as novel algorithms for: acronym/abbreviation resolution; spelling correction; template and pseudo-table re-writing; and removal of embedded non-text. To test POET we will use a large corpus of cross-discipline (e.g., medical, nursing, pharmacy, etc.) clinical note types, as well as the clinical research texts MedWatch reports and IRB adverse event reports. The development of POET will combine the best practices found in the literature and new research efforts as part of the project. To validate the fidelity of POET processing we plan a formal analysis of information loss and information gain pre- and post-process. To ensure broad access to the tools, POET will be released under an open-source license. Finally, we plan to assess the feasibility of offering POET as a Web service for remote processing.
描述(由申请人提供): 随着电子健康记录(EHRS)继续扩展到临床环境中,在挖掘所包含的数据的兴趣中,既有研究,又有临床决策支持。信息家越来越多地研究挖掘EHR文本内容的方法。这是一个重要的趋势,因为临床文本中没有代表EHR中其他任何地方都包含大量信息。有一个低水平的文本数据,这给广泛使用可用的医疗NLP系统带来了重要的障碍:EHR中的手工临床叙事通常是不语法的;短或电报的风格;充满了缩写,首字母缩写和拼写错误;以模板或伪式形式格式化;并包含嵌入式非文本,例如从EHR中其他地方剪切和贴上实验室值的列表。正如我们在“初步研究”部分中显示的那样,除了几种“干净”文档类型(例如出院摘要或咨询报告)(例如病理学或放射学报告)之外,这使得诸如Medlee和MetAmap之类的流行工具有效地进行了高级处理。反过来,这解释了为什么关于临床文本肯定是什么,那些肯定是什么,那些肯定是什么,那些临床文本不像出院摘要那样词汇和语法上的表现不佳。 在此应用中,我们将临床叙述(例如,进度注释)与生物医学叙述(例如PubMed摘要)区分开来。我们对临床或研究环境中出现的文本感兴趣;由临床医生和研究人员直接构成计算机系统的文本。我们建议构建和发布一种称为诗人的工具(从文本中提取的可简易输出)。诗人将旨在接受非结构化的文本文档,并返回结构化的语言等效物,在可能的程度上可以通过高级NLP发动机来解析。诗人将拥有一个模块化,可扩展的建筑,并且基于开源平台和来源(例如Java,Perl,Umls,Negex,Stanford Parser,Stanford Parser,HL7临床文档架构,Cagrid等)。为了实施诗人,我们将收集,编程和评估已发布的新算法:首字母缩写/缩写解决方案;拼写校正;模板和伪表重写;并去除嵌入式非文本。为了测试诗人,我们将使用大量的跨学科(例如,医疗,护理,药房等)临床注释类型,以及临床研究文本MedWatch报告和IRB不良事件报告。诗人的发展将结合文献中发现的最佳实践和新的研究工作,作为该项目的一部分。为了验证诗人加工的保真度,我们计划对信息丢失和信息获得的正式分析前和后期制作。为了确保对工具的广泛访问,诗人将根据开源许可发布。最后,我们计划评估将诗人作为用于远程处理的网络服务的可行性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

JOHN F. HURDLE其他文献

JOHN F. HURDLE的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('JOHN F. HURDLE', 18)}}的其他基金

University of Utah Biomedical Informatics Training Grant Supplement
犹他大学生物医学信息学培训补助金补充
  • 批准号:
    9380137
  • 财政年份:
    2016
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET-2: High-performance computing for advanced clinical narrative preprocessing
POET-2:用于高级临床叙述预处理的高性能计算
  • 批准号:
    8326648
  • 财政年份:
    2011
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET-2: High-performance computing for advanced clinical narrative preprocessing
POET-2:用于高级临床叙述预处理的高性能计算
  • 批准号:
    8182025
  • 财政年份:
    2011
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET: Consolidated, Comprehensive Clinical Text Preprocessing
POET:整合、全面的临床文本预处理
  • 批准号:
    7689273
  • 财政年份:
    2008
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET: Consolidated, Comprehensive Clinical Text Preprocessing
POET:整合、全面的临床文本预处理
  • 批准号:
    7847940
  • 财政年份:
    2008
  • 资助金额:
    $ 16.93万
  • 项目类别:
Statistical NLP Analysis of Cross-discipline Clinical Text
跨学科临床文本的统计NLP分析
  • 批准号:
    6836781
  • 财政年份:
    2004
  • 资助金额:
    $ 16.93万
  • 项目类别:
Statistical NLP Analysis of Cross-discipline Clinical Text
跨学科临床文本的统计NLP分析
  • 批准号:
    6944955
  • 财政年份:
    2004
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    8681515
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    8261299
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    9086432
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:

相似海外基金

A Double-Blinded Comparison of the Accuracy of ShuntCheck, a Non-Invasive Device
非侵入性设备 ShuntCheck 准确性的双盲比较
  • 批准号:
    8057207
  • 财政年份:
    2011
  • 资助金额:
    $ 16.93万
  • 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
  • 批准号:
    8077875
  • 财政年份:
    2010
  • 资助金额:
    $ 16.93万
  • 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
  • 批准号:
    7866149
  • 财政年份:
    2010
  • 资助金额:
    $ 16.93万
  • 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
  • 批准号:
    8305149
  • 财政年份:
    2010
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET: Consolidated, Comprehensive Clinical Text Preprocessing
POET:整合、全面的临床文本预处理
  • 批准号:
    7689273
  • 财政年份:
    2008
  • 资助金额:
    $ 16.93万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了