A Document Processing System
文档处理系统
基本信息
- 批准号:8149592
- 负责人:
- 金额:$ 17.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
A system of C++ language programs has been developed for the purpose of finding the closely related documents in Medline and for the purpose of performing machine learning on sets of documents. The system has a number of unique features: 1) It is based on a number of C++ classes and highly modular so that alterations in the system are relatively simple to perform. 2) The system currently processes PubMed data by extracting from the Sybase repositories using a C++ interface to Sybase. However, a change in the interface portion of the system would allow it to be applied to any large database consisting of discrete textual records. 3) Data processed by the system is stored as compressed file structures, etc. These structures are updatable so that new data may be continually added to the system as it becomes available. 4) Documents are compared with each other using a Bayesian form of analysis. 5) The latest work on this system has involved adding the ability to generate themes using an EM algorithm approach. Also recently code has been multithreaded and memory mapping capabilities added to speed up processing.
The system described here is now not only being used to process all of MEDLINE for our research purposes, but also to produce the related documents for arbitrary pieces of text by other groups here in the NLM and outside of the NLM. The system has been used for mining email communications for the NLM help desk.
已经开发了C ++语言程序的系统,目的是在MEDLINE中查找密切相关的文档,并为了在一组文档上执行机器学习。该系统具有许多独特的功能:1)它基于许多C ++类,并且高度模块化,因此系统中的更改相对较简单。 2)该系统当前通过使用SYBASE的C ++接口从SYBASE存储库中提取PubMed数据。但是,系统的接口部分的更改将允许将其应用于由离散文本记录组成的任何大型数据库。 3)系统处理的数据被存储为压缩文件结构等。这些结构可更新,因此可以在系统中不断添加新数据。 4)使用贝叶斯的分析形式将文档相互比较。 5)该系统的最新工作涉及使用EM算法方法添加生成主题的能力。最近,代码也是多线程,并添加了内存映射功能以加快处理。
现在,此处描述的系统不仅用于为我们的研究目的处理所有MEDLINE,而且还用于制作NLM和NLM外部其他组的任意文本的相关文档。该系统已用于挖掘NLM帮助台的电子邮件通信。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据
数据更新时间:2024-06-01
Willy Wilbur的其他基金
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
- 批准号:83449608344960
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
General and Semi-supervised Machine Learning Applied to Bioinformatics
应用于生物信息学的通用和半监督机器学习
- 批准号:85581058558105
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
Natural Language Processing Techniques To Enhance Information Access.
增强信息访问的自然语言处理技术。
- 批准号:89432248943224
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
PubMed Query Log Analysis and Use in Access Inhancement
PubMed 查询日志分析及其在访问增强中的使用
- 批准号:79692447969244
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
Automatic Bayesian Methods In Text Retrieval
文本检索中的自动贝叶斯方法
- 批准号:81495918149591
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
General and Semi-supervised Machine Learning Applied to Bioinformatics
应用于生物信息学的通用和半监督机器学习
- 批准号:81496028149602
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
General and Semi-supervised Machine Learning Applied to Bioinformatics
应用于生物信息学的通用和半监督机器学习
- 批准号:83449488344948
- 财政年份:
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
相似国自然基金
基于运动想象与视觉注意的混合脑机接口研究及在电子邮件通信中的应用
- 批准号:61365013
- 批准年份:2013
- 资助金额:45.0 万元
- 项目类别:地区科学基金项目
可操作的电子邮件的理论及其智能应用研究
- 批准号:60673015
- 批准年份:2006
- 资助金额:21.0 万元
- 项目类别:面上项目
相似海外基金
RHODE ISLAND CHILDREN'S EQUITY AND DEVELOPMENT STUDY (ENRICHED) - PHASES 1 AND 2
罗德岛州儿童公平与发展研究(丰富)- 第 1 和第 2 阶段
- 批准号:1092304910923049
- 财政年份:2023
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
Development of a novel visualization, labeling, communication and tracking engine for human anatomy.
开发一种新颖的人体解剖学可视化、标签、通信和跟踪引擎。
- 批准号:1076106010761060
- 财政年份:2023
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
DATA MANAGEMENT FOR CANCER DIAGNOSIS PROGRAM ACTIVITIES
癌症诊断计划活动的数据管理
- 批准号:1084959510849595
- 财政年份:2023
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
The Unvarnished Truth: Pursuing Health Equity by Correcting Disinformation Targeting African Americans about the FDA's Proposed Ban on Menthol Cigarettes and Flavored Cigars
赤裸裸的真相:通过纠正针对非裔美国人的关于 FDA 提议禁止薄荷卷烟和调味雪茄的虚假信息来追求健康公平
- 批准号:1074028110740281
- 财政年份:2023
- 资助金额:$ 17.63万$ 17.63万
- 项目类别:
SCIENTIFIC AND PROGRAM SUPPORT SERVICES
科学和计划支持服务
- 批准号:1094290910942909
- 财政年份:2023
- 资助金额:$ 17.63万$ 17.63万
- 项目类别: