Textpresso, an information retrieval and extraction system for biological literat
Textpresso,生物文学信息检索和提取系统
基本信息
- 批准号:7047977
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2006
- 资助国家:美国
- 起止时间:2006-03-23 至 2009-01-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
DESCRIPTION (provided by applicant):
An information retrieval and extraction system that processes the full text of biological papers will be
developed. A prototype system has been in operation at WormBase for over a year, used by C. elegans
researchers as well as WormBase biological curators, and has recently been implemented for yeast at SGD. The system, called Textpresso, separates text into sentences, and labels words and phrases according to an ontology (an organized lexicon), and allows queries to be performed on a database of labeled sentences. The current ontology comprises 37 categories of terms, such as "gene," "regulation," "method," etc. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a threefold increase of search efficiency. This system will be further developed in three ways. First, the core system will be refined and altered to allow expansion to multiple domains of interest, e.g., model organisms, human disease. Simple modifications to the system and website functionality will be made, including synonym, search phrases, and case-sensitivity. A software package for local installation will be supported. The project team will maintain the Textpresso site (www.textpresso.org). which will include C. elegans and pilot systems, but software package will be available for installation of Textpresso at local sites, e.g., SGD, Flybase etc. Second, the ontology will be structured somewhat more deeply and lexica expanded for organism and field
specific terms. Third, algorithms for information extraction will be implemented. One approach will be the implementation of similarity measures using categories (high level nodes) of the Textpresso ontology to reduce the dimensionality of associated vector spaces. A second approach will be the development of hidden Markov models to fill slots of a fact template based on the marked-up text. Information extracted will be presented to the user or expert curator.
Public Description: The quality and pace of research depends upon rapid access to published information. This project will provide researchers with a search engine that rapidly gives them detailed, technical information they want by indexing the complete text of research articles.
描述(由申请人提供):
处理生物论文全文的信息检索和提取系统将是
发达。原型系统已在 WormBase 运行一年多,由秀丽隐杆线虫使用
研究人员以及 WormBase 生物管理者,最近已在 SGD 中针对酵母实施了该技术。该系统称为 Textpresso,将文本分成句子,并根据本体(有组织的词典)标记单词和短语,并允许在标记句子的数据库上执行查询。目前的本体包含 37 类术语,例如“基因”、“调节”、“方法”等。本体可以显着加速特定生物事实的提取,例如基因-基因相互作用,而 Textpresso 几乎可以自动执行以及专家策展人来识别句子;在搜索两个唯一命名的基因和一个相互作用项时,本体使搜索效率提高了三倍。该系统将通过三个方面进一步发展。首先,核心系统将被完善和改变,以允许扩展到多个感兴趣的领域,例如模式生物、人类疾病。将对系统和网站功能进行简单的修改,包括同义词、搜索短语和区分大小写。将支持本地安装的软件包。项目团队将维护 Textpresso 网站 (www.textpresso.org)。其中将包括线虫和试点系统,但软件包将可用于在本地站点安装 Textpresso,例如 SGD、Flybase 等。其次,本体的结构将更加深入,词汇将针对生物体和领域进行扩展
具体条款。第三,将实施信息提取算法。一种方法是使用 Textpresso 本体的类别(高级节点)来实施相似性度量,以减少相关向量空间的维数。第二种方法是开发隐马尔可夫模型来填充基于标记文本的事实模板的槽位。提取的信息将呈现给用户或专家管理者。
公开描述:研究的质量和速度取决于对已发布信息的快速访问。该项目将为研究人员提供一个搜索引擎,通过索引研究文章的完整文本,快速为他们提供所需的详细技术信息。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
PAUL Warren STERNBERG其他文献
PAUL Warren STERNBERG的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('PAUL Warren STERNBERG', 18)}}的其他基金
Curation at scale: Integrating AI into community curation
大规模策展:将人工智能融入社区策展
- 批准号:
10621338 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Curation at scale: Integrating AI into community curation
大规模策展:将人工智能融入社区策展
- 批准号:
10344771 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Bipartite gene expression system for C. elegans genetic and neural circuit analysis
用于线虫遗传和神经回路分析的二分基因表达系统
- 批准号:
9437389 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Textpresso, information retrieval and extraction system for biological literature
Textpresso,生物文献信息检索和提取系统
- 批准号:
7347569 - 财政年份:2006
- 资助金额:
$ 30万 - 项目类别:
Textpresso, information retrieval and extraction system for biological literature
Textpresso,生物文献信息检索和提取系统
- 批准号:
7212077 - 财政年份:2006
- 资助金额:
$ 30万 - 项目类别:
相似国自然基金
染色质多位点相互作用与多个基因转录协同的调控机制研究
- 批准号:32370691
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
APC基因敲除食蟹猴结直肠肿瘤形成及与MDSC相互作用
- 批准号:32371190
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于病毒-宿主基因组相互作用探讨EBV相关胃癌发生发展的分子机制
- 批准号:82303931
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
锌指蛋白ZBTB20通过与甲状腺激素受体相互作用调控基因转录的分子机制研究
- 批准号:32271162
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
基于人工基因组重排的多基因相互作用研究
- 批准号:22208241
- 批准年份:2022
- 资助金额:20 万元
- 项目类别:青年科学基金项目
相似海外基金
Modeling PIEZO associated diseases in Caenorhabditis elegans: from genetics to mechanism
秀丽隐杆线虫 PIEZO 相关疾病建模:从遗传学到机制
- 批准号:
10866791 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
The role of DOT1L methyltransferase in controlling the noncoding transcriptome
DOT1L 甲基转移酶在控制非编码转录组中的作用
- 批准号:
10809451 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Elucidation of mitochondrial mechanisms critical to mediating PFAS neurotoxicity
阐明对介导 PFAS 神经毒性至关重要的线粒体机制
- 批准号:
10805097 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Viral vector-mediated gene activation to facilitate large-scale genetic analysis in Caenorhabditis elegans.
病毒载体介导的基因激活,以促进秀丽隐杆线虫的大规模遗传分析。
- 批准号:
10572507 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Genetic and molecular regulation of experience-dependent structural plasticity
经验依赖性结构可塑性的遗传和分子调控
- 批准号:
10562121 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别: