BIGDATA: Collaborative Research: F: Streaming Architecture for Continuous Entity Linking in Social Media
BIGDATA:协作研究:F:社交媒体中连续实体链接的流架构
基本信息
- 批准号:1546480
- 负责人:
- 金额:$ 78.33万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A large fraction of the ever-growing internet content is found in social media such as (micro)blogs. Users access it to both form and share their opinions about events and people, election preferences, product and brand recommendations. This situation provides opportunities to create added layers of data mining and analysis regarding users' views on developing events, products, services, or government actions; at the same time, it raises challenges for Entity Linking (EL) in social media. EL is the task of linking an extracted mention to a specific definition of the entity. The definition of an entity is usually a pointer to a Web page that defines the entity. Information extraction from social media generally faces many challenging issues due to: message volume, message speed (Twitter alone generates over 500 million messages per day), variety, free-form language, lack of context, large reference variation and language diversity. Hashtags are an essential part of the ethos of social networks. They are used to denote brands, events, people, social rallies, etc. The hashtag disambiguation problem is to detect synonymous hashtags and recognize the polysemic ones. For example, the hashtag '#BHaram' refers to the entity 'Boko Haram', defined at Wikipedia page en.wikipedia.org/wiki/Boko_Haram or at National Counterterrorism Center Web web page www.nctc.gov/site/groups/boko_haram.html. The purpose of this project is to perform EL in social media. This work will benefit multiple segments of society that rely on applications using data from microblog systems, such as targeted monitoring of Twitter and Facebook to collect and understand users' opinions about a recent product or a world event; data aggregation (e.g., reviews about products and services); and data mining for early crisis detection and response as well as national security. This project is one more step towards addressing the government's latest initiative of fighting crime using big data.The goals of this project are to research algorithms to detect in near real-time those pieces of text in messages that reference entities, Web pages that describe entities, and to link entity references to Web pages and across microblog systems so that together a broad, more complete characterization of each entity can be automatically generated. The proposed approaches are based on innovative techniques that include: incremental, iterative message analysis; smart indexing techniques with live updates to support fast incremental entity reference detection; computationally light soft-clustering of messages to improve entity reference detection; and fast incremental K-partite graph clustering. The resulting artifacts (e.g., software tools) will be made available to benefit researchers in academe and industry. Distribution of free, open-source software for implementing the techniques developed will enhance existing research infrastructure. The project will support and train at least three PhD students, as well as involve undergraduate students in research at Temple University and Binghampton University. The project web site (http://cis.temple.edu/~edragut/projects/nimel.htm) includes more information on the project, software, datasets, educational materials, and publications.
不断增长的互联网内容的很大一部分存在于(微)博客等社交媒体中。用户访问它可以形成并分享他们对事件和人物、选举偏好、产品和品牌推荐的看法。这种情况提供了创建附加层数据挖掘和分析的机会,这些数据挖掘和分析涉及用户对开发事件、产品、服务或政府行为的看法;同时,它对社交媒体中的实体链接(EL)提出了挑战。 EL 是将提取的提及链接到实体的特定定义的任务。实体的定义通常是指向定义该实体的网页的指针。从社交媒体中提取信息通常面临许多具有挑战性的问题,因为:消息量、消息速度(仅 Twitter 每天就生成超过 5 亿条消息)、多样性、自由形式的语言、缺乏上下文、参考差异大和语言多样性。标签是社交网络精神的重要组成部分。它们用于表示品牌、事件、人物、社交集会等。主题标签消歧问题是检测同义主题标签并识别多义主题标签。例如,主题标签“#BHaram”指的是实体“Boko Haram”,定义于维基百科页面 en.wikipedia.org/wiki/Boko_Haram 或国家反恐中心网页 www.nctc.gov/site/groups/boko_haram .html。这个项目的目的是在社交媒体上进行EL。这项工作将使依赖使用微博系统数据的应用程序的社会多个领域受益,例如对 Twitter 和 Facebook 进行有针对性的监控,以收集和了解用户对最近产品或世界事件的看法;数据聚合(例如,有关产品和服务的评论);以及用于早期危机发现和响应以及国家安全的数据挖掘。该项目是朝着解决政府利用大数据打击犯罪的最新举措迈出的又一步。该项目的目标是研究算法,以近乎实时地检测引用实体的消息中的文本、描述实体的网页,并将实体引用链接到网页和跨微博系统,以便可以自动生成每个实体的广泛、更完整的特征。所提出的方法基于创新技术,包括:增量、迭代消息分析;具有实时更新的智能索引技术,支持快速增量实体引用检测;消息的计算轻量软聚类可改善实体参考检测;以及快速增量K分图聚类。由此产生的工件(例如软件工具)将可供学术界和工业界的研究人员使用。用于实施所开发技术的免费开源软件的分发将增强现有的研究基础设施。该项目将支持和培训至少三名博士生,并让本科生参与天普大学和宾汉普顿大学的研究。该项目网站 (http://cis.temple.edu/~edragut/projects/nimel.htm) 包含有关该项目、软件、数据集、教育材料和出版物的更多信息。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eduard Dragut其他文献
Eduard Dragut的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eduard Dragut', 18)}}的其他基金
Proto-OKN Theme 1: Knowledge Graph to Support Evaluation and Development of Climate Models
Proto-OKN 主题 1:支持气候模型评估和开发的知识图
- 批准号:
2333789 - 财政年份:2023
- 资助金额:
$ 78.33万 - 项目类别:
Cooperative Agreement
NSF Convergence Accelerator Track F: America's Fourth Estate at Risk: A System for Mapping the (Local) Journalism Life Cycle to Rebuild the Nation's News Trust
NSF 融合加速器轨道 F:美国第四产业面临风险:绘制(本地)新闻生命周期图以重建国家新闻信任的系统
- 批准号:
2137846 - 财政年份:2021
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
III: Medium: Collaborative Research: Extracting and Linking AI Artifacts
III:媒介:协作研究:提取和链接人工智能工件
- 批准号:
2107213 - 财政年份:2021
- 资助金额:
$ 78.33万 - 项目类别:
Continuing Grant
BIGDATA: F: Collaborative Research: Collective Mining of Vertical Social Communities
BIGDATA:F:协同研究:垂直社交社区的集体挖掘
- 批准号:
1838145 - 财政年份:2018
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
相似国自然基金
数智背景下的团队人力资本层级结构类型、团队协作过程与团队效能结果之间关系的研究
- 批准号:72372084
- 批准年份:2023
- 资助金额:40 万元
- 项目类别:面上项目
颅颌面手术机器人辅助半面短小牵张成骨术的智能规划与交互协作研究
- 批准号:
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:
面向自主认知与群智协作的多智能体制造系统关键技术研究
- 批准号:52305539
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大规模物联网多协作绿色信息感知和智慧响应决策一体化方法研究
- 批准号:62371149
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
多UAV协作的大规模传感网并发充电模型及其服务机制研究
- 批准号:62362017
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
相似海外基金
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
2348159 - 财政年份:2023
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
- 批准号:
2308649 - 财政年份:2022
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
BigData:IA:Collaborative Research: TIMES: A tensor factorization platform for spatio-temporal data
BigData:IA:协作研究:TIMES:时空数据张量分解平台
- 批准号:
2034479 - 财政年份:2020
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
- 批准号:
2027516 - 财政年份:2020
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1934319 - 财政年份:2019
- 资助金额:
$ 78.33万 - 项目类别:
Standard Grant