RI: Small: DaRE: Detection and Recognition of Euphemisms
RI:小:DaRE:委婉语的检测和识别
基本信息
- 批准号:2226006
- 负责人:
- 金额:$ 56.41万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-01-01 至 2025-12-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
To fully understand human language, machines need to be able to recognize and interpret expressions that contain hidden meanings. This project concentrates on euphemisms, mild or indirect phrases used in place of harsher or more offensive ones. Euphemisms are often used to mask profanity or refer to sensitive topics such as death, sex, religion, disability, or personal relationships in a polite way. People use euphemisms all the time, e.g., 'negative patient outcome', 'between jobs', 'financially fortunate', 'correctional facility','friendly fire', or 'sunshine unit'. Different cultures/languages use different euphemisms. Euphemisms change over time. Machines that process human language do not understand euphemisms yet. This project is devoted to making machines understand euphemisms in different languages, and therefore contributing to improving the capabilities of artificial intelligence. Additional benefits include interesting new generalizations about the nature of euphemisms and the training of a diverse cadre of undergraduate and graduate students in highly practical work on a difficult interdisciplinary problem. Montclair State University, a Hispanic Serving Institution, is known for its diverse student population and a large proportion of first-generation college students. Montclair State University puts great emphasis on justice and inclusivity in academia. This project is not an exception.Detecting and interpreting figurative language is a rapidly growing area in Natural Language Processing (NLP). Unfortunately, the processing of euphemisms is lacking in NLP thus far. The project addresses the following: 1) algorithm design for detecting and interpreting euphemisms, and 2) interpretability of black-box neural models by creating a series of new datasets and tasks that explore the embedding space of transformer language models for euphemism recognition. The key insights are 1) euphemistic expressions and their paraphrased counterparts differ in the strength of the sentiment they convey; 2) euphemistic and non-euphemistic interpretation is context-sensitive; 3) euphemisms are vaguer than the taboo expressions they substitute. The experiments test what linguistic properties of euphemisms the deep learning approaches capture and why. The algorithm developed can detect new euphemisms, not previously recorded in dictionaries, without human intervention. The computational work on euphemisms is important to further the understanding of how strategic use of language can bias people's perceptions of important and highly contentious actions and perhaps find ways how to de-bias language models. This work on euphemisms helps understand what topics are controversial or sensitive in a specific culture. Applying the algorithm to diachronic data and detecting the change in euphemism usage leads to a better understanding of culture changes. The corpora produced are useful for answering questions at the intersection of AI, NLP, linguistics, cultural anthropology, and social psychology. The range of languages provides a natural way of making interesting linguistic observations about euphemisms. Since euphemisms are a form of verbal behavior, finding a way to detect and interpret euphemisms automatically may lead to a better understanding of human behavior in general.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
为了充分理解人类语言,机器需要能够识别和解释包含隐藏含义的表达。该项目专注于委婉语、温和或间接的短语,用来代替更严厉或更具攻击性的短语。委婉语通常用于掩盖脏话或以礼貌的方式提及敏感话题,例如死亡、性、宗教、残疾或个人关系。人们总是使用委婉语,例如“患者的负面结果”、“工作间隙”、“经济上幸运”、“惩教所”、“友军火力”或“阳光单位”。不同的文化/语言使用不同的委婉语。委婉语随着时间的推移而变化。处理人类语言的机器还不能理解委婉语。该项目致力于让机器理解不同语言的委婉语,从而为提高人工智能的能力做出贡献。其他好处包括对委婉语本质的有趣的新概括,以及对多元化的本科生和研究生干部进行培训,以解决困难的跨学科问题的高度实践工作。蒙特克莱尔州立大学是一所西班牙裔服务机构,以其多元化的学生群体和大量的第一代大学生而闻名。蒙特克莱尔州立大学非常重视学术界的公正和包容性。这个项目也不例外。检测和解释比喻语言是自然语言处理 (NLP) 中一个快速发展的领域。不幸的是,到目前为止,NLP 还缺乏委婉语的处理。该项目解决以下问题:1)用于检测和解释委婉语的算法设计,2)通过创建一系列新的数据集和任务来探索用于委婉语识别的 Transformer 语言模型的嵌入空间,从而实现黑盒神经模型的可解释性。主要见解是:1)委婉表达和释义表达的不同之处在于它们所传达的情感强度; 2)委婉和非委婉解释是上下文相关的; 3)委婉语比它们所替代的禁忌用语更加模糊。这些实验测试了深度学习方法捕获委婉语的哪些语言特性以及原因。所开发的算法可以检测以前未记录在词典中的新委婉语,无需人工干预。委婉语的计算工作对于进一步理解语言的策略性使用如何影响人们对重要和高度争议的行为的看法具有重要意义,并可能找到消除语言模型偏见的方法。这项关于委婉语的工作有助于了解哪些话题在特定文化中是有争议或敏感的。将算法应用于历时数据并检测委婉语使用的变化可以更好地理解文化变化。生成的语料库对于回答人工智能、自然语言处理、语言学、文化人类学和社会心理学的交叉问题非常有用。语言的范围提供了一种对委婉语进行有趣的语言观察的自然方式。由于委婉语是言语行为的一种形式,因此找到一种自动检测和解释委婉语的方法可能会更好地理解人类的一般行为。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优势进行评估,被认为值得支持以及更广泛的影响审查标准。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Report on the Euphemisms Detection Shared Task
- DOI:10.48550/arxiv.2211.13327
- 发表时间:2022-11
- 期刊:
- 影响因子:0
- 作者:Patrick Lee;Anna Feldman;J. Peng
- 通讯作者:Patrick Lee;Anna Feldman;J. Peng
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
第三届形象语言处理(FLP)研讨会论文集
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Ghosh, Debanjan
- 通讯作者:Ghosh, Debanjan
NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification
- DOI:10.48550/arxiv.2305.10971
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Iyanuoluwa Shode;David Ifeoluwa Adelani;J. Peng;Anna Feldman
- 通讯作者:Iyanuoluwa Shode;David Ifeoluwa Adelani;J. Peng;Anna Feldman
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Anna Feldman其他文献
WordPrep: Word-based Preposition Prediction Tool
WordPrep:基于单词的介词预测工具
- DOI:
10.1109/bigdata47090.2019.9005608 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Pooja Bhagat;A. Varde;Anna Feldman - 通讯作者:
Anna Feldman
Experiments in Cross-Language Morphological Annotation Transfer
跨语言形态注释迁移实验
- DOI:
10.1007/11671299_4 - 发表时间:
2006 - 期刊:
- 影响因子:0
- 作者:
Anna Feldman;Jirka Hana;Chris Brew - 通讯作者:
Chris Brew
Evaluating and automating the annotation of a learner corpus
评估和自动化学习者语料库的注释
- DOI:
10.1007/s10579-013-9226-3 - 发表时间:
2013 - 期刊:
- 影响因子:2.7
- 作者:
Alexandr Rosen;Jirka Hana;Barbora Stindlová;Anna Feldman - 通讯作者:
Anna Feldman
Legend at ArAIEval Shared Task: Persuasion Technique Detection using a Language-Agnostic Text Representation Model
ArAIEval 共享任务的传奇:使用与语言无关的文本表示模型进行说服技术检测
- DOI:
10.48550/arxiv.2310.09661 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
O. E. Ojo;O. O. Adebanji;Hiram Calvo;Damian O. Dieke;Olumuyiwa E. Ojo;S.E. Akinsanya;Tolulope O. Abiola;Anna Feldman - 通讯作者:
Anna Feldman
Linguistic Fingerprints of Internet Censorship: the Case of SinaWeibo
互联网审查的语言指纹:以新浪微博为例
- DOI:
10.1609/aaai.v34i01.5381 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Kei Yin Ng;Anna Feldman;Jing Peng - 通讯作者:
Jing Peng
Anna Feldman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Anna Feldman', 18)}}的其他基金
Workshop on Natural Language Processing for Internet Freedom
自然语言处理促进互联网自由研讨会
- 批准号:
1828199 - 财政年份:2018
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
Student Support at the North American Association for Computational Linguistics Workshop on Computational Methods for Analysis of Narrative
北美计算语言学协会叙事分析计算方法研讨会的学生支持
- 批准号:
1523285 - 财政年份:2015
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
RI: Small: RUI: AIR: Automatic Idiom Recognition
RI:小:RUI:AIR:自动成语识别
- 批准号:
1319846 - 财政年份:2013
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
Undergraduate Research: Cross-Lingual Approaches to Morphosyntactic Tagging
本科生研究:形态句法标记的跨语言方法
- 批准号:
1033275 - 财政年份:2010
- 资助金额:
$ 56.41万 - 项目类别:
Continuing Grant
RI:EAGER: A Montclair Group in Cognitive and Computational Aspects of Language and Speech Processing: An Exploration
RI:EAGER:蒙特克莱尔小组在语言和语音处理的认知和计算方面:探索
- 批准号:
1048406 - 财政年份:2010
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
RI: Small: RUI: Resource-light Morphosyntactic Tagging of Morphologically Complex Languages
RI:小:RUI:形态复杂语言的轻资源形态句法标记
- 批准号:
0916280 - 财政年份:2009
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
Workshop on Computational Approaches to Linguistic Creativity - Element 7495
语言创造力计算方法研讨会 - 元素 7495
- 批准号:
0906244 - 财政年份:2009
- 资助金额:
$ 56.41万 - 项目类别:
Standard Grant
相似国自然基金
诊疗一体化PS-Hc@MB协同训练介导脑小血管病康复的作用及机制研究
- 批准号:82372561
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
非小细胞肺癌MECOM/HBB通路介导血红素代谢异常并抑制肿瘤起始细胞铁死亡的机制研究
- 批准号:82373082
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
基于胆碱能皮层投射纤维探讨脑小血管病在帕金森病步态障碍中的作用及机制研究
- 批准号:82301663
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
关于丢番图方程小素数解上界估计的研究
- 批准号:12301005
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
嗅球小胶质细胞P2X7受体在变应性鼻炎发生帕金森病样改变中的作用与机制研究
- 批准号:82371119
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Powering Small Craft with a Novel Ammonia Engine
用新型氨发动机为小型船只提供动力
- 批准号:
10099896 - 财政年份:2024
- 资助金额:
$ 56.41万 - 项目类别:
Collaborative R&D
"Small performances": investigating the typographic punches of John Baskerville (1707-75) through heritage science and practice-based research
“小型表演”:通过遗产科学和基于实践的研究调查约翰·巴斯克维尔(1707-75)的印刷拳头
- 批准号:
AH/X011747/1 - 财政年份:2024
- 资助金额:
$ 56.41万 - 项目类别:
Research Grant
人工知能に基づく非線形高次元小標本データ解析とその社会的応用
基于人工智能的非线性高维小样本数据分析及其社会应用
- 批准号:
24K14847 - 财政年份:2024
- 资助金额:
$ 56.41万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Fragment to small molecule hit discovery targeting Mycobacterium tuberculosis FtsZ
针对结核分枝杆菌 FtsZ 的小分子片段发现
- 批准号:
MR/Z503757/1 - 财政年份:2024
- 资助金额:
$ 56.41万 - 项目类别:
Research Grant
Bacteriophage control of host cell DNA transactions by small ORF proteins
噬菌体通过小 ORF 蛋白控制宿主细胞 DNA 交易
- 批准号:
BB/Y004426/1 - 财政年份:2024
- 资助金额:
$ 56.41万 - 项目类别:
Research Grant