RI: Small: DaRE: Detection and Recognition of Euphemisms

RI：小：DaRE：委婉语的检测和识别

基本信息

批准号：
2226006
负责人：
Anna Feldman
金额：
$ 56.41万
依托单位：
Montclair State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-01-01 至 2025-12-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2226006&HistoricalAwards=false
关键词：
RI Small DaRE Detection Recognition

RI Small DaRE Detection Recognition

项目摘要

To fully understand human language, machines need to be able to recognize and interpret expressions that contain hidden meanings. This project concentrates on euphemisms, mild or indirect phrases used in place of harsher or more offensive ones. Euphemisms are often used to mask profanity or refer to sensitive topics such as death, sex, religion, disability, or personal relationships in a polite way. People use euphemisms all the time, e.g., 'negative patient outcome', 'between jobs', 'financially fortunate', 'correctional facility','friendly fire', or 'sunshine unit'. Different cultures/languages use different euphemisms. Euphemisms change over time. Machines that process human language do not understand euphemisms yet. This project is devoted to making machines understand euphemisms in different languages, and therefore contributing to improving the capabilities of artificial intelligence. Additional benefits include interesting new generalizations about the nature of euphemisms and the training of a diverse cadre of undergraduate and graduate students in highly practical work on a difficult interdisciplinary problem. Montclair State University, a Hispanic Serving Institution, is known for its diverse student population and a large proportion of first-generation college students. Montclair State University puts great emphasis on justice and inclusivity in academia. This project is not an exception.Detecting and interpreting figurative language is a rapidly growing area in Natural Language Processing (NLP). Unfortunately, the processing of euphemisms is lacking in NLP thus far. The project addresses the following: 1) algorithm design for detecting and interpreting euphemisms, and 2) interpretability of black-box neural models by creating a series of new datasets and tasks that explore the embedding space of transformer language models for euphemism recognition. The key insights are 1) euphemistic expressions and their paraphrased counterparts differ in the strength of the sentiment they convey; 2) euphemistic and non-euphemistic interpretation is context-sensitive; 3) euphemisms are vaguer than the taboo expressions they substitute. The experiments test what linguistic properties of euphemisms the deep learning approaches capture and why. The algorithm developed can detect new euphemisms, not previously recorded in dictionaries, without human intervention. The computational work on euphemisms is important to further the understanding of how strategic use of language can bias people's perceptions of important and highly contentious actions and perhaps find ways how to de-bias language models. This work on euphemisms helps understand what topics are controversial or sensitive in a specific culture. Applying the algorithm to diachronic data and detecting the change in euphemism usage leads to a better understanding of culture changes. The corpora produced are useful for answering questions at the intersection of AI, NLP, linguistics, cultural anthropology, and social psychology. The range of languages provides a natural way of making interesting linguistic observations about euphemisms. Since euphemisms are a form of verbal behavior, finding a way to detect and interpret euphemisms automatically may lead to a better understanding of human behavior in general.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

为了充分理解人类语言，机器需要能够识别和解释包含隐藏含义的表达式。该项目集中于委婉语，轻度或间接短语代替更严厉或更令人反感的短语。委婉语通常用于掩盖亵渎性或以礼貌的方式掩盖敏感的话题，例如死亡，性别，宗教，残疾或人际关系。人们一直在使用委婉语，例如“负面的患者结果”，“在工作之间”，“财务上幸运”，“惩教设施”，“友好的火”或“阳光单位”。不同的文化/语言使用不同的委婉语。随着时间的流逝，委婉语会改变。处理人类语言的机器尚不理解委婉语。该项目致力于使机器了解不同语言的委婉语，因此有助于提高人工智能的能力。其他好处包括有关委婉语的性质的有趣的新概括，以及在困难的跨学科问题上进行高度实践工作的各种本科生和研究生的培训。西班牙裔服务机构蒙特克莱尔州立大学（Montclair State University）以其多样化的学生人数和大部分第一代大学生而闻名。蒙特克莱州立大学非常重视学术界的正义和包容性。该项目不是一个例外。检测和解释比喻性语言是自然语言处理（NLP）中快速增长的领域。不幸的是，到目前为止，NLP缺乏委婉语的处理。该项目解决以下内容：1）用于检测和解释委婉语的算法设计，以及2）通过创建一系列新的数据集和任务来探索委托书识别的变压器语言模型的嵌入空间，从而解释黑盒神经模型。关键的见解是1）委婉的表达及其释义在他们传达的情绪的强度上有所不同； 2）委婉的和非脑电图解释对上下文敏感； 3）委婉语比他们替代的禁忌表达更含糊。实验测试了深度学习方法捕获的委婉语的语言特性以及原因。开发的算法可以检测新的委婉语，而不是在没有人类干预的情况下在词典中记录的。关于委婉语的计算工作对于进一步了解语言的战略使用如何使人们对重要和高度争议的行动的看法偏见，并找到方法来消除语言模型的方式很重要。这项关于委婉语的工作有助于了解哪些主题在特定文化中是有争议的或敏感的。将算法应用于历时数据并检测委婉使用的变化会导致对文化变化的更好理解。生产的语料库对于在AI，NLP，语言学，文化人类学和社会心理学的交集中回答问题很有用。语言的范围提供了一种自然的方式，可以对委婉语进行有趣的语言观察。由于委婉语是一种言语行为的一种形式，因此找到一种自动发现和解释委婉语的方法可能会导致对一般人的行为有更好的理解。该奖项反映了NSF的法定使命，并被认为是值得通过基金会的知识分子优点和更广泛的审查标准来通过评估来获得支持的。

项目成果

期刊论文数量（6）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Report on the Euphemisms Detection Shared Task

DOI：
10.48550/arxiv.2211.13327
发表时间：
2022-11
期刊：
ArXiv
影响因子：
0
作者：
Patrick Lee;Anna Feldman;J. Peng
通讯作者：
Patrick Lee;Anna Feldman;J. Peng

Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

第三届形象语言处理（FLP）研讨会论文集

DOI：
发表时间：
2022
期刊：
Association for Computational Linguistics
影响因子：
0
作者：
Ghosh, Debanjan
通讯作者：
Ghosh, Debanjan

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

DOI：
10.48550/arxiv.2305.10971
发表时间：
2023-05
期刊：
ArXiv
影响因子：
0
作者：
Iyanuoluwa Shode;David Ifeoluwa Adelani;J. Peng;Anna Feldman
通讯作者：
Iyanuoluwa Shode;David Ifeoluwa Adelani;J. Peng;Anna Feldman

共 3 条前往

页

Anna Feldman其他文献

WordPrep: Word-based Preposition Prediction Tool

WordPrep：基于单词的介词预测工具

DOI：
10.1109/bigdata47090.2019.9005608
10.1109/bigdata47090.2019.9005608
发表时间：
2019
2019
期刊：
2019 IEEE International Conference on Big Data (Big Data)
2019 IEEE International Conference on Big Data (Big Data)
影响因子：
0
作者：
Pooja Bhagat;A. Varde;Anna Feldman
Pooja Bhagat;A. Varde;Anna Feldman
通讯作者：
Anna Feldman
Anna Feldman

Experiments in Cross-Language Morphological Annotation Transfer

跨语言形态注释迁移实验

DOI：
10.1007/11671299_4
10.1007/11671299_4
发表时间：
2006
2006
期刊：
BioMed Research International
BioMed Research International
影响因子：
0
作者：
Anna Feldman;Jirka Hana;Chris Brew
Anna Feldman;Jirka Hana;Chris Brew
通讯作者：
Chris Brew
Chris Brew

Evaluating and automating the annotation of a learner corpus

评估和自动化学习者语料库的注释

DOI：
10.1007/s10579-013-9226-3
10.1007/s10579-013-9226-3
发表时间：
2013
2013
期刊：
Language Resources and Evaluation
Language Resources and Evaluation
影响因子：
2.7
作者：
Alexandr Rosen;Jirka Hana;Barbora Stindlová;Anna Feldman
Alexandr Rosen;Jirka Hana;Barbora Stindlová;Anna Feldman
通讯作者：
Anna Feldman
Anna Feldman

Legend at ArAIEval Shared Task: Persuasion Technique Detection using a Language-Agnostic Text Representation Model

ArAIEval 共享任务的传奇：使用与语言无关的文本表示模型进行说服技术检测

DOI：
10.48550/arxiv.2310.09661
10.48550/arxiv.2310.09661
发表时间：
2023
2023
期刊：
影响因子：
0
作者：
O. E. Ojo;O. O. Adebanji;Hiram Calvo;Damian O. Dieke;Olumuyiwa E. Ojo;S.E. Akinsanya;Tolulope O. Abiola;Anna Feldman
O. E. Ojo;O. O. Adebanji;Hiram Calvo;Damian O. Dieke;Olumuyiwa E. Ojo;S.E. Akinsanya;Tolulope O. Abiola;Anna Feldman
通讯作者：
Anna Feldman
Anna Feldman