Euphemisms are an indispensable means of communication in language exchange. The study of euphemisms has always been one of the hot topics in the field of linguistics. However, in the field of natural language processing, there has been no research related to euphemisms. With the help of existing paper dictionaries and based on the methods of corpus retrieval and expert manual judgment, this article initially constructs a Chinese euphemism language resource with a scale of more than 63,000 pieces of corpus. And according to the relevant task requirements of natural language processing, it classifies euphemisms in combination with dictionary definitions. This article proposes a method of using the context of similar euphemisms to assist in annotation. After experiments, the accuracy rate of semantic discrimination for simple semantic euphemisms reaches 89.71%, and the accuracy rate of semantic discrimination for complex semantic multi-category euphemisms reaches 74.65%, preliminarily verifying the feasibility of using computer-assisted manual annotation to construct euphemism language resources.
委婉语是语言交流中不可或缺的交际手段,委婉语研究一直是语言学界的热门话题之一,但在自然语言处理领域,尚未有委婉语相关研究。该文借助现有纸质词典,基于语料库检索和专家人工判别的方式,初步构建了规模为63 000余条语料的汉语委婉语语言资源;并根据自然语言处理的相关任务需求,结合词典释义对委婉语进行分类。该文提出了利用同类委婉语的上下文语境辅助进行标注的方法。经过实验,对简单语义委婉语的语义判别准确率达89.71%,对语义复杂的兼类委婉语的语义判别准确率达74.65%,初步验证了利用计算机辅助人工标注构建委婉语语言资源的可行性。