FreeTxt: supporting bilingual free-text survey and questionnaire data analysis

FreeTxt：支持双语自由文本调查和问卷数据分析

基本信息

批准号：
AH/W004844/1
负责人：
Dawn Knight
金额：
$ 10.28万
依托单位：
CARDIFF UNIVERSITY
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=AH%2FW004844%2F1
关键词：
FreeTxt supporting bilingual free text

项目摘要

In a modern consumer-led culture, obtaining and responding to qualitative feedback (i.e. often free-text comments/written feedback) is embedded in the professional practice of many walks of life.Surveys are used, for example, in staff development, professional training, product design and testing, and in various forms of service provision across the public and private sector. Surveys and questionnaires often produce a combination of quantitative and qualitative forms of data. Quantitative forms, such as rating scales (e.g. likert scale responses), multiple choice questions and rank order questions can be numerated (i.e. quantified) with ease, the analysis of which can be conducted in a systematic and often automated way. By contrast, more qualitative questions, which prompt open ended, free-text comment responses, or, in the context of the tourism and heritage sector, written feedback from exhibitions, events and/or historical sites on social media channels or websites including Trip Advisor and Trust Pilot, pose a more difficult challenge for the analyst. Tackling written, text-based feedback often requires a more labour-intensive and manual approach to analysis. Compounding this challenge is where feedback is presented in both English and Welsh, as is often the case in Wales, with Wales representing the largest bilingual community in the UK. The successful analysis of bilingual data relies on the workforce having the appropriate linguistic expertise to process it.While a range of sophisticated digital tools for the analysis of text-based data are available, particularly for researchers working in academia, in marketing and public relations contexts etc., many of the digital resources used are not necessarily affordable, quick and easy to use, and/or accessible to non-expert users. Specifically, these tools currently do not fully support the task of systematically processing free-text responses in Welsh.This project aims to bridge this gap by building the novel 'FreeTxt' toolkit which is designed to support the analysis and visualisation of multiple forms of open-ended, free-text data in both English and Welsh. FreeTxt will draw on existing open-source bilingual corpus-based utilities and methodologies, repackaging these and taking them in a new direction so that they are relevant to new audiences/user-groups. We will work closely with project partners Cadw and National Trust Wales to co-design, co-construct and test FreeTxt to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.Existing tools that we will draw on include those developed as part of the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh). This includes CorCenCC's semantic (i.e. meaning based categorisations of individual words and phrases) and part of speech (POS - i.e. grammar-based categorisations of individual words and phrases - e.g. nouns, verbs) taggers and tagsets for Welsh language, and corpus functionalities for the querying of language, amongst others. These tools will be integrated into a user-friendly, online interface that users can paste/upload their texts into, to search for patterns of meaning that emerge in survey responses and feedback; to see which words are most often used in relation to a given theme, place, topic; to understand what visitors particularly enjoyed about a service or attraction, and what they think could be improved.The final version of the tool will be made freely-available and will be adaptable in terms of who can use it and when. It will contain generic analysis features that enable it to be used by any public and/or professional company and institution dealing with varying datasets of qualitative survey data and will be of relevance to academic researchers analysing and visualising survey data. The accessibility and usability of this tool will help provide a direct route to potential impact.

在现代消费者领导的文化中，获得定性反馈（即通常自由文本评论/书面反馈）的反应已嵌入到许多人生的专业实践中。调查和问卷通常会产生定量和定性数据形式的组合。定量形式，例如评级量表（例如Likert量表响应），多项选择问题和等级顺序问题可以轻松数字（即量化），可以以系统的且通常是自动化的方式进行分析。相比之下，更定性的问题促使公开结束，自由文本的评论回答，或者在旅游业和遗产领域的背景下，来自社交媒体渠道或网站上的展览，活动和/或历史网站（包括Trip Trip Advisor and Trust Pilot）的书面反馈对分析师构成更加困难的挑战。解决书面，基于文本的反馈通常需要采取更加劳动力和手动的分析方法。加剧了这一挑战是在威尔士经常出现的英语和威尔士的反馈中，威尔士代表了英国最大的双语社区。对双语数据的成功分析依赖于具有适当的语言专业知识来处理它的劳动力。尽管可以使用一系列用于分析基于文本数据的复杂数字工具，尤其是对于从事学术界，营销和公共关系环境中工作的研究人员，但使用的许多数字资源都不一定能够实现，并且不一定是可实施的，快速且易于使用，并且可用于非预设使用者。具体而言，这些工具当前并未完全支持威尔士语中系统地处理自由文本响应的任务。本项目旨在通过构建新颖的“ Freetxt”工具包来弥合这一差距，该工具包旨在支持英语和威尔士语中多种形式的开放式，免费文本数据的分析和可视化。 FreeTXT将利用现有的基于双语语料库的公用事业和方法，重新包装并将其带到新的方向，以便它们与新的受众/用户组相关。 We will work closely with project partners Cadw and National Trust Wales to co-design, co-construct and test FreeTxt to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.Existing tools that we will draw on include those developed as part of the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh).这包括Corcencc的语义（即单个单词和短语的基于含义的分类）和语音的一部分（即基于语法的单个单词和短语的基于语法的分类 - 例如，名词，动词）标签者和标记Welsh语言的标签和标签语言，以及对语言的语言函数，对其他语言的声音函数。这些工具将集成到用户可以粘贴/上传文本的用户友好的在线接口中，以搜索在调查响应和反馈中出现的含义模式；查看哪些单词最常用于给定主题，地点，主题；要了解访客特别喜欢的服务或景点，以及他们认为可以改进的东西。该工具的最终版本将可以自由使用，并且可以适应谁可以使用它以及何时使用。它将包含通用分析功能，使其能够由任何公共和/或专业公司以及处理不同定性调查数据数据集的机构使用，并与分析和可视化调查数据的学术研究人员有关。该工具的可访问性和可用性将有助于提供潜在影响的直接途径。