Identifying & Classifying Bias in Cultural Heritage Catalogues: Applying Natural Language Processing to University of Edinburgh Archival Descriptions
识别
基本信息
- 批准号:2356289
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2020
- 资助国家:英国
- 起止时间:2020 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The objective of this project is to develop a context-informed approach to bias detection, executed as a series of case studies beginning with the University of Edinburgh's Archive. Motivated by separate yet related strands of research in the fields of Natural Language Processing (NLP) and Cultural Heritage, the project identifies opportunity to improve large-scale, automated bias detection. Taking a cross-disciplinary approach, the project applies NLP and data visualisation to archival descriptions. NLP approaches such as topic modelling and sentiment analysis will analyse and classify the language of the Archive's descriptions. Due to the context-dependency of bias, data visualisation provides a suitable approach to presenting results of the NLP analysis. Interactive data visualisations will present the results in their associated geographic areas and time periods, enabling people to see associations that Archive items have with different types of bias. The project will propose a visualisation framework for presenting bias in human language content, which, based on the author's knowledge, has yet to be proposed. Rather than eliminate bias, the project seeks to identify and classify bias, arguing that bias deserves a place in cultural heritage institutions.Bias, though problematic when one-sided, is informative when presented transparently. Bias communicates the perspective of specific groups of people during specific time periods in history; recording historical biases informs understandings of societal evolution and the various perspectives that have existed on a topic [1]. Identifying different types of bias helps researchers understand how representative their dataset is, where more types of bias being present suggests a more representative dataset. This project seeks to develop techniques for identifying and classifying bias that will bring value to cultural heritage institutions and the public they serve, making bias transparent in human language content anywhere from an archival description to a social media post.The project seeks to develop bias-detecting technology beginning with a case study with free-text, human-written, archival descriptions. Cataloguers first wrote archival descriptions on paper in the 1930s and then in databases beginning in the 1970s. Explicitly, the language of archival descriptions reflects their historical contexts, using terms considered racist, sexist or otherwise inappropriately biased today. Implicitly, missing information in archival descriptions regarding certain groups of people reflects historical biases. These types of explicit and implicit bias can be found in textual data beyond cultural heritage catalogues, such as in newspapers and social media posts. As a result, while improving the transparency of the Archive's descriptions, the outcomes of this project could also inform research on returning representative search results [5], implementing fair algorithms [2], and identifying bias in social media [3, 4].References1. Holterhoff, K. (2017) "From Disclaimer to Critique: Race and the Digital Image Archivist." In: Digital Humanities Quarterly 11.3 URL: http://digitalhumanities.org:8081/dhq/vol/11/3/ 000324/000324.html2. IEEE. (2016) Ethically Aligned Design: A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems. Version 1. http://standards.ieee.org/develop/indconn/ ec/autonomous%20systems.html 12.05.20183. Recasens, M., Danescu-Nculescu-Mizil, C., Jurafsky, D. (2013). "Linguistic Models for Analyzing and Detecting Biased Language." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 1650-1659.
该项目的目标是开发一种基于上下文的偏见检测方法,从爱丁堡大学档案馆开始作为一系列案例研究来执行。在自然语言处理 (NLP) 和文化遗产领域独立但相关的研究的推动下,该项目确定了改进大规模、自动化偏差检测的机会。该项目采用跨学科的方法,将自然语言处理和数据可视化应用于档案描述。主题建模和情感分析等自然语言处理方法将对档案描述的语言进行分析和分类。由于偏差的上下文依赖性,数据可视化提供了一种合适的方法来呈现 NLP 分析的结果。交互式数据可视化将呈现相关地理区域和时间段的结果,使人们能够看到存档项目与不同类型偏见的关联。该项目将提出一个可视化框架来呈现人类语言内容中的偏见,但据作者所知,该框架尚未提出。该项目不是消除偏见,而是寻求对偏见进行识别和分类,认为偏见应该在文化遗产机构中占有一席之地。偏见虽然单方面存在问题,但如果透明地呈现,就会提供丰富的信息。偏见传达了历史特定时期特定人群的观点;记录历史偏见有助于理解社会演变以及某个主题存在的各种观点[1]。识别不同类型的偏见有助于研究人员了解他们的数据集的代表性,其中存在更多类型的偏见表明数据集更具代表性。该项目旨在开发识别和分类偏见的技术,为文化遗产机构及其服务的公众带来价值,使从档案描述到社交媒体帖子的人类语言内容中的偏见变得透明。该项目旨在发展偏见-检测技术从带有自由文本、人工编写的档案描述的案例研究开始。编目员首先在 20 世纪 30 年代在纸上编写档案描述,然后从 20 世纪 70 年代开始在数据库中编写档案描述。明确地,档案描述的语言反映了它们的历史背景,使用了今天被认为是种族主义、性别歧视或其他不当偏见的术语。隐含地,档案描述中关于某些人群的信息缺失反映了历史偏见。这些类型的显性和隐性偏见可以在文化遗产目录之外的文本数据中找到,例如报纸和社交媒体帖子。因此,在提高档案馆描述透明度的同时,该项目的成果还可以为有关返回代表性搜索结果 [5]、实施公平算法 [2] 以及识别社交媒体中的偏见 [3, 4] 的研究提供信息。参考文献1. Holterhoff, K. (2017)“从免责声明到批评:种族和数字图像档案管理员”。见:数字人文季刊 11.3 URL:http://digital humanities.org:8081/dhq/vol/11/3/000324/000324.html2。 IEEE。 (2016) 道德一致的设计:利用人工智能和自治系统优先考虑人类福祉的愿景。版本 1。http://standards.ieee.org/develop/indconn/ec/autonomous%20systems.html 12.05.20183。 Recasens, M.、Danescu-Nculescu-Mizil, C.、Jurafsky, D. (2013)。 “用于分析和检测有偏见语言的语言模型。” 1650-1659 年计算语言学协会第 51 届年会论文集。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
Interactive comment on “Source sector and region contributions to BC and PM 2 . 5 in Central Asia” by
关于“来源部门和地区对中亚 BC 和 PM 5 的贡献”的互动评论。
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Vortex shedding analysis of flows past forced-oscillation cylinder with dynamic mode decomposition
采用动态模态分解对流过受迫振荡圆柱体的流进行涡流脱落分析
- DOI:
10.1063/5.0153302 - 发表时间:
2023-05-01 - 期刊:
- 影响因子:4.6
- 作者:
- 通讯作者:
Observation of a resonant structure near the D + s D − s threshold in the B + → D + s D − s K + decay
观察 B – D s D – s K 衰减中 D s D – s 阈值附近的共振结构
- DOI:
10.1103/physrevd.102.016005 - 发表时间:
2024-09-14 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Accepted for publication in The Astrophysical Journal Preprint typeset using L ATEX style emulateapj v. 6/22/04 OBSERVATIONS OF RAPID DISK-JET INTERACTION IN THE MICROQUASAR GRS 1915+105
接受《天体物理学杂志》预印本排版,使用 L ATEX 样式 emulateapj v. 6/22/04 观测微类星体 GRS 中的快速盘射流相互作用 1915 105
- DOI:
- 发表时间:
2024-09-14 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
The Evolutionary Significance of Phenotypic Plasticity
表型可塑性的进化意义
- DOI:
- 发表时间:
2024-09-14 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Development of a new solid tritium breeder blanket
新型固体氚增殖毯的研制
- 批准号:
2908923 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Landscapes of Music: The more-than-human lives and politics of musical instruments
音乐景观:超越人类的生活和乐器的政治
- 批准号:
2889655 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Cosmological hydrodynamical simulations with calibrated non-universal initial mass functions
使用校准的非通用初始质量函数进行宇宙流体动力学模拟
- 批准号:
2903298 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
婴幼儿客体分类加工的发展及其脑机制
- 批准号:32300860
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向短视频多标签分类的关键技术研究
- 批准号:62371330
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
中国蝼蛄科昆虫整合分类学研究
- 批准号:32300375
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
中国月牙藻科植物的分类及系统发育研究
- 批准号:32370219
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
耦合连续变化探测与分类的年度地表覆盖时序分类方法研究
- 批准号:42301534
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Constructing and Classifying Pre-Tannakian Categories
前坦纳克阶范畴的构建和分类
- 批准号:
2401515 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
Classifying and Understanding Remedies in Comparative Labour Law
比较劳动法中补救措施的分类和理解
- 批准号:
EP/Y036875/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
Efficient and effective methods for classifying massive time series data
海量时间序列数据高效有效的分类方法
- 批准号:
DP240100048 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Discovery Projects
Classifying and localising future cancerous lesions
对未来的癌性病变进行分类和定位
- 批准号:
2895295 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship
Development and Validation of an Equitable Computable Phenotype for Classifying Pediatric Sleep Deficiency in Electronic Health Records
开发和验证电子健康记录中儿童睡眠不足分类的公平可计算表型
- 批准号:
10724442 - 财政年份:2023
- 资助金额:
-- - 项目类别: