CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
基本信息
- 批准号:1343530
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-02-01 至 2017-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Linguistic code switching (LCS) is the practice of switching back and forth between the shared languages of bilingual or multilingual speakers. This phenomenon is particularly prevalent in geographic regions with linguistic boundaries or where there are large immigrant groups. Various levels of language (phonological, morphological, syntactic, semantic and discourse-pragmatic) may be implicated in LCS in different language pairs and/or genres. Computational algorithms trained for a single language quickly break down when the input includes LCS. A major barrier to research on LCS in computational linguistics (CL) has been the lack of large, accurately annotated corpora of LCS data. In this project, a large repository of LCS data is collected and a large annotation infrastructure is developed. It is consistently annotated in different modalities (speech and text), at various levels of linguistic granularity, and across different language pairs reflecting different linguistic typologies (Standard Arabic and Dialectal Arabic, Arabic-English, Spanish-English, Chinese-English, Hindi-English). The focus of the effort is on intra-sentential LCS.This infrastructure and unified large LCS data resource is eagerly awaited by the CL research community, since annotated LCS data provides a natural test-bed for adaptive learning algorithms and the handling of diverse data sources, as well as a framework for genuine multilingual processing. It will also be of benefit to sociolinguistic and theoretical linguistic researchers, and provide a platform for collaborative interdisciplinary research. Finally, research on LCS helps overcome biases against multilingual speakers by demonstrating the creativity of such speakers in exploiting their verbal repertoires. Such a result is particularly important for K-12 education and testing policies in the USA with its diverse immigrant population.
语言代码切换(LCS)是双语或多语言扬声器共享语言之间来回切换的做法。这种现象在具有语言边界的地理区域或有大型移民群体的地方尤为普遍。 LC在不同语言对和/或类型的LCS中可能与各种语言(语音,形态学,句法,语义和话语形式)有关。当输入包含LCS时,接受过单语言训练的计算算法很快就会分解。计算语言学(CL)研究LCS研究的主要障碍是缺乏大型,准确的LCS数据语料库。在该项目中,收集了大量的LCS数据存储库,并开发了大量注释基础架构。它始终以不同的方式(语音和文本),各种语言粒度以及反映不同语言类型的不同语言对(标准阿拉伯语和阿拉伯语,阿拉伯语 - 英语,西班牙语英语,中文 - 英语,印度英语)的不同语言对进行注释。努力的重点是端态LC。本基础架构和统一的大型LCS数据资源被CL研究社区急切地等待,因为注释的LCS数据为适应性学习算法提供了自然测试床,用于处理不同的数据源,以及用于真实多种语言的框架。这也将对社会语言和理论语言研究人员有益,并为协作跨学科研究提供了一个平台。最后,对LCS的研究有助于克服对多语言说话者的偏见,通过证明这类演讲者的创造力在利用其口头曲目时的创造力。这种结果对于在美国各种移民人口的美国教育和测试政策中尤其重要。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mona Diab其他文献
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
结合离散小波和余弦变换实现高效句子嵌入
- DOI:
10.5121/csit.2024.141006 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
R. Salama;Abdou Youssef;Mona Diab - 通讯作者:
Mona Diab
Improving Coherence of Language Model Generation with Latent Semantic State
提高语言模型生成与潜在语义状态的一致性
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Amanda Askell;Yuntao Bai;Anna Chen;Dawn Drain;Deep Ganguli;T. Henighan;Andy Jones;Benjamin Mann;Nova Dassarma;Nelson El;Zac Hatfield;Danny Hernandez;John Kernion;Kamal Ndousse;Catherine Olsson;Dario Amodei;Tom Brown;J. Clark;Sam Mc;Chris Olah;Jared Kaplan;Nick Ryder;Jared D Subbiah;Prafulla Kaplan;A. Dhariwal;P. Neelakantan;Girish Shyam;Amanda Sastry;Sandhini Askell;Ariel Agarwal;Herbert;Gretchen Krueger;R. Child;Aditya Ramesh;Daniel M. Ziegler;Jeffrey Wu;Christopher Winter;Mark Hesse;Eric Chen;Mateusz Sigler;Scott teusz Litwin;Benjamin Gray;Jack Chess;Christopher Clark;Sam Berner;Alec McCandlish;Ilya Radford;Sutskever Dario;Amodei;Joshua Maynez;Shashi Narayan;Bernd Bohnet;Kurt Shuster;Spencer Poff;Moya Chen;Douwe Kiela;Shane Storks;Qiaozi Gao;Yichi Zhang;Joyce Chai;Niket Tandon;Keisuke Sakaguchi;Bhavana Dalvi;Dheeraj Rajagopal;Peter Clark;Michal Guerquin;Kyle Richardson;Eduard H. Hovy;A. Dataset;Rowan Zellers;Ari Holtzman;Matthew E. Peters;Roozbeh Mottaghi;Aniruddha Kembhavi;Ali Farhadi;Chunting Zhou;Graham Neubig;Jiatao Gu;Mona Diab;Francisco Guzmán;Luke Zettlemoyer - 通讯作者:
Luke Zettlemoyer
Investigating Cultural Alignment of Large Language Models
研究大型语言模型的文化一致性
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Badr AlKhamissi;Muhammad N. ElNokrashy;Mai AlKhamissi;Mona Diab - 通讯作者:
Mona Diab
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Grass:使用结构化稀疏梯度计算高效的低内存 LLM 训练
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Aashiq Muhamed;Oscar Li;David Woodruff;Mona Diab;Virginia Smith - 通讯作者:
Virginia Smith
Empirical Evaluation of Topic Zero-and Few-Shot Learning for Stance Dissonance Detection
用于立场失调检测的主题零和少样本学习的实证评估
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Emily Allaway;Malavika Srikanth;Kathleen McK;Samuel R. Bowman;Gabor Angeli;Christopher Potts;Daniel Cer;Mona Diab;Eneko Agirre;Iñigo Lopez - 通讯作者:
Iñigo Lopez
Mona Diab的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mona Diab', 18)}}的其他基金
CI-P: Towards the Creation of a Unified Repository for MultiLingual and CrossLingual Multiword Expressions
CI-P:为多语言和跨语言多词表达式创建统一存储库
- 批准号:
1513116 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1205556 - 财政年份:2012
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: CI-P: Creation of an annotated repository of multilingual and multigenre code switched data for several language pairs
合作研究:CI-P:创建多个语言对的多语言和多流派代码交换数据的带注释存储库
- 批准号:
0958440 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SGER: Automatic Processing of Natural Language Code Switching
SGER:自然语言代码切换的自动处理
- 批准号:
0749062 - 财政年份:2007
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似海外基金
CI-ADDO-NEW: Collaborative Research: Development of DARwIn Humanoid Robots for Research, Education and Outreach
CI-ADDO-NEW:协作研究:开发用于研究、教育和推广的 DARwIn 人形机器人
- 批准号:
1564417 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CI-ADDO-NEW: Collaborative Research: A Repository for Annotating Multilingual Code Switched Data
CI-ADDO-NEW:协作研究:用于注释多语言代码交换数据的存储库
- 批准号:
1462142 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305215 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: ASTERIX: A Community Software Platform for Big Data Research, Analysis, and Management
CI-ADDO-NEW:ASTERIX:用于大数据研究、分析和管理的社区软件平台
- 批准号:
1305253 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305319 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant