Language Preservation 2.0: Crowdsourcing Oral Language Documentation using Mobile Devices

语言保存2.0：使用移动设备众包口语文档

基本信息

批准号：
1160639
负责人：
Mark Liberman
金额：
$ 10.15万
依托单位：
University of Pennsylvania
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-07-01 至 2014-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1160639&HistoricalAwards=false
关键词：
Language Preservation 2.0 Crowdsourcing Oral

项目摘要

Language Preservation 2.0The purpose of this pilot project is to demonstrate the feasibility of a new approach to documenting endangered languages.To allow wide-ranging investigation of a language even after it is no longer spoken, we need the equivalent of the million words of extant biblical Hebrew texts, or the five million words of extant classical Latin. But for endangered languages without a significant culture of literacy, diverse text collections on this scale seem out of reach. Given typical speaking rates of about 10,000 word-equivalents per hour, a hundred hours of recorded speech -- conversations, narratives, or oral histories -- would give us the equivalent of a million words of text. With community involvement, hundreds of hours of such recordings are easily within reach.However, transcribing such large audio collections is a daunting task, given the small number of literate native speakers and the time-consuming nature of such transcription, which can take 200 hours of work for every hour of audio. We propose to solve this problem by substituting re-speaking and verbal translation: one or more native speakers repeats each phrase of a recording, speaking slowly and carefully, and then translates it into a better-documented language.The utility of translated passages as a way to analyze otherwise-unknown languages has been demonstrated many times, starting with the Rosetta Stone. This aspect of our task is easier, since at least a grammatical sketch will in general be available. Our goal in this project is to demonstrate the utility of re-speaking. We believe that linguists, starting out with relatively little knowledge of a language, can produce phonetic transcriptions that will be good enough to support subsequent analysis resulting in coherent texts, in a process analogous to (but easier than) the process that allowed previous generations of scholars to learn to read ancient Egyptian or Sumerian.

语言保存2.0该试验项目的目的是证明一种新方法记录濒危语言的可行性。为了允许对语言进行广泛的调查，即使不再讲过语言，我们也需要等同于百万个庞大的圣经希伯来语文本或500万个经典的经典拉丁文单词。但是，对于没有重要扫盲文化的濒危语言，这种规模的多种文本收集似乎是遥不可及的。考虑到典型的每小时大约10,000个单词相等的说话率，记录的语音（对话，叙述或口述历史）将使我们等同于一百万个文本单词。通过社区参与，数百小时的录音很容易到达。但是，考虑到如此庞大的音频收集是一项艰巨的任务，鉴于识字的母语人数少，而且这种转录的耗时性，这可能需要每小时200小时的工作时间。我们建议通过替换说话和口头翻译来解决这个问题：一个或多个母语者重复录音的每个短语，缓慢而仔细地说话，然后将其翻译成一种据可查的语言。翻译段落的实用性是分析其他尚未划分的语言的一种方式，已经证明了很多次，从Rosetta Stone开始了很多次。我们任务的这一方面更容易，因为通常至少可以提供语法草图。我们在这个项目中的目标是证明重新说话的实用性。我们认为，语言学家以相对较少的语言知识开始，可以产生语音转录，这将足以支持随后的分析，从而在类似于（但更容易）的过程中，使前几代学者学会阅读古代埃及或苏美尔人。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Mark Liberman其他文献

Dimensions of Speech and Language Disturbance in Psychosis and Computational Linguistic Markers

DOI：
10.1016/j.biopsych.2022.02.144
发表时间：
2022-05-01
期刊：
Conference abstract
影响因子：
作者：
Sunny Tang;Katrin Hänsel;Yan Cong;Sarah Berretta;Sunghye Cho;Amir Nikzad;Aarush Mehta;Sameer Pradhan;James Fiumara;Mark Liberman
通讯作者：
Mark Liberman

CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

CLiFF笔记：宾夕法尼亚大学语言、信息和计算实验室的研究

DOI：
发表时间：
1995
期刊：
arXiv.org
影响因子：
0
作者：
Norm Badler;F. B. Baldwin;Nicola J. Bessell;Eric Brill;Sharon Cote;Barbara Di Eugenio;Alexis Dimitriadis;Jon Freeman;Christopher W. Geib;A. Gertner;Daniel Hardt;Michael Hegarty;Shyam Kapur;Jonathan Kaye;Michael H. Kelly;Libby Levison;Mark Liberman;D. R. Mani;Mitch Marcus Michael;B. Moore;Michael Niv;Charles L. Ortiz;Jong Cheol Park;Sandeep Prasada Scott
通讯作者：
Sandeep Prasada Scott

l / VARIATION IN AMERICAN ENGLISH : A CORPUS

l / 美式英语变体：语料库

DOI：
发表时间：
2012
期刊：
影响因子：
0
作者：
Jiahong Yuan;Mark Liberman
通讯作者：
Mark Liberman

LOOKING BACK, MOVING FORWARD Why underlying representations? 1

回顾过去，展望未来为什么要使用底层表征？

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Looking Back;Moving Forward;Larry;M. Hyman;Jeffrey Heinz;Sharon Inkelas;Keith Johnson;Mark Liberman
通讯作者：
Mark Liberman

Decreased Speech Coherence Captured by Novel Natural Language Processing Methods in Two Cohorts of Individuals With Schizophrenia

DOI：
10.1016/j.biopsych.2020.02.971
发表时间：
2020-05-01
期刊：
Conference abstract
影响因子：
作者：
Sunny Tang;Reno Kriz;Sunghye Cho;João Sedoc;Suh Jung Park;Jenna Harowitz;Mahendra Bhati;Raquel Gur;Daniel Wolf;Mark Liberman
通讯作者：
Mark Liberman