CI-NEW: NIEUW: Novel Incentives and Workflows in Linguistic Data Collection and Annotation
CI-NEW:NIEUW:语言数据收集和注释中的新颖激励措施和工作流程
基本信息
- 批准号:1730377
- 负责人:
- 金额:$ 121.85万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-15 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Language touches every aspect of human life. People speak and write in order to manage relationships from the personal to the international, to gather and provide information, to negotiate, influence and inspire. Scientists use language to communicate their findings regardless of their field of study. Although researchers have been working for six decades to process language via computer, only in the past several years have their efforts have produced technologies of sufficient maturity that they can affect the lives of the average citizen. Today, some of the most fortunate use computers to search the vast archives of the Internet, to translate material from languages they do not understand into languages they do and to interact with smart devices by giving them natural language commands and queries and receive responses in kind. Despite the growth and promise of human language technologies, they are in fact available for only a tiny portion of the world's approximately 7000 languages and, even then, for only a limited range of situations. This is the case because the approaches that have proven most successful in developing human language technologies require vast amounts of spoken or written language material that have been augmented by human judgment as to their interpretation, but such resources are lacking for most languages and for many types of situations, even for languages of international importance, including English. This Research Infrastructure project will address this shortage of language resources by supporting the language technology research community to employ novel incentives and alternate workflows to greatly expand the methods that have been used to date for collecting and annotating language data. The resulting resources will support research and development on an expanded range of language technologies, leading to the creation and deployment of applications for an increasingly broad range of languages and situations. Even a brief observation of user behavior on social media, online games, citizen science and public good initiatives demonstrates that many people around the world are willing to devote collectively vast amounts of effort when given appropriate motivation and effective tools. This project will harness some of the immense people-power that drives such activities and focus it on problems of developing language resources that help computers learn to process language. Specifically, the project will create a software toolkit to be developed by the project team in response to the needs of language technology researchers to create online activities that yield language resources. The activities will include games, citizen science and tools for language professionals, clustered into a series of portals that appeal to different populations of users. The project will build and maintain the database and web servers, with redundancy, load balancing and fail over, to run the principal instance of all of the activities, and an open-source release of the software will enable other researchers to build their own instances independently. Finally, the data resulting from this project will be shared with the least restrictive terms possible to further support language technology research and development activities worldwide.
语言触及人类生活的各个方面。人们讲话和写作是为了管理从个人到国际的关系,收集和提供信息,谈判,影响和启发。科学家使用语言来传达他们的发现,而不管他们的研究领域如何。尽管研究人员已经工作了六十年来通过计算机处理语言,但仅在过去的几年中,他们的努力才产生了足够成熟的技术,他们可以影响普通公民的生活。如今,一些最幸运的使用计算机来搜索互联网的庞大档案,从他们不了解的语言中将材料转化为他们所使用的语言,并通过提供自然语言命令和查询并以实物的方式接收响应来与智能设备进行交互。尽管人类语言技术的增长和希望,但实际上,它们仅适用于世界上约7000种语言的一小部分,即便如此,仅在有限的情况下。就是这样,因为在开发人类语言技术方面,最成功的方法需要大量的口语或书面语言材料,而这些语言材料已被人类对解释的判断而增强,但是对于大多数语言和许多类型的情况,即使对于包括英语在内的国际重要性(包括英语)的语言,这种资源也缺乏这种资源。该研究基础架构项目将通过支持语言技术研究社区采用新颖的激励措施和替代工作流程来大大扩展已用于收集和注释语言数据的方法来解决这种语言资源的短缺。最终的资源将支持扩展的语言技术范围的研发,从而为越来越广泛的语言和情况创建和部署应用程序。即使在社交媒体,在线游戏,公民科学和公共善意上对用户行为的简短观察也表明,如果有适当的动力和有效的工具,世界上许多人都愿意在付出巨大的努力。该项目将利用一些推动此类活动的巨大力量,并将其集中在开发语言资源的问题上,以帮助计算机学习处理语言。具体而言,该项目将创建一个软件工具包,以应对语言技术研究人员的需求创建在线活动来产生语言资源的需求。这些活动将包括游戏,公民科学和语言专业人士的工具,并聚集在一系列吸引不同用户人群的门户网站中。该项目将构建和维护数据库和网络服务器,并具有冗余,负载平衡和失败,以运行所有活动的主要实例,并且该软件的开源发布将使其他研究人员能够独立构建自己的实例。最后,该项目产生的数据将以可能的最少限制性术语共享,以进一步支持全球语言技术研发活动。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Using Games to Augment Corpora for Language Recognition and Confusability
使用游戏增强语料库以实现语言识别和混淆
- DOI:10.21437/interspeech.2021-1611
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Cieri, Christopher;Fiumara, James;Wright, Jonathan
- 通讯作者:Wright, Jonathan
LanguageARC: Developing Language Resources Through Citizen Linguistics
LanguageARC:通过公民语言学开发语言资源
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Fiumara, James;Cieri, Christopher;Wright, Jonathan;Liberman, Mark
- 通讯作者:Liberman, Mark
LanguageARC – a tutorial
LanguageARC — 教程
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Cieri, Christopher;Fiumara, James
- 通讯作者:Fiumara, James
Reflections on 30 Years of Language Resource Development and Sharing
语言资源开发与共享30年的思考
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Christopher Cieri, Mark Liberman
- 通讯作者:Christopher Cieri, Mark Liberman
Proceedings of the Workshop on Citizen Linguistics in Language Resource Development (CLLRD 2020)
语言资源开发中的公民语言学研讨会论文集 (CLLRD 2020)
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Fiumara, James;Cieri, Christopher;Liberman, Mark;Callison-Burch, Chris
- 通讯作者:Callison-Burch, Chris
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark Liberman其他文献
Dimensions of Speech and Language Disturbance in Psychosis and Computational Linguistic Markers
- DOI:
10.1016/j.biopsych.2022.02.144 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Sunny Tang;Katrin Hänsel;Yan Cong;Sarah Berretta;Sunghye Cho;Amir Nikzad;Aarush Mehta;Sameer Pradhan;James Fiumara;Mark Liberman - 通讯作者:
Mark Liberman
CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
CLiFF笔记:宾夕法尼亚大学语言、信息和计算实验室的研究
- DOI:
- 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
Norm Badler;F. B. Baldwin;Nicola J. Bessell;Eric Brill;Sharon Cote;Barbara Di Eugenio;Alexis Dimitriadis;Jon Freeman;Christopher W. Geib;A. Gertner;Daniel Hardt;Michael Hegarty;Shyam Kapur;Jonathan Kaye;Michael H. Kelly;Libby Levison;Mark Liberman;D. R. Mani;Mitch Marcus Michael;B. Moore;Michael Niv;Charles L. Ortiz;Jong Cheol Park;Sandeep Prasada Scott - 通讯作者:
Sandeep Prasada Scott
l / VARIATION IN AMERICAN ENGLISH : A CORPUS
l / 美式英语变体:语料库
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Jiahong Yuan;Mark Liberman - 通讯作者:
Mark Liberman
LOOKING BACK, MOVING FORWARD Why underlying representations? 1
回顾过去,展望未来 为什么要使用底层表征?
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Looking Back;Moving Forward;Larry;M. Hyman;Jeffrey Heinz;Sharon Inkelas;Keith Johnson;Mark Liberman - 通讯作者:
Mark Liberman
Decreased Speech Coherence Captured by Novel Natural Language Processing Methods in Two Cohorts of Individuals With Schizophrenia
- DOI:
10.1016/j.biopsych.2020.02.971 - 发表时间:
2020-05-01 - 期刊:
- 影响因子:
- 作者:
Sunny Tang;Reno Kriz;Sunghye Cho;João Sedoc;Suh Jung Park;Jenna Harowitz;Mahendra Bhati;Raquel Gur;Daniel Wolf;Mark Liberman - 通讯作者:
Mark Liberman
Mark Liberman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark Liberman', 18)}}的其他基金
Language Preservation 2.0: Crowdsourcing Oral Language Documentation using Mobile Devices
语言保存2.0:使用移动设备众包口语文档
- 批准号:
1160639 - 财政年份:2012
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
Prosodic Systems in New Guinea: Integrating computational and typological approaches to linguistic analysis
新几内亚的韵律系统:将计算和类型学方法整合到语言分析中
- 批准号:
0951651 - 财政年份:2010
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
Collaborative Research: OLAC: Accessing the World's Language Resources
合作研究:OLAC:访问世界语言资源
- 批准号:
0723357 - 财政年份:2007
- 资助金额:
$ 121.85万 - 项目类别:
Continuing Grant
ITR-SCOTUS: A Resource for Collaborative Research in Speech Technology, Linguistics, Decision Processes and the Law
ITR-SCOTUS:语音技术、语言学、决策过程和法律合作研究的资源
- 批准号:
0325739 - 财政年份:2003
- 资助金额:
$ 121.85万 - 项目类别:
Continuing Grant
Eletronic Materials For Natural Language Research
用于自然语言研究的电子材料
- 批准号:
9113530 - 财政年份:1991
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
相似国自然基金
新骨架紫杉烷二萜baccataxane的化学合成、衍生化和降糖活性研究
- 批准号:82373758
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
老年重症新冠患者体内炎性细胞的特点、免疫致病机制及临床转归的研究
- 批准号:82370019
- 批准年份:2023
- 资助金额:65 万元
- 项目类别:面上项目
在幼年型粒单核细胞白血病中鉴定CD69作为其白血病干细胞新表面标记的实验研究
- 批准号:82370146
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
基于磁共振APT成像的乳腺癌新辅助治疗敏感性预测研究
- 批准号:82302153
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
新辅助化疗后CXCL12+CAF诱导胰腺癌三级淋巴结构表型特征与空间定位的分子机制研究
- 批准号:82373296
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Development of a new solid tritium breeder blanket
新型固体氚增殖毯的研制
- 批准号:
2908923 - 财政年份:2027
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348998 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant
New approaches to training deep probabilistic models
训练深度概率模型的新方法
- 批准号:
2613115 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Studentship
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348999 - 财政年份:2025
- 资助金额:
$ 121.85万 - 项目类别:
Standard Grant