Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information

专业到通俗语言的神经翻译：通向可行健康信息的道路

基本信息

批准号：
10349319
负责人：
Trevor Cohen
金额：
$ 19.04万
依托单位：
UNIVERSITY OF WASHINGTON
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-03-01 至 2024-02-29
项目状态：
已结题

项目摘要

Health literacy is key to making well-informed health decisions that improve outcomes. However, while the peer- reviewed clinical literature contains valuable information to guide health decisions, it is generally written for an audience of healthcare professionals. Even in the context of good general literacy, medical jargon and the complex structure of professional language make this information especially hard to interpret. While efforts have been made to summarize some of this literature in plain language to make it accessible to the general public, these efforts depend on human expertise. This approach cannot scale to match the rapid pace at which new findings emerge in the literature. Thus, there is an urgent unmet need for automated methods to enhance the accessibility of the canonical biomedical literature to the general public. This problem can be framed as a type of translation problem, between the language of healthcare professionals, and that of healthcare consumers. The proposed research builds on recent advances in deep learning stemming from neural sequence- to-sequence models, which were originally evaluated in machine translation tasks. In our recent work, we showed these models can be effectively adapted to the task of translating between abstracts in the Cochrane Database of Systematic Reviews (CDSR) and corresponding professionally-authored plain language summaries. The resulting automatically-generated summaries outperformed those from other models in their alignment with professionally-authored summaries. Furthermore, in a pilot user evaluation in which participants were blinded as to summary provenance, they were generally judged favorably to their expert-authored counterparts. In the proposed research we will develop this line of research further, by evaluating the utility of additional pre-training and auxiliary fine-tuning tasks as a means to improve the quality of generated summaries. We will also customize the models concerned to enhance their factual accuracy and readability using novel auxiliary training objectives and post-processing procedures. We will evaluate our methods as compared with robust baseline models in system-centric evaluations of content alignment with reference summaries, readability and factual correctness. Using Mechanical Turk, we will conduct user-centric evaluations of the ease with which summaries from best-performing models can be understood, as compared with CDSR expert-authored plain language summaries. These evaluations will consider both perceived interpretability, and actual comprehension, with the latter evaluated using sets of multiple choice questions to probe comprehension, recall and learning. In doing so, the proposed research will advance the state-of-the-art in automated simplification and summarization of the biomedical literature for consumption by the general public.

健康素养是做出明智的健康决策以改善结果的关键。但是，而同行回顾的临床文献包含有价值的信息来指导健康决策，通常是为医疗保健专业人员的受众。即使在良好的一般识字率的背景下，医疗术语和专业语言的复杂结构使此信息特别难以解释。努力已经用简单的语言总结了一些文献，以使其通用访问公众，这些努力取决于人类的专业知识。这种方法无法扩展以匹配快速的步伐文献中出现了新发现。因此，紧急未满足的自动化方法可以增强规范生物医学文学对公众的可访问性。这个问题可以被构成翻译问题的类型，医疗保健专业人员的语言与医疗保健的语言之间消费者。拟议的研究是基于从神经序列的最新进展的基础最初在机器翻译任务中评估的序列模型。在我们最近的工作中，我们表明这些模型可以有效地适应于Cochrane中摘要之间翻译的任务系统评价数据库（CDSR）和相应的专业纯语言摘要。由此产生的自动生成的摘要的表现优于其他模型的摘要与专业作品的摘要保持一致。此外，在参与者的试点用户评估中对于摘要出处而言，他们对他们的专家作者有利。同行。在拟议的研究中，我们将通过评估效用其他预培训和辅助微调任务是提高生成摘要质量的一种手段。我们还将自定义有关模型，以增强其事实准确性和使用新颖性辅助培训目标和后处理程序。我们将评估我们的方法参考摘要，可读性的内容对齐方式评估中以系统为中心的基线模型和事实正确性。使用机械土耳其人，我们将对以用户为中心的评估来进行轻松的评估与CDSR专家纯正的Plain相比，可以理解表现最佳模型的摘要语言摘要。这些评估将考虑可解释性和实际理解力，使用后者使用一组多项选择问题进行评估，以探究理解，回忆和学习。在这样做，拟议的研究将推进自动简化和汇总的最新研究公众消费的生物医学文献。