This paper concerns the evaluation of a workspace architecture for generating natural language descriptions, including methods for evaluating both its output and its own self-evaluation. Herein are details of preliminary results from evaluation of an early iteration of the architecture operating in the domain of weather. The domain is not typically seen as creative, but provides a simple testbed for the architecture and evaluation methodology. The program does not yet match humans in terms of fluency of language, factual correctness, and how completely the input is described, but human judges did find the program’s output easier to read than human generated texts. Planned improvements to the program also described in the paper will incorporate self-monitoring and better self-evaluation with the aim of producing descriptions that are more fluently written and more accurate.
本文涉及对一种用于生成自然语言描述的工作空间架构的评估,包括评估其输出以及自身自我评价的方法。这里包含了对该架构在天气领域的早期迭代进行评估所得到的初步结果的详细信息。天气领域通常不被视为具有创造性,但为该架构和评估方法提供了一个简单的测试平台。该程序在语言流畅性、事实准确性以及对输入描述的完整性方面还无法与人类相比,但人类评判者确实发现该程序的输出比人类撰写的文本更易于阅读。本文还描述了对该程序计划进行的改进,这些改进将纳入自我监测和更好的自我评价,目的是生成写得更流畅、更准确的描述。