When information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti into branches, side stories, and intertwining narratives; search engines, our most popular navigational tools, are limited in their capacity to explore such complex stories.
We propose a methodology for creating structured summaries of information, which we call metro maps. Our proposed algorithm generates a concise structured set of documents that maximizes coverage of salient pieces of information. Most importantly, metro maps explicitly show the relations among retrieved pieces in a way that captures story development.
The overarching theme of this work is formalizing characteristics of good maps, and providing efficient algorithms (with theoretical guarantees) to optimize them. Moreover, as information needs vary from person to person, we integrate user interaction into our framework, allowing users to alter the maps to better reflect their interests. Pilot user studies with real-world datasets demonstrate that the method is able to produce maps which help users acquire knowledge efficiently. We believe that metro maps could be powerful tools for any Web user, scientist, or intelligence analyst trying to process large amounts of data.
当信息丰富时,将知识片段整合为一个连贯的整体变得越来越困难。复杂的故事像意大利面条一样分支、衍生出支线故事以及相互交织的叙述;搜索引擎,我们最常用的导航工具,在探索这类复杂故事的能力上是有限的。
我们提出一种用于创建信息结构化摘要的方法,我们称之为“地铁图”。我们提出的算法生成一组简洁的结构化文档,最大程度地涵盖重要信息片段。最重要的是,地铁图以一种能体现故事发展的方式明确展示检索到的片段之间的关系。
这项工作的首要主题是将优质地图的特征形式化,并提供高效算法(有理论保证)对其进行优化。此外,由于信息需求因人而异,我们将用户交互整合到我们的框架中,允许用户改变地图以更好地反映他们的兴趣。对真实世界数据集进行的初步用户研究表明,该方法能够生成帮助用户高效获取知识的地图。我们相信,地铁图对于任何试图处理大量数据的网络用户、科学家或情报分析师来说,都可能是强大的工具。