Word embeddings, which represent words as dense feature vectors, are widely used in natural language processing. In their seminal paper on word2vec, Mikolov and colleagues showed that a feature space created by training a word prediction network on a large text corpus will encode semantic information that supports analogy by vector arithmetic, e.g., "king" minus "man" plus "woman" equals "queen". To help novices appreciate this idea, people have sought effective graphical representations of word embeddings. We describe a new interactive tool for visually exploring word embeddings. Our tool allows users to define semantic dimensions by specifying opposed word pairs, e.g., gender is defined by pairs such as boy/girl and father/mother, and age by pairs such as father/son and mother/daughter. Words are plotted as points in a zoomable and rotatable 3D space, where the third ”residual” dimension encodes distance from the hyperplane defined by all the opposed word vectors with age and gender subtracted out. Our tool allows users to visualize vector analogies, drawing the vector from “king” to “man” and a parallel vector from “woman” to “king-man+woman”, which is closest to “queen”. Visually browsing the embedding space and experimenting with this tool can make word embeddings more intuitive. We include a series of experiments teachers can use to help K-12 students appreciate the strengths and limitations of this representation.
词嵌入将单词表示为密集的特征向量,在自然语言处理中被广泛使用。在关于word2vec的开创性论文中,米可洛夫及其同事表明,通过在大型文本语料库上训练一个单词预测网络所创建的特征空间将编码语义信息,这些信息支持通过向量运算进行类比,例如,“国王”减去“男人”加上“女人”等于“王后”。为了帮助新手理解这一概念,人们一直在寻求词嵌入的有效图形表示方法。我们描述了一种用于可视化探索词嵌入的新型交互工具。我们的工具允许用户通过指定相反的单词对来定义语义维度,例如,性别可由诸如男孩/女孩和父亲/母亲这样的对来定义,年龄可由诸如父亲/儿子和母亲/女儿这样的对来定义。单词被绘制为可缩放和可旋转的3D空间中的点,其中第三个“剩余”维度编码了与所有减去年龄和性别后的相反单词向量所定义的超平面的距离。我们的工具允许用户可视化向量类比,绘制从“国王”到“男人”的向量以及从“女人”到“国王 - 男人 + 女人”(最接近“王后”)的平行向量。可视化浏览嵌入空间并使用该工具进行实验可以使词嵌入更直观。我们包含了一系列教师可以用来帮助K - 12学生理解这种表示方法的优势和局限性的实验。