Task-Driven Comparison of Topic Models

IEEE Transactions on Visualization and Computer Graphics, Volume 22, Number 1, page 320--329 — jan 2016
Topic modeling, a method of statistically extracting thematic content from a large collection of texts, is used for a wide variety of tasks within text analysis. Though there are a growing number of tools and techniques for exploring single models, comparisons between models are generally reduced to a small set of numerical metrics. These metrics may or may not reflect a model's performance on the analyst's intended task, and can therefore be insufficient to diagnose what causes differences between models. In this paper, we explore task-centric topic model comparison, considering how we can both provide detail for a more nuanced understanding of differences and address the wealth of tasks for which topic models are used. We derive comparison tasks from single-model uses of topic models, which predominantly fall into the categories of understanding topics, understanding similarity, and understanding change. Finally, we provide several visualization techniques that facilitate these tasks, including buddy plots, which combine color and position encodings to allow analysts to readily view changes in document similarity.

BibTex references

  author       = "Alexander, Eric and Gleicher, Michael",
  title        = "Task-Driven Comparison of Topic Models",
  journal      = "IEEE Transactions on Visualization and Computer Graphics",
  number       = "1",
  volume       = "22",
  pages        = "320--329",
  month        = "jan",
  year         = "2016",
  note         = "Proceedings VAST 2015",
  ee           = "http://ieeexplore.ieee.org/xpl/login.jsp?tp=\&arnumber=7194832\&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7194832",
  doi          = "10.1109/TVCG.2015.2467618",
  projecturl   = "http://vep.cs.wisc.edu/comparingTopicModels/",
  url          = "http://graphics.cs.wisc.edu/Papers/2016/AG16b"

