= Menu

Serendip: Topic Model-Driven Visual Exploration of Text Corpora

Eric Alexander, Joe Kohlmann, Robin Valenza, Michael Witmore, Michael Gleicher
Proceedings of the 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), page 173--182 — October 2014
Download the publication : Preprint.pdf [1.2Mo]  
Exploration and discovery in a large text corpus requires investigation at multiple levels of abstraction, from a zoomed-out view of the entire corpus down to close-ups of individual passages and words. At each of these levels, there is a wealth of information that can inform inquiry--from statistical models, to metadata, to the researcher's own knowledge and expertise. Joining all this information together can be a challenge, and there are issues of scale to be combatted along the way. In this paper, we describe an approach to text analysis that addresses these challenges of scale and multiple information sources, using probabilistic topic models to structure exploration through multiple levels of inquiry in a way that fosters serendipitous discovery. In implementing this approach into a tool called Serendip, we incorporate topic model data and metadata into a highly reorderable matrix to expose corpus level trends; extend encodings of tagged text to illustrate probabilistic information at a passage level; and introduce a technique for visualizing individual word rankings, along with interaction techniques and new statistical methods to create links between different levels and information types. We describe example uses from both the humanities and visualization research that illustrate the benefits of our approach.

Images and movies


BibTex references

  author       = "Alexander, Eric and Kohlmann, Joe and Valenza, Robin and Witmore, Michael and Gleicher, Michael",
  title        = "Serendip: Topic Model-Driven Visual Exploration of Text Corpora",
  booktitle    = "Proceedings of the 2014 IEEE Conference on Visual Analytics Science and Technology (VAST)",
  pages        = "173--182",
  month        = "October",
  year         = "2014",
  publisher    = "IEEE",
  ee           = "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7042493\&tag=1",
  doi          = "10.1109/VAST.2014.7042493",
  url          = "http://graphics.cs.wisc.edu/Papers/2014/AKVWG14"

Other publications in the database