TextDNA was inspired by conversations with Robin Valenza (University of Wisconsin-Madison) that illuminated analogies between the ways in which evolutionary biologists read genetic sequences and literary scholars read texts. Drawing on these early analogies, we generated new models for dealing with text collections at scale. TextDNA builds on the original configurable colorfield approach proposed in Sequence Surveyor, with metrics specifically adapted to text analysis and new interaction techniques that help connect high-level patterns to individual words.

Read more about TextDNA: D.A. Szafir, D. Stuffer, Y. Sohail, & M. Gleicher. “TextDNA: Visualizing Word Usage Patterns with Configurable Colorfields.” Computer Graphics Forum. 35 (3), 2016. (In the Proceedings of the 2016 Eurographics/IEEE Conference on Visualization) PDF

TextDNA was made possible by support from the Andrew W. Mellon Foundation grant for the Visualizing English Project and Comparisons (NSF Award IIS-1162037) . It is released under a BSD license.


Credit for TextDNA belongs to Danielle Albers Szafir, who originally developed the program and website documentation. The TextDNA user interface was further developed by Yusef Sohail under the direction of Szafir. The two fine-tuned the system for raw text in consultation with Deidre Stuffer, who provided documentation for raw text manipulation and generated the test dataset with documentation to help users learn TextDNA functions. Erin Winter provided scripts to generate csv datasets from plaintext input for the public release.


Danielle Albers Szafir is an Assistant Professor and Founding Faculty Member of Information Science, Affiliate Professor of Computer Science, and Fellow in the Institute of Cognitive Science at the University of Colorado Boulder. Her research sits at the intersection of information visualization, human-computer interaction, data science, computer graphics, and cognitive science. Through this work, she develops interactive technologies that allow people to explore large collections of data in fields ranging from biology to the humanities.

Yusef Sohail is currently pursuing an undergraduate degree in Computer Science at the University of Wisconsin-Madison, where he contributed to the development of the TextDNA user interface. Outside of academics, he works on projects featuring some combination of real-time graphics, networking, and game engine architecture.

Deidre Stuffer is an English graduate student and research assistant for the Visualizing English Print project at the University of Wisconsin-Madison. She specializes in 18th-century British literature and digital humanities. Her work focuses on digital corpus research and curation, documenting visualization tools.

Erin Winter is a first-year graduate student in the University of Wisconsin-Madison’s Computer Sciences program. She has spent three years on digital humanities projects, and her research interests include databases, machine learning, and human-computer interaction.

Email danielle.szafir@colorado.edu for more information.