TextDNA | UW Graphics Group

Display Overview

TextDNA leverages multiple visual displays to support visual analysis. The above image is the default TextDNA screen that appears when first loading a dataset. The components are as follows:

Primary Display: An overview display of the data. Sequences map to rows; elements map to colors within the rows.
Zoom Window: A detailed view of a specific blocked region within a row as selected by the user.
Label Bar: The names of the sequences represented by each row. Names are aligned with the sequence rows that they represent.
Histogram: A distribution of the matching sets of elements by frequency within the data. Elements are sorted along the x-axis according to the number of elements in the matching set and colored with respect to the color parameters of the primary display. The height of an element bar represents the average frequency of the element sets represented by the bar.
Navigation Pane: This panel contains a variety of options for interacting with and exploring the data in the primary display.

Navigation Pane

Aggregation Type

TextDNA offers two block encodings which display different statistical properties of the colors of the elements represented by the block.

Averaging: Each block in the sequences of the display pane is represented as a solid color representing the average value of the "Color By" parameter across elements in the block.
Color Weaving: Each block represents the approximate distribution of values of the "Color By" parameters across elements in the block. Each pixel is mapped to the value of one element. After each element is mapped to a pixel, the elements are randomized and mapped to the remaining pixels. The randomization repeats until the block is filled.

Color By

This list defines the property of an element that is mapped to the element color. There are five default property colorings. "Sequence Co-Occurrence" and "Rank In Reference" are split coloring properties: a portion of the elements are mapped to one ramp while the remainder are mapped to another. They require you to choose the primary "Color Scheme" and "Secondary Ramp."

Word Rank: the ordinal position of a word in a sequence; for raw text, it is the reading order position within the collection; for ranked text, it is the relative commonality of the word in the collection
Word Frequency: how often the word appears in the dataset as a whole; for raw text, it is the number of instances of the word within the corpus; for ranked text, it is the number of sequences the word appears in
Sequence Co-Occurrence: the specific set of sequences that a word occurs in
Rank in Reference: colors elements in a dataset according to their color scheme in a specific sequence
count: the frequency of each word within the the raw text of the underlying corpus
tag: optional, user-specified input for text-tagging
tfidf: colors data according to the importance of a word to the corpus (adjusts for the fact certain words occur more frequently than others (e.g., stop words) in a corpus; the tfidf value increases according to the number of times a word is used in a documet yet offsets by the frequency of the word in the overall corpus.

Order By

This menu section contains parameter settings for the ordering of elements within a Sequence. TextDNA provides four ordering options.

Word Rank: the ordinal position of a word in a sequence; for raw text, it is the reading order position within the collection; for ranked text, it is the relative commonality of the word in the collection
Word Frequency: how often the word appears in the dataset as a whole; for raw text, it is the number of instances of the word within the corpus; for ranked text, it is the number of sequences the word appears in
Sequence Co-Occurrence: the specific set of sequences that a word occurs in
Word Count: the frequency of each word within the the raw text of the underlying corpus

Match On

This menu selection specifies behavior in the display pane, specifically indicating how you wish to compare elements across sequences upon mouseover of element blocks. Options are Word, Rank, Frequency, Sequence Co-Occurrence, and Count.

Color Scheme

This dropdown contains color ramps that are mapped to elements. By default, TextDNA offers a variety of sequential and diverging color ramps, along with a greyscale.

Secondary Ramp

This dropdown is used to specify the secondary color ramps used when using split color encodings. If the same ramp is specified in the "Color Scheme" and "Secondary Ramp," split encodings will use the first half of the ramp as the "Color Scheme" and the second half as the "Secondary Ramp." The ramps are the same as those offered in the "Color Scheme" list.

Preset Configurations

This menu selection applies common Order By and Color By configurations. Presets are named for the type of data the configurations allow users to explores.

Co-Occurrence of Ordered Words

Color By: Word Rank & Order By: Sequence Co-Occurrence
For seeing word usage patterns in an entire dataset

Term order for Common Words

Color By: Sequence Co-Occurrence & Order By: Word Rank
For seeing what the most common words are and where they sequentially occur.

Frequency of Ordered Words

Color By: Word Frequency & Order By: Word Rank
For seeing the most popular words within each sequence in relation to their popularity in the dataset as a whole.

Frequency of Co-Occurring Words:
- Color By: Word Frequency & Order By: Sequence Co-Occurrence
- For seeing the popularity of the most common words in the dataset.

Words To Highlight

Entering a word or list of words separated by commas in this box and pressing enter will de-emphasize blocks where input words do not occur, leaving blocks with words matching input bright.

Words To Color

Entering a word or list of words separated by commas in this box and pressing enter will transform the color blocks with words that do not match input to grey, and map a contrasting color to blocks/pixels with words matching input.

Zoom Window

Left clicking on a block in the display pane shows the elements represented by the block in the zoom window.

You can interact with the elements in the zoom window by hovering over them.

If you left click on a word in the zoom window, TextDNA maps the behavior of the element across sequences. Here is the behavior of the word "humour."

Right-Click Behavior

Right-clicking on a block within a sequence provides several ways to interact with the GUI:

Set as Color Reference: colors the blocks of the entire corpus according to the color scheme of a selected sequence
Set as Order Reference: orders the words within an entire corpus according to the order of a selected sequence
Show Words: generates a window that displays the words mapped to a selected block and its properties
Clear Line Markings: removes lines mapping word behavior from main display
Clear Zoom Window: clears the contents of the zoom window