Figure 0: A Simple Example

This figure aims to give you a simple example of an "explainers diagram" so that you are more likely to be able to make some sense out of the more complicated ones in the paper.

An explainer is a function mapping between objects in a set (each one is a row of the data, if you like to think that way), and a number. It projects the set onto a single axis.

In this example, the set of objects are Shakespeare's 36 plays. The function was designed to measure the "comedicness" of a play - that is, that plays that are comedies should have higher values than plays that are not comedies. For this example, we have chosen a very simple function: a linear function that has only 2 variables (it selects two columns of the data matrix), and has unit coefficients. With such a simple function, we might not be so surprised that the function gets a few things "wrong" (that is, a non-comedy has a higher value than a comedy). The data (and the function) will be explained later.

The diagram consists of a number of parts.

The SVG files work in any browser with good SVG support. However, the html pages with embedded SVG seem to appear wrong in browsers other than Chrome. For some reason, the text gets messed up (probably a CSS issue).

The left most part is a list of the plays in the order of comedicness (so that the play with the highest value is at top, the one with the least value is lowest). Here, the colors are used to denote genre (green is comedy, yellow is history, purple is tragedy, and red are the "late plays"). Yes, there should be a caption labeling these colors. Some example things to notice:

You can see that generally, greens are on the top (so comedies are actually more comedic according to this function)
There are two tragedies (Othello and Romeo and Juliet) mixed in with the comedies. If you know these plays, this probably shouldn't surprise you.
Down at the bottom of the list, you can see things that are the "least comedic" according to this function.
For 36 objects, this stack of blocks (since the list is drawn that way) is already getting really tall. For larger data sets, we will almost always put a few objects on each row (so the blocks are in reading order). In fact, often we omit the text labels on the blocks (so you need to hover over the block to see what it is - clearly this doesn't work in a printed paper).

The next column is a stack of spline curves each connecting one of the plays on its left to a position on the right. The right hand side indicates the value of the function (ranging from the minimum value to the maximum value). So each of the little curves connects from an object's rank value to the object's value. The vertical scale (of the right side) is the same scale that is used for the next two columns (histogram and boxplot). Reading this, you should be able to notice:

The top two most comedic plays are a good amount more comedic (by the measure of this function) than the next one. The least comedic play is a pretty big step from the next biggest one, but the bottom 3 are really separated from the 4th.
There are no numbers (even though this is a quantitative axis). Because the function is pretty much unit-less (it can be scaled and shifted), I prefer to save the space. The top is the maximum, the bottom is the minimum. (except when I draw the diagrams upside down). In more complicated diagrams, space is a premium.

The next column is a stacked histogram showing the distribution of the data over the scale. For 36 items, this isn't so interesting - but it's here for completeness. There are fixed size buckets ranging from min value to max value (for this, there are only 7 bins to try to make the histogram look a little less empty). The vertical scale is the same as the right side of the connecting curves. What you should be able to see:

The top buckets are mainly "green".
Othello just missed being put into the top bucket (which would have made the top being green look a little less compelling). The reason to point this out is to again emphasize the connections that the curves are making between elements and positions (and therefore histogram bins).
The distribution doesn't look too much like any particular statistical distribution. If there was something to learn from the distribution, we would be able to see it in the histogram.

The last part of the display consists of 3 modified boxplots. I emphasize modified because its parts do not have the standard meaning. The line represents the range of the data (the whiskers are at the min and max). The boxes represent the inner quartiles, and the strip represents the median. The leftmost boxplot shows the distribution of the entire data set. It is always shown in gray. The other two boxplots show the two classes (comedies and non-comedies). In cases where the classes have colors, the boxplot is colored. So the rightmost one is green, since comedies are green. The middle boxplot (for non-comedies) is uncolored, since there is no "non-comedy" color (it is 3 different colors). Some things you can pick out from these boxplots:

You can get a sense of how much the classes overlap. Here, the inner quartile ranges do not overlap.
You can see the range of the different parts of the data. Again, the scale of the vertical axis is shared with the other parts of the diagram. So you can read across the see that the minimum comedy (bottom whisker of the rightmost boxplot) is "All's Well That Ends Well."

At the bottom there is some text describing the explainer's performance and what variables it depends on.

The top line lists various performance metrics. Here it gives the nth score (the non-threshold metric from the paper) and the mcc score (the Mathews correlation coefficient, a common metric for judging classifiers in machine learning).

nth score: If you take all pairs of objects, how often would the metric be right? For pairs where there isn't a right answer (two of the same class), there is no right answer. So this is the percentage of pairings of one from each class (e.g. comedies and non) where the one in the positive class has a higher value than the one from the negative class.
mcc score: The Mathews correlation coefficient is a normalized score from -1 to 1, where 1 means perfect performance and0 means chance performance. It takes into account the distribution of the data, so chance means how many you would get right if you always chose the larger class.

The lowest lines describe the linear function. This one is -1 * inclusive + 1 * metadiscourse. Inclusive and Metadiscourse are features (columns) in the data. For this example, these features come from Docuscope (see references in the paper), which basically counts the number of phrases in the document that are of this "type" (where the "types" are rhetorical categories defined by the creators of the Docuscope system).

If you're curious about how I chose this explainer, over all the other "comedicness" explainers, you should look at the explanation of the scagnostics plot in Figure 0s.

For some further practice with this, you might want to move on to Figure 0b or Figure 4.

Scaling Up

These diagrams get more complicated in a few different ways:

Often, you want to look at multiple ones side by side. Sometimes, this is to look at explanations of different concepts. (see Figure 0b for an example related to this one). Sometimes, you want to look at several different explainers of the same concept to compare them (see Figure 4 for an example related to this one).
When there are more than a few dozen objects, stacking them with their names starts to break down. There are examples of these kinds of scalings in the figures. For example, Figure 1 considers 140 cities, a few figures (like Fig 6) consider 180 acts of Shakespeare (36 plays each have 5 acts), the novels examples (Figure N) considers 343 novels, and the thousand text example considers 1080 historical texts.
The functions themselves get more complicated. Generally, we like to look at functions with few variables and integer coefficients. But sometimes, we need to look at more complicated functions (see Figure 4). The diagrams don't do much in terms of helping you interpret the functions.
The patterns displayed, and the questions people ask get more complicated. This is a natural consequence of the first three things. At this point, I can't make much of a claim that the explainers diagrams are very good for more complicated cases.