Explainers Supplementary Material

Figure 0b: Simple Genre Classifiers

This is an exampe that is also not in the paper. It's a simple one to help you get used to looking at explainers diagrams, and give a sense of how we can make explainers for different concepts. It buids on Figure 0, so you might want to start there.

As we mentioned, Shakespeare's plays are categorized into 4 genres (comedies, histories, tragedies, and late plays).

The data here is again the "Docuscope" measurements of each play. So each play gets turned into a vector of 115 numbers by counting the kinds of words it contains (and normalizing).

Explainers are made for each genre. The explainer for genre X tries to give higher values to all of the Xs than the non-Xs. For each genre, we can make many different explainers. Here we have chosen (for each genre) one of the simplest: 2 variable, linear functions with small integer coefficients. The four "explainer diagrams" are put side by side.

The colors are green for comedy, yellow for history, purple for tragedy, and red for late plays.

Here, you should quickly notice that there are 4 explainer diagrams. In the leftmost one, the greens are generally above the other colors (since it is a comedy explainer). Similarly, the yellows are generally higher in the second one (which tries to put histories at the top), and so on.

The modified boxplots are drawn a little bit differently: for each explainer, there are 5 modified boxplots. One for the entire data (gray) and one for each genre (colored appropriately). I have no good reason for doing this.

You can use this diagram for some practice in reading explainer diagrams. Notice that King Henry the 4th is an outlier by being extremely non-comedic, or how Othello often gets mixed in with the comedies (even when the explainer is not trying to explain comedicness).

Unless you are a literature scholar and familiar with Docuscope, its unlikely that the variables will mean much to you. But basically, this tells a domain scholar that comedies are generally in words in the "Inclusive" category and low in the "Immediacy" category. Just having one of these properties isn't as distinguishing as having both). To figure out what this might really mean, the scholar would go back and look at the texts. For example since we know that King Henry 4th is really non-comedic, it's probably a good place to start to look for those properties (or really the opposite of them). We have built other tools for connecting these kinds of explainers back to the texts (see the Correll et al. 2011 paper).

The fact that the kinds of words used in different genres are different is interesting to literature scholars (at least the ones we collaborate with).