Figure 0s: Scagnostics of the simple example

This is a "scagnostics" display of the explainers generated for the simple example of Figure 0. In the example, we created an explainer for the concept of "comedicness" - by finding a function that valued comedies (generally) higher than non-comedies. The particular one chosen is a simple one (two variables, unit weights). How did we pick that one?

There are many possible explainers of "comedicness." Which one is best involves a tradeoff between correctness (really putting comedies above non-comedies), simplicity (for various metrics of how simple the functions are), and some other things (that don't apply as much to this example).

The way we choose amongst the tradeoffs is by generating a large number of explainers, and picking the ones that seem to be the sweet spots in this tradeoff.

For this example, we used the greedy algorithm (in the paper) to generate approximately 2000 1-4 variable functions. We then quantized these to various levels (to have integer coefficients). This gives us several thousand explainers to choose from.

To get a sense of this collection of explainers, we use a "scagnostics" style plot. I use scarequotes since its in the spirit of scagnostics, not the actual metrics proposed by Tukey (or Wilkinson and colleagues). In this kind of display, each explainer function is a point in high-dimensional space, where the axes are different metrics we might apply. Here, I am just using 5 metrics (since it's hard to view more than 5 dimensions): 3 of correctness (mcc, nth, and margin), and 2 for simplicity (number of variables and quantization level).

Showing this in a scatterplot matrix leads to a bit of a mess. We have thousands of points (each point is a function - and it appears on each of the graph), so there is horrendous overdraw. The color of the points represents the number of variables: red is 1, purple 2, blue 3, cyan 4. Lower numbered colors are always drawn on top of higher numbered ones (so if you see a blue dot, there might be cyan dots hiding underneath it).

What can you learn from this?

First, you can see that the system tried a few different levels of quantization (1,2,3,5,10).

Start by looking at the mcc (Mathews correlation coefficient) vs. nvars (number of variables) plot (4th row, 3rd one in). Remember that an mcc score of 1 is perfect. Notice that the best 1 variable explainer gets about .7, the best 2 variable explainer get close to 1. The best 3 and 4 variable explainers get perfect scores.

If you look carefully (it's very hard to see in this scatterplot) at the qlevel (quantization level) vs. mcc score, you can there is a blue dot at the upper right corner. In order to achieve the perfect mcc score with 3 variables (blue), you need a quantization level of 10. Notice there's a green dot at (5,1) - so there is a 4 variable explainer that gets a perfect score with a quantization level of 5. (there is probably also a green dot at (10,1), but its covered by the blue dot).

If you look at the margin plots, you can see the the 4 variable explainers can get much larger margins (expressed as a portion of the overall range) than the explainers with fewer variables. You might care about this if you care about machine learning theory (the theory of SVMs is based on maximizing the margins), or want to make pictures like Figure 4.

So which choice is best? It depends! The best 2 variable explainer is really simple, but gets a few wrong. So it was a good choice for Figure 0. If we really cared about perfect correctness, the 3 variable explainer might be a good choice, but it requires a more complicated set of coefficients. A four variable explainer can get this performance and bigger margins and simpler coefficients - but 4 variables might be too complicated for interpretation.