Explainers Supplementary Material

Figure N (not in paper): Novels

Section 6.2 describes some experiments using Docuscope data for a collection of 343 18th and 19th century novels. Space precluded including the pictures. We include some here, primarily to show some of the issues is trying to portray this kind of data.

The explainers are trained for particular authors. For this first figure, the 9 novels of Jane Austen are the positive training examples, and the other 334 novels are the negative set. There are 7 explainers with 3 variables and small integer weights that perfectly distinguish Austen's novels from the rest. Those 7 are not what are illustrated here (two of them are, followed by some other simpler explainers that perform well - including the best single variable and pair of variables).

Here, purple are novels written by Austen and green are novels written by others. Hover over the blocks to see the information about each novel.

For Charles Dickens, there are no 3 variable explainers that can completely correctly distinguish his novels from the others. In fact, the simplest completely correct linear classifier that the SVM parameter search finds has 22 variables.

The explainers still perform well enough that they are probably capturing the essence of Dickens' writing. In fact, many of the "most Dickensian" novels (according to these explainers) were written by authors thought to have been trying to emulate Dickens' style.

The low mcc scores are probably an artifact of the optimization that tries to select the threshold. It optimizes for best accuracy, not the best mcc. The nth score is not based on the threshold.