Science Writing and Style in VEP’s Super Science Corpus
This image shows a scatterplot of the Super Science corpus (each dot represents a text) and highlights the Philosophy of Science category:
In this category are texts by a number of the famous early modern scientists (or scientific theorists), who are concerned with questions of methodology, morality and science as a system of knowledge – figures like Francis Bacon, Robert Boyle and Margaret Cavendish. If we look at some of the DocuScope LATs that feature in these texts, we find things like Confidence, Uncertainty, Question and Contingency, types of language that may be used in the service of discursive writing. That these texts are clustering mostly on the right hand side of the graph suggests that there is something that differentiates their style from texts that group in other areas of the scatterplot.
For example, this image highlights texts from the Materia Medica/Medical Recipe Books category:
The texts in this upper left quadrant exhibit LATs such as Motions, Sense Property and Imperative, types of language that describe objects, actions and provide instruction. What we are seeing is a divide in the PCA space between procedural and discursive writing.
The common occurrence of Subjective Perception (i.e. observation that tells us as much about the perceiver as the perceived) and Private Thinking is also a feature of the right side of the PCA space. If we want to get a more concrete idea of how far ‘subjectivity’ is characteristic of a particular genre, like Philosophy of Science, we can create graphs that test individual LATs across genres:
The graph shows that Private Thinking occurs most frequently in Philosophy of Science, followed by the closely related Science-Religion category.
The scatterplot can also tell us interesting things about style. For example, in the Philosophy of Science scatterplot above, the outlier that has been marked with a triangle is Rene Descartes’ Meditations. In order to get a sense of what may be drawing this text away from the rest of the corpus, we can look at the Docuscope tagged html file of the text.
View the DocuScope file here: http://vep.cs.wisc.edu/corpora/docuscope_v321/non-dramatic/a35750headed_Docuscope.html. When opened, the file looks like this:
The column on the left of the page lists individual LATs according to frequency within the text. When you click on a LAT it highlights every instance of this LAT in the text.
Descartes Meditations scores highly on LATs such as, Private Thinking, First Person, Self-Disclosure and Uncertainty, types of language that deal with subjectivity, inner thought and the disclosure of personal opinion.
The notion of subjectivity raises some interesting questions about the scientific texts we see drawn to the right of the graph, compared to the rest of the corpus. For example, does this indicate a more modern, or individualistic approach to the study of science? Is Descartes leading the way with a hyper-reflexive mode of philosophical writing? If so, how do we square emerging scientific ideas of objectivity and ‘matters of fact’ with the subjective perception of the individual who reports these facts?
By Alan Hogarth