Flu. What is Thisby? A wandring knight? Quin. It is the Lady, that Pyramus must loue. Fl. Nay faith: let not me play a wom?: I haue a beard c?-(ming. Quin. Thats all one: you shall play it in a Maske: and you may speake as small as you will. Bott. And I may hide my face, let me play Thisby to: Ile speake in a monstrous little voice; Thisne, Thisne, ah Py-, ramus my louer deare, thy Thysby deare, & Lady deare. Qu. No, no: you must play Pyramus: & Flute, you Thysby.

Language change has been the focus of my research for over twenty years now, so when I taught second semester linguistics at Saint John’s University, I was very much looking forward to teaching a unit focused on change.  I had been working to replace constructed examples with real data, so I took a tip from my natural language processing colleague Dr. Wei Xu and turned to SparkNotes.

I first encountered SparkNotes when I was teaching French Language and Culture, and I assigned all of my students to write a book report on a work of French literature, or a book about French language or culture.  I don’t remember the details, but at times I had reason to suspect that one or another of my students was copying summary or commentary information about their chosen book from SparkNotes rather than writing their own.

When I was in high school, my classmates would make use of similar information for their book reports.  The rule was that you could consult the Cliffs Notes for help understanding the text, but you weren’t allowed to simply copy the Cliffs Notes.

Modern Text

Who’s Thisbe? A knight on a quest?

Thisbe is the lady Pyramus is in love with.

No, come on, don’t make me play a woman. I’m growing a beard.

That doesn’t matter. You’ll wear a mask, and you can make your voice as high as you want to.

In that case, if I can wear a mask, let me play Thisbe too! I’ll be Pyramus first: “Thisne, Thisne!”—And then in falsetto: “Ah, Pyramus, my dear lover! I’m your dear Thisbe, your dear lady!”

No, no. Bottom, you’re Pyramus.—And Flute, you’re Thisbe.

When I discovered SparkNotes I noticed that for some older authors – Shakespeare, of course, but even Dickens – they not only offered summaries and commentary, but translations of the text into contemporary English.  It was this feature I drew on for the unit on language change.

While I was developing and teaching this second semester intro linguistics course at Saint John’s, I was also working as a linguistic annotator for an information extraction project in the NYU Computer Science Department.  I met a doctoral student, Wei Xu, who was studying a number of interesting corpora, including Twitter, hip-hop and SparkNotes. Wei graduated in 2014, and is now Assistant Professor of Computer Science and Engineering at Ohio State.

Wei had realized that the modern translations on SparkNotes and eNotes, combined with the original Shakespearean text, formed a parallel corpus, a collection of texts in one language variety that are paired with translations in another language variety.  Parallel corpora, like the Canadian Hansard Corpus of French and English parliamentary debates, are used in translation studies, including for training machine translation software. Wei used the SparkNotes/eNotes parallel Shakespeare corpus to generate Shakespearean-style paraphrases of contemporary movie lines, among other things.

When it came time to teach the unit on language change at Saint John’s, I found a few small exercises that asked students to compare older literary excerpts with modern translations.  Given the constraints of this being one unit in a survey course, it made sense to focus on the language of instruction, English. The Language Files had one such exercise featuring a short Chaucer passage.  In general, when working with corpora I prefer to look at larger segments, ideally an entire text but at minimum a full page.

I realized that I could cover all the major areas of language change – phonological, morphological, syntactic, semantic and pragmatic – with these texts.  Linguists have been able to identify phonological changes from changes in spelling, for example that Chaucer’s spelling of “when” as “whan” indicates that we typically put our tongues in a higher place in our mouths when pronouncing the vowel of that word than people did in the fourteenth century.

When teaching Shakespeare to college students it is common to use texts with standardized spelling, but we now have access to scans of Shakespeare’s work as it was first published in his lifetime or shortly after his death, with the spellings chosen by those printers.  This spelling modernization is even practiced with some nineteenth century authors, and similarly we have access to the first editions of most words through digitization projects like Google Books.

With this in mind, I created exercises to explore language change.  For a second semester intro course the students learned a lot from a simple scavenger hunt: compare a passage from the SparkNotes translation of Shakespeare with the Quarto, find five differences, and specify whether they are phonological, morphological, syntactic, semantic or pragmatic.  In more advanced courses stufents could compare differences more systematically.

This comparison is the kind of thing that we always do when we read an old text: compare older spellings and wordings with the forms we would expect from a more modern text.  Wei Xu showed us that the translations and spelling changes in SparkNotes and eNotes can be used for a more explicit comparison, because they are written down based on the translators’ and editors’ understanding of what modern students will find difficult to read.

As I have detailed in my forthcoming book, Building a Representative Theater Corpus, we must be careful not to generalize universal statements, including statements about prevalence, to the language as a whole.  This is especially problematic when we are looking at authors who appealed to elite audiences, but it applies to Shakespeare and Dickens as well.  Existential observations, such as that Shakespeare used bare not (“let me not”) in one instance where SparkNotes used do-support (“don’t let me”) are much safer.

My students seemed to learn a lot from this technique.  I hope some of you find it useful in your classrooms!

    This certainly ought to provide fascinating comparisons for (for lack of a better term) CALPs – but a better parallel corpus could perhaps be found for BICs; have any historians modernized old court transcriptions perhaps?

