The Digital Parisian Stage is now on GitHub

For the past five years I’ve been working on a project, the Digital Parisian Stage, that aims to create a representative sample of Nineteenth-century Parisian theater. I’ve made really satisfying progress on the first stage, 1800 through 1815, which corresponds to the first volume of Charles Beaumont Wicks’s catalog, the Parisian Stage (1950). Of the initial one-percent sample (31 plays), I have obtained 24, annotated 15 and discarded three for length, for a current total of twelve plays.

The Théâtre de la Porte Saint-Martin. Watercolor and gouache by Jean-Baptiste Lallemand
The Théâtre de la Porte Saint-Martin. Watercolor and gouache by Jean-Baptiste Lallemand

At conferences like the Keystone Digital Humanities Conference and the American Association for Corpus Linguistics, I’ve presented results showing that these twelve plays cover a much wider and more innovative range of language than the four theatrical plays from this period in the FRANTEXT corpus, a sample drawn fifty years ago based on a “principle of authority.”

Just looking at declarative sentence negation, I found that in the FRANTEXT corpus the playwrights negate declarative sentences with the ne … pas construction 49 percent of the time. In the twelve randomly sampled plays, the playwrights used ne … pas 75 percent of the time to negate declarative sentences. Because this was a representative sample, I even have a p value below 0.01, based on a chi-square goodness of fit test!

This seems like a good point to release the twelve texts that I have OCRed and cleaned to the public. I have uploaded them to GitHub as HTML files. In this I have been partly inspired by the work of Alex Gil, now my colleague at Columbia University.

You can read them for your own entertainment (Jocrisse-maître et Jocrisse-valet is my favorite), stage your own production of them (I’ll buy tickets!) or use them as data for your scientific investigations. I hope that you will also consider contributing to the repository, by checking for errors in the existing texts, adding new texts from the catalog, or converting them to a different format like TEI or Markdown.

If you do use them in your own studies, please don’t forget to cite me along the lines given below, or even to contact me to discuss co-authorship!

Grieve-Smith, Angus B. (2016). The Digital Parisian Stage Corpus. GitHub. https://github.com/grvsmth/theatredeparis

This entry was posted in Digital humanities, French. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *