The Digital Parisian Stage is now on GitHub

For the past five years I’ve been working on a project, the Digital Parisian Stage, that aims to create a representative sample of Nineteenth-century Parisian theater. I’ve made really satisfying progress on the first stage, 1800 through 1815, which corresponds to the first volume of Charles Beaumont Wicks’s catalog, the Parisian Stage (1950). Of the initial one-percent sample (31 plays), I have obtained 24, annotated 15 and discarded three for length, for a current total of twelve plays.

At conferences like the Keystone Digital Humanities Conference and the American Association for Corpus Linguistics, I’ve presented results showing that these twelve plays cover a much wider and more innovative range of language than the four theatrical plays from this period in the FRANTEXT corpus, a sample drawn fifty years ago based on a “principle of authority.”

Just looking at declarative sentence negation, I found that in the FRANTEXT corpus the playwrights negate declarative sentences with the ne … pas construction 49 percent of the time. In the twelve randomly sampled plays, the playwrights used ne … pas 75 percent of the time to negate declarative sentences. Because this was a representative sample, I even have a p value below 0.01, based on a chi-square goodness of fit test!

This seems like a good point to release the twelve texts that I have OCRed and cleaned to the public. I have uploaded them to GitHub as HTML files. In this I have been partly inspired by the work of Alex Gil, now my colleague at Columbia University.

You can read them for your own entertainment (Jocrisse-maître et Jocrisse-valet is my favorite), stage your own production of them (I’ll buy tickets!) or use them as data for your scientific investigations. I hope that you will also consider contributing to the repository, by checking for errors in the existing texts, adding new texts from the catalog, or converting them to a different format like TEI or Markdown.

If you do use them in your own studies, please don’t forget to cite me along the lines given below, or even to contact me to discuss co-authorship!

Grieve-Smith, Angus B. (2016). The Digital Parisian Stage Corpus. GitHub. https://github.com/grvsmth/theatredeparis

Nobody’s Boy

I got a paper rejected from a generativist conference a few years ago. A generativist friend of mine said, “Why did you bother submitting your paper to that conference? You knew they were going to reject it.” I said, “Well, the conference was in town, so I figured I’d send something in anyway.”

My friend proceeded to tell me a story from her early grad school days about reviewing papers for her school’s signature conference. She sat down one evening with Professor Big Deal, who glanced through the stack of anonymous submissions and sorted them one by one into piles. “This is from one of Professor X’s students, and this is from one of Professor Y’s students. Here’s another from Professor X’s group. This must be Professor Z.” She continued like this until all the papers were sorted, and then as I recall she had some formula for allocating time to each professor and their students.

I think about this a lot, because I’m not a Student Of anyone in particular. On paper I may look like a student of Professor Bigshot, and that’s probably how my paper got accepted to a conference where Professor Bigshot was a keynote speaker. But I’m not really a Student Of Professor Bigshot. I didn’t ask her to be on my committee. And I know she doesn’t think of me as a Student Of hers, because she was sitting in front of me later in that conference, and walked out of the room right before it was my turn to present my paper.

My relationship with my actual advisor is Complicated, but suffice it to say that we don’t work in the same subfield of linguistics, and I’m tied to the New York area, where she doesn’t have the pull to get me a job anyway. My relationships with my other committee members are problematic in various ways. I’m on good terms with plenty of other linguists, but since I’m not their Student their loyalty to me is always secondary.

Even if my friend’s story about Professor Big Deal is an egregious outlier, it is still a regular occurrence to see professors co-authoring and co-presenting papers with their students, making introductions and writing letters. If you know me professionally, I can pretty much guarantee that we were not introduced by Professor Bigshot, or by any member of my committee. If you’ve seen me present my research, or read it anywhere, or hired me, it’s entirely through my own hard work. I have not had any of the advantages that come with being a Student Of anyone.

You could say that it’s my fault for not choosing the right advisors, or for the problems in my relationships with my advisors. In my defense I would argue that most of the problems in these relationships had to do with my supporting my wife’s progress on the tenure track and my kid’s not being in daycare ten hours a day over my own progress on the PhD. But even if you disagree, does that mean that I deserve to be a second-class citizen in the field?

I know I’m not the only academic orphan out there. Maybe we should get together and found a Home for Orphaned Linguists, where we can hope to someday be adopted by professors with generous allocations of reassigned time, who will co-author with us and introduce us and attend our talks. Some day…