A tool for annotating corpora

My dissertation focused on the evolution of negation in French, and I’ve continued to study this change. In order to track the way that negation was used, I needed to collect a corpus of texts and annotate them. I developed a MySQL database to store the annotations (and later the texts themselves) and a suite of PHP scripts to annotate the texts and store them in the database. I then developed another suite of PHP scripts to query the database and tabulate the data in a form that could be imported into Microsoft Excel or a more specialized statistics package like SPSS.

I am continuing to develop these scripts. Since I finished my dissertation, I added the ability to load the entire text into the database, and revamped the front end with AJAX to streamline the workflow. The new front end actually works pretty well on a tablet and even a smartphone when there’s a stable internet connection, but I’d like to add the ability to annotate offline, on a workstation or a mobile device. I also need to redo the scripts that query the database and generate reports. Here’s what the annotation screen currently looks like:


I’ve put many hours of work into this annotation system, and it works so well for me, that it’s a shame I’m the only one who uses it. It would take some work to adapt it for other projects, but I’m interested in doing that. If you think this system might work for your project, please let me know (grvsmth@panix.com) and I’ll give you a closer look.

Just so stories in French negation

Just So stories were named by Rudyard Kipling in his book of the same name, which contained stories like “How the Rhinoceros Got his Skin.” In that one, the rhino’s skin starts out tight, but after he takes it off to swim, a man put crumbs in it to take revenge for the rhino eating his cake. When the rhino put his skin back on, it itched so much he loosened it up with all his scratching. Presumably something similar happened with basset hounds.

Illustration: Joseph M. Gleeson

These stories can be fun, especially for kids who ask “why?” and won’t take “I don’t know” for an answer. They’re entertaining, but they’re not science and they’re not history. Even if they’re broadly consistent with a scientific theory, if they’re not based on actual data, they’re just fiction.

This is different from the normal simplification that happens in scientific explanations. We know that the Earth is not a perfectly round sphere, that it bulges out a little at the equator. Sometimes it’s enough to think of the world as round, and nobody needs to worry about oblate spheroids.

The main difference is that scientific simplification removes distracting detail from the raw data to allow the bigger picture to be seen more clearly, but Just So stories add detail that doesn’t exist in the data, and may actually create a picture that doesn’t exist. This is why, as science, they are so dangerous.

Linguistics is certainly no stranger to Just So stories. The most famous may be the old chestnut that the Eskimos have a hundred (or a thousand, or…) words for snow. This has long been used to illustrate the effect of environment on language, even though Geoffrey Pullum famously showed it to be false in 1989.

Just So stories are also found in the history of French negation, the subject of my dissertation. There is a story that you will find in almost every article or book discussing the evolution of negation. Here’s the version from Detges and Waltereit (2002):

As a standard example of grammaticalization, consider the French negation ne … pas. A lexical item, the Latin full noun passus ‘step’, has turned into a grammatical item, the Modern French negation marker pas.

(3) a. Before grammaticalization: Latin
non vado   passum
NEG go:lsG step:ACC
'I don't go a step'

b. After grammaticalization: Modern French
je ne vais   pas
'I don't go'

Reading this, I assumed that Detges and Waltereit have some attestations of non vado passum in Latin. That’s the way science works, and history. We do experiments to collect data, and we base our stories of the past on documents and artifacts. In historical linguistics we have what people wrote, and we have reconstructions. Because the reconstructions are less reliable as evidence, we mark them with asterisks.

I was all ready to repeat this story as I told the history of French negation. In fact, one of my professors suggested that I look for evidence of pas being initially restricted to verbs of motion, then gradually used with a broader and broader range of verbs. I did look, but I discovered that it’s just a story. We don’t have any evidence that anyone ever wrote non vado passum, other than linguists talking about grammaticization.

What I did find was this excellent three-part opus on Romance negation by Alfred Schweighäuser, published in 1851-52, digitized to PDF by Google Books and extracted for your convenience here (section 1, section 2, section 3). In section 3 (Part 2), he takes you on a very thorough tour of all the expressions that have been used to “supplement” negation in Latin and its descendants over the years. After spending some time discussing ne … pas, he concludes:

Observons toutefois que cette modification apportée au sens du mot pas est antérieure aux plus anciens monuments de la langue. Si haut que nous remontions dans le cours des siècles, les textes ne nous montrent jamais cette négation explétive que privée de l’article, et jointe indiféremment à des verbes de toute signification.

Let us note in any case that this modification made to the sense of the word pas is earlier than the most remote works of the language. No matter how far back we look across the centuries, the texts only show us that negation shorn of its article and combined indifferently with verbs from any semantic field.

One thing I find remarkable about this is that these aspects of language change were known and studied 161 years ago. And yet it was only a year later, in 1853, that P.L.J.B. Gaussin gave us our first citation of non vado passum:

Nous avons encore à parler d’une dernière modification que quelques mots subissent : elle a lieu lorsque, par suite d’un emploi très-fréquent, ils ne deviennent que de simples formes grammaticales. C’est un fait que nous aurons l’occasion de vérifier en polynésien ; nous en trouvons d’ailleurs de nombreux exemples dans nos langues d’Europe : on connaît l’origine des négations françaises pas et point ; on a d’abord dit non vado passum ou passu, je ne vais d’un pas ; non video punctum, je ne vois un point. Pas et point, par un usage devenu de plus en plus général, n’ont plus été par la suite que de simples signes grammaticaux.

We have yet to discuss one last modification that certain words undergo. It happens when, in the course of very frequent usage, they are transformed into simple grammatical forms. This is a fact that we will have the opportunity to confirm in Polynesian; we also find many examples in our European languages. We know the origin of the French negations pas and point: people first said non vado passum or passu, I am not going one step, non video punctum, I do not see one point. Pas and point, by virtue of more and more general usage, have become nothing more than simple grammatical signs.

Schweighäuser and Gaussin perfectly illustrate the difference between history and Just So stories. Schweighäuser combs through Latin and Old French texts in detail to find all the different ways that the words are used. His wealth of detail is perfectly appropriate for his task, but the story could be told to outsiders in a compelling way by simply omitting some of that detail. There are many examples of this kind of semantic broadening with other constructions; those could have been used instead. But Gaussin doesn’t do that. He just makes stuff up.

It is obviously silly to single out Detges and Waltereit for this Just So story, since it came from Gaussin, and has been handed down ever since. But other than a brief mention in 1907, it was dormant until Lüdtke (1980) revived it. It seems to have been most widely propagated by Paolo Ramat in 1987.

Looking back on this, I appreciate my professor’s invitation to re-examine this story rather than simply repeating it. We should do that with all of our standard stories, to find out which ones are supported by the data, and which are Just So.

Two changes in French negation

I realized today that I hadn’t yet blogged about my dissertation, the Spread of Change in French Negation. That’s too bad, because I like my dissertation topic. It’s fun, and it’s interesting.

You may see here, from time to time, posts about my dissertation research. I’ll try to make them accessible to anyone, not just the specialized audience that I wrote the dissertation for. If you have a reaction or a question I hope you’ll comment or send me an email. If there’s anything you don’t understand, please tell me, because I mean for this blog to be easy to understand.

When I studied French in high school, I learned the standard line: that to negate a sentence you put ne before the verb and pas after it: Je sais becomes Je ne sais pas. But then my teachers were smart enough to show me a movie that aimed for authentic language. Diva, the 1981 action film, features a moped chase in the Paris Métro, and a pair of grumpy hitmen. One of the gangsters is a man of few words, but he repeatedly takes the time to say that he doesn’t like whatever’s at hand. And in one scene with cars, he says, “J’aime pas les bagnoles.” In case our French wasn’t good enough, we had the subtitle: I don’t like cars.

I laughed, I repeated the line, mimicking Dominique Pinon’s terse delivery. Then I realized: what happened to the ne? The other lines where the hitman declared his dislike for elevators and other burdensome features of the environment were also missing the ne. And years later when I went to live in Paris and walk through the same métro stations, I heard lots of negation with the pas only, no ne. I learned to negate my own sentences with just a casual pas after the verb, because when in Paris, do as the Parisians do.

Another six years later, in a class on Frequency Effects in Language Change, Joan Bybee asked us to pick a change for our term project. I chose to look at French negation. I was sure the story of the missing ne would turn out to be a compelling one.

I was right. It was so compelling that it already had a big literature on it. Worse, because it had only recently entered mainstream media, the data on ne-dropping were hard for me to get in time for a term paper. But as I looked further back in time, I discovered an earlier change. This one had been studied a lot, but not quite as much, and there was quite a lot of data. This was the original addition of pas to the ne. Or, as I was to find out, the large increase in the use of ne … pas.

Want to read the rest of the story? Stay tuned to this blog. If you can’t wait, go read my dissertation. Oh, and ask if you have questions!