A tool for annotating corpora

My dissertation focused on the evolution of negation in French, and I’ve continued to study this change. In order to track the way that negation was used, I needed to collect a corpus of texts and annotate them. I developed a MySQL database to store the annotations (and later the texts themselves) and a suite of PHP scripts to annotate the texts and store them in the database. I then developed another suite of PHP scripts to query the database and tabulate the data in a form that could be imported into Microsoft Excel or a more specialized statistics package like SPSS.

I am continuing to develop these scripts. Since I finished my dissertation, I added the ability to load the entire text into the database, and revamped the front end with AJAX to streamline the workflow. The new front end actually works pretty well on a tablet and even a smartphone when there’s a stable internet connection, but I’d like to add the ability to annotate offline, on a workstation or a mobile device. I also need to redo the scripts that query the database and generate reports. Here’s what the annotation screen currently looks like:

I’ve put many hours of work into this annotation system, and it works so well for me, that it’s a shame I’m the only one who uses it. It would take some work to adapt it for other projects, but I’m interested in doing that. If you think this system might work for your project, please let me know (grvsmth@panix.com) and I’ll give you a closer look.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.