Le Corpus de la scène parisienne

C’est l’année 1810, et vous vous promenez sur les Grands Boulevards de Paris. Vous avez l’impression que toute la ville, voir même toute la France, a eu la même idée, et est venue pour se promener, pour voir les gens et se faire voir. Qu’est-ce que vous entendez?

Vous arrivez à un théâtre, vous montrez un billet pour une nouvelle pièce, et vous entrez. La pièce commence. Qu’est-ce que vous entendez de la scène? Quels voix, quel langage?

Le projet du Corpus de la scène parisienne cherche à répondre à cette dernière question, avec l’idée que cela nous informera sur la première question aussi. Il s’appuie sur les travaux du chercheur Beaumont Wicks et des ressources comme Google Books et le projet Gallica de la Bibliothèque Nationale de France pour créer un corpus vraiment représentatif du langage du théâtre parisien.

Certains corpus sont construits à base d’une «principe d’autorité», qui tend à mettre les voix des aristocrates et des grands bourgeois au premier plan. Le Corpus de la Scène Parisienne corrige ce biais par se baser sur une échantillon tirée au sort. En incorporant ainsi le théâtre populaire, le Corpus de la Scène Parisienne permet au langage des classes ouvrières, dans sa représentation théâtrale, de prendre sa place dans le tableau linguistique de cette période.

La première phase de construction, qui couvre les années 1800 à 1815, a déjà contribué à la découverte des résultats intéressants. Par exemple, dans le CSP en 75% des négations de phrase on utilise la construction ne … pas, mais dans les quatre pièces de théâtre qui font partie du corpus FRANTEXT de la même période, on n’utilise ne … pas qu’en 49% des négations de phrase.

En 2016 j’ai créé un dépôt sur GitHub et commencé à y mettre les textes de la première phase en format HTML. Vous pouvez en lire pour vous amuser (Jocrisse-Maître et Jocrisse-Valet en particulier m’a amusé), les mettre sur scène (j’achèterai des places) ou bien les utiliser pour vos propres recherches. Peut-être vous voudriez aussi contribuer au dépôt, par corriger des erreurs dans les textes, ajouter de nouveaux textes du catalogue, ou convertir les textes en de nouveaux formats, comme TEI ou Markdown.

En janvier 2018 j’ai créé le bot spectacles_xix sur Twitter. Chaque jour il diffuse les descriptions des pièces qui ont débuté ce jour-là il y a exactement deux cents ans.

N’hésitez pas à utiliser ce corpus dans vos recherches, mais je vous prie de ne pas oublier de me citer, ou même me contacter pour discuter des collaborations éventuelles!

Why do people make ASL translations of written documents?

My friend Josh was puzzled to see that the City of New York offers videos of some of its documents, translated from the original English into American Sign Language, on YouTube. I didn’t know of a good, short explainer online, and nobody responded when I asked for one on Twitter, so I figured I’d write one up.

The short answer is that ASL and English are completely different language, and knowing one is not that much help learning the other. It’s true that some deaf people are able to lipread, speak and write fluent English, this is generally because they have some combination of residual hearing, talent, privilege and interest in language. Many deaf people need to sign for daily conversation, even if they grew up with hearing parents.

It is incredibly difficult to learn to read and write a language that you can’t speak, hear, sign or see. As part of my training in sign linguistics I spent time with two deaf fifth grade students in an elementary school in Albuquerque. These were bright, curious children, and they spent hours every day practicing reading, writing, speaking and even listening – they both had cochlear implants.

After visiting these kids several times, talking with them in ASL and observing their reading and writing, I realized that at the age of eleven they did not understand how writing is used to communicate. I asked them to simply pass notes to each other, the way that hearing kids did well before fifth grade. They did each write things on paper that made the other laugh, but when I tried giving them specific messages and asking them to pass those messages on in writing, they had no idea what I was asking for.

These kids are in their thirties now, and they may well be able to read and write English fluently. At least one had a college-educated parent who was fluent in both English and ASL, which helps a lot. Other factors that help are the family’s income level and a general skill with languages. Many deaf people have none of these advantages, and consequently never develop much skill with English.

The City could even print some of these documents in ASL. Several writing systems have been created for sign languages, some of them less complete than others. For a variety of reasons, they haven’t caught on in Deaf communities, so using one of those would not help the City get the word out about school closures.

The reasons that the City government provides videos in ASL are thus that ASL is a completely different language from English, many deaf people do not have the exceptional language skills necessary to read a language they don’t really speak, and the vast majority of deaf people don’t read ASL.

On this day in Parisian theater

Since I first encountered The Parisian Stage, I’ve been impressed by the completeness of Beaumont Wicks’s life’s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I’ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

Last week I had an idea for another project. Science requires both qualitative and quantitative research, and I’ve admired Neil Freeman’s @everylotnyc Twitter bot as a project that conveys the diversity of the underlying data and invites deep, qualitative exploration.

In 2016, with Timm Dapper, Elber Carneiro and Laura Silver I forked Freeman’s everylotbot code to create @everytreenyc, a random walk through the New York City Parks Department’s 2015 street tree census. Every three hours during normal New York active time, the bot tweets information about a tree from the database, in a template written by Laura that may also include topical, whimsical sayings.

Recently I’ve encountered a lot of anniversaries. A lot of it is connected to the centenary of the First World War I, but some is more random: I just listened to an episode of la Fabrique de l’histoire about François Mitterrand’s letters to his mistress that was promoted with the fact that he was born in 1916, one hundred years before that episode aired, even though he did not start writing those letters until 1962.

There are lots of “On this day” blogs and Twitter feeds, such as the History Channel and the New York Times, and even specialized feeds like @ThisDayInMETAL. There are #OnThisDay and #otd hashtags, and in French #CeJourLà. The “On this day” feeds have two things in common: they tend to be hand-curated, and they jump around from year to year. For April 13, 2014, the @CeJourLa feed tweeted events from 1849, 1997, 1695 and 1941, in that order.

Two weeks ago I was at the Annual Convention of the Modern Language Association, describing my Digital Parisian Stage corpus, and I realized that in the Parisian Stage there were plays being produced exactly two hundred years ago. I thought of the #OnThisDay feeds and @everytreenyc, and realized that I could create a Twitter bot to pull information about plays from the database and tweet them out. A week later, @spectacles_xix sent out its first automated tweet, about the play la Réconciliation par ruse.

@spectacles_xix runs on Pythonanywhere in Python 3.6, and accesses a MySQL database. It uses Mike Verdone’s Twitter API client. The source is open on GitHub.

Unlike other feeds, including this one from the French Ministry of Culture that just tweeted about the anniversary of the première of Rostand’s Cyrano de Bergerac, this one will not be curated, and it will not jump around from year to year. It will tweet every play that premièred in 1818, in order, until the end of the year, and then go on to 1819. If there is a day when no plays premièred, like January 16, @spectacles_xix will not tweet.
I have a couple of ideas about more features to add, so stay tuned!

And we mean really every tree!

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day’s work, but it was still a hasty patch job on top of Neil Freeman’s original Everylotbot code. I hadn’t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I’ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere’s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I’ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the dérive!

The Photo Roster, a web app for Columbia University faculty

Since July 2016 I have been working as Associate Application Systems in the Teaching and Learning Applications group at Columbia University. I have developed several apps, including this Photo Roster, an LTI plugin to the Canvas Learning Management System.

The back end of the Photo Roster is written in Python and Flask. The front end uses Javascript with jQuery to filter the student listings and photos, and to create a flash card app to help instructors learn their students’ names.

This is the third generation of the Photo Roster tool at Columbia. The first generation, for the Prometheus LMS, was famously scraped by Mark Zuckerberg when he extended Facebook to Columbia. To prevent future release of private student information, this version uses SAML and OAuth2 to authenticate users and securely retrieve student information from the Canvas API, and Oracle SQL to store and retrieve the photo authorizations.

It would be a release of private student information if I showed you the Roster live, so I created a demo class with famous Columbia alumni, and used a screen recorder to make this demo video. Enjoy!

Online learning: Definitely possible

There’s been a lot of talk over the past several years about online learning. Some people sing its praises without reservation. Others claim that it doesn’t work at all. I have successfully learned over the internet and I have successfully taught over the internet. It can work very well, but it requires a commitment on the part of the teacher and the learner that is not always present. In this series of posts I will discuss what has worked well and what hasn’t in my experience, specifically in teaching linguistics to undergraduate speech pathology majors.

Online learning is usually contrasted with an ideal classroom model where the students engage in two-way oral conversation, exercises and assessment with the instructor and each other, face to face in real time. In practice there are already deviations from this model: one-way lectures, independent and group exercises, asynchronous homeworks, take-home exams. The questions are really whether the synchronous or face-to-face aspects can be completely eliminated, and whether the internet can provide a suitable medium for instruction.

The first question was answered hundreds of years ago, when the first letter was exchanged between scholars. Since then people have learned a great deal from each other, via books and through the mail. My great-uncle Doc learned embalming through a correspondence course, and made a fortune as one of the few providers of Buddhist funerals in San Jose. So we know that people can learn without face-to-face, synchronous or two-way interaction with teachers.

What about the internet? People are learning a lot from each other over the internet. I’ve learned how to assemble a futon frame and play the cups over the internet. A lot of the core ideas about social science that inform my work today I learned in a single independent study course I took over email with Melissa Axelrod in 1999.

My most dramatic exposure to online learning was from 2003 through 2006. I read the book My Husband Betty, and discovered that the author, Helen Boyd, had an online message board for readers to discuss her book (set up by Betty herself). The message board would send me emails whenever someone posted, and I got drawn into a series of discussions with Helen and Betty, as well as Diane S. Frank, Caprice Bellefleur, Donna Levinsohn, Sarah Steiner and a number of other thoughtful, creative, knowledgeable people.

A lot of us knew a thing or two about gender and sexuality already, but Helen, having read widely and done lots of interviews on those topics, was our teacher, and would often start a discussion by posting a question or a link to an article. Sometimes the discussion would get heated, and eventually I was kicked off and banned. But during those three years I learned a ton, and I feel like I got a Master’s level education in gender politics. Of course, we didn’t pay Helen for this besides buying her books, so I’m glad she eventually got a full-time job teaching this stuff.

So yes, we can definitely learn things over the internet. But are official online courses an adequate substitute for – or even an improvement over – in-person college classes? I have serious doubts, and I’ll cover them in future posts.

Shelter from the tweetstorm

It’s happened to me too: I’m angry, or upset, or excited about something. I go on Twitter. I’ve got stuff to say. It’s more than will fit in the 140-character limit, but I don’t have the time or energy to write a blog post. So I just write a tweet. And then another, and another.

I’ve seen other people doing this, and I’m fine with it. But for a while now I’ve seen people doing something more planned, numbering their tweets. Many people try to predict how many tweets are going to be in a particular rant, and often fail spectacularly along the lines of Monty Python’ Spanish Inquisition sketch. Some people are clearly composing the whole thing ahead of time, as a unit. Sometimes they’re not even excited, just telling a story. It’s developing into a genre: the tweetstorm.

I get why people are reluctant to blog in these cases. If you’re already in Twitter and you want to write something longer, you have to switch to a different window, maybe log in, come up with a picture to grab people’s attention. Assuming you already have an account on a blogging platform. It doesn’t help that Twitter sees some of these as competitors and drags its feet on integrating them. And yes, mobile blogging apps still leave a lot to be desired, especially if you’ve got an intermittent connection like on the train.

People also tend to be drawn in easier one tweet at a time, like Beorn meeting the dwarves in the Hobbit. Maybe they don’t feel in the mood for reading something longer, or opening a web browser.

There may also be an aspect of live performance for the tweetstormer and the people who happen to be on Twitter while the storm is passing over, and the thread functions as an inferior archive of the performance, like concert videos. I can understand that too, but it’s a pain for the rest of us.

The problem is that Twitter sucks as a platform for reading longform pieces, or even medium-form ones. Yes, I know they’ve introduced “threading” features to make it easier to follow conversations. That doesn’t mean it’s easy to follow a single person’s multi-tweet rant. Combine that with other people replying in the middle of the “storm” and the original tweeter taking time in the middle to respond to them, and people using the quote feature and replying to quotes and quoting replies, and it gets really chaotic. If I bother to take the time, usually at the end it turns out it’s not worth it.

In terms of Bad Things on Twitter this is nowhere near the level of harassment and death threats, or even people livetweeting Netflix videos. But please, just go write a blog post and post a link. I promise I’ll read it.

What’s worse is that people are encouraging each other to do it. It’s one thing to get outraged on Twitter, or even to see someone else get outraged on Twitter and tell your followers to go check it out. It’s another when you know the whole thing is planned and you tell everyone to Read This. Now.

I get that you think it’s interesting, but that’s not enough for me. Tell me why, and let me decide if it’s worth my time to go reading through all those tweets in reverse chronological order. Better yet, storify that shit and tweet me the URL.

You know what would be even better? Tell that other tweeter, “What an awesome thread! It would make an even better blog post. Do you have a blog?”


At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

Printing differences and material issues in Google Books

I am looking forward to presenting my Digital Parisian Stage corpus and the exciting results I’ve gotten from it so far at the American Association for Corpus Linguistics at Iowa State in September. In the meantime I’m continuing to process texts, working towards a one percent sample from the Napoleonic period (Volume 1 of the Wicks catalog).

One of the plays in my sample is les Mœurs du jour, ou l’école des femmes, a comedy by Collin-Harleville (also known as Jean-François Collin d’Harleville). I ran the initial OCR on a PDF scanned for the Google Books project. For reasons that will become clear, I will refer to it by its Google Books ID, VyBaAAAAcAAJ. When I went to clean up the OCR text, I discovered that it was missing pages 2-6. I emailed the Google Books team about this, and got the following response:


I’m guessing “a material issue” means that those pages were missing from the original paper copy, but I didn’t even bother emailing until the other day, since I found another copy in the Google Books database, with the ID kVwxUp_LPIoC.

Comparing the OCR text of VyBaAAAAcAAJ with the PDF of kVwxUp_LPIoC, I discovered some differences in spelling. For example, throughout the text, words that end in the old fashioned spelling -ois or -oit in VyBaAAAAcAAJ are spelled with the more modern -ais in kVwxUp_LPIoC. There is also a difference in the way “Madame” is abbreviated (“Mad.” vs. “M.me“) and in which accented letters preserve their accents when set in small caps, and differences in pagination. Here is the entirety of Act III, Scene X in each copy:


Act III, Scene X in copy VyBaAAAAcAAJ

Act III, Scene X in kVwxUp_LPIoC

Act III, Scene X in copy kVwxUp_LPIoC

My first impulse was to look at the front matter and see if the two copies were identified as different editions or different printings. Unfortunately, they were almost identical, with the most notable differences being that VyBaAAAAcAAJ has an œ ligature in the title, while kVwxUp_LPIoC is signed by the playwright and marked as being a personal gift from him to an unspecified recipient. Both copies give the exact same dates: the play was first performed on the 7th of Thermidor in year VIII and published in the same year (1800).

The Google Books metadata indicate that kVwxUp_LPIoC was digitized from the Lyon Public Library, while VyBaAAAAcAAJ came from the Public Library of the Netherlands. The other copies I have found in the Google Books database, OyL1oo2CqNIC from the National Library of Naples and dPRIAAAAcAAJ from Ghent University, appear to be the same printing as kVwxUp_LPIoC, as does the copy from the National Library of France.

Since the -ais and M.me spellings are closer to the forms used in France today, we might expect that kVwxUp_LPIoC and its cousins are from a newer printing. But in Act II, Scene XI I came across a difference that concerns negation, the variable that I have been studying for many years. The decadent Parisians Monsieur Basset and Madame de Verdie question whether marriage should be eternal. Our hero Formont replies that he has no reason not to remain with his wife forever. In VyBaAAAAcAAJ he says, “je n’ai pas de raisons,” while in kVwxUp_LPIoC he says “je n’ai point de raisons.”

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 78) in kVwxUp_LPIoC

Act III, Scene XI (page 78) in kVwxUp_LPIoC

In my dissertation study I found that the relative use of ne … point had already peaked by the nineteenth century, and was being overtaken by ne … pas. If this play fits the pattern, the use of the more conservative pattern in kVwxUp_LPIoC goes against the more innovative -ais and M.me spellings.

I am not an expert in French Revolutionary printing (if anyone knows a good reference or contact, please let me know!). My best guess is that kVwxUp_LPIoC is from a limited early run, some copies of which were given to the playwright to give away, while VyBaAAAAcAAJ and the other -ais/M.me/ne … point copies are from a larger, slightly later, printing.

In any case, it is clear that I should pick one copy and make it consistent with that. Since VyBaAAAAcAAJ is incomplete, I will try dPRIAAAAcAAJ. I will try to double-check all the spellings and wordings, but at the very least I will check all of the examples of negation against dPRIAAAAcAAJ as I annotate them.

Introducing Selected Birthdays

If you have an Android phone like me, you probably use Google Calendar. I like the way it integrates with my contacts so that I can schedule events with people. I like the idea of it integrating with my Google+ contacts to automatically create a calendar of birthdays that I don’t want to miss. There’s a glitch in that, but I’ve created a new app to get around it, called Selected Birthdays.

The glitch is that the builtin Birthdays calendar has three options: show your Google Contacts, show your contacts and the people in your Google+ circles, or nothing. I have a number of contacts who are attractive and successful people, but I’m sorry to say I have no interest in knowing when their birthdays are. Natasha Lomas has even stronger feelings.

Google doesn’t let you change the builtin Birthdays calendar, but it does let you create a new calendar and fill it with the birthdays that interest you. My new web app, Selected Birthdays, automates that process. It goes through your contacts, finds the ones who have shared their birthdays with you, and gives you a checklist. You decide whose birthdays to include, and Select Birthdays will create a new calendar with those birthdays. It’ll also give you the option of hiding Google’s built-in birthday calendar.

I wrote the Selected Birthdays app in Javascript with the Google+ and Google Calendar APIs. Ian Jones was a big help in recommending the moment.js library, which I used to manipulate dates. Bootflat helped me add a bit of visual style.

For the app to work you’ll have to authorize it to read your contacts and write your calendars. For your privacy, the app communicates directly between your browser and Google’s server; once you download it there is no further contact with my server. There is no way for me to see or edit your contacts or calendars. You can verify that in the source code.

Please let me know if you have any comments, questions or suggestions. I have also made the code available on GitHub for free under the Apache License, if you want to build on it. A number of people have said they wish they had an app like this for Facebook. If enough of you repeat that, I’ll look into it!