On this day in Parisian theater

Since I first encountered The Parisian Stage, I’ve been impressed by the completeness of Beaumont Wicks’s life’s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I’ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

Last week I had an idea for another project. Science requires both qualitative and quantitative research, and I’ve admired Neil Freeman’s @everylotnyc Twitter bot as a project that conveys the diversity of the underlying data and invites deep, qualitative exploration.

In 2016, with Timm Dapper, Elber Carneiro and Laura Silver I forked Freeman’s everylotbot code to create @everytreenyc, a random walk through the New York City Parks Department’s 2015 street tree census. Every three hours during normal New York active time, the bot tweets information about a tree from the database, in a template written by Laura that may also include topical, whimsical sayings.

Recently I’ve encountered a lot of anniversaries. A lot of it is connected to the centenary of the First World War I, but some is more random: I just listened to an episode of la Fabrique de l’histoire about François Mitterrand’s letters to his mistress that was promoted with the fact that he was born in 1916, one hundred years before that episode aired, even though he did not start writing those letters until 1962.

There are lots of “On this day” blogs and Twitter feeds, such as the History Channel and the New York Times, and even specialized feeds like @ThisDayInMETAL. There are #OnThisDay and #otd hashtags, and in French #CeJourLà. The “On this day” feeds have two things in common: they tend to be hand-curated, and they jump around from year to year. For April 13, 2014, the @CeJourLa feed tweeted events from 1849, 1997, 1695 and 1941, in that order.

Two weeks ago I was at the Annual Convention of the Modern Language Association, describing my Digital Parisian Stage corpus, and I realized that in the Parisian Stage there were plays being produced exactly two hundred years ago. I thought of the #OnThisDay feeds and @everytreenyc, and realized that I could create a Twitter bot to pull information about plays from the database and tweet them out. A week later, @spectacles_xix sent out its first automated tweet, about the play la Réconciliation par ruse.

@spectacles_xix runs on Pythonanywhere in Python 3.6, and accesses a MySQL database. It uses Mike Verdone’s Twitter API client. The source is open on GitHub.

Unlike other feeds, including this one from the French Ministry of Culture that just tweeted about the anniversary of the première of Rostand’s Cyrano de Bergerac, this one will not be curated, and it will not jump around from year to year. It will tweet every play that premièred in 1818, in order, until the end of the year, and then go on to 1819. If there is a day when no plays premièred, like January 16, @spectacles_xix will not tweet.
I have a couple of ideas about more features to add, so stay tuned!

Remembering Alan Hudson

On Saturday I found out that Alan Hudson died. Alan was my doctoral advisor at the University of New Mexico until his retirement in 2005, and a source of support after that.

I first met Alan when I visited the UNM Linguistics Department in 1997. Alan welcomed me into his office with a broad smile, and asked, “So Angus, have you made up your mind about whether you want to come here?”

“Well…” I said. I had been accepted into the PhD program, but had just come from a very discouraging encounter with another professor, and was ready to give up and go home. Before I could continue, Alan said, “Is there anything I can say to convince you?” I replied, “Well, I guess you just did.”

Alan was not a big name in linguistics; he never published a book. I regularly had to tell people that my advisor was not Dick Hudson. But Alan had a profound insight about the sociology of language that changed my career trajectory and my thinking about language and social justice.

In a seminar on Societal Bilingualism the next year, Alan led us through the case studies laid out by Joshua Fishman, his own advisor, in his book Reversing Language Shift. Fishman’s book is of interest to anyone concerned with language “death” (a problematic metaphor unless the language users themselves are being killed). As a Dubliner who had become fluent in Irish through compulsory government schooling, Alan cared deeply about his national language, but he did not have high hopes for it recovering its status as the primary language of Ireland.

Fishman argues that we can prevent large numbers of people abandoning a language by establishing “diglossia” – arrangements where language H is used for some functions and language L is used for others. Charles Ferguson had shown in 1949 that diglossic arrangements tend to be stable over time. Fishman believed that if language users can establish similar functional separations, they can stop language shift.

Drawing in part on his own research in Ireland and Switzerland, Alan observed that the cases Fishman categorized as diglossia did not fit with Ferguson’s examples. The key factor in Ferguson’s cases was that there were no children in the speech community who are native speakers of H: no child speakers of High German in Switzerland, no child speakers of Metropolitan French in Haiti, etc. In Ireland, by contrast, there are millions of English-speaking children, and in the Netherlands Frisian-speaking children go to school with Dutch-speaking peers.

The result of this contact is that most of these children eventually shift to the higher-prestige, better-paying language, and will not pass their native languages on to their children. There are only two ways to stop it: reverse the power dynamic (as happened in Finland when Russia conquered it from Sweden, I discovered in a term paper that semester) or isolate the children (as Kamal Sridhar observed in her Thanjavur Marathi community).

This was an important insight, with major implications for linguistics. None of us in the course were interested in segregating language groups from each other, and as linguists we were not positioned to shift the socioeconomic power differentials between groups. If the prescription for reversing language shift can be captured in a single sentence, that leaves no ongoing role for linguists.

Since then I have not been terribly surprised that Alan’s insight has not been enthusiastically embraced by other linguists. As Upton Sinclair said, “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!” Alan published two articles describing his definition of diglossia, but framed it in theoretical terms, downplaying the implications for efforts at language maintenance and revitalization.

Alan Hudson supervised my studies and my comprehensive exams, but retired before I was ready to begin my dissertation. He continued to provide valuable advice, and attended my dissertation defense. He will be remembered as an insightful linguist and a supportive teacher.

Data science and data technology

The big buzz over the past few years has been Data Science. Corporations are opening Data Science departments and staffing them with PhDs, and universities have started Data Science programs to sell credentials for these jobs. As a linguist I’m particularly interested in this new field, because it includes research practices that I’ve been using for years, like corpus linguistics and natural language processing.

As a scientist I’m a bit skeptical of this field, because frankly I don’t see much science. Sure, the practitioners have labs and cool gadgets. But I rarely see anyone asking hard questions, doing careful observations, creating theories, formulating hypotheses, testing the hypotheses and examining the results.

The lack of careful observation and skeptical questioning is what really bothers me, because that’s what’s at the core of science. Don’t get me wrong: there are plenty of people in Data Science doing both. But these practices should permeate a field with this name, and they don’t.

If there’s so little science, why do we call it “science”? A glance through some of the uses of the term in the Google Books archive suggests that it was first used in the late twentieth century it did include hypothesis testing. In the early 2000s people began to use it as a synonym for “big data,” and I can understand why. “Big data” was a well-known buzzword associated with Silicon Valley tech hype.

I totally get why people replaced “big data” with “data science.” I’ve spent years doing science (with observations, theories, hypothesis testing, etc.). Occasionally I’ve been paid for doing science or teaching it, but only part time. Even after getting a PhD I had to conclude that science jobs that pay a living wage are scarce and in high demand, and I was probably not going to get one.

It was kind of exciting when I got a job with Scientist in the title. It helped to impress people at parties. At first it felt like a validation of all the time I spent learning how to do science. So I completely understand why people prefer to say they’re doing “data science” instead of “big data.”

The problem is that I wasn’t working on experiments. I was just helping people optimize their tools. Those tools could possibly be used for science, but that was not why we were being paid to develop them. We have a word for a practice involving labs and gadgets, without requiring any observation or skepticism. That word is not science, it’s technology.

Technology is perfectly respectable; it’s what I do all day. For many years I’ve been well paid to maintain and expand the technology that sustains banks, lawyers, real estate agents, bakeries and universities. I’m currently building tools that help instructors at Columbia with University do things like memorizing the names of their students and sending them emails. It’s okay to do technology. People love it.

If you really want to do science and you’re not one of the lucky ones, you can do what I do: I found a technology job that doesn’t demand all my time. Once in a while they want me to stay late or work on a weekend, but the vast majority of my time outside of 9-5 is mine. I spend a lot of that time taking care of my family and myself, and relaxing with friends. But I have time to do science.

Teaching with accent tags in the face-to-face classroom

In September I wrote about how I used accent tag videos to teach phonetic transcription in my online linguistics classes. Since I could not be there in person, the videos provided a stable reference that we could all refer to from our computers around the country. Having two pronunciations to compare drew the students’ attention to the differences between them – one of the major reasons phonetic transcription was invented – and the most natural level of detail to include in the answer.

In the Fall of 2015 I was back in the classroom teaching Introduction to Phonology, and I realized that those features – a stable reference and multiple pronunciations of the same word with small differences – were also valuable when we were all in the same room. I used accent tag clips in exercises on transcription and other skills, such as identifying phonetic traits like tongue height and frication.

One of my students, Alice Nkanga, pointed out a feature of YouTube that I wasn’t aware of before: you can adjust the speed of playback down to one-quarter speed, and it auto-corrects the pitch, which can help with transcription.

After reading my previous post another linguist, Jessi Grieser, said that she liked the idea, so I shared some of my clips with her. She used them in her class, including a clip I made contrasting two African American women – one from Chicago and one from New York – saying the word “oil.”

Grieser reported, “this went excellently! It really helped hammer home the idea that there isn’t a ‘right’ way to transcribe a word based on its orthography–that what we’re really looking for is a transcription which captures what the speaker did. They really had fun with ‘oil’ since many of them are /AHL/ or /UHL/ speakers themselves. It was a really great discussion starter for our second day of transcription. This is a genius idea.”

It makes me really happy to know that other people find this technique useful in their classrooms, because I was so excited when I came up with it. I would make the clips available to the public, even at no charge, but I’m not sure about the rights because I did not make the original accent tag videos. I hope you’ll all make your own, though – it’s not that hard!

How Google’s Pixel Buds will change the world!

Scene: a quietly bustling bistro in Paris’s 14th Arrondissement.

SERVER: Oui, vous désirez?
PIXELBUDS: Yes, you desire?
TOURIST: Um, yeah, I’ll have the steak frites.
PIXELBUDS: UM, OUAIS, JE VAIS AVOIR LES FRITES DE STEAK
SERVER: Que les frites?
PIXELBUDS: Than fries?
TOURIST: No, at the same time.
PIXELBUDS: NON, EN MEME TEMPS
SERVER: Alors, vous voulez le steak aussi?
PIXELBUDS: DESOLE, JE N’AI PAS COMPRIS.
SERVER: VOUS VOULEZ LE STEAK AUSSI?
PIXELBUDS: You want the steak too?
TOURIST: Yeah, I just ordered the steak.
PIXELBUDS: OUAIS, JE VIENS DE COMMANDER LE STEAK
SERVER: Okay, du steak, et des frites, en même temps.
PIXELBUDS: Okay, steak, and fries at the same time.
TOURIST: You got it.
PIXELBUDS: TU L’AS EU.

(All translations by Google Translate. Photo: Alain Bachelier / Flickr.)

Teaching phonetic transcription online

When I was teaching introductory linguistics, I had a problem with the phonetic transcription exercises in the textbooks I was using: they asked students to transcribe “the pronunciation” of individual words – implying that there is a single correct pronunciation with a single correct transcription. I worked around it in face-to-face classes by hearing the students’ accents and asking them to pronounce any words if their transcriptions differed from what I expected. I was also able to illustrate the pronunciation of various IPA symbols by pronouncing the sounds in class.

In the summer of 2013 I taught linguistics online for the first time, and it was much more difficult to give students a sense of the sounds I expected them to produce, and to get a sense of the sounds they associated with particular symbols. On top of that I discovered I had another challenge: I couldn’t trust these students to do the work if the answers were available anywhere online. Some of them would google the questions, find the answers, copy and paste. Homework done!

Summer courses move so fast that I wasn’t able to change the exercises until it was too late. In the fall of 2014 I taught the course again, and created several new exercises. I realized that there was now a huge wealth of speech data available online, in the form of streaming and downloadable audio, created for entertainment, education and archives. I chose a podcast episode that seemed relatively interesting and asked my students to transcribe specific words and phrases.

It immediately became clear to me that instead of listening to the sounds and using Richard Ishida’s IPA Picker or another tool to transcribe what they heard, the students were listening to the words, looking them up one by one in the dictionary, and copying and pasting word transcriptions. In some cases Roman Mars’s pronunciations were different from the dictionary transcriptions, but they were close enough that my low grades felt like quibbling to them.

I tried a different strategy: I noticed that another reporter on the podcast, Joel Werner, spoke with an Australian accent, so I asked the students to transcribe his speech. They began to understand: “Professor, do we still have to transcribe the entire word even though a letter from the word may not be pronounced due to an accent?” asked one student. Others noticed that the long vowels were shifted relative to American pronunciations.

For tests and quizzes, I found that I could make excerpts of sound and video files using editing software like Audacity and Microsoft Movie Maker. That allowed me to isolate particular words or groups of words so that the students didn’t waste time locating content in a three-minute video, or a twenty-minute podcast.

This still left a problem: how much detail were the students expected to include, and how could I specify that for them in the instructions? Back in 2013, in a unit on language variation, I had used accent tag videos to replace the hierarchy implied in most discussions of accents with a more explicit, less judgmental contrast between “sounds like me” and “sounds different.” I realized that the accent tags were also good for transcription practice, because they contained multiple pronunciations of words that differed in socially meaningful ways – in fact, the very purpose that phonetic transcription was invented for. Phonetic transcription is a tool for talking about differences in pronunciation.

The following semester, Spring 2015, I created a “Comparing Accents” assignment, where I gave the students links to excerpts of two accent tag videos, containing the word list segment of the accent tag task. I then asked them to find pairs of words that the two speakers pronounced differently and transcribe them in ways that highlighted the differences. To give them practice reading IPA notation, I gave them transcriptions and asked them to upload recordings of themselves pronouncing the transcriptions.

I was pleased to find that I actually could teach phonetic transcription online, and even write tests that assessed the students’ abilities to transcribe, thanks to accent tag videos and the principle that transcription is about communicating differences.

I found these techniques to be useful for teaching other aspects of linguistics. I’ll talk about that in future posts.

Othering, dehumanization and abuse

In a comment, Candy asked:

Can we also please talk about how “cis” is used as a term of abuse against feminists? As in, “shut up privileged cis bitches”? It’s the bit where trans activism begins to overlap with Men Rights Activism.

The use of “shut up” and “bitches” in Candy’s (unattested) example is definitely abuse, but in this dismissive context, “cis” is not functioning as abuse but othering. It positions the referent as an outsider who has no standing in the group, and possibly a threat. Othering can hurt, and it can often be done with malicious intent, but it is not the same as abuse, and responding to it as though it were abuse is generally not effective.

We can distinguish othering from abuse by removing the abusive terms and imagining a different context. Imagine that you have a group of army officers discussing how to attack a fort. Someone with no expertise is walking by and says, “Hey guys, you should just hit the tower with a bazooka!” The officers would be justified in saying, “Are you an army officer? What do you know?” or just “Get this civilian out of here!” “Civilian” isn’t a term of abuse here. It’s othering, but without malicious intent.

Othering is close to dehumanizing, which is a process where categories of people are reframed as enemies unworthy of common decency. This is a well-documented response to trauma, but it can also be done without trauma, when one group is framed as an existential threat to another. This framing can be done quite cynically, as Bosnian Serb leader Radovan Karadži? did with Muslims and later with NATO troops. As I’ve discussed on my trans blog, much of the hatred against gay men, lesbians, trans people, and women who don’t obey men is often in response to a framing that portrays them as unwilling to cooperate in increasing the birth rate of the group. I’m sure any of you can think of several more examples.

Othering is connected with abuse because dehumanizing is an invitation to abuse. If someone is really The Enemy, and unworthy of common decency, then any attacks on them are allowed. Restrictions demarcating acceptable conduct like forum rules and rules against torture are seen as an inconvenience at best, and at worst a dangerous vulnerability at times when “we” can least afford it.

Othering and dehumanizing are forms of category profiling: substituting a category of people for the feature that is required. The army officers have training and experience attacking forts, and in theory they’ve been promoted because they’ve demonstrated some skill. There’s no evidence that this civilian has training, experience or skill. Similarly, German soldiers on the Western Front in World War I were under genuine mortal threat from French and British soldiers who had been ordered to kill them, but there was no evidence that, say, Mexican soldiers were a threat to them at that time.

Of course, category profiling can go wrong in decision-making. There are many examples of experts failing spectacularly, and of outsiders succeeding where the experts don’t. There’s a whole genre of stories about these, like the film Working Girl, where our heroine’s financial expertise is dismissed because she’s categorized as a secretary. When she changes her clothes and hairstyle, people classify her as a financial executive, take her recommendations seriously, and make money.

Profiling has a notorious record in connection with dehumanization. The way that the US treated Russians and Communists when I was a kid, and Arabs and Muslims now, is far out of proportion to any threat. Profiling almost invariably misses some threats, and innocent people are always caught up in the mess.

Now let’s bring in the principle of “nothing for us without us,” which is a foundation of both representative government and identity politics. The idea is that people are experts in the issues that affect them, and the best people to discuss issues affecting a category of people are members of that category. It is very common in activist movements to insist on centering members of a particular category and excluding non-members – either from participating at all, or from taking part in the discussion. This is why othering statements like Candy’s constructed example are common in activist contexts.

When the “nothing for us without us” interacts with dehumanization, it leads to fear that They are pretending to be Us, derailing our discussions within the group and misrepresenting our goals to the wider public. This is not some paranoid fantasy: there is a long history of people infiltrating enemy movements with the goal of spying and disrupting. Recent examples from the FBI include the COINTELPRO infiltration of anti-racist groups in the 1960s and 1970s, and ongoing infiltration of Islamic groups. Our interactions increasingly take place online, where it can be even more difficult to judge people’s affiliations and motives.

Of course, the idea of representation immediately bumps square up against the profiling problem. The fact that someone is affected by an issue is no guarantee that they understand that issue, or have any ability to communicate it or resolve it. It is also no guarantee that they are sane or ethical. It can also be difficult to pin down the specific category affected and center just the members of that category. For example, low-income female-presenting nonwhite transgender people working in sex industries are targets of violence and discrimination, but even if trans people are given voice to talk about it they are often white, relatively affluent, not sex workers and even male-presenting. And because categories can be messy, it is sometimes possible for people to be simultaneously (or intermittently) part of one category that needs representation and another category that some in the first see as a threat.

Several people, including myself and Third Way Trans, have observed that it is common for trans people to have a history of trauma. This leads many trans people to take an us-and-them view of the world, where all trans people are good and innocent, and all “cis” people are evil abusers – despite the fact that trans people are just as likely to be abusive as anyone else. Their trauma leads to othering and dehumanization, and that invites abuse.

So that’s the answer to Candy’s question: “cis” in this context is not used as a term of abuse. It is used for othering, and in combination with the dehumanization that many trans people practice, that justifies the abuse. I don’t see any direct connection to “Men’s Rights Activism.” Thanks for your question, Candy; I hope this helps!

Teaching language variation with accent tag videos

Last January I wrote that the purpose of phonetic transcription is to talk about differences in pronunciation. Last December I introduced accent tags, a fascinating genre of self-produced YouTube videos of crowdsourced dialectology and a great source of data about language variation. I put these together when I was teaching a unit on language variation for the second-semester Survey of Linguistics course at Saint John’s University. When I learned about language variation as an undergraduate, it was exciting to see accents as a legitimate object of study, and it was gratifying to see my family’s accents taken seriously.

At the same time, the focus on a single dialect at a time contrasts with the absence of variation from the discussion of English pronunciation, grammar and lexis in other units, and in the rest of the way English is typically taught. This implies that there is a single standard that does not vary, despite evidence from perceptual dialectology (such as Dennis Preston’s work) that language norms are fragmentary, incomplete and contested. I saw the cumulative effects of this devaluation in class discussions, when students openly denigrated features of the New York accents spoken by their neighbors, their families and often the students themselves.

At first I just wanted to illustrate variation in African American accents, but then I realized that the accent tags allowed me to set up the exercises as an explicit contrast between two varieties. I asked my students to search YouTube to find an accent tag that “sounds like you,” and one that sounded different, and to find differences between the two in pronunciation, vocabulary and grammar. I followed up on this exercise with other ones asking students to compare two accent tags from the same place but with different ethnic, economic or gender backgrounds.

My students did a great job at finding videos that sounded like them. Most of them were from the New York area, and were able to find accent tags made by people from New York City, Long Island or northern New Jersey. Some students were African American or Latin American, and were able to find videos that demonstrated the accents, vocabulary and grammar common among those groups. The rest of the New York students did not have any features that we noticed as ethnic markers, and whether the students were Indian, Irish or Circassian, they were satisfied that the Italian or Jewish speakers in the videos sounded pretty much like them.

Some of the students were from other parts of the country, and found accent tags from California or Boston that illustrated features that the students shared. A student from Zimbabwe who is bilingual in English and Shona was not able to find any accent tags from her country, but she found a video made by a white South African and was able to identify features of English pronunciation, vocabulary and grammar that they shared.

As I wrote last year, the phonetic transcription exercises I had done in introductory linguistics and phonology courses were difficult because they implicitly referred to unspecified standard pronunciations, leading to confusion among the students about the “right” transcriptions. In the variation unit, when I framed the exercise as an explicit comparison between something that “sounds like you” and something different, I removed the implied value judgment and replaced it with a neutral investigation of difference.

I found that this exercise was easier for the students than the standard transcription problems, because it gave them two recordings to compare instead of asking them to compare one recording against their imagination of the “correct” or “neutral” pronunciation. I realized that this could be used for the regular phonetics units as well. I’ll talk about my experiences with that in a future post.

And we mean really every tree!

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day’s work, but it was still a hasty patch job on top of Neil Freeman’s original Everylotbot code. I hadn’t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I’ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere’s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I’ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the dérive!