The supposed successes of AI

The author and cat, with pirate hats and eye patches superimposed by a Snapchat filter
I’m a regular watcher of Last Week Tonight with John Oliver, so in February I was looking forward to his take on “AI” and the large language models and image generators that many people have been getting excited about lately. I was not disappointed: Oliver heaped a lot of much-deserved criticism on these technologies, particularly for the ways they replicate prejudice and are overhyped by their developers.

What struck me was the way that Oliver contrasted large language models with more established applications of machine learning, portraying those as uncontroversial and even unproblematic. He’s not unusual in this: I know a lot of people who accept these technologies as a fact of life, and many who use them and like them.

But I was struck by how many of these technologies I myself find problematic and avoid, or even refuse to use. And I’m not some know-nothing: I’ve worked on projects in information retrieval and information extraction. I developed one of the first sign language synthesis systems, and one of the first prototype English-to-American Sign Language machine translation systems.

When I buy a new smartphone or desktop computer, one of the first things I do is to turn off all the spellcheck, autocorrect and autocomplete functions. I don’t enable the face or handprint locks. When I open an entertainment app like YouTube, Spotify or Netflix I immediately navigate away from the recommended content, going to my own playlist or the channels I follow. I do the same for shopping sites like Amazon or Zappos, and for social media like Twitter. I avoid sites like TikTok where the barrage of recommended content begins before you can stop it.

It’s not that I don’t appreciate automated pattern recognition. Quite the contrary. I’ve been using it for years – one of my first jobs in college was cleaning up a copy of the Massachusetts Criminal Code that had been scanned in and run through optical character recognition. For my dissertation I compiled a corpus from scanned documents, and over the past ten years I’ve developed another corpus using similar methods.

I feel similarly about synonym expansion – modifying a search engine to return results including “bicycle” when someone searches for “bike,” for example. I worked for a year for a company whose main product was synonym expansion, and I was really glad a few years later when Google rolled it out to the public.

There are a couple of other things that I find useful, like suggested search terms, image matching for attribution and Shazam for saving songs I hear in cafés. Snapchat filters can be fun. Machine translation is often cheaper than a dictionary lookup.

Using these technologies as fun toys or creative inspiration is fine. Using them as unreliable tools that need to be thoroughly checked and corrected is perfectly appropriate. The problem begins when people don’t check the output of their tools, releasing them as completed work. This is where we get the problems documented by sites like Damn You Auto Correct: often humorous, but occasionally harmful.

My appreciation for automated pattern recognition is one of the reasons I’m so disturbed when I see people taking it for granted. I think it’s the years of immersion in all the things that automated recognizers got wrong, garbled or even left out completely that makes me concerned when people ignore the possibility of any such errors. I feel like an experienced carpenter watching someone nailing together a skyscraper out of random pieces of wood, with no building inspectors in sight.

When managers make the use of pattern recognition or generation tools mandatory, it goes from being potentially harmful to actively destructive. Search boxes that won’t let users turn off synonym expansion, returning wildly inaccurate results to avoid saying “nothing found,” make a mockery of the feature. I am writing this post on Google Docs, which is fine on a desktop computer, but the Android app does not let me turn off spell check. To correct a word without choosing one of the suggested corrections requires an extra tap every time.

Now let’s take the example of speech recognition. I have never found an application of speech recognition technology that personally satisfied me. I suppose if something happened to my hands that made it impossible for me to type I would appreciate it, but even then it would require constant attention to correct its output.

A few years ago I was trying to report a defective street condition to the New York City 311 hotline. The system would not let me talk to a live person until I’d exhausted its speech recognition system, but I was in a noisy subway car. Not only could the recognizer not understand anything I said, but the system was forcing me to disturb my fellow commuters by shouting selections into my phone.

I’ve attended conferences on Zoom with “live captioning” enabled, and at every talk someone commented on major inaccuracies in the captions. For people who can hear the speech it can be kind of amusing, but if I had to depend on those captions to understand the talks I’d be missing so much of the content.

I know some deaf people who regularly insist on automated captions as an equity issue. They are aware that the captions are inaccurate, and see them as better than nothing. I support that position, but in cases where the availability of accurate information is itself an equity issue, like political debates for example, I do not feel that fully automated captions are adequate. Human-written captions or human sign language interpreters are the only acceptable forms.

Humans are, of course, far from perfect, but for anything other than play, where accuracy is required, we cannot depend on fully automated pattern recognition. There should always be a human checking the final output, and there should always be the option to do without it. It should never be mandatory. The pattern recognition apps that are already all around us show us that clearly.

Screenshot of the "Compose new Tweet" modal on Twitter, with the "+" button and a tooltip reading "Add another Tweet". The tweet texts reads "blah blah blah bl"

Dialogue and monologue in social media

I wrote most of this post in June 2022, before a lot of us decided to try out Mastodon. I didn’t publish it because I despaired of it making a difference. It felt like so many people were set in particular practices, including not reading blog posts! My experience on Mastodon has been so much better than the past several years on Twitter. I think this is connected with how Twitter and Mastodon handle threads.

A few years ago I wrote a critique of Twitter threads, tweetstorms, essays, and similar forms. I realize now that I didn’t actually talk much about what’s wrong with them. I focused on how difficult they are to read, but I didn’t realize how the native Twitter website and app actually makes them easier to read. So let me tell you some of the deeper problems with threads.

In 2001 I visited some of the computational linguistics labs at Carnegie Mellon University. Unfortunately I don’t remember the researchers’ names, but they described a set of experiments that has informed my thinking about language ever since. They were looking at the size of the input box in a communication app.

These researchers did experiments where they asked people to communicate with each other using a custom application. They presented different users with input boxes of different sizes: some got only a single line, others got three or four, and maybe some got six or eight lines.

What they found was that when someone was presented with a large blank space, as in an email application or the Google Docs application I’m writing this in, they tended to take their time and write long blocks of text, and edit them until they were satisfied. Only then did they hit send. Then the other user would do the same.

When the Carnegie Mellon researchers presented users with only one line, as in a text message app, their behavior was much different. They wrote short messages and sent them off with minimal editing. The short turnaround time resulted in a dialogue that was much closer to the rhythm of spoken conversation.

This echoed my own findings from a few years before. I was searching for features of French that I heard all over the streets of Paris, but had not been taught to me in school, in particular what linguists call right dislocation (“Ils sont fous, ces Romains”) and left dislocation (“L’état, c’est moi”).

In 1998 the easiest place to look was USENET newsgroups, and I found that even casual newsgroups like fr.rec.animaux were heavy on the formal, carefully crafted types of messages I remembered from high school French class. I had already read some prior research on this kind of language variation, so I decided to try something with faster dialogue.

In Internet Relay Chat (IRC) I hit the jackpot. On the IRC channel, left and right dislocations made up between 21% and 38% of all finite clauses. I noticed other features of conversational French like ne-dropping were common as well. I could even see IRC newbies adapting in real time: they would start off trying to write formal sentences the way they were taught in lycée, and soon give up and start writing the way they talked.

At this point I have to say: I love dialogue. Don’t get me wrong: I can get into a nice well-crafted monologue or monograph. And anyone who knows me knows I enjoy telling a good story or tearing off on a rant about something. But dialogue keeps me honest, and it keeps other people honest too.

Dialogue is not inherently or automatically good. On Twitter as in many other places, it is used to harass and intimidate. But when properly structured and regulated it can be a democratizing force. It’s important to remember how long our media has been dominated by monologues: newspapers, films, television. Even when these formats contain dialogues, they are often fictional dialogues written by a single author or team of authors to send a single message.

One of my favorite things about the internet is that it has always favored dialogue. Before large numbers of people were on the internet there was a large gap between privileged media sources and independent ones. Those of us who disagreed with the monologues being thrust upon us by television and newspapers were often reduced to impotently talking back at those powerful media sources, in an empty room.

USENET, email newsletters, personal websites and blogs were democratizing forces because they allowed anyone who could afford the hosting fees (sometimes with the help of advertisers) to command these monologic platforms. They were the equivalent of Speakers’ Corner in London. They were like pamphlets or letters to the editor or cable access television, but they eliminated most of the barriers to entry. But they were focused on monologues.

In the 1990s and early 2000s we had formats that encouraged dialogue, like mailing lists and bulletin boards, but they had large input boxes. As I saw on fr.rec.animaux in 1998, that encouraged long, edited messages.
We did have forums with smaller input boxes, like IRC or the group chats on AOL Instant Messenger. As I found, those encouraged people to write short messages in dialog with each other. When I first heard about Twitter with its 140-character limit I immediately recognized it as a dialogic forum.

But what sets Twitter apart from IRC or AOL Instant Messenger? Twitter is a broadcast platform. The fact that every tweet is public by default, searchable and assigned a unique URL, makes it a “microblog” site like some popular sites in China.

If someone said something on IRC or AIM in 1999 it was very hard to share it outside that channel. I was able to compile my corpus by creating a “bot” that logged on to the channel every night and logged a copy of all the messages. What Twitter and the sites it copied like Weibo brought was the combination of permanent broadcast, low barrier to entry, and dialogue.

This is why I’m bothered by Twitter threads, by screenshots of text, by the unending demands for an edit button. These are all attempts to overpower the dialogue on Twitter, to remove one of the key elements that make it special.

Without the character limits, Twitter is just a blogging platform. Of course, there’s nothing wrong with blogs! I’ve done a lot of blogging, I’ve done a lot of commenting on blogs and I’ve tweeted a lot of links to blogs. But I want to choose when to follow those links and go read those blog posts or news articles or press releases.

I want a feed full of dialogue or short statements. Threads and screenshots interrupt the dialogue. They aggressively claim the floor, crowding out other tweets. Screenshots interrupt the other tweets with large blocks of text, demanding to be read in their entirety. Threads take up even more of the timeline. The Twitter web app will show as many as three tweets of a thread, interrupting the flow of dialogue.

The experience of threads is much worse on Twitter clients that don’t manipulate the timeline, like TweetDeck (which was bought by Twitter in 2011) and HootSuite. If it’s a long thread, your timeline is screwed, and you have to scroll endlessly to get past it.

One of the things I love the most about Mastodon is the standard practice of making the first toot in a thread public, but publishing all the other toots as unlisted. That broadcasts the toot announcing the thread, and then gives readers the agency to decide whether they want to read the follow-up toots. It’s more or less the equivalent of including a link to a web page or blog post in a toot.

There’s a lot more to say about dialogue and social media, but for now I’m hugely encouraged by the feeling of being on Mastodon, and I’m hoping it leads us in a better direction for dialogue, away from threads and screenshots.

(imit.: dez may be slightly bent spread 5) v type, N typewriter, typist with or without suffix -|| [/BB/v,.

Fonts for Stokoe notation

You may be familiar with the International Phonetic Alphabet, the global standard for representing speech sounds, ideally independent of the way those speech sounds may be represented in a writing system. Did you know that sign languages have similar standards for representing hand and body gestures?

Unfortunately, we haven’t settled on a single notation system for sign languages the way linguists have mostly chosen the IPA for speech. There are compelling arguments that none of the existing systems are complete enough for all sign languages, and different systems have different strengths.

Another difference is that signers, by and large, do not read and write their languages. Several writing systems have been developed and promoted, but to my knowledge, there is no community that sends written messages to each other in any sign language, or that writes works of fiction or nonfiction for other signers to read.

One of the oldest and best-known notation system is the one developed by Gallaudet University professor William Stokoe (u5"tx) for his pioneering analysis of American Sign Language in the 1960s, which succeeded in convincing many people that ASL is, in ways that matter, a language like English or Japanese or Navajo. Among other things, with his co-authors Dorothy Casterline and Carl Cronenberg Stokoe used this system for the entries in their 1965 Dictionary of American Sign Language (available from SignMedia).  In the dictonary entry above, the sign CbCbr~ is given the English translation of “type.”

Stokoe notation is incomplete in a number of ways. Chiefly, it is optimized for the lexical signs of American Sign Language. It does not account for the wide range of handshapes used in American fingerspelling, or the wide range of locations, orientations and movements used in ASL depicting gestures. It only describes what a signer’s hands are doing, with none of the face and body gestures that have come to be recognized as essential to the grammar of sign languages. Some researchers have produced modifications for other languages, but those are not always well-documented.

Stokoe created a number of symbols, some of which bore a general resemblance to Roman letters, and some that didn’t. This made it impossible to type with existing technology; I believe all the transcriptions in the Dictionary of ASL were written by hand. In 1993 another linguist, Mark Mandel, developed a system for encoding Stokoe notation into the American Standard Code for Information Interchange (ASCII) character set, which by then could be used on almost all American computers.

In September 1995 I was in the middle of a year-long course in ASL at the ASL Institute in Manhattan. I used some Stokoe notation for my notes, but I wanted to be able to type it on the computer, not just using Mandel’s ASCII encoding. I also happened to be working as a trainer at Userfriendly, a small chain of computer labs with a variety of software available, including Altsys Fontographer, and as an employee I could use the workstations whenever customers weren’t paying for them.

One day I sat down in a Userfriendly lab and started modifying an existing public domain TrueType font (Tempo by David Rakowski) to make the Stokoe symbols. The symbols were not in Unicode, and still are not, despite a proposal to that effect on file. I arranged it so that the symbols used the ASCII-Stokoe mappings: if you typed something in ASCII-Stokoe and applied my font, the appropriate Stokoe symbols would appear. StokoeTempo was born. It wasn’t elegant, but it worked.

I made the font available for download from my website, where it’s been for the past 26-plus years. I wound up not using it for much, other than to create materials for the linguistics courses I taught at Saint John’s University, but others have downloaded it and put it to use. It is linked from the Wikipedia article on Stokoe notation.

A few years later I developed SignSynth, a web-based prototype sign language synthesis application. At the time web browsers did not offer much flexibility in terms of fonts, so I could not use Stokoe symbols and had to rely on ASCII-Stokoe, and later Don Newkirk’s (1986) Literal Orthography, along with custom extensions for fingerspelling and nonmanual gestures.

Recently, as part of a project to bring SignSynth (another project of mine) into the 21st Century I decided to explore using fonts on the Web. I discovered a free service, FontSquirrel, that creates Web Open Font Format (WOFF and WOFF2) wrappers for TrueType fonts. I created WOFF and WOFF2 files for StokoeTempo and posted them on my site.

I also discovered a different standard, Typeface.js, which actually uses a JSON format. This is of particular relevance to SignSynth, because it can be used with the 3D web library Three.js. There’s another free service, Facetype.js, that converts TrueType fonts to Typeface.js fonts.

(imit.: dez may be slightly bent spread 5) v type, N typewriter, typist with or without suffix -|| [/BB/v,.

To demonstrate the use of StokoeTempo web fonts, above is a scan of the definition of CbCbr~ from page 51 of the Dictionary of American Sign Language. Below I have reproduced it using HTML and StokoeTempo:

CbCbr~ (imit.: dez may be slightly bent spread 5) v type, r typewriter, typist with or without suffix _____ ?[BBv.

StokoeTempo is free to download and use by individuals and educational institutions.

Screenshot of LanguageLab displaying the exercise "J'étais certain que j'aillais écrire à quinze ans"

Imagining an alternate language service

It’s well known that some languages have multiple national standards, to the point where you can take courses in either Brazilian or European Portuguese, for example. Most language instruction services seem to choose one variety per language: when I studied Portuguese at the University of Paris X-Nanterre it was the European variety, but the online service Duolingo only offers the Brazilian one.

I looked into some of Duolingo’s offerings for this post, because they’re the most talked about language instruction service these days. I was surprised to discover that they use no recordings of human speakers; all their speech samples are synthesized using an Amazon speech synthesis service named Polly. Interestingly, even though Duolingo only offers one variety of each language, Amazon Polly offers multiple varieties of English, Spanish, Portuguese and French.

As an aside, when I first tried Duolingo years ago I had the thought, “Wait, is this synthesized?” but it just seemed too outrageous to think that someone would make a business out of teaching humans to talk like statistical models of corpus speech. It turns out it wasn’t too outrageous, and I’m still thinking through the implications of that.

Synthesized or not, it makes sense for a company with finite resources to focus on one variety. But if that one company controls a commanding market share, or if there’s a significant amount of collusion or groupthink among language instruction services, they can wind up shutting out whole swathes of the world, even while claiming to be inclusive.

This is one of the reasons I created an open LanguageLab platform: to make it easier for people to build their own exercises and lessons, focusing on any variety they choose. You can set up your own LanguageLab server with exercises exclusively based on recordings of the English spoken on Smith Island, Maryland (population 149), if you like.

So what about excluded varieties with a few more speakers? I made a table of all the Duolingo language offerings according to their number of English learners, along with the Amazon Polly dialect that is used on Duolingo. If the variety is only vaguely specified, I made a guess.

For each of these languages I picked another variety, one with a large number of speakers. I tried to find the variety with the largest number of speakers, but these counts are always very imprecise. The result is an imagined alternate language service, one that does not automatically privilege the speakers of the most influential variety. Here are the top ten:

Language Duolingo dialect Alternate dialect
English Midwestern US India
Spanish Mexico Argentina
French Paris Quebec
Japanese Tokyo Kagoshima
German Berlin Bavarian
Korean Seoul Pyongyang
Italian Florence Rome
Mandarin Chinese Beijing Taipei
Hindi Delhi Chhatisgarhi
Russian Moscow Almaty

To show what could be done with a little volunteer work, I created a sample lesson for a language that I know, the third-most popular language on Duolingo, French. After France, the country with the next largest number of French speakers is Canada. Canadian French is distinct in pronunciation, vocabulary and to some degree grammar.

Canadian French is stigmatized outside Canada, to the point where I’m not aware of any program in the US that teaches it, but it is omnipresent in all forms of media in Canada, and there is quite a bit of local pride. These days at least, it would be as odd for a Canadian to speak French like a Parisian as for an American to speak English like a Londoner. There are upper and lower class accents, but they all share certain features, notably the ranges of the nasal vowels.

I chose a bestselling author and television anchor, Michel Jean, who has one grandmother from the indigenous Innu people and three presumably descended from white French settlers. I took a small excerpt from an interview with Jean about his latest novel where he responds spontaneously to the questions of a librarian, Josianne Binette.

The sample lesson in Canadian French based on Michel Jean’s speech is available on the LanguageLab demo site. You are welcome to try it! Just log in with the username demo and the password LanguageLab.

How to set up your own LanguageLab

I’ve got great news! I have now released LanguageLab, my free, open-source software for learning languages and music, to the public on GitHub.

I wish I could tell you I’ve got a public site up that you can all use for free. Unfortunately, the features that would make LanguageLab easy for multiple users to share one server are later in the roadmap. There are a few other issues that also stand in the way of a massive public service. But you can set up your own server!

I’ve documented the steps in the README file, but here’s an overview. You don’t need to know how to program, but you will need to know how to set up web services, retrieve files from GitHub, edit configuration files, and run a few commands at a Linux/MacOS/DOS prompt.

LanguageLab uses Django, one of the most popular web frameworks for Python, and React, one of the most popular frameworks for Javascript. All you need is a server that can run Django and host some Javascript files! I’ve been doing my development and testing on Pythonanywhere, but I’ve also set it up on Amazon Web Services, and you should be able to run it on Google Cloud, Microsoft Azure, a University web server or even your personal computer.

There are guides online for setting up Django in all those environments. Once you’ve got a basic Django setup installed, you’ll need to clone the LanguageLab repo from GitHub to a place where it can be read by your web server. Then you’ll configure it to access the database, and configure the web server to load it. You’ll use Pip and NPM to download the Python and Javascript libraries you need, like the Django REST Framework, React and the Open Iconic font. Finally, you’ll copy all the files into the right places for the web server to read them and restart the server.

Once you’ve got everything in place, you should be able to log in! You can make multiple accounts, but keep in mind that at this point we do not have account-level access, so all accounts have full access to all the data. You can then start building your library of languages, media, exercises and lessons. LanguageLab comes with the most widely used languages, but it’s easy to set up new ones if yours are not on the list.

Media can be a bit tricky, because LanguageLab is not a media server. You can upload your media to another place on your server, or any other server – as long as it’s got an HTTPS URL you should be able to use it. If the media you’re using is copyrighted you may want to set up some basic password protection to avoid any accusations of piracy. I use a simple .htaccess password. I have to log in every time, but it works.

With the URL of your media file, you can create a media entry. Just paste that URL into the form and add metadata to keep track of the file and what it can be used for. You can then set up one or more exercises based on particular segments of that media file. It may take a little trial and error to get the exercises right.

You can then create one or more lessons to organize your exercises. You can choose to have a lesson for all the exercises in a particular media file, or you can combine exercises from multiple media files in a lesson. It’s up to you how to organize the lessons. You can edit the queues for each lesson to reorder or remove exercises.

Once you’ve got exercises, you can start practicing! The principle is simple: listen to the model, repeat into the microphone, then listen to the model again, followed by your recording. Set yourself a goal of a.certain number of repetitions per session.

After you’ve created your language and media entries, exercises and lessons, you can export the data. Importing the data is not yet implemented, but the data is exported to a human-readable JSON format that you can then recreate if necessary.

In the near future I will go on Twitch to demonstrate how to set up exercises and lessons, and how to practice with them. I will also try to find time to demonstrate the installation process. I will record each demonstration and put it on YouTube for your future reference. You can follow me on Twitter to find out when I’m doing the demos and posting the videos.

If you try setting up a LanguageLab, please let me know how it goes! You can report bugs by creating incidents on GitHub, or you can send me an email. I’m happy to hear about problems, but I’d also like to hear success stories! And if you know some Python or Javascript, please consider writing a little code to help me add one of the features in the roadmap!

A free, open source language lab app

Viewers of the Crown may have noticed a brief scene where Prince Charles practices Welsh by sitting in a glass cubicle wearing a headset.  Some viewers may recognize that as a language lab. Some may have even used language labs themselves.

The core of the language lab technique is language drills, which are based on the bedrock of all skills training: mimicry, feedback and repetition.  An instructor can identify areas for the learner to focus on.

Because it’s hard for us to hear our own speech, the instructor also can observe things in the learner’s voice that the learner may not perceive.  Recording technology enabled the learner to take on some of the role of observer more directly.

When I used a language lab to learn Portuguese in college, it ran on cassette tapes.  The lab station played the model (I can still remember “Elena, estudante francesa, vai passar as ferias em Portugal…“), then it recorded my attempted mimicry onto a blank cassette.  Once I was done recording it played back the model, followed by my own recording.

Hearing my voice repeated back to me after the model helped me judge for myself how well I had mimicked the model.  It wasn’t enough by itself, so the lab instructor had a master station where he could listen in on any of us and provide additional feedback.  We also had classroom lessons with an instructor, and weekly lectures on culture and grammar.

There are several companies that have brought language lab technology into the digital age, on CD-ROM and then over the internet.  Many online language learning providers rely on proprietary software and closed platforms to generate revenue, which is fine for them but doesn’t allow teachers the flexibility to add new language varieties.

People have petitioned these language learning companies to offer new languages, but developing offerings for a new language is expensive.  If a language has a small user base it may never generate enough revenue to offset the cost of developing the lessons.  It would effectively be a donation to people who want to promote these languages, and these companies are for profit entities.

Duolingo has offered a work-around to this closed system: they will accept materials developed by volunteers according to their specifications and freely donated.  Anyone who remembers the Internet Movie Database before it was sold to Amazon can identify the problems with this arrangement: what happens to those submissions if Duolingo goes bankrupt, or simply decides not to support them anymore?

Closed systems raise another issue: who decides what it means to learn French, or Hindi?  This has been discussed in the context of Duolingo, which chose to teach the artificial Modern Standard Arabic rather than a colloquial dialect or the classical language of the Qur’an.  Similarly, activists for the Hawai’ian language wanted the company to focus on lessons to encourage Hawai’ians to speak the language, rather than tourists who might visit for a few weeks at most.

Years ago I realized that we could make a free, open-source language lab application.  It wouldn’t have to replicate all the features of the commercial apps, especially not initially.  An app would be valuable if it offers the basic language lab functionality: play a model, record the learner’s mimicry, play the model again and finally play the recording of the learner.

An open system would be able to use any recording that the device can play.  This would allow learners to choose the models they practice with, or allow an instructor to choose models for their students.  The lessons don’t have to be professionally produced.  They can be created for a single student, or even for a single occasion.  I am not a lawyer, but I believe they can even use copyrighted materials.

I have created a language lab app using the Django Rest Framework and ReactJS that provides basic language lab functionality.  It runs in a web browser using responsive layout, and I have successfully tested it in Chrome and Firefox, on Windows and Android.

This openness and flexibility drastically reduces the cost of producing a lesson.  The initial code can be installed in an hour, on any server that can host Django.  The monthly cost of hosting code and media can be under $25.  Once this is set up, a media item and several exercises based on it can be added in five minutes.

This reduced cost means that a language does not have to bring in enough learners to recoup a heavy investment.  That in turn means that teachers can create lessons for every dialect of Arabic, or in fact for every dialect of English.  They can create Hawai’ian lessons for both tourists and heritage speakers.  They could even create lessons for actors to learn dialects, or master impressions of celebrities.

As a transgender person I’ve long been interested in developing a feminine voice to match my feminine visual image.  Gender differences in language include voice quality, pitch contour, rhythm and word choice – areas that can only be changed through experience.  I have used the alpha and beta versions of my app to create exercises for practicing these differences.

Another area where it helps a learner to hear a recording of their own voice is singing.  This could be used by professional singers or amateurs.  It could even be used for instrument practice.  I use it to improve my karaoke!

This week I was proud to present my work at the QueensJS meetup.  My slides from that talk contain more technical details about how to record audio through the web browser.  I’ll be pushing my source to GitHub soon. You can read more details about how to set up and use LanguageLab.  In the meantime, if you’d like to contribute, or to help with beta testing, please get in touch!

What is “text” for a sign language?

I started writing this post back in August, and I hurried it a little because of a Limping Chicken article guest written by researchers at the Deafness, Cognition and Language Research Centre at University College London. I’ve known the DCAL folks for years, and they graciously acknowledged some of my previous writings on this issue. I know they don’t think the textual form of British Sign Language is written English, so I was surprised that they used the term “sign-to-text” in the title of their article and in a tweet announcing the article. After I brought it up, Dr. Kearsy Cormier acknowledged that there was potential for confusion in that term.

So, what does “sign-to-text” mean, and why do I find it problematic in this context? “Sign-to-text” is an analogy with “speech-to-text,” also known as speech recognition, the technology that enables dictation software like DragonSpeak. Speech recognition is also used by agents like Siri to interpret words we say so that they can act on them.

There are other computer technologies that rely on the concept of text. Speech synthesis is also known as text-to-speech. It’s the technology that enables a computer to read a text aloud. It can also be used by agents like Siri and Alexa to produce sounds we understand as words. Machine translation is another one: it typically proceeds from text in one language to text in another language. When the DCAL researchers wrote “sign-to-text” they meant a sign recognition system hooked up to a BSL-to-English machine translation system.

Years ago I became interested in the possibility of applying these technologies to sign languages, and created a prototype sign synthesis system, SignSynth, and an experimental English-to-American Sign Language system.

I realized that all these technologies make heavy use of text. If we want automated audiobooks or virtual assistants or machine translation with sign languages, we need some kind of text, or we need to figure out a new way of accomplishing these things without text. So what does text mean for a sign language?

One big thing I discovered when working on SignSynth is that (unlike the DCAL researchers) many people really think that the written form of ASL (or BSL) is written English. On one level that makes a certain sense, because when we train ASL signers for literacy we typically teach them to read and write English. On another level, it’s completely nuts if you know anything about sign languages. The syntax of ASL is completely different from that of English, and in some ways resembles Mandarin Chinese or Swahili more than English.

It’s bad enough that we have speakers of languages like Moroccan Arabic and Fujianese that have to write in a related language (written Arabic and written Chinese, respectively) that is different in non-trivial ways that take years of schooling to master. ASL and English are so totally different that it’s like writing Korean or Japanese with Chinese characters. People actually did this for centuries until someone smart invented hangul and katakana, which enabled huge jumps in literacy.

There are real costs to this, serious costs. I spent some time volunteering with Deaf and hard-of-hearing fifth graders in an elementary school, and after years of drills they were able to put English words on paper and pronounce them when they saw them. But it became clear to me that despite their obvious intelligence and curiosity, they had no idea that they could use words on paper to send a message, or that some of the words they saw might have a message for them.

There are a number of Deaf people who are able to master English early on. But from extensive reading and discussions with Deaf people, it is clear to me that the experience of these kids is typical of that for the vast majority of Deaf people.

It is a tremendous injustice to a child, and a tremendous waste of that child’s time and attention, for them to get to the age of twelve, at normal intelligence, without being able to use writing. This is the result of portraying English as the written form of ASL or BSL.

So what is the written form of ASL? Simply put, it doesn’t have one, despite several writing systems that have been invented, and it won’t have one until Deaf people adopt one. There will be no sign-to-text until signers have text, in their language.

I can say more about that, but I’ll leave it for another post.

On this day in Parisian theater

Since I first encountered The Parisian Stage, I’ve been impressed by the completeness of Beaumont Wicks’s life’s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I’ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

Last week I had an idea for another project. Science requires both qualitative and quantitative research, and I’ve admired Neil Freeman’s @everylotnyc Twitter bot as a project that conveys the diversity of the underlying data and invites deep, qualitative exploration.

In 2016, with Timm Dapper, Elber Carneiro and Laura Silver I forked Freeman’s everylotbot code to create @everytreenyc, a random walk through the New York City Parks Department’s 2015 street tree census. Every three hours during normal New York active time, the bot tweets information about a tree from the database, in a template written by Laura that may also include topical, whimsical sayings.

Recently I’ve encountered a lot of anniversaries. A lot of it is connected to the centenary of the First World War I, but some is more random: I just listened to an episode of la Fabrique de l’histoire about François Mitterrand’s letters to his mistress that was promoted with the fact that he was born in 1916, one hundred years before that episode aired, even though he did not start writing those letters until 1962.

There are lots of “On this day” blogs and Twitter feeds, such as the History Channel and the New York Times, and even specialized feeds like @ThisDayInMETAL. There are #OnThisDay and #otd hashtags, and in French #CeJourLà. The “On this day” feeds have two things in common: they tend to be hand-curated, and they jump around from year to year. For April 13, 2014, the @CeJourLa feed tweeted events from 1849, 1997, 1695 and 1941, in that order.

Two weeks ago I was at the Annual Convention of the Modern Language Association, describing my Digital Parisian Stage corpus, and I realized that in the Parisian Stage there were plays being produced exactly two hundred years ago. I thought of the #OnThisDay feeds and @everytreenyc, and realized that I could create a Twitter bot to pull information about plays from the database and tweet them out. A week later, @spectacles_xix sent out its first automated tweet, about the play la Réconciliation par ruse.

@spectacles_xix runs on Pythonanywhere in Python 3.6, and accesses a MySQL database. It uses Mike Verdone’s Twitter API client. The source is open on GitHub.

Unlike other feeds, including this one from the French Ministry of Culture that just tweeted about the anniversary of the première of Rostand’s Cyrano de Bergerac, this one will not be curated, and it will not jump around from year to year. It will tweet every play that premièred in 1818, in order, until the end of the year, and then go on to 1819. If there is a day when no plays premièred, like January 16, @spectacles_xix will not tweet.
I have a couple of ideas about more features to add, so stay tuned!

How Google’s Pixel Buds will change the world!

Scene: a quietly bustling bistro in Paris’s 14th Arrondissement.

SERVER: Oui, vous désirez?
PIXELBUDS: Yes, you desire?
TOURIST: Um, yeah, I’ll have the steak frites.
PIXELBUDS: UM, OUAIS, JE VAIS AVOIR LES FRITES DE STEAK
SERVER: Que les frites?
PIXELBUDS: Than fries?
TOURIST: No, at the same time.
PIXELBUDS: NON, EN MEME TEMPS
SERVER: Alors, vous voulez le steak aussi?
PIXELBUDS: DESOLE, JE N’AI PAS COMPRIS.
SERVER: VOUS VOULEZ LE STEAK AUSSI?
PIXELBUDS: You want the steak too?
TOURIST: Yeah, I just ordered the steak.
PIXELBUDS: OUAIS, JE VIENS DE COMMANDER LE STEAK
SERVER: Okay, du steak, et des frites, en même temps.
PIXELBUDS: Okay, steak, and fries at the same time.
TOURIST: You got it.
PIXELBUDS: TU L’AS EU.

(All translations by Google Translate. Photo: Alain Bachelier / Flickr.)

And we mean really every tree!

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day’s work, but it was still a hasty patch job on top of Neil Freeman’s original Everylotbot code. I hadn’t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I’ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere’s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I’ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the dérive!