The story of SignSynth

Leave a comment December 24, 2025 Angus Andrea Grieve-Smith

The beginning

Every morning in the fall of 1997 I would wake up at 6AM and immediately jump out of bed. I went straight to my Fujitsu laptop in the home office, leaving my then-girlfriend sleeping in the bedroom. I didn’t have anywhere to be; nobody was paying me, and this wasn’t for a class. I was working on a new project: a sign-language synthesis prototype.

This project was the culmination of several threads in my life. I had been inspired to go back to grad school by spending time talking about language with my girlfriend, a linguist who had just earned her PhD. For the past two years I had been making my living in information technology, working my way up from Microsoft Office trainer to LAN operations tech. I had also been taking night classes at the American Sign Language Institute and participating in discussions on SLLING-L, a sign linguistics email list, and the SignWriting List.

I had chosen the linguistics doctoral program at the University of New Mexico because I knew it was strong in sign linguistics, and I had just arrived in August. I was taking a course in psycholinguistics with the sign linguist Jill Morford, and a course in ASL.

Several people had asked me, “So you’re interested in language, and you’re interested in computers. Is there a way you could combine these interests?” I replied, “Well, there’s speech synthesis,” but that technology was already pretty well established. I realized that there wasn’t much in the area of technology for sign languages. What would sign synthesis look like?

When I arrived in Albuquerque I mentioned this idea to Sean Burke, a linguist and programmer I had made friends with on an earlier visit. Sean suggested using Virtual Reality Modeling Language, a standard for describing three-dimensional objects and their movements. People could install a plugin in their web browsers, and point it to a VRML (since renamed Web3D) site, and the plugin would display the 3D animation.

I followed Sean’s tip and discovered that there was a standard for specifying and animating humanoids in VRML, and a way to control the plugin through Javascript. I created a basic animation of a sign using Javascript, but after numerous frustrations I concluded that it was best to assemble the VRML through a custom Perl templating system controlled by CGI forms.

Getting the word out

Once my system was working to the point that it could create intelligible signs, I showed it to the faculty and students who were studying sign languages. The strongest interest came from Jill Morford, my psycholinguistics professor, who recognized that I had created a sign-language analog of the early speech synthesizers that were used at Haskins Labs to demonstrate the categorical nature of speech perception and its acquisition in language learners.

Jill realized that my system, which I had dubbed SignSynth, could be used to test whether perception of sign languages is similarly categorical, and how it is acquired. The following year she obtained a grant from the National Institutes of Health to study the question, and included some money to support me part time and get an office that I shared with a couple other students.

Around that time I had a meeting with my Committee on Studies. They liked my work, but told me that simply creating a sign synthesis system wasn’t theoretical enough for a Linguistics dissertation. If I used it, I would have to use it to demonstrate an answer to some theoretical question.

Doubts

That year I also took a course in literacy education called Teaching Reading to the ESL Student. The professor, Leila Flores-Dueñas, was happy to support me applying these lessons to Deaf students, but throughout the course she stressed the importance of grounding our teaching in the priorities of our students, and orienting our research to the goals of the populations we were studying. She pointed out further that we should be working to help lift up people who were oppressed or disadvantaged, so when we are working with people in those situations we have a particular obligation to fit our work to their priorities.

I realized that the same principle applied to developing applications: an app that touches disadvantaged people should help them to fulfill their goals. This brought back to mind the fact that I had never seen a Deaf person ask for a sign synthesis or natural language processing application.

I actually hadn’t talked much with Deaf people about language technology. This was in part because there is a general suspicion of hearing people in Deaf communities, particularly of hearing people who come bearing language technology. The suspicion is well earned; check out what Deaf people have to say about Alexander Graham Bell.

I had difficulty overcoming this suspicion because I had only been studying American Sign Language for a few years. I could communicate with Deaf people, but only if they were patient, and many of them had no reason to be patient with some random hearing person. I realized that I didn’t necessarily have any technology that could help them accomplish their goals, and if i did, they weren’t necessarily going to try it.

Challenges

That year I ran into two major challenges in sign synthesis, both relating to the placement of hands in space. The higher-level challenge relates to signs that are called depicting or iconic verbs, or classifier predicates. In these signs, the relative location, orientation and movement of the signer’s hands are used to refer to a similar spatial relationship or movement. In a classic example, an ASL signer can depict the movement of a car by making the “3” vehicle classifier shape with one hand and moving that hand in a scaled-down version of the car’s movements.

Iconic verbs are very difficult to specify in a user interface, because the level of detail of the location and movement is only constrained by the signer’s control of their hands, and the audience’s visual perception abilities. Specifying, representing and transmitting such signs is, at a minimum, a vastly different task than that for lexical signs, where there is a relatively small set of locations and movements.

The lower-level challenge involved figuring out how to bend the shoulder, elbow and wrist on a humanoid figure so that the figure’s hand winds up in a particular place, facing in a particular direction. This is a well-known problem, called inverse kinematics, that we humans solve intuitively every time we make a gesture or pick up an object.

There are computer libraries for inverse kinematics, and I tried to connect my animation code to those libraries, but they were written in C++, not Perl, and after several months I still hadn’t figured it out.

I described these challenges of inverse kinematics and representing classifier predicates to my committee, but they did not consider them to be theoretical enough for a dissertation. I was able to create the videos for Jill Morford’s categorical perception experiments, and she suggested that I could study categorical perception for my dissertation. In retrospect I should maybe have taken her up on it, but instead I decided to study a cosmopolitan, largely privileged community where the speakers were all long dead: the history of Parisian French through theater.

Legacy

My papers on SignSynth are routinely cited as one of the pioneering works in text-to-sign, mostly by developers who didn’t have a Professor Flores-Dueñas to teach them the importance of serving the language community, and who don’t follow my lead in separating synthesis from translation.

The paper that Jill Morford wrote with me and my fellow students about categorical perception has also had lasting influence; we failed to find a strong effect of categorical perception to match those found in speech, and the implications of that are still being discussed.

I never wanted to commercialize SignSynth, knowing that average incomes for Deaf people tend to be significantly lower than the general population, so I made it publicly available as a web application. The original application relied on third-party browser plugins to display the VRML, but over time, web browser makers dropped support for those plugins.

In 2022 I rewrote SignSynth from scratch in Javascript, using a relatively new 3D graphics library, Three.js, and made the code available on GitHub. I updated the interface to follow “character pickers” like Richard Ishida’s IPA Character Picker, replacing the old interface that was heavy on drop-down menus. I also created a standalone character picker for the Stokoe notation that it used for the original text.

What SignSynth meant to me

Thinking back to 1997, what got me so excited, what inspired me to move across the country and start a doctoral program with no promise of funding, living off of part time jobs and student loans, and spend so many unpaid hours every day, was the idea that I was creating something new, something that could be useful to people.

When I was young, I admired inventors, both real-life inventors like Thomas Edison and fictional ones like Professor Bullfinch from the Danny Dunn books. My father, my stepfather and my mother’s boyfriends in between were all tinkerers, and they made useful things. I didn’t have money to buy a lot of tools and equipment, or space to keep it in, but I did have access to computers, and the skills to program them.

Looking back, there was some ego in this. Why was it important for me to make these things, and not someone else? Why not be satisfied fixing the Novell servers for Chase Manhattan, or even sending faxes for John Hancock?

To be fair to myself, I have never been a competitive inventor. While I was working on SignSynth I met several people who were working on similar systems, and I always tried to be positive, supportive and cooperative. I let the quality of my work speak for itself, and trusted people to judge its value for themselves.

I’ve never felt like I was the best inventor, programmer or researcher in the world, but even in 1997 I had a fair amount of experience and education in research and technology. I felt like those skills were being wasted when I was taking telephone messages at John Hancock, and even when I was reinstalling network drivers. Of course, there are a lot of highly skilled people out there, maybe more than there’s a need for.

After putting SignSynth on the shelf and focusing on French negation for my dissertation, I didn’t feel the same level of excitement; the theoretical advancements that my committee demanded felt small by comparison, although I’m still proud of them.

I have felt a bit more excitement for my recent work for the New School. It’s a more simple project, just retrieving class listings or final grades from the student information system and presenting it in a table, but it helps save time and effort for students and faculty. Less exciting, but still satisfying.

Screenshot of the "Compose new Tweet" modal on Twitter, with the "+" button and a tooltip reading "Add another Tweet". The tweet texts reads "blah blah blah bl"

Dialogue and monologue in social media

Leave a comment November 27, 2022 Angus Andrea Grieve-Smith

I wrote most of this post in June 2022, before a lot of us decided to try out Mastodon. I didn’t publish it because I despaired of it making a difference. It felt like so many people were set in particular practices, including not reading blog posts! My experience on Mastodon has been so much better than the past several years on Twitter. I think this is connected with how Twitter and Mastodon handle threads.

A few years ago I wrote a critique of Twitter threads, tweetstorms, essays, and similar forms. I realize now that I didn’t actually talk much about what’s wrong with them. I focused on how difficult they are to read, but I didn’t realize how the native Twitter website and app actually makes them easier to read. So let me tell you some of the deeper problems with threads.

In 2001 I visited some of the computational linguistics labs at Carnegie Mellon University. Unfortunately I don’t remember the researchers’ names, but they described a set of experiments that has informed my thinking about language ever since. They were looking at the size of the input box in a communication app.

These researchers did experiments where they asked people to communicate with each other using a custom application. They presented different users with input boxes of different sizes: some got only a single line, others got three or four, and maybe some got six or eight lines.

What they found was that when someone was presented with a large blank space, as in an email application or the Google Docs application I’m writing this in, they tended to take their time and write long blocks of text, and edit them until they were satisfied. Only then did they hit send. Then the other user would do the same.

When the Carnegie Mellon researchers presented users with only one line, as in a text message app, their behavior was much different. They wrote short messages and sent them off with minimal editing. The short turnaround time resulted in a dialogue that was much closer to the rhythm of spoken conversation.

This echoed my own findings from a few years before. I was searching for features of French that I heard all over the streets of Paris, but had not been taught to me in school, in particular what linguists call right dislocation (“Ils sont fous, ces Romains”) and left dislocation (“L’?tat, c’est moi”).

In 1998 the easiest place to look was USENET newsgroups, and I found that even casual newsgroups like fr.rec.animaux were heavy on the formal, carefully crafted types of messages I remembered from high school French class. I had already read some prior research on this kind of language variation, so I decided to try something with faster dialogue.

In Internet Relay Chat (IRC) I hit the jackpot. On the #france IRC channel, left and right dislocations made up between 21% and 38% of all finite clauses. I noticed other features of conversational French like ne-dropping were common as well. I could even see IRC newbies adapting in real time: they would start off trying to write formal sentences the way they were taught in lyc?e, and soon give up and start writing the way they talked.

At this point I have to say: I love dialogue. Don’t get me wrong: I can get into a nice well-crafted monologue or monograph. And anyone who knows me knows I enjoy telling a good story or tearing off on a rant about something. But dialogue keeps me honest, and it keeps other people honest too.

Dialogue is not inherently or automatically good. On Twitter as in many other places, it is used to harass and intimidate. But when properly structured and regulated it can be a democratizing force. It’s important to remember how long our media has been dominated by monologues: newspapers, films, television. Even when these formats contain dialogues, they are often fictional dialogues written by a single author or team of authors to send a single message.

One of my favorite things about the internet is that it has always favored dialogue. Before large numbers of people were on the internet there was a large gap between privileged media sources and independent ones. Those of us who disagreed with the monologues being thrust upon us by television and newspapers were often reduced to impotently talking back at those powerful media sources, in an empty room.

USENET, email newsletters, personal websites and blogs were democratizing forces because they allowed anyone who could afford the hosting fees (sometimes with the help of advertisers) to command these monologic platforms. They were the equivalent of Speakers’ Corner in London. They were like pamphlets or letters to the editor or cable access television, but they eliminated most of the barriers to entry. But they were focused on monologues.

In the 1990s and early 2000s we had formats that encouraged dialogue, like mailing lists and bulletin boards, but they had large input boxes. As I saw on fr.rec.animaux in 1998, that encouraged long, edited messages.
We did have forums with smaller input boxes, like IRC or the group chats on AOL Instant Messenger. As I found, those encouraged people to write short messages in dialog with each other. When I first heard about Twitter with its 140-character limit I immediately recognized it as a dialogic forum.

But what sets Twitter apart from IRC or AOL Instant Messenger? Twitter is a broadcast platform. The fact that every tweet is public by default, searchable and assigned a unique URL, makes it a “microblog” site like some popular sites in China.

If someone said something on IRC or AIM in 1999 it was very hard to share it outside that channel. I was able to compile my corpus by creating a “bot” that logged on to the #france channel every night and logged a copy of all the messages. What Twitter and the sites it copied like Weibo brought was the combination of permanent broadcast, low barrier to entry, and dialogue.

This is why I’m bothered by Twitter threads, by screenshots of text, by the unending demands for an edit button. These are all attempts to overpower the dialogue on Twitter, to remove one of the key elements that make it special.

Without the character limits, Twitter is just a blogging platform. Of course, there’s nothing wrong with blogs! I’ve done a lot of blogging, I’ve done a lot of commenting on blogs and I’ve tweeted a lot of links to blogs. But I want to choose when to follow those links and go read those blog posts or news articles or press releases.

I want a feed full of dialogue or short statements. Threads and screenshots interrupt the dialogue. They aggressively claim the floor, crowding out other tweets. Screenshots interrupt the other tweets with large blocks of text, demanding to be read in their entirety. Threads take up even more of the timeline. The Twitter web app will show as many as three tweets of a thread, interrupting the flow of dialogue.

The experience of threads is much worse on Twitter clients that don’t manipulate the timeline, like TweetDeck (which was bought by Twitter in 2011) and HootSuite. If it’s a long thread, your timeline is screwed, and you have to scroll endlessly to get past it.

One of the things I love the most about Mastodon is the standard practice of making the first toot in a thread public, but publishing all the other toots as unlisted. That broadcasts the toot announcing the thread, and then gives readers the agency to decide whether they want to read the follow-up toots. It’s more or less the equivalent of including a link to a web page or blog post in a toot.

There’s a lot more to say about dialogue and social media, but for now I’m hugely encouraged by the feeling of being on Mastodon, and I’m hoping it leads us in a better direction for dialogue, away from threads and screenshots.

(imit.: dez may be slightly bent spread 5) v type, N typewriter, typist with or without suffix -|| [/BB/v,.

Fonts for Stokoe notation

Leave a comment May 8, 2022 Angus Andrea Grieve-Smith

You may be familiar with the International Phonetic Alphabet, the global standard for representing speech sounds, ideally independent of the way those speech sounds may be represented in a writing system. Did you know that sign languages have similar standards for representing hand and body gestures?

Unfortunately, we haven’t settled on a single notation system for sign languages the way linguists have mostly chosen the IPA for speech. There are compelling arguments that none of the existing systems are complete enough for all sign languages, and different systems have different strengths.

Another difference is that signers, by and large, do not read and write their languages. Several writing systems have been developed and promoted, but to my knowledge, there is no community that sends written messages to each other in any sign language, or that writes works of fiction or nonfiction for other signers to read.

One of the oldest and best-known notation system is the one developed by Gallaudet University professor William Stokoe (u5"_tx) for his pioneering analysis of American Sign Language in the 1960s, which succeeded in convincing many people that ASL is, in ways that matter, a language like English or Japanese or Navajo. Among other things, with his co-authors Dorothy Casterline and Carl Cronenberg Stokoe used this system for the entries in their 1965 Dictionary of American Sign Language (available from SignMedia).? In the dictonary entry above, the sign C_bC_b^r~ is given the English translation of “type.”

Stokoe notation is incomplete in a number of ways. Chiefly, it is optimized for the lexical signs of American Sign Language. It does not account for the wide range of handshapes used in American fingerspelling, or the wide range of locations, orientations and movements used in ASL depicting gestures. It only describes what a signer’s hands are doing, with none of the face and body gestures that have come to be recognized as essential to the grammar of sign languages. Some researchers have produced modifications for other languages, but those are not always well-documented.

Stokoe created a number of symbols, some of which bore a general resemblance to Roman letters, and some that didn’t. This made it impossible to type with existing technology; I believe all the transcriptions in the Dictionary of ASL were written by hand. In 1993 another linguist, Mark Mandel, developed a system for encoding Stokoe notation into the American Standard Code for Information Interchange (ASCII) character set, which by then could be used on almost all American computers.

In September 1995 I was in the middle of a year-long course in ASL at the ASL Institute in Manhattan. I used some Stokoe notation for my notes, but I wanted to be able to type it on the computer, not just using Mandel’s ASCII encoding. I also happened to be working as a trainer at Userfriendly, a small chain of computer labs with a variety of software available, including Altsys Fontographer, and as an employee I could use the workstations whenever customers weren’t paying for them.

One day I sat down in a Userfriendly lab and started modifying an existing public domain TrueType font (Tempo by David Rakowski) to make the Stokoe symbols. The symbols were not in Unicode, and still are not, despite a proposal to that effect on file. I arranged it so that the symbols used the ASCII-Stokoe mappings: if you typed something in ASCII-Stokoe and applied my font, the appropriate Stokoe symbols would appear. StokoeTempo was born. It wasn’t elegant, but it worked.

I made the font available for download from my website, where it’s been for the past 26-plus years. I wound up not using it for much, other than to create materials for the linguistics courses I taught at Saint John’s University, but others have downloaded it and put it to use. It is linked from the Wikipedia article on Stokoe notation.

A few years later I developed SignSynth, a web-based prototype sign language synthesis application. At the time web browsers did not offer much flexibility in terms of fonts, so I could not use Stokoe symbols and had to rely on ASCII-Stokoe, and later Don Newkirk’s (1986) Literal Orthography, along with custom extensions for fingerspelling and nonmanual gestures.

Recently, as part of a project to bring SignSynth (another project of mine) into the 21st Century I decided to explore using fonts on the Web. I discovered a free service, FontSquirrel, that creates Web Open Font Format (WOFF and WOFF2) wrappers for TrueType fonts. I created WOFF and WOFF2 files for StokoeTempo and posted them on my site.

I also discovered a different standard, Typeface.js, which actually uses a JSON format. This is of particular relevance to SignSynth, because it can be used with the 3D web library Three.js. There’s another free service, Facetype.js, that converts TrueType fonts to Typeface.js fonts.

To demonstrate the use of StokoeTempo web fonts, above is a scan of the definition of C_bC_b^r~ from page 51 of the Dictionary of American Sign Language. Below I have reproduced it using HTML and StokoeTempo:

C_bC_b^r~ (imit.: dez may be slightly bent spread 5) v type, r typewriter, typist with or without suffix _____ ?[BB^v.

StokoeTempo is free to download and use by individuals and educational institutions.

Screenshot of LanguageLab displaying the exercise "J'étais certain que j'aillais écrire à quinze ans"

Imagining an alternate language service

Leave a comment April 6, 2022 Angus Andrea Grieve-Smith

It’s well known that some languages have multiple national standards, to the point where you can take courses in either Brazilian or European Portuguese, for example. Most language instruction services seem to choose one variety per language: when I studied Portuguese at the University of Paris X-Nanterre it was the European variety, but the online service Duolingo only offers the Brazilian one.

I looked into some of Duolingo’s offerings for this post, because they’re the most talked about language instruction service these days. I was surprised to discover that they use no recordings of human speakers; all their speech samples are synthesized using an Amazon speech synthesis service named Polly. Interestingly, even though Duolingo only offers one variety of each language, Amazon Polly offers multiple varieties of English, Spanish, Portuguese and French.

As an aside, when I first tried Duolingo years ago I had the thought, “Wait, is this synthesized?” but it just seemed too outrageous to think that someone would make a business out of teaching humans to talk like statistical models of corpus speech. It turns out it wasn’t too outrageous, and I’m still thinking through the implications of that.

Synthesized or not, it makes sense for a company with finite resources to focus on one variety. But if that one company controls a commanding market share, or if there’s a significant amount of collusion or groupthink among language instruction services, they can wind up shutting out whole swathes of the world, even while claiming to be inclusive.

This is one of the reasons I created an open LanguageLab platform: to make it easier for people to build their own exercises and lessons, focusing on any variety they choose. You can set up your own LanguageLab server with exercises exclusively based on recordings of the English spoken on Smith Island, Maryland (population 149), if you like.

So what about excluded varieties with a few more speakers? I made a table of all the Duolingo language offerings according to their number of English learners, along with the Amazon Polly dialect that is used on Duolingo. If the variety is only vaguely specified, I made a guess.

For each of these languages I picked another variety, one with a large number of speakers. I tried to find the variety with the largest number of speakers, but these counts are always very imprecise. The result is an imagined alternate language service, one that does not automatically privilege the speakers of the most influential variety. Here are the top ten:

Language	Duolingo dialect	Alternate dialect
English	Midwestern US	India
Spanish	Mexico	Argentina
French	Paris	Quebec
Japanese	Tokyo	Kagoshima
German	Berlin	Bavarian
Korean	Seoul	Pyongyang
Italian	Florence	Rome
Mandarin Chinese	Beijing	Taipei
Hindi	Delhi	Chhatisgarhi
Russian	Moscow	Almaty

To show what could be done with a little volunteer work, I created a sample lesson for a language that I know, the third-most popular language on Duolingo, French. After France, the country with the next largest number of French speakers is Canada. Canadian French is distinct in pronunciation, vocabulary and to some degree grammar.

Canadian French is stigmatized outside Canada, to the point where I’m not aware of any program in the US that teaches it, but it is omnipresent in all forms of media in Canada, and there is quite a bit of local pride. These days at least, it would be as odd for a Canadian to speak French like a Parisian as for an American to speak English like a Londoner. There are upper and lower class accents, but they all share certain features, notably the ranges of the nasal vowels.

I chose a bestselling author and television anchor, Michel Jean, who has one grandmother from the indigenous Innu people and three presumably descended from white French settlers. I took a small excerpt from an interview with Jean about his latest novel where he responds spontaneously to the questions of a librarian, Josianne Binette.

The sample lesson in Canadian French based on Michel Jean’s speech is available on the LanguageLab demo site. You are welcome to try it! Just log in with the username demo and the password LanguageLab.

A free, open source language lab app

1 Comment February 5, 2021 Angus Andrea Grieve-Smith

Viewers of the Crown may have noticed a brief scene where Prince Charles practices Welsh by sitting in a glass cubicle wearing a headset.? Some viewers may recognize that as a language lab. Some may have even used language labs themselves.

The core of the language lab technique is language drills, which are based on the bedrock of all skills training: mimicry, feedback and repetition.? An instructor can identify areas for the learner to focus on.

Because it’s hard for us to hear our own speech, the instructor also can observe things in the learner’s voice that the learner may not perceive.? Recording technology enabled the learner to take on some of the role of observer more directly.

When I used a language lab to learn Portuguese in college, it ran on cassette tapes.? The lab station played the model (I can still remember “Elena, estudante francesa, vai passar as ferias em Portugal?“), then it recorded my attempted mimicry onto a blank cassette.? Once I was done recording it played back the model, followed by my own recording.

Hearing my voice repeated back to me after the model helped me judge for myself how well I had mimicked the model.? It wasn’t enough by itself, so the lab instructor had a master station where he could listen in on any of us and provide additional feedback.? We also had classroom lessons with an instructor, and weekly lectures on culture and grammar.

There are several companies that have brought language lab technology into the digital age, on CD-ROM and then over the internet.? Many online language learning providers rely on proprietary software and closed platforms to generate revenue, which is fine for them but doesn’t allow teachers the flexibility to add new language varieties.

People have petitioned these language learning companies to offer new languages, but developing offerings for a new language is expensive.? If a language has a small user base it may never generate enough revenue to offset the cost of developing the lessons.? It would effectively be a donation to people who want to promote these languages, and these companies are for profit entities.

Duolingo has offered a work-around to this closed system: they will accept materials developed by volunteers according to their specifications and freely donated.? Anyone who remembers the Internet Movie Database before it was sold to Amazon can identify the problems with this arrangement: what happens to those submissions if Duolingo goes bankrupt, or simply decides not to support them anymore?

Closed systems raise another issue: who decides what it means to learn French, or Hindi?? This has been discussed in the context of Duolingo, which chose to teach the artificial Modern Standard Arabic rather than a colloquial dialect or the classical language of the Qur’an.? Similarly, activists for the Hawai’ian language wanted the company to focus on lessons to encourage Hawai’ians to speak the language, rather than tourists who might visit for a few weeks at most.

Years ago I realized that we could make a free, open-source language lab application.? It wouldn’t have to replicate all the features of the commercial apps, especially not initially.? An app would be valuable if it offers the basic language lab functionality: play a model, record the learner’s mimicry, play the model again and finally play the recording of the learner.

An open system would be able to use any recording that the device can play.? This would allow learners to choose the models they practice with, or allow an instructor to choose models for their students.? The lessons don’t have to be professionally produced.? They can be created for a single student, or even for a single occasion.? I am not a lawyer, but I believe they can even use copyrighted materials.

I have created a language lab app using the Django Rest Framework and ReactJS that provides basic language lab functionality.? It runs in a web browser using responsive layout, and I have successfully tested it in Chrome and Firefox, on Windows and Android.

This openness and flexibility drastically reduces the cost of producing a lesson.? The initial code can be installed in an hour, on any server that can host Django.? The monthly cost of hosting code and media can be under $25.? Once this is set up, a media item and several exercises based on it can be added in five minutes.

This reduced cost means that a language does not have to bring in enough learners to recoup a heavy investment.? That in turn means that teachers can create lessons for every dialect of Arabic, or in fact for every dialect of English.? They can create Hawai’ian lessons for both tourists and heritage speakers.? They could even create lessons for actors to learn dialects, or master impressions of celebrities.

As a transgender person I’ve long been interested in developing a feminine voice to match my feminine visual image.? Gender differences in language include voice quality, pitch contour, rhythm and word choice – areas that can only be changed through experience.? I have used the alpha and beta versions of my app to create exercises for practicing these differences.

Another area where it helps a learner to hear a recording of their own voice is singing.? This could be used by professional singers or amateurs.? It could even be used for instrument practice.? I use it to improve my karaoke!

This week I was proud to present my work at the QueensJS meetup.? My slides from that talk contain more technical details about how to record audio through the web browser.? I’ll be pushing my source to GitHub soon. You can read more details about how to set up and use LanguageLab.? In the meantime, if you’d like to contribute, or to help with beta testing, please get in touch!

Le Corpus de la sc?ne parisienne

Leave a comment April 17, 2018 Angus Andrea Grieve-Smith

C’est l’ann?e 1810, et vous vous promenez sur les Grands Boulevards de Paris. Vous avez l’impression que toute la ville, voir m?me toute la France, a eu la m?me id?e, et est venue pour se promener, pour voir les gens et se faire voir. Qu’est-ce que vous entendez?

Vous arrivez ? un th??tre, vous montrez un billet pour une nouvelle pi?ce, et vous entrez. La pi?ce commence. Qu’est-ce que vous entendez de la sc?ne? Quels voix, quel langage?

Le projet du Corpus de la sc?ne parisienne cherche ? r?pondre ? cette derni?re question, avec l’id?e que cela nous informera sur la premi?re question aussi. Il s’appuie sur les travaux du chercheur Beaumont Wicks et des ressources comme Google Books et le projet Gallica de la Biblioth?que Nationale de France pour cr?er un corpus vraiment repr?sentatif du langage du th??tre parisien.

Certains corpus sont construits ? base d’une ?principe d’autorit??, qui tend ? mettre les voix des aristocrates et des grands bourgeois au premier plan. Le Corpus de la Sc?ne Parisienne corrige ce biais par se baser sur une ?chantillon tir?e au sort. En incorporant ainsi le th??tre populaire, le Corpus de la Sc?ne Parisienne permet au langage des classes ouvri?res, dans sa repr?sentation th??trale, de prendre sa place dans le tableau linguistique de cette p?riode.

La premi?re phase de construction, qui couvre les ann?es 1800 ? 1815, a d?j? contribu? ? la d?couverte des r?sultats int?ressants. Par exemple, dans le CSP en 75% des n?gations de phrase on utilise la construction ne ? pas, mais dans les quatre pi?ces de th??tre qui font partie du corpus FRANTEXT de la m?me p?riode, on n’utilise ne ? pas qu’en 49% des n?gations de phrase.

En 2016 j’ai cr?? un d?p?t sur GitHub et commenc? ? y mettre les textes de la premi?re phase en format HTML. Vous pouvez en lire pour vous amuser (Jocrisse-Ma?tre et Jocrisse-Valet en particulier m’a amus?), les mettre sur sc?ne (j’ach?terai des places) ou bien les utiliser pour vos propres recherches. Peut-?tre vous voudriez aussi contribuer au d?p?t, par corriger des erreurs dans les textes, ajouter de nouveaux textes du catalogue, ou convertir les textes en de nouveaux formats, comme TEI ou Markdown.

En janvier 2018 j’ai cr?? le bot spectacles_xix sur Twitter. Chaque jour il diffuse les descriptions des pi?ces qui ont d?but? ce jour-l? il y a exactement deux cents ans.

N’h?sitez pas ? utiliser ce corpus dans vos recherches, mais je vous prie de ne pas oublier de me citer, ou m?me me contacter pour discuter des collaborations ?ventuelles!

Why do people make ASL translations of written documents?

1 Comment February 8, 2018 Angus Andrea Grieve-Smith

My friend Josh was puzzled to see that the City of New York offers videos of some of its documents, translated from the original English into American Sign Language, on YouTube. I didn?t know of a good, short explainer online, and nobody responded when I asked for one on Twitter, so I figured I?d write one up.

The short answer is that ASL and English are completely different language, and knowing one is not that much help learning the other. It?s true that some deaf people are able to lipread, speak and write fluent English, this is generally because they have some combination of residual hearing, talent, privilege and interest in language. Many deaf people need to sign for daily conversation, even if they grew up with hearing parents.

It is incredibly difficult to learn to read and write a language that you can?t speak, hear, sign or see. As part of my training in sign linguistics I spent time with two deaf fifth grade students in an elementary school in Albuquerque. These were bright, curious children, and they spent hours every day practicing reading, writing, speaking and even listening – they both had cochlear implants.

After visiting these kids several times, talking with them in ASL and observing their reading and writing, I realized that at the age of eleven they did not understand how writing is used to communicate. I asked them to simply pass notes to each other, the way that hearing kids did well before fifth grade. They did each write things on paper that made the other laugh, but when I tried giving them specific messages and asking them to pass those messages on in writing, they had no idea what I was asking for.

These kids are in their thirties now, and they may well be able to read and write English fluently. At least one had a college-educated parent who was fluent in both English and ASL, which helps a lot. Other factors that help are the family?s income level and a general skill with languages. Many deaf people have none of these advantages, and consequently never develop much skill with English.

The City could even print some of these documents in ASL. Several writing systems have been created for sign languages, some of them less complete than others. For a variety of reasons, they haven?t caught on in Deaf communities, so using one of those would not help the City get the word out about school closures.

The reasons that the City government provides videos in ASL are thus that ASL is a completely different language from English, many deaf people do not have the exceptional language skills necessary to read a language they don?t really speak, and the vast majority of deaf people don?t read ASL.

On this day in Parisian theater

Leave a comment January 20, 2018 Angus Andrea Grieve-Smith

Since I first encountered The Parisian Stage, I?ve been impressed by the completeness of Beaumont Wicks?s life?s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I?ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

Last week I had an idea for another project. Science requires both qualitative and quantitative research, and I?ve admired Neil Freeman?s @everylotnyc Twitter bot as a project that conveys the diversity of the underlying data and invites deep, qualitative exploration.

In 2016, with Timm Dapper, Elber Carneiro and Laura Silver I forked Freeman?s everylotbot code to create @everytreenyc, a random walk through the New York City Parks Department?s 2015 street tree census. Every three hours during normal New York active time, the bot tweets information about a tree from the database, in a template written by Laura that may also include topical, whimsical sayings.

Recently I?ve encountered a lot of anniversaries. A lot of it is connected to the centenary of the First World War I, but some is more random: I just listened to an episode of la Fabrique de l?histoire about Fran?ois Mitterrand’s letters to his mistress that was promoted with the fact that he was born in 1916, one hundred years before that episode aired, even though he did not start writing those letters until 1962.

There are lots of ?On this day? blogs and Twitter feeds, such as the History Channel and the New York Times, and even specialized feeds like @ThisDayInMETAL. There are #OnThisDay and #otd hashtags, and in French #CeJourL?. The ?On this day? feeds have two things in common: they tend to be hand-curated, and they jump around from year to year. For April 13, 2014, the @CeJourLa feed tweeted events from 1849, 1997, 1695 and 1941, in that order.

Two weeks ago I was at the Annual Convention of the Modern Language Association, describing my Digital Parisian Stage corpus, and I realized that in the Parisian Stage there were plays being produced exactly two hundred years ago. I thought of the #OnThisDay feeds and @everytreenyc, and realized that I could create a Twitter bot to pull information about plays from the database and tweet them out. A week later, @spectacles_xix sent out its first automated tweet, about the play la R?conciliation par ruse.

la Réconciliation par ruse, par Riboutté, comédie en 1 acte, a débuté jeudi le 15 janvier 1818 au Théâtre Français. Wicks 5583.

— Spectacles il y a 200 ans (@spectacles_xix) January 15, 2018

@spectacles_xix runs on Pythonanywhere in Python 3.6, and accesses a MySQL database. It uses Mike Verdone?s Twitter API client. The source is open on GitHub.

Unlike other feeds, including this one from the French Ministry of Culture that just tweeted about the anniversary of the premi?re of Rostand?s Cyrano de Bergerac, this one will not be curated, and it will not jump around from year to year. It will tweet every play that premi?red in 1818, in order, until the end of the year, and then go on to 1819. If there is a day when no plays premi?red, like January 16, @spectacles_xix will not tweet.
I have a couple of ideas about more features to add, so stay tuned!

And we mean really every tree!

Leave a comment May 30, 2017 Angus Andrea Grieve-Smith

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn?t actually sampling from a list that included every street tree in New York City. The Parks Department?s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day?s work, but it was still a hasty patch job on top of Neil Freeman?s original Everylotbot code. I hadn?t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I?ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere?s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I?ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the d?rive!

The Photo Roster, a web app for Columbia University faculty

Leave a comment May 10, 2017 Angus Andrea Grieve-Smith

Since July 2016 I have been working as Associate Application Systems in the Teaching and Learning Applications group at Columbia University. I have developed several apps, including this Photo Roster, an LTI plugin to the Canvas Learning Management System.

The back end of the Photo Roster is written in Python and Flask. The front end uses Javascript with jQuery to filter the student listings and photos, and to create a flash card app to help instructors learn their students’ names.

This is the third generation of the Photo Roster tool at Columbia. The first generation, for the Prometheus LMS, was famously scraped by Mark Zuckerberg when he extended Facebook to Columbia. To prevent future release of private student information, this version uses SAML and OAuth2 to authenticate users and securely retrieve student information from the Canvas API, and Oracle SQL to store and retrieve the photo authorizations.

It would be a release of private student information if I showed you the Roster live, so I created a demo class with famous Columbia alumni, and used a screen recorder to make this demo video. Enjoy!