How Google’s Pixel Buds will change the world!

Scene: a quietly bustling bistro in Paris’s 14th Arrondissement.

SERVER: Oui, vous désirez?
PIXELBUDS: Yes, you desire?
TOURIST: Um, yeah, I’ll have the steak frites.
PIXELBUDS: UM, OUAIS, JE VAIS AVOIR LES FRITES DE STEAK
SERVER: Que les frites?
PIXELBUDS: Than fries?
TOURIST: No, at the same time.
PIXELBUDS: NON, EN MEME TEMPS
SERVER: Alors, vous voulez le steak aussi?
PIXELBUDS: DESOLE, JE N’AI PAS COMPRIS.
SERVER: VOUS VOULEZ LE STEAK AUSSI?
PIXELBUDS: You want the steak too?
TOURIST: Yeah, I just ordered the steak.
PIXELBUDS: OUAIS, JE VIENS DE COMMANDER LE STEAK
SERVER: Okay, du steak, et des frites, en même temps.
PIXELBUDS: Okay, steak, and fries at the same time.
TOURIST: You got it.
PIXELBUDS: TU L’AS EU.

(All translations by Google Translate. Photo: Alain Bachelier / Flickr.)

Ten reasons why sign-to-speech is not going to be practical any time soon.

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that … supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype. But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

  1. Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
  2. We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
  3. It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
  4. Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

    -Say “yes” or “no” after the tone.
    -No.
    -I think you said, “Go!” Is that correct?
    -No.
    -My mistake. Please try again.
    -No.
    -I think you said, “I love cheese.” Is that correct?
    -Operator!

  5. There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
  6. Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
  7. Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
  8. Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
  9. Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
  10. What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

Appreciating interpreters

My friend Dan Parvaz, who is a registered American Sign Language interpreter, posted yesterday on Facebook that it was Interpreter Appreciation Day. Further investigation reveals that this day is intended specifically for sign language interpreters, but spoken-language interpreters work hard and deserve appreciation too. Dan mentioned that September 30 is Saint Jerome’s day, the patron saint of translators, but as far as I know nobody mentions interpreters then. It’s as good a day as any other to appreciate the hard work that interpreters do in all languages.

I wanted to share a couple of radio stories that have made me appreciate professional interpreters even more – by their absence. In one case things seem to have turned out well despite the lack of interpreters, and in another they went very badly.

The first is a segment, “I Am Curious Yellow” in the “Tribes” episode of This American Life, which first aired on March 29. Debbie Lum, a Chinese-American filmmaker, produced a documentary Seeking Asian Female, which premiered at South by Southwest last year. She focuses on a white American man with a fetish for Asian women and his Chinese mail-order bride. She recounts how she intended to play the part of the neutral documentarian, but found herself being drawn into the story. She serves as an unpaid, amateur interpreter for Sandy and Steven, and even an informal relationship counselor, helping them both to understand each other and what they want from the relationship.

The second segment, “Yellow Rain,” was part of the Radiolab episode “The Fact of the Matter,” which first aired on September 24 of last year. During the bombing of Hmong villages by the Viet Cong in 1975, a suspicious “yellow rain” fell; some people believe it was poison, and others that it was “bee poop.” Host Robert Krulwich interviewed one of the survivors, Eng Yang, “translated by his niece,” writer Kao Kalia Yang. Krulwich later admitted that he “pressed too hard” in his quest to get evidence to condemn Ronald Reagan. At a certain point in the interview, Kao Kalia Yang, overcome with emotion, cut off both Krulwich and her uncle, yelled at Krulwich and then terminated the interview.

You can read Kao Kalia Yang’s side of the story here and here, and a brief statement from Eng Yang. Krulwich’s statement is here, and one from his co-host Jad Abumrad.

The only place that I have seen discussion of Kao Kalia Yang’s role as interpreter was a comment from “Diane from MN” on the “Yellow Rain” show page: “I speak Hmong and can hear Eng telling the interviewers repeatedly in the final cut he knows what bee pollen looks like. … Even Kao Kalia’s husband, who witnessed the interview, has said Eng was talking about his knowledge of bees and told the interviewers he is an experienced beekeeper. Did you hear any of this in the final cut? Not unless you understand Hmong!”

What Diane from MN is telling us is that Kao Kalia Yang, in her frustration, did her uncle and their cause a disservice. By stepping out of her role as interpreter, she left no one there to honor his voice. It is not even clear to me that he wanted to end the interview.  But even before that she did herself a disservice by agreeing to interpret in a situation where she would not be able to remain impartial.  In her reaction to the show she expresses frustration at being “reduced” to the role of niece in the show credits, when she is a published author.  But taking on the role of interpreter requires the humility to set aside your own agenda and qualifications, something she was not prepared to do.

Over the years I have watched Dan and my other interpreter friends work hard to convey meaning between Deaf and hearing people in as clear and neutral a way as possible.  Many of them are accomplished scientists of language and that may help them to interpret more clearly, but they set aside their research and their egos when they are interpreting.  Some of them will interpret for a friend or spouse in a casual setting, but in a formal situation where the stakes are high, Deaf people need to know that they are getting an unbiased translation of what’s being spoken, and that their own words are being translated fairly. It is the same for Hmong speakers in the United States – for anyone who needs to communicate with someone without a common language.

Ted Xiong, a Hmong interpreter at the Fairview Clinic of the University of Minnesota, told Minnesota Public Radio in 2008, “It’s being in the middle, between the patient and the provider. You cannot advocate for them, you can’t give them advice. It’s like… you are just a voice.” Xiong finds that frustrating, but the alternative is worse. Elizabeth Heibl, a doctor, added, “What you want is a two-way conversation between the clinician and the patient, with the interpreter there to help with communication.”

Unfortunately, children of immigrants like Kao Kalia Yang – the “1.5 generation” – and hearing children of Deaf adults are often thrust into the role of interpreter without any training or qualification. They find themselves interpreting for their parents with bureaucrats, teachers, shopkeepers and doctors, because nobody has hired Ted Xiong or Dan Parvaz or one of their colleagues – because interpreting services are expensive and many in government and business don’t know or care that people need this service. They find themselves forced to choose between conveying accurate information and being a full participant in the event. This is not fair to anyone.

There are some languages that have very small communities in the United States, and it can be very hard to find a trained, neutral interpreter. The Hmong languages are not like that. In a quick Google search I found three services offering professional Hmong translation. Eng Yang should have insisted on one; if so desired, Kao Kalia Yang could have participated as an advocate, free from the responsibilities of interpreting. But really, they shouldn’t have had to ask. WNYC can afford a Hmong interpreter for a two-hour interview session, and Pat Walters, the producer, should have simply provided one as a matter of course as soon as it was clear that Eng Yang’s English wasn’t fluent enough.

Walters and Krulwich set out to find the Truth. The Truth is a slippery thing. But you’re never going to get anywhere close to it without a reliable, neutral interpreter. And you’re probably going to mess things up.