Prejudice and intelligibility

Last month I wrote about the fact that intelligibility – the ability of native speakers of one language or dialect to understand a closely related one – is not constant or automatic. A major factor in intelligibility is familiarity: when I was a kid, for example, I had a hard time understanding the Beatles until I got used to them. Having lived in North Carolina, I find it much easier to understand people from Ocracoke Island than my students do.

Prejudice can play a big role in intelligibility, as Donald Rubin showed in 1992. (I first heard about this study from Rosina Lippi-Green’s book English With an Accent.) At the time, American universities had recently increased the overall number of instructors from East Asia they employed, and some students complained that they had difficulty understanding the accents of their instructors.

In an ingenious experiment, Rubin demonstrated that much of this difficulty was due to prejudice. He recorded four-minute samples of “a native speaker of English raised in Central Ohio” reading a script for introductory-level lectures on two different subjects and played those samples to three groups of students.

For one group, a still photo of a “Caucasian” woman representing the instructor was projected on a screen while the audio sample was played. For the second group, a photo of “an Asian (Chinese)” woman was projected, with the same audio of the woman from central Ohio (presumably not of Asian ancestry) was played. The third group heard only the audio and was not shown a photo.

In a survey they took after hearing the clip, most of the students who saw the picture of an Asian woman reported that the speaker had “Oriental/Asian ethnicity.” That’s not surprising, because it’s essentially what they were told by being shown the photograph. But many of these students went further and reported that the person in the recording “speaks with a foreign accent.” In contrast, the vast majority of the students who were shown the “Caucasian” picture said that they heard “an American accent.”

The kicker is that immediately after they heard the recording (and before answering the survey), Rubin tested the students on their comprehension of the content of the excerpt, by giving them a transcript with every seventh word replaced by a blank. The students who saw a picture of an Asian woman not only thought they heard a “foreign accent,” but they did worse on the comprehension task! Rubin concluded that “listening comprehension seemed to be undermined simply by identifying (visually) the instructor as Asian.”

Rubin’s subjects may not have felt any particular hostility towards people from East Asia, but they had a preconceived notion that the instructor would have an accent, and they assumed that they would have difficulty understanding her, so they didn’t bother trying.

This study (and a previous one by Rubin with Kim Smith) connect back to what I was saying about familiarity, and I will discuss that and power imbalances in a future post, but this finding is striking enough to merit its own post.

Ten reasons why sign-to-speech is not going to be practical any time soon.

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that … supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype. But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

  1. Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
  2. We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
  3. It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
  4. Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

    -Say “yes” or “no” after the tone.
    -No.
    -I think you said, “Go!” Is that correct?
    -No.
    -My mistake. Please try again.
    -No.
    -I think you said, “I love cheese.” Is that correct?
    -Operator!

  5. There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
  6. Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
  7. Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
  8. Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
  9. Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
  10. What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

Including linguistics at literary conferences

I just got back from attending my second meeting of the Northeast Modern Language Association. My experience at both conferences has been very positive: friendly people, interesting talks, good connections. But I would like to see a little more linguistics at NeMLA, and better opportunities for linguists to attend. I’ve talked with some of the officers of the organization about this, and they have told me they welcome more papers from linguists.

One major challenge is that the session calls tend to be very specific and/or literary. Here are some examples from this year’s conference:

  • The Language of American Warfare after World War II
  • Representing Motherhood in Contemporary Italy
  • ‘Deviance’ in 19th-century French Women´s Writing

There is nothing wrong with any of these topics, but when they are all that specific, linguistic work can easily fall through the cracks. For several years I scanned the calls and simply failed to find anything where my work would fit. The two papers that I have presented are both pedagogical (in 2014 on using music to teach French, and this year on using accent tag videos to teach language variation and language attitudes). I believe that papers about the structure of language can find an audience at NeMLA, when there are sessions where they can fit.

In contrast, the continental MLA tends to have several calls with broader scope: an open call for 18th-Century French, for example, as well as ones specifically related to linguistics. When I presented at the MLA in 2012 it was at a session titled “Change and Perception of Change in the Romance Languages,” organized by Chris Palmer (a linguist and all-around nice guy).

With all that in mind, if you are considering attending next year’s NeMLA in Baltimore, I would like to ask the following:

  • Would you consider submitting a session proposal by the April 29th deadline?
  • Would you like to co-chair a session with me? (please respond by private email)
  • What topics would you find most inviting for linguistics papers at a (mostly) literature conference?

I recognize that I have readers outside of the region. For those of you who do not live in northeastern North America, have you had similar experiences with literary conferences? Do you have suggestions for session topics – or session topics to avoid?