The supposed successes of AI

The author and cat, with pirate hats and eye patches superimposed by a Snapchat filter
I’m a regular watcher of Last Week Tonight with John Oliver, so in February I was looking forward to his take on “AI” and the large language models and image generators that many people have been getting excited about lately. I was not disappointed: Oliver heaped a lot of much-deserved criticism on these technologies, particularly for the ways they replicate prejudice and are overhyped by their developers.

What struck me was the way that Oliver contrasted large language models with more established applications of machine learning, portraying those as uncontroversial and even unproblematic. He’s not unusual in this: I know a lot of people who accept these technologies as a fact of life, and many who use them and like them.

But I was struck by how many of these technologies I myself find problematic and avoid, or even refuse to use. And I’m not some know-nothing: I’ve worked on projects in information retrieval and information extraction. I developed one of the first sign language synthesis systems, and one of the first prototype English-to-American Sign Language machine translation systems.

When I buy a new smartphone or desktop computer, one of the first things I do is to turn off all the spellcheck, autocorrect and autocomplete functions. I don’t enable the face or handprint locks. When I open an entertainment app like YouTube, Spotify or Netflix I immediately navigate away from the recommended content, going to my own playlist or the channels I follow. I do the same for shopping sites like Amazon or Zappos, and for social media like Twitter. I avoid sites like TikTok where the barrage of recommended content begins before you can stop it.

It’s not that I don’t appreciate automated pattern recognition. Quite the contrary. I’ve been using it for years – one of my first jobs in college was cleaning up a copy of the Massachusetts Criminal Code that had been scanned in and run through optical character recognition. For my dissertation I compiled a corpus from scanned documents, and over the past ten years I’ve developed another corpus using similar methods.

I feel similarly about synonym expansion – modifying a search engine to return results including “bicycle” when someone searches for “bike,” for example. I worked for a year for a company whose main product was synonym expansion, and I was really glad a few years later when Google rolled it out to the public.

There are a couple of other things that I find useful, like suggested search terms, image matching for attribution and Shazam for saving songs I hear in cafés. Snapchat filters can be fun. Machine translation is often cheaper than a dictionary lookup.

Using these technologies as fun toys or creative inspiration is fine. Using them as unreliable tools that need to be thoroughly checked and corrected is perfectly appropriate. The problem begins when people don’t check the output of their tools, releasing them as completed work. This is where we get the problems documented by sites like Damn You Auto Correct: often humorous, but occasionally harmful.

My appreciation for automated pattern recognition is one of the reasons I’m so disturbed when I see people taking it for granted. I think it’s the years of immersion in all the things that automated recognizers got wrong, garbled or even left out completely that makes me concerned when people ignore the possibility of any such errors. I feel like an experienced carpenter watching someone nailing together a skyscraper out of random pieces of wood, with no building inspectors in sight.

When managers make the use of pattern recognition or generation tools mandatory, it goes from being potentially harmful to actively destructive. Search boxes that won’t let users turn off synonym expansion, returning wildly inaccurate results to avoid saying “nothing found,” make a mockery of the feature. I am writing this post on Google Docs, which is fine on a desktop computer, but the Android app does not let me turn off spell check. To correct a word without choosing one of the suggested corrections requires an extra tap every time.

Now let’s take the example of speech recognition. I have never found an application of speech recognition technology that personally satisfied me. I suppose if something happened to my hands that made it impossible for me to type I would appreciate it, but even then it would require constant attention to correct its output.

A few years ago I was trying to report a defective street condition to the New York City 311 hotline. The system would not let me talk to a live person until I’d exhausted its speech recognition system, but I was in a noisy subway car. Not only could the recognizer not understand anything I said, but the system was forcing me to disturb my fellow commuters by shouting selections into my phone.

I’ve attended conferences on Zoom with “live captioning” enabled, and at every talk someone commented on major inaccuracies in the captions. For people who can hear the speech it can be kind of amusing, but if I had to depend on those captions to understand the talks I’d be missing so much of the content.

I know some deaf people who regularly insist on automated captions as an equity issue. They are aware that the captions are inaccurate, and see them as better than nothing. I support that position, but in cases where the availability of accurate information is itself an equity issue, like political debates for example, I do not feel that fully automated captions are adequate. Human-written captions or human sign language interpreters are the only acceptable forms.

Humans are, of course, far from perfect, but for anything other than play, where accuracy is required, we cannot depend on fully automated pattern recognition. There should always be a human checking the final output, and there should always be the option to do without it. It should never be mandatory. The pattern recognition apps that are already all around us show us that clearly.

Screenshot of LanguageLab displaying the exercise "J'étais certain que j'aillais écrire à quinze ans"

Imagining an alternate language service

It’s well known that some languages have multiple national standards, to the point where you can take courses in either Brazilian or European Portuguese, for example. Most language instruction services seem to choose one variety per language: when I studied Portuguese at the University of Paris X-Nanterre it was the European variety, but the online service Duolingo only offers the Brazilian one.

I looked into some of Duolingo’s offerings for this post, because they’re the most talked about language instruction service these days. I was surprised to discover that they use no recordings of human speakers; all their speech samples are synthesized using an Amazon speech synthesis service named Polly. Interestingly, even though Duolingo only offers one variety of each language, Amazon Polly offers multiple varieties of English, Spanish, Portuguese and French.

As an aside, when I first tried Duolingo years ago I had the thought, “Wait, is this synthesized?” but it just seemed too outrageous to think that someone would make a business out of teaching humans to talk like statistical models of corpus speech. It turns out it wasn’t too outrageous, and I’m still thinking through the implications of that.

Synthesized or not, it makes sense for a company with finite resources to focus on one variety. But if that one company controls a commanding market share, or if there’s a significant amount of collusion or groupthink among language instruction services, they can wind up shutting out whole swathes of the world, even while claiming to be inclusive.

This is one of the reasons I created an open LanguageLab platform: to make it easier for people to build their own exercises and lessons, focusing on any variety they choose. You can set up your own LanguageLab server with exercises exclusively based on recordings of the English spoken on Smith Island, Maryland (population 149), if you like.

So what about excluded varieties with a few more speakers? I made a table of all the Duolingo language offerings according to their number of English learners, along with the Amazon Polly dialect that is used on Duolingo. If the variety is only vaguely specified, I made a guess.

For each of these languages I picked another variety, one with a large number of speakers. I tried to find the variety with the largest number of speakers, but these counts are always very imprecise. The result is an imagined alternate language service, one that does not automatically privilege the speakers of the most influential variety. Here are the top ten:

Language Duolingo dialect Alternate dialect
English Midwestern US India
Spanish Mexico Argentina
French Paris Quebec
Japanese Tokyo Kagoshima
German Berlin Bavarian
Korean Seoul Pyongyang
Italian Florence Rome
Mandarin Chinese Beijing Taipei
Hindi Delhi Chhatisgarhi
Russian Moscow Almaty

To show what could be done with a little volunteer work, I created a sample lesson for a language that I know, the third-most popular language on Duolingo, French. After France, the country with the next largest number of French speakers is Canada. Canadian French is distinct in pronunciation, vocabulary and to some degree grammar.

Canadian French is stigmatized outside Canada, to the point where I’m not aware of any program in the US that teaches it, but it is omnipresent in all forms of media in Canada, and there is quite a bit of local pride. These days at least, it would be as odd for a Canadian to speak French like a Parisian as for an American to speak English like a Londoner. There are upper and lower class accents, but they all share certain features, notably the ranges of the nasal vowels.

I chose a bestselling author and television anchor, Michel Jean, who has one grandmother from the indigenous Innu people and three presumably descended from white French settlers. I took a small excerpt from an interview with Jean about his latest novel where he responds spontaneously to the questions of a librarian, Josianne Binette.

The sample lesson in Canadian French based on Michel Jean’s speech is available on the LanguageLab demo site. You are welcome to try it! Just log in with the username demo and the password LanguageLab.

What is “text” for a sign language?

I started writing this post back in August, and I hurried it a little because of a Limping Chicken article guest written by researchers at the Deafness, Cognition and Language Research Centre at University College London. I’ve known the DCAL folks for years, and they graciously acknowledged some of my previous writings on this issue. I know they don’t think the textual form of British Sign Language is written English, so I was surprised that they used the term “sign-to-text” in the title of their article and in a tweet announcing the article. After I brought it up, Dr. Kearsy Cormier acknowledged that there was potential for confusion in that term.

So, what does “sign-to-text” mean, and why do I find it problematic in this context? “Sign-to-text” is an analogy with “speech-to-text,” also known as speech recognition, the technology that enables dictation software like DragonSpeak. Speech recognition is also used by agents like Siri to interpret words we say so that they can act on them.

There are other computer technologies that rely on the concept of text. Speech synthesis is also known as text-to-speech. It’s the technology that enables a computer to read a text aloud. It can also be used by agents like Siri and Alexa to produce sounds we understand as words. Machine translation is another one: it typically proceeds from text in one language to text in another language. When the DCAL researchers wrote “sign-to-text” they meant a sign recognition system hooked up to a BSL-to-English machine translation system.

Years ago I became interested in the possibility of applying these technologies to sign languages, and created a prototype sign synthesis system, SignSynth, and an experimental English-to-American Sign Language system.

I realized that all these technologies make heavy use of text. If we want automated audiobooks or virtual assistants or machine translation with sign languages, we need some kind of text, or we need to figure out a new way of accomplishing these things without text. So what does text mean for a sign language?

One big thing I discovered when working on SignSynth is that (unlike the DCAL researchers) many people really think that the written form of ASL (or BSL) is written English. On one level that makes a certain sense, because when we train ASL signers for literacy we typically teach them to read and write English. On another level, it’s completely nuts if you know anything about sign languages. The syntax of ASL is completely different from that of English, and in some ways resembles Mandarin Chinese or Swahili more than English.

It’s bad enough that we have speakers of languages like Moroccan Arabic and Fujianese that have to write in a related language (written Arabic and written Chinese, respectively) that is different in non-trivial ways that take years of schooling to master. ASL and English are so totally different that it’s like writing Korean or Japanese with Chinese characters. People actually did this for centuries until someone smart invented hangul and katakana, which enabled huge jumps in literacy.

There are real costs to this, serious costs. I spent some time volunteering with Deaf and hard-of-hearing fifth graders in an elementary school, and after years of drills they were able to put English words on paper and pronounce them when they saw them. But it became clear to me that despite their obvious intelligence and curiosity, they had no idea that they could use words on paper to send a message, or that some of the words they saw might have a message for them.

There are a number of Deaf people who are able to master English early on. But from extensive reading and discussions with Deaf people, it is clear to me that the experience of these kids is typical of that for the vast majority of Deaf people.

It is a tremendous injustice to a child, and a tremendous waste of that child’s time and attention, for them to get to the age of twelve, at normal intelligence, without being able to use writing. This is the result of portraying English as the written form of ASL or BSL.

So what is the written form of ASL? Simply put, it doesn’t have one, despite several writing systems that have been invented, and it won’t have one until Deaf people adopt one. There will be no sign-to-text until signers have text, in their language.

I can say more about that, but I’ll leave it for another post.

On this day in Parisian theater

Since I first encountered The Parisian Stage, I’ve been impressed by the completeness of Beaumont Wicks’s life’s work: from 1950 through 1979 he compiled a list of every play performed in the theaters of Paris between 1800 and 1899. I’ve used it as the basis for my Digital Parisian Stage corpus, currently a one percent sample of the first volume (Wicks 1950), available in full text on GitHub.

Last week I had an idea for another project. Science requires both qualitative and quantitative research, and I’ve admired Neil Freeman’s @everylotnyc Twitter bot as a project that conveys the diversity of the underlying data and invites deep, qualitative exploration.

In 2016, with Timm Dapper, Elber Carneiro and Laura Silver I forked Freeman’s everylotbot code to create @everytreenyc, a random walk through the New York City Parks Department’s 2015 street tree census. Every three hours during normal New York active time, the bot tweets information about a tree from the database, in a template written by Laura that may also include topical, whimsical sayings.

Recently I’ve encountered a lot of anniversaries. A lot of it is connected to the centenary of the First World War I, but some is more random: I just listened to an episode of la Fabrique de l’histoire about François Mitterrand’s letters to his mistress that was promoted with the fact that he was born in 1916, one hundred years before that episode aired, even though he did not start writing those letters until 1962.

There are lots of “On this day” blogs and Twitter feeds, such as the History Channel and the New York Times, and even specialized feeds like @ThisDayInMETAL. There are #OnThisDay and #otd hashtags, and in French #CeJourLà. The “On this day” feeds have two things in common: they tend to be hand-curated, and they jump around from year to year. For April 13, 2014, the @CeJourLa feed tweeted events from 1849, 1997, 1695 and 1941, in that order.

Two weeks ago I was at the Annual Convention of the Modern Language Association, describing my Digital Parisian Stage corpus, and I realized that in the Parisian Stage there were plays being produced exactly two hundred years ago. I thought of the #OnThisDay feeds and @everytreenyc, and realized that I could create a Twitter bot to pull information about plays from the database and tweet them out. A week later, @spectacles_xix sent out its first automated tweet, about the play la Réconciliation par ruse.

@spectacles_xix runs on Pythonanywhere in Python 3.6, and accesses a MySQL database. It uses Mike Verdone’s Twitter API client. The source is open on GitHub.

Unlike other feeds, including this one from the French Ministry of Culture that just tweeted about the anniversary of the première of Rostand’s Cyrano de Bergerac, this one will not be curated, and it will not jump around from year to year. It will tweet every play that premièred in 1818, in order, until the end of the year, and then go on to 1819. If there is a day when no plays premièred, like January 16, @spectacles_xix will not tweet.
I have a couple of ideas about more features to add, so stay tuned!

How Google’s Pixel Buds will change the world!

Scene: a quietly bustling bistro in Paris’s 14th Arrondissement.

SERVER: Oui, vous désirez?
PIXELBUDS: Yes, you desire?
TOURIST: Um, yeah, I’ll have the steak frites.
PIXELBUDS: UM, OUAIS, JE VAIS AVOIR LES FRITES DE STEAK
SERVER: Que les frites?
PIXELBUDS: Than fries?
TOURIST: No, at the same time.
PIXELBUDS: NON, EN MEME TEMPS
SERVER: Alors, vous voulez le steak aussi?
PIXELBUDS: DESOLE, JE N’AI PAS COMPRIS.
SERVER: VOUS VOULEZ LE STEAK AUSSI?
PIXELBUDS: You want the steak too?
TOURIST: Yeah, I just ordered the steak.
PIXELBUDS: OUAIS, JE VIENS DE COMMANDER LE STEAK
SERVER: Okay, du steak, et des frites, en même temps.
PIXELBUDS: Okay, steak, and fries at the same time.
TOURIST: You got it.
PIXELBUDS: TU L’AS EU.

(All translations by Google Translate. Photo: Alain Bachelier / Flickr.)

And we mean really every tree!

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day’s work, but it was still a hasty patch job on top of Neil Freeman’s original Everylotbot code. I hadn’t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I’ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere’s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I’ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the dérive!

@everytreenyc

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

Ten reasons why sign-to-speech is not going to be practical any time soon.

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that … supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype. But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

  1. Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
  2. We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
  3. It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
  4. Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

    -Say “yes” or “no” after the tone.
    -No.
    -I think you said, “Go!” Is that correct?
    -No.
    -My mistake. Please try again.
    -No.
    -I think you said, “I love cheese.” Is that correct?
    -Operator!

  5. There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
  6. Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
  7. Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
  8. Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
  9. Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
  10. What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

One way of generating spam

This showed up today in the comments that Akismet flagged for spam:

{Photo|Picture|Photograph|Image|Photography|Snapshot|Shot|Pic|Photographic|Graphic|Pics} {credit|credit score|credit rating|credit history|credit ratings|consumer credit|credit ranking|credit standing|consumer credit rating|credit scores|credit worthiness}: {AP|Elp} | {FILE|Document|Record|Report|Data file|Submit|Computer file|Data|Register|Archive|Database} #file_links\keywords1.txt,1,S] {-|–|:|*|( space )|( blank )|,|To|. . .|And|As} {In this|Within this|On this|With this|In this particular|During this|In such a|Through this|From this|In that|This particular} {O|To|A|E|I|U|} #file #file_links\keywords2.txt,1,S] _links\keywords3.txt,1,S] ct. {7|Seven|Several|6|8|Six|5|Five|9|Eight|10}, {2012|Next year}, {file|document|record|report|data file|submit|computer file|data|register|archive|database} {photo|picture|photograph|image|photography|snapshot|shot|pic|photographic|graphic|pics}, {Chicago|Chi town|Chicago, il|Detroit|Dallas|Chicago, illinois|Philadelphia|Los angeles|Denver|Chicagoland|Miami} {Bears|Has|Contains|Holds|Carries|Provides|Offers|Includes|Teddy bears|Requires|Features} {middle|center|midsection|midst|heart|centre|core|mid|central|middle section|middle of the} linebacker {Brian|John|Mark} Urlacher {watches|wrist watches|timepieces|designer watches|wristwatches|different watches|pieces|running watches|looks after|monitors|devices} {from the|in the|from your|through the|on the|with the|within the|belonging to the|out of the|out of your|of your} {sideline|part time} {during the|throughout the|through the|in the|over the|while in the|within the|all through the|through|usually in the|within} {second half|other half|better half|lover|wife or husband|partner|loved one} {of an|of the|of your|associated with an|connected with an|of|of any|of each|associated with the|of some|associated with} {NFL|National football league|American footbal|Football|Nhl|Nba} {football|soccer|sports|basketball|baseball|hockey|footballing|rugby|nfl|golf|nfl football} {game|sport|video game|online game|recreation|activity|match|adventure|gameplay|performance|gaming} {against the|from the|up against the|contrary to the|resistant to the|about the|with the|on the|versus the|with|around the} {Jacksonville Jaguars|Gambling} {in|within|inside|throughout|with|around|during|on|when it comes to|for|found in} {Jacksonville|The city of jacksonville|The town of jacksonville}, Fla. {The|The actual|The particular|Your|This|A|Any|Typically the|All the|That|All of the} {Bears|Has|Contains|Holds|Carries|Provides|Offers|Includes|Teddy bears|Requires|Features} {announced|introduced|declared|released|reported|proclaimed|publicised|publicized|launched|revealed|stated} {on|upon|about|in|with|for|regarding|concerning|at|relating to|on the subject of} {Wednesday|Thursday|Friday|Wed|Saturday|Sunday|Mondy|Monday|The following friday|The following thursday|Tuesday}, {March|03|Goal|Drive|Walk|April|Mar|Strut|Next month|May|Celebration} {20|Twenty|Something like 20|30|Thirty|10|21|19|More than 20|20 or so|22}, {20|Twenty|Something like 20|30|Thirty|10|21|19|More than 20|20 or so|22} #file_links\keywords4.txt,1,S] {13|Thirteen|Tough luck|12|14|15|10}, {that they were|that they are|them to be|they were} {unable to|not able to|struggling to|can not|struggle to|cannot|incapable of|helpless to|struggles to|could not|canrrrt} {reach|achieve|attain|get to|accomplish|arrive at|access|obtain|get through to|contact|grasp} {a contract|an agreement|a legal contract|a binding agreement|binding agreement|legal contract|a partnership|an understanding|a|a deal} {agreement|contract|arrangement|deal|understanding|settlement|commitment|binding agreement|legal contract|transaction|decision} {with|along with|together with|using|having|by using|utilizing|through|with the help of|by means of|by way of} Urlacher, {who is|who’s|that is|that’s|who’s going to be|who will be|who may be|who might be|who seems to be|who is responsible for|the person} {an|a good|a great|the|a|a strong|some sort of|a powerful|a particular|any|an excellent} unre #file_links\keywords5.txt,1,S] stricted {free|totally free|free of charge|no cost|cost-free|absolutely free|zero cost|100 % free|complimentary|free of cost|no charge} {agent|broker|realtor|adviser|representative|real estate agent|professional|advisor|solution|dealer|factor} {for the first time|the very first time|the first time|initially|in my ballet shoes|somebody in charge of|at last|now|responsible for|as a beginner|there’s finally someone} {in his|in the|as part of his|in their|within his|in her|within the|in|on his|during his|with his} {career|profession|job|occupation|vocation|employment|work|professional|livelihood|position|line of work}. ({AP|Elp} Photo/Phelan {M|Michael|Meters|Mirielle|L|T|D|N|E|S|R}. Ebenhack, {File|Document|Record|Report|Data file|Submit|Computer file|Data|Register|Archive|Database})