{"id":267,"date":"2016-04-12T12:12:33","date_gmt":"2016-04-12T16:12:33","guid":{"rendered":"https:\/\/www.grieve-smith.com\/blog1\/?p=267"},"modified":"2018-06-13T18:13:39","modified_gmt":"2018-06-13T22:13:39","slug":"ten-reasons-why-sign-to-speech-is-not-going-to-be-practical-any-time-soon","status":"publish","type":"post","link":"https:\/\/grieve-smith.com\/blog\/2016\/04\/ten-reasons-why-sign-to-speech-is-not-going-to-be-practical-any-time-soon\/","title":{"rendered":"Ten reasons why sign-to-speech is not going to be practical any time soon."},"content":{"rendered":"<p>It&#8217;s that time again!  A bunch of <em>really eager<\/em> computer scientists have a prototype that will translate sign language to speech!  They&#8217;ve got a <em>really cool video<\/em> that you <em>just gotta see<\/em>!  They win an award! (from a panel that includes no signers or linguists).  Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).<\/p>\n<p>&#8230;and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.<\/p>\n<p>The latest strain of viral computational sign linguistics hype <a href=\"http:\/\/www.washington.edu\/news\/2016\/04\/12\/uw-undergraduate-team-wins-10000-lemelson-mit-student-prize-for-gloves-that-translate-sign-language\/\" target=\"_blank\">comes from the University of Washington<\/a>, where two hearing undergrads have put together a system that ? supposedly recognizes isolated hand gestures in citation form.  But you can see the potential!  *facepalm*.<\/p>\n<p>Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of <a href=\"http:\/\/link.springer.com\/chapter\/10.1007%2F3-540-47873-6_14\" target=\"_blank\">a paper<\/a> on my sign language synthesis prototype.  But since most people don&#8217;t have a subscription to the journal it appeared in, I&#8217;ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.<\/p>\n<ol>\n<li><strong>Sign languages are languages<\/strong>.  They&#8217;re different from spoken languages.  Yes, that means that if you think of a place where there&#8217;s a sign language and a spoken language, they&#8217;re going to be different.  More different than English and Chinese.<\/li>\n<li><strong>We can&#8217;t do this for spoken languages<\/strong>.  You know that app where you can speak English into it and out comes fluent Pashto?  No?  That&#8217;s because <em>it doesn&#8217;t exist<\/em>.  The Army has wanted an app like that for decades, and they&#8217;ve been funding it up the wazoo, and it&#8217;s still not here.  Sign languages are at least ten times harder.<\/li>\n<li><strong>It&#8217;s complicated<\/strong>.  Computers aren&#8217;t great with natural language at all, but they&#8217;re better with written language than spoken language.  For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.<\/li>\n<li><strong>Speech to text is hard<\/strong>.  When you call a company and get a message saying &#8220;press or say the number after the tone,&#8221; do you press or say?  I bet you don&#8217;t even call if you can get to their website, because speech to text suuucks:<br \/>\n<blockquote><p>-Say &#8220;yes&#8221; or &#8220;no&#8221; after the tone.<br \/>\n-No.<br \/>\n-I think you said, &#8220;Go!&#8221; Is that correct?<br \/>\n-No.<br \/>\n-My mistake.  Please try again.<br \/>\n-No.<br \/>\n-I think you said, &#8220;I love cheese.&#8221;  Is that correct?<br \/>\n-Operator!<\/p><\/blockquote>\n<\/li>\n<li><strong>There is no text<\/strong>.  A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you&#8217;ll realize that that can&#8217;t possibly be true.  Well, why don&#8217;t people write sign languages?  I believe it can be done, and <a href=\"http:\/\/aslfont.github.io\/Symbol-Font-For-ASL\/ways-to-write.html\" target=\"_blank\">lots of people have tried<\/a>, but for some reason it never seems to catch on.  It might just be the classifier predicates.<\/li>\n<li><strong>Sign recognition is hard<\/strong>.  There&#8217;s a lot that linguists don&#8217;t know about sign languages already.  Computers can&#8217;t even get reliable signs from people wearing gloves, never mind video feeds.  This may be better than gloves, but it doesn&#8217;t do anything with facial or body gestures.<\/li>\n<li><strong>Machine translation is hard<\/strong> going from one written (i.e. written version of a spoken) language to another.  Different words, different meanings, different word order.  You can&#8217;t just look up words in a dictionary and string them together.  Google Translate is only moderately decent because it&#8217;s throwing massive statistical computing power at the input &#8211; and that only works for languages with a huge corpus of text available.<\/li>\n<li><strong>Sign to spoken translation is really hard<\/strong>.  Remember how in #5 I mentioned that there is no text for sign languages?  No text, no huge corpus, no machine translation.  I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off.  Matt Huenerfauth has been trying (<a href=\"http:\/\/eniac.cs.qc.cuny.edu\/matt\/pubs\/huenerfauth-2006-dissertation.pdf\" target=\"_blank\">PDF<\/a>), but he knows how big a job it is.<\/li>\n<li><strong>Sign synthesis is hard<\/strong>.  Okay, that&#8217;s probably the easiest problem of them all.  I built <a href=\"https:\/\/www.panix.com\/~grvsmth\/signsynth\/\" target=\"_blank\">a prototype sign synthesis system<\/a> in 1997, I&#8217;ve improved it, and other people have built even better ones since.<\/li>\n<li><strong>What is this for, anyway?<\/strong>  Oh yeah, why are we doing this?  So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it?  Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures?  I&#8217;m sure that&#8217;s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.<\/li>\n<\/ol>\n<p>So I&#8217;m asking all you computer scientists out there who don&#8217;t know anything about sign languages, especially anyone who might be in a position to <em>fund<\/em> something like this or give out one of these gee-whiz awards: Just stop.  Take a minute.  Step back from the tech-bling.  Unplug your messiah complex.  Realize that you might not be the best person to decide whether or not this is a good idea.  Ask a linguist.  And please, <strong>ask a Deaf person<\/strong>!<\/p>\n<p><em>Note: I originally wrote this post in November 2013, in response to <a href=\"http:\/\/blogs.msdn.com\/b\/msr_er\/archive\/2013\/10\/29\/kinect-sign-language-translator.aspx\" target=\"_blank\">an article about a prototype using Microsoft Kinect<\/a>.  I never posted it.  Now I&#8217;ve seen at least three more, and I feel like I have to post this.  I didn&#8217;t have to change much.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They&#8217;ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and &hellip; <a href=\"https:\/\/grieve-smith.com\/blog\/2016\/04\/ten-reasons-why-sign-to-speech-is-not-going-to-be-practical-any-time-soon\/\" class=\"excerpt-link\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":725,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[12,11,18,43,38,13],"tags":[],"class_list":["post-267","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-interpreting","category-language-politics","category-natural-language-generation","category-sign-languages","category-software","category-translation"],"_links":{"self":[{"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/posts\/267","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/comments?post=267"}],"version-history":[{"count":17,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/posts\/267\/revisions"}],"predecessor-version":[{"id":1043,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/posts\/267\/revisions\/1043"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/media\/725"}],"wp:attachment":[{"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/media?parent=267"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/categories?post=267"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/grieve-smith.com\/blog\/wp-json\/wp\/v2\/tags?post=267"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}