The supposed successes of AI

The author and cat, with pirate hats and eye patches superimposed by a Snapchat filter
I’m a regular watcher of Last Week Tonight with John Oliver, so in February I was looking forward to his take on “AI” and the large language models and image generators that many people have been getting excited about lately. I was not disappointed: Oliver heaped a lot of much-deserved criticism on these technologies, particularly for the ways they replicate prejudice and are overhyped by their developers.

What struck me was the way that Oliver contrasted large language models with more established applications of machine learning, portraying those as uncontroversial and even unproblematic. He’s not unusual in this: I know a lot of people who accept these technologies as a fact of life, and many who use them and like them.

But I was struck by how many of these technologies I myself find problematic and avoid, or even refuse to use. And I’m not some know-nothing: I’ve worked on projects in information retrieval and information extraction. I developed one of the first sign language synthesis systems, and one of the first prototype English-to-American Sign Language machine translation systems.

When I buy a new smartphone or desktop computer, one of the first things I do is to turn off all the spellcheck, autocorrect and autocomplete functions. I don’t enable the face or handprint locks. When I open an entertainment app like YouTube, Spotify or Netflix I immediately navigate away from the recommended content, going to my own playlist or the channels I follow. I do the same for shopping sites like Amazon or Zappos, and for social media like Twitter. I avoid sites like TikTok where the barrage of recommended content begins before you can stop it.

It’s not that I don’t appreciate automated pattern recognition. Quite the contrary. I’ve been using it for years – one of my first jobs in college was cleaning up a copy of the Massachusetts Criminal Code that had been scanned in and run through optical character recognition. For my dissertation I compiled a corpus from scanned documents, and over the past ten years I’ve developed another corpus using similar methods.

I feel similarly about synonym expansion – modifying a search engine to return results including “bicycle” when someone searches for “bike,” for example. I worked for a year for a company whose main product was synonym expansion, and I was really glad a few years later when Google rolled it out to the public.

There are a couple of other things that I find useful, like suggested search terms, image matching for attribution and Shazam for saving songs I hear in caf?s. Snapchat filters can be fun. Machine translation is often cheaper than a dictionary lookup.

Using these technologies as fun toys or creative inspiration is fine. Using them as unreliable tools that need to be thoroughly checked and corrected is perfectly appropriate. The problem begins when people don’t check the output of their tools, releasing them as completed work. This is where we get the problems documented by sites like Damn You Auto Correct: often humorous, but occasionally harmful.

My appreciation for automated pattern recognition is one of the reasons I’m so disturbed when I see people taking it for granted. I think it’s the years of immersion in all the things that automated recognizers got wrong, garbled or even left out completely that makes me concerned when people ignore the possibility of any such errors. I feel like an experienced carpenter watching someone nailing together a skyscraper out of random pieces of wood, with no building inspectors in sight.

When managers make the use of pattern recognition or generation tools mandatory, it goes from being potentially harmful to actively destructive. Search boxes that won’t let users turn off synonym expansion, returning wildly inaccurate results to avoid saying “nothing found,” make a mockery of the feature. I am writing this post on Google Docs, which is fine on a desktop computer, but the Android app does not let me turn off spell check. To correct a word without choosing one of the suggested corrections requires an extra tap every time.

Now let’s take the example of speech recognition. I have never found an application of speech recognition technology that personally satisfied me. I suppose if something happened to my hands that made it impossible for me to type I would appreciate it, but even then it would require constant attention to correct its output.

A few years ago I was trying to report a defective street condition to the New York City 311 hotline. The system would not let me talk to a live person until I’d exhausted its speech recognition system, but I was in a noisy subway car. Not only could the recognizer not understand anything I said, but the system was forcing me to disturb my fellow commuters by shouting selections into my phone.

I’ve attended conferences on Zoom with “live captioning” enabled, and at every talk someone commented on major inaccuracies in the captions. For people who can hear the speech it can be kind of amusing, but if I had to depend on those captions to understand the talks I’d be missing so much of the content.

I know some deaf people who regularly insist on automated captions as an equity issue. They are aware that the captions are inaccurate, and see them as better than nothing. I support that position, but in cases where the availability of accurate information is itself an equity issue, like political debates for example, I do not feel that fully automated captions are adequate. Human-written captions or human sign language interpreters are the only acceptable forms.

Humans are, of course, far from perfect, but for anything other than play, where accuracy is required, we cannot depend on fully automated pattern recognition. There should always be a human checking the final output, and there should always be the option to do without it. It should never be mandatory. The pattern recognition apps that are already all around us show us that clearly.

A free, open source language lab app

Viewers of the Crown may have noticed a brief scene where Prince Charles practices Welsh by sitting in a glass cubicle wearing a headset.? Some viewers may recognize that as a language lab. Some may have even used language labs themselves.

The core of the language lab technique is language drills, which are based on the bedrock of all skills training: mimicry, feedback and repetition.? An instructor can identify areas for the learner to focus on.

Because it’s hard for us to hear our own speech, the instructor also can observe things in the learner’s voice that the learner may not perceive.? Recording technology enabled the learner to take on some of the role of observer more directly.

When I used a language lab to learn Portuguese in college, it ran on cassette tapes.? The lab station played the model (I can still remember “Elena, estudante francesa, vai passar as ferias em Portugal?“), then it recorded my attempted mimicry onto a blank cassette.? Once I was done recording it played back the model, followed by my own recording.

Hearing my voice repeated back to me after the model helped me judge for myself how well I had mimicked the model.? It wasn’t enough by itself, so the lab instructor had a master station where he could listen in on any of us and provide additional feedback.? We also had classroom lessons with an instructor, and weekly lectures on culture and grammar.

There are several companies that have brought language lab technology into the digital age, on CD-ROM and then over the internet.? Many online language learning providers rely on proprietary software and closed platforms to generate revenue, which is fine for them but doesn’t allow teachers the flexibility to add new language varieties.

People have petitioned these language learning companies to offer new languages, but developing offerings for a new language is expensive.? If a language has a small user base it may never generate enough revenue to offset the cost of developing the lessons.? It would effectively be a donation to people who want to promote these languages, and these companies are for profit entities.

Duolingo has offered a work-around to this closed system: they will accept materials developed by volunteers according to their specifications and freely donated.? Anyone who remembers the Internet Movie Database before it was sold to Amazon can identify the problems with this arrangement: what happens to those submissions if Duolingo goes bankrupt, or simply decides not to support them anymore?

Closed systems raise another issue: who decides what it means to learn French, or Hindi?? This has been discussed in the context of Duolingo, which chose to teach the artificial Modern Standard Arabic rather than a colloquial dialect or the classical language of the Qur’an.? Similarly, activists for the Hawai’ian language wanted the company to focus on lessons to encourage Hawai’ians to speak the language, rather than tourists who might visit for a few weeks at most.

Years ago I realized that we could make a free, open-source language lab application.? It wouldn’t have to replicate all the features of the commercial apps, especially not initially.? An app would be valuable if it offers the basic language lab functionality: play a model, record the learner’s mimicry, play the model again and finally play the recording of the learner.

An open system would be able to use any recording that the device can play.? This would allow learners to choose the models they practice with, or allow an instructor to choose models for their students.? The lessons don’t have to be professionally produced.? They can be created for a single student, or even for a single occasion.? I am not a lawyer, but I believe they can even use copyrighted materials.

I have created a language lab app using the Django Rest Framework and ReactJS that provides basic language lab functionality.? It runs in a web browser using responsive layout, and I have successfully tested it in Chrome and Firefox, on Windows and Android.

This openness and flexibility drastically reduces the cost of producing a lesson.? The initial code can be installed in an hour, on any server that can host Django.? The monthly cost of hosting code and media can be under $25.? Once this is set up, a media item and several exercises based on it can be added in five minutes.

This reduced cost means that a language does not have to bring in enough learners to recoup a heavy investment.? That in turn means that teachers can create lessons for every dialect of Arabic, or in fact for every dialect of English.? They can create Hawai’ian lessons for both tourists and heritage speakers.? They could even create lessons for actors to learn dialects, or master impressions of celebrities.

As a transgender person I’ve long been interested in developing a feminine voice to match my feminine visual image.? Gender differences in language include voice quality, pitch contour, rhythm and word choice – areas that can only be changed through experience.? I have used the alpha and beta versions of my app to create exercises for practicing these differences.

Another area where it helps a learner to hear a recording of their own voice is singing.? This could be used by professional singers or amateurs.? It could even be used for instrument practice.? I use it to improve my karaoke!

This week I was proud to present my work at the QueensJS meetup.? My slides from that talk contain more technical details about how to record audio through the web browser.? I’ll be pushing my source to GitHub soon. You can read more details about how to set up and use LanguageLab.? In the meantime, if you’d like to contribute, or to help with beta testing, please get in touch!

Data science and data technology

The big buzz over the past few years has been Data Science. Corporations are opening Data Science departments and staffing them with PhDs, and universities have started Data Science programs to sell credentials for these jobs. As a linguist I?m particularly interested in this new field, because it includes research practices that I?ve been using for years, like corpus linguistics and natural language processing.

As a scientist I?m a bit skeptical of this field, because frankly I don?t see much science. Sure, the practitioners have labs and cool gadgets. But I rarely see anyone asking hard questions, doing careful observations, creating theories, formulating hypotheses, testing the hypotheses and examining the results.

The lack of careful observation and skeptical questioning is what really bothers me, because that?s what?s at the core of science. Don?t get me wrong: there are plenty of people in Data Science doing both. But these practices should permeate a field with this name, and they don?t.

If there?s so little science, why do we call it ?science?? A glance through some of the uses of the term in the Google Books archive suggests that it was first used in the late twentieth century it did include hypothesis testing. In the early 2000s people began to use it as a synonym for ?big data,? and I can understand why. ?Big data? was a well-known buzzword associated with Silicon Valley tech hype.

I totally get why people replaced ?big data? with ?data science.? I?ve spent years doing science (with observations, theories, hypothesis testing, etc.). Occasionally I?ve been paid for doing science or teaching it, but only part time. Even after getting a PhD I had to conclude that science jobs that pay a living wage are scarce and in high demand, and I was probably not going to get one.

It was kind of exciting when I got a job with Scientist in the title. It helped to impress people at parties. At first it felt like a validation of all the time I spent learning how to do science. So I completely understand why people prefer to say they?re doing ?data science? instead of ?big data.?

The problem with being called a Scientist in that job was that I wasn?t working on experiments. I was just helping people optimize their tools. Those tools could possibly be used for science, but that was not why we were being paid to develop them. We have a word for a practice involving labs and gadgets, without requiring any observation or skepticism. That word is not science, it?s technology.

Technology is perfectly respectable; it?s what I do all day. For many years I?ve been well paid to maintain and expand the technology that sustains banks, lawyers, real estate agents, bakeries and universities. I?m currently building tools that help instructors at Columbia University with things like memorizing the names of their students and sending them emails. It?s okay to do technology. People love it.

If you really want to do science and you?re not one of the lucky ones, you can do what I do: I found a technology job that doesn?t demand all my time. Once in a while they need me to stay late or work on a weekend, but the vast majority of my time outside of 9-5 is mine. I spend a lot of that time taking care of my family and myself, and relaxing with friends. But I have time to do science.

How Google’s Pixel Buds will change the world!

Scene: a quietly bustling bistro in Paris’s 14th Arrondissement.

SERVER: Oui, vous d?sirez?
PIXELBUDS: Yes, you desire?
TOURIST: Um, yeah, I?ll have the steak frites.
PIXELBUDS: UM, OUAIS, JE VAIS AVOIR LES FRITES DE STEAK
SERVER: Que les frites?
PIXELBUDS: Than fries?
TOURIST: No, at the same time.
PIXELBUDS: NON, EN MEME TEMPS
SERVER: Alors, vous voulez le steak aussi?
PIXELBUDS: DESOLE, JE N’AI PAS COMPRIS.
SERVER: VOUS VOULEZ LE STEAK AUSSI?
PIXELBUDS: You want the steak too?
TOURIST: Yeah, I just ordered the steak.
PIXELBUDS: OUAIS, JE VIENS DE COMMANDER LE STEAK
SERVER: Okay, du steak, et des frites, en m?me temps.
PIXELBUDS: Okay, steak, and fries at the same time.
TOURIST: You got it.
PIXELBUDS: TU L?AS EU.

(All translations by Google Translate. Photo: Alain Bachelier / Flickr.)

@everytreenyc

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon?s empire, the forerunner of modern statistical methods. Also among Laplace?s cultural heirs are the fl?neurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the fl?neurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called d?rives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places ? these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the fl?neurs, the d?rive and Freeman?s work. Freeman acknowledged this: ?How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.?

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a d?rive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it?s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, ?the map is not the territory?:

Streetview and the property database are both a widely used because they?re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They?re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

Teaching phonetic transcription in the digital age

When I first taught phonetic transcription, almost seven years ago, I taught it almost the same way I had learned it twenty-five years ago. Today, the way I teach it is radically different. The story of the change is actually two stories intertwined. One is a story of how I’ve adopted my teaching to the radical changes in technology that occurred in the previous eighteen years. The other is a story of the more subtle evolution of my understanding of phonetics, phonology, phonological variation and the phonetic transcription that allows us to talk about them.

When I took Introduction to Linguistics in 1990 all the materials we had were pencil, paper, two textbooks and the ability of the professor to produce unusual sounds. In 2007 and even today, the textbooks have the same exercises: Read this phonetic transcription, figure out which English words were involved, and write the words in regular orthography. Read these words in English orthography and transcribe the way you pronounce them. Transcribe in broad and narrow transcription.

The first challenge was moving the homework online. I already assigned all the homework and posted all the grades online, and required my students to submit most of the assignments online; that had drastically reduced the amount of paper I had to collect and distribute in class and schlep back and forth. For this I had the advantage that tuition at Saint John’s pays for a laptop for every student. I knew that all of my students had the computing power to access the Blackboard site.

Thanks to the magic of Unicode and Richard Ishida’s IPA Picker, my students were able to submit their homework in the International Phonetic Alphabet without having to fuss with fonts and keyboard layouts. Now, with apps like the Multiling Keyboard, students can even write in the IPA on phones and tablets.

The next problem was that instead of transcribing, some students would look up the English spellings on dictionary sites, copy the standard pronunciation guides, and paste them into the submission box. Other students would give unusual transcriptions, but I couldn’t always tell whether these transcriptions reflected the students’ own pronunciations or just errors.

At first, as my professors had done, I made up for these homework shortcomings with lots of in-class exercises and drills, but they still all relied on the same principle: reading English words and transcribing them. Both in small groups and in full-class exercises, we were able to check the transcriptions and correct each other because everyone involved was listening to the same sounds. It wasn’t until I taught the course exclusively online that I realized there was another way to do it.

When I tell some people that I teach online courses, they imagine students from around the world tuning in to me lecturing at a video camera. This is not the way Saint John’s does online courses. I do create a few videos every semester, but the vast majority of the teaching I do is through social media, primarily the discussion forums on the Blackboard site connected with the course. I realized that I couldn’t teach phonetics without a way to verify that we were listening to the same sounds, and without that classroom contact I no longer had a way.

I also realized that with high-speed internet connections everywhere in the US, I had a new way to verify that we were listening to the same sounds: use a recording. When I took the graduate Introduction to Phonetics in 1993, we had to go to the lab and practice with the cassette tapes from William Smalley’s Manual of Articulatory Phonetics, but if I’m remembering right we didn’t actually do any transcription of the sounds; we just practiced listening to them and producing them. Some of us were better at that than others.

In 2015 we are floating in rivers of linguistic data. Human settlements have always been filled with the spontaneous creation of language, but we used to have to pore over their writings or rely on our untrustworthy memories. In the twentieth century we had records and tape, film and video, but so much of what was on that was scripted and rehearsed. If we could get recordings of the unscripted language it was hard to store, copy and distribute them.

Now people create language in forms that we can grab and hold: online news articles, streaming video, tweets, blog posts, YouTube videos, Facebook comments, podcasts, text messages, voice mails. A good proportion of these are even in nonstandard varieties of the language. We can read them and watch them and listen to them – and then we can reread and rewatch and relisten, we can cut and splice in seconds what would have taken hours – and then analyze them, and compare our analyses.

Instead of telling my students to read English spelling and transcribe in IPA, now I give them a link to a video. This way we’re working from the exact same sequence of sounds, a sequence that we can replay over and over again. I specifically choose pronunciations that don’t match what they find on the dictionary websites. This is precisely what the IPA is for.

Going the other way, I give my students IPA transcriptions and ask them to record themselves pronouncing the transcriptions and post it to Blackboard. Sure, my professor could have assigned us something like this in 1990, but then he would have had to take home a stack of cassettes and spend time rewinding them over and over. Now all my students have smartphones with built-in audio recording apps, and I could probably listen to all of their recordings on my own smartphone if I didn’t have my laptop handy.

So that’s the story about technology and phonetic transcription. Stay tuned for the other story, about the purpose of phonetic transcription.

How to Connect an Insignia NS-15AT10 to ADB on Windows

I bought a nice little tablet at BestBuy, and I wanted to use it to test an Android app I’m developing. In order to do that, I have to connect the tablet to my Windows laptop and run something called ADB. Unfortunately, in order for ADB to connect to it, Windows needs to recognize it as an ADB device, and BestBuy hasn’t done the work to support that.

I did find a post by someone named pcdebol that tells you how to get other Insignia tablets working with ADB, and was able to get mine working using the Google USB drivers with some modifications. I wanted to post this for the benefit of other people who want to test their apps on this model of tablet.

The first thing to do is to download the Google driver, unpack it and modify the android_winusb.inf file to add the following lines in the [Google.NTamd64] section.

;NS-15AT10
%SingleAdbInterface% = USB_Install, USB\VID_0414&PID_506B&MI_01
%CompositeAdbInterface% = USB_Install, USB\VID_0414&PID_506B&REV_FFFF&MI_01

I found the “VID” and “PID” codes by looking at the hardware IDs in the Windows Device Manager. They should be the same for all NS-15AT10 tablets, but different for any other model. The next step is to edit the file adb_usb.ini in the .android folder in your user profile (for me, in windows 7, that’s “c:\users\grvsmth\”). If there is no .android folder, you should make one, and if your .android folder has no adb_usb.ini file you should make one of those. Then you put in the file this code, on a line by itself.

0x0414

It took me a little while to figure out that it’s the VID number from the Device Manager, with an 0x prefix to tell Windows that it’s a hexidecimal number. Once I did that and saved the file, I was able to re-add the device in Device Manager, Windows recognized it, and I was able to connect ADB to it flawlessly and test my app. I hope you have similar success!