Quantitative needs qualitative, and vice versa

Data Science is all the rage these days. But this current craze focuses on a particular kind of data analysis. I conducted an informal poll as an icebreaker at a recent data science party, and most of the people I talked to said that it wasn’t data science if it didn’t include machine learning. Companies in all industries have been hiring “quants” to do statistical modeling. Even in the humanities, “distant reading” is a growing trend.

There has been a reaction to this, of course. Other humanists have argued for the continued value of close reading. Some companies have been hiring anthropologists and ethnographers. Academics, journalists and literary critics regularly write about the importance of nuance and empathy.

For years, my response to both types of arguments has been “we need both!” But this is not some timid search for a false balance or inclusion. We need both close examination and distributional analysis because the way we investigate the world depends on both, and both depend on each other.

I learned this from my advisor Melissa Axelrod, and a book she assigned me for an independent study on research methods. The Professional Stranger is a guide to ethnographic field methods, but also contains some commentary on the nature of scientific inquiry, and mixes its well-deserved criticism of quantitative social science with a frank acknowledgment of the interdependence of qualitative and quantitative methods. On Page 134 he discusses Labov’s famous study of /r/-dropping in New York City:

The catch, of course, is that he would never have known which variable to look at without the blood, sweat and tears of previous linguists who had worked with a few informants and identified problems in the linguistic structure of American English. All of which finally brings us to the point of this example traditional ethnography struggles mightily with the existence of pattern among the few.

Labov acknowledges these contributions in Chapter 2 of his 1966 book: Babbitt (1896), Thomas (1932, 1942, 1951), Kurath (1949, based on interviews by Guy S. Lowman), Hubbell (1950) and Bronstein (1962). His work would not be possible without theirs, and their work was incomplete until he developed a theoretical framework to place their analysis in, and tested that framework with distributional surveys.

We’ve all seen what happens when people try to use one of these methods without the other. Statistical methods that are not grounded in close examination of specific examples produce surveys that are meaningless to the people who take them and uninformative to scientists. Qualitative investigations that are not checked with rigorous distributional surveys produce unfounded, misleading generalizations. The worst of both worlds are quantitative surveys that are neither broadly grounded in ethnography nor applied to representative samples.

It’s also clear in Agar’s book that qualitative and quantitative are not a binary distinction, but rather two ends of a continuum. Research starts with informal observations about specific things (people, places, events) that give rise to open-ended questions. The answers to these questions then provoke more focused questions that are asked of a wider range of things, and so on.

The concepts of broad and narrow, general and specific, can be confusing here, because at the qualitative, close or ethnographic end of the spectrum the questions are broad and general but asked about a narrow, specific set of subjects. At the quantitative, distant or distributional end of the spectrum the questions are narrow and specific, but asked of a broad, general range of subjects. Agar uses a “funnel” metaphor to model how the questions narrow during this progression, but he could just as easily have used a showerhead to model how the subjects broaden at the same time.

The progression is not one-way, either. The findings of a broad survey can raise new questions, which can only be answered by a new round of investigation, again beginning with qualitative examination on a small scale and possibly proceeding to another broad survey. This is one of the cycles that increase our knowledge.

Rather than the funnel metaphor, I prefer a metaphor based on seeing. Recently I’ve been re-reading The Omnivore’s Dilemma, and in Chapter 8 Michael Pollan talks about taking a close view of a field of grass:

In fact, the first time I met Salatin he’d insisted that even before I met any of his animals, i get down on my belly in this very pasture to make the acquaintance of the less charismatic species his farm was nurturing that, in turn, were nurturing his farm.

Pollan then gets up from the grass to take a broader view of the pasture, but later bends down again to focus on individual cows and plants. He does this metaphorically throughout the book, as many great authors do: focusing in on a specific case, then zooming out to discuss how that case fits in with the bigger picture. Whether he’s talking about factory-farmed Steer 534, or Budger the grass-fed cow, or even the thousands of organic chickens that are functionally nameless under the generic name of “Rosie,” he dives into specific details about the animals, then follows up by reporting statistics about these farming methods and the animals they raise.

The bottom line is that we need studies from all over the qualitative-quantitative spectrum. They build on each other, forming a cycle of knowledge. We need to fund them all, to hire people to do them all, and to promote and publish them all. If you do it right, the plural of anecdote is indeed data, and you can’t have data without anecdotes.

Viewing in free motion

Last month I went on a walk with my friend Ezra. It was his birthday, so we walked for almost two hours, drinking coffee, eating cinnamon rolls, and talking about semantics and coding. The funny thing is that Ezra lives on the West Coast and I live in New York, so we conducted our entire conversation by cell phone, with him walking through Ballard and Loyal Heights, and me walking through Jackson Heights and East Elmhurst.

Cell phones have been around for decades, and I’m sure we’re far from the first to walk together this way. You’ve probably done it yourself. But it reminded me of Isaac Asimov’s 1956 novel The Naked Sun, in which our hero Elijah Baley visits an Earth colony on the planet Solaria, where all the colonists live on separate estates, with at most one spouse and possibly an infant child, surrounded by robots who tend to their every need, almost never seeing one another in person. They interact socially by “viewing” each other through realistic virtual-reality projections.

Baley interviews a murder suspect, Gladia Delmarre, and is intrigued when she tells him she goes on walks together with her neighbor. “I didn’t know you could go on walks together with anyone,” says Baley.

“I said viewing,” responds Gladia. “Oh well, I keep forgetting you’re an Earthman. Viewing in free motion means we focus on ourselves and we can go anywhere we want to without losing contact. I walk on my estate and he walks on his and we’re together.”

I had no visual contact with Ezra during this walk. I’ve seen people “viewing in free motion” on FaceTime. We could probably have rigged something up with a GoPro camera and Google Glass, but it would most likely not have been much like on Solaria, where I could have looked over and seen a chunk of Seattle superimposed on Queens, with Ezra walking across it next to me.

The biggest reason not to attempt any visual presence is that it was dangerous enough for me to be crossing the street while talking; it would have been much worse if the virtual view of the cars on 24th Avenue NW were blocking my view of the cars coming at me down Northern Boulevard.

Of course, on Solaria all the cars were (or will be?) automatic, and there are armies of robots to protect the humans from danger.

Printing differences and material issues in Google Books

I am looking forward to presenting my Digital Parisian Stage corpus and the exciting results I’ve gotten from it so far at the American Association for Corpus Linguistics at Iowa State in September. In the meantime I’m continuing to process texts, working towards a one percent sample from the Napoleonic period (Volume 1 of the Wicks catalog).

One of the plays in my sample is les M?urs du jour, ou l’?cole des femmes, a comedy by Collin-Harleville (also known as Jean-Fran?ois Collin d’Harleville). I ran the initial OCR on a PDF scanned for the Google Books project. For reasons that will become clear, I will refer to it by its Google Books ID, VyBaAAAAcAAJ. When I went to clean up the OCR text, I discovered that it was missing pages 2-6. I emailed the Google Books team about this, and got the following response:

google-books-material-issue

I’m guessing “a material issue” means that those pages were missing from the original paper copy, but I didn’t even bother emailing until the other day, since I found another copy in the Google Books database, with the ID kVwxUp_LPIoC.

Comparing the OCR text of VyBaAAAAcAAJ with the PDF of kVwxUp_LPIoC, I discovered some differences in spelling. For example, throughout the text, words that end in the old fashioned spelling -ois or -oit in VyBaAAAAcAAJ are spelled with the more modern -ais in kVwxUp_LPIoC. There is also a difference in the way “Madame” is abbreviated (“Mad.” vs. “M.me“) and in which accented letters preserve their accents when set in small caps, and differences in pagination. Here is the entirety of Act III, Scene X in each copy:

VyBaAAAAcAAJ

Act III, Scene X in copy VyBaAAAAcAAJ

Act III, Scene X in kVwxUp_LPIoC

Act III, Scene X in copy kVwxUp_LPIoC

My first impulse was to look at the front matter and see if the two copies were identified as different editions or different printings. Unfortunately, they were almost identical, with the most notable differences being that VyBaAAAAcAAJ has an ? ligature in the title, while kVwxUp_LPIoC is signed by the playwright and marked as being a personal gift from him to an unspecified recipient. Both copies give the exact same dates: the play was first performed on the 7th of Thermidor in year VIII and published in the same year (1800).

The Google Books metadata indicate that kVwxUp_LPIoC was digitized from the Lyon Public Library, while VyBaAAAAcAAJ came from the Public Library of the Netherlands. The other copies I have found in the Google Books database, OyL1oo2CqNIC from the National Library of Naples and dPRIAAAAcAAJ from Ghent University, appear to be the same printing as kVwxUp_LPIoC, as does the copy from the National Library of France.

Since the -ais and M.me spellings are closer to the forms used in France today, we might expect that kVwxUp_LPIoC and its cousins are from a newer printing. But in Act II, Scene XI I came across a difference that concerns negation, the variable that I have been studying for many years. The decadent Parisians Monsieur Basset and Madame de Verdie question whether marriage should be eternal. Our hero Formont replies that he has no reason not to remain with his wife forever. In VyBaAAAAcAAJ he says, “je n’ai pas de raisons,” while in kVwxUp_LPIoC he says “je n’ai point de raisons.”

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 78) in kVwxUp_LPIoC

Act III, Scene XI (page 78) in kVwxUp_LPIoC

In my dissertation study I found that the relative use of ne ? point had already peaked by the nineteenth century, and was being overtaken by ne ? pas. If this play fits the pattern, the use of the more conservative pattern in kVwxUp_LPIoC goes against the more innovative -ais and M.me spellings.

I am not an expert in French Revolutionary printing (if anyone knows a good reference or contact, please let me know!). My best guess is that kVwxUp_LPIoC is from a limited early run, some copies of which were given to the playwright to give away, while VyBaAAAAcAAJ and the other -ais/M.me/ne ? point copies are from a larger, slightly later, printing.

In any case, it is clear that I should pick one copy and make it consistent with that. Since VyBaAAAAcAAJ is incomplete, I will try dPRIAAAAcAAJ. I will try to double-check all the spellings and wordings, but at the very least I will check all of the examples of negation against dPRIAAAAcAAJ as I annotate them.

Introducing Selected Birthdays

If you have an Android phone like me, you probably use Google Calendar. I like the way it integrates with my contacts so that I can schedule events with people. I like the idea of it integrating with my Google+ contacts to automatically create a calendar of birthdays that I don’t want to miss. There’s a glitch in that, but I’ve created a new app to get around it, called Selected Birthdays.

The glitch is that the builtin Birthdays calendar has three options: show your Google Contacts, show your contacts and the people in your Google+ circles, or nothing. I have a number of contacts who are attractive and successful people, but I’m sorry to say I have no interest in knowing when their birthdays are. Natasha Lomas has even stronger feelings.

Google doesn’t let you change the builtin Birthdays calendar, but it does let you create a new calendar and fill it with the birthdays that interest you. My new web app, Selected Birthdays, automates that process. It goes through your contacts, finds the ones who have shared their birthdays with you, and gives you a checklist. You decide whose birthdays to include, and Select Birthdays will create a new calendar with those birthdays. It’ll also give you the option of hiding Google’s built-in birthday calendar.

I wrote the Selected Birthdays app in Javascript with the Google+ and Google Calendar APIs. Ian Jones was a big help in recommending the moment.js library, which I used to manipulate dates. Bootflat helped me add a bit of visual style.

For the app to work you’ll have to authorize it to read your contacts and write your calendars. For your privacy, the app communicates directly between your browser and Google’s server; once you download it there is no further contact with my server. There is no way for me to see or edit your contacts or calendars. You can verify that in the source code.

Please let me know if you have any comments, questions or suggestions. I have also made the code available on GitHub for free under the Apache License, if you want to build on it. A number of people have said they wish they had an app like this for Facebook. If enough of you repeat that, I’ll look into it!

Prejudice and intelligibility

Last month I wrote about the fact that intelligibility – the ability of native speakers of one language or dialect to understand a closely related one – is not constant or automatic. A major factor in intelligibility is familiarity: when I was a kid, for example, I had a hard time understanding the Beatles until I got used to them. Having lived in North Carolina, I find it much easier to understand people from Ocracoke Island than my students do.

Prejudice can play a big role in intelligibility, as Donald Rubin showed in 1992. (I first heard about this study from Rosina Lippi-Green’s book English With an Accent.) At the time, American universities had recently increased the overall number of instructors from East Asia they employed, and some students complained that they had difficulty understanding the accents of their instructors.

In an ingenious experiment, Rubin demonstrated that much of this difficulty was due to prejudice. He recorded four-minute samples of “a native speaker of English raised in Central Ohio” reading a script for introductory-level lectures on two different subjects and played those samples to three groups of students.

For one group, a still photo of a “Caucasian” woman representing the instructor was projected on a screen while the audio sample was played. For the second group, a photo of “an Asian (Chinese)” woman was projected, with the same audio of the woman from central Ohio (presumably not of Asian ancestry) was played. The third group heard only the audio and was not shown a photo.

In a survey they took after hearing the clip, most of the students who saw the picture of an Asian woman reported that the speaker had “Oriental/Asian ethnicity.” That’s not surprising, because it’s essentially what they were told by being shown the photograph. But many of these students went further and reported that the person in the recording “speaks with a foreign accent.” In contrast, the vast majority of the students who were shown the “Caucasian” picture said that they heard “an American accent.”

The kicker is that immediately after they heard the recording (and before answering the survey), Rubin tested the students on their comprehension of the content of the excerpt, by giving them a transcript with every seventh word replaced by a blank. The students who saw a picture of an Asian woman not only thought they heard a “foreign accent,” but they did worse on the comprehension task! Rubin concluded that “listening comprehension seemed to be undermined simply by identifying (visually) the instructor as Asian.”

Rubin’s subjects may not have felt any particular hostility towards people from East Asia, but they had a preconceived notion that the instructor would have an accent, and they assumed that they would have difficulty understanding her, so they didn’t bother trying.

This study (and a previous one by Rubin with Kim Smith) connect back to what I was saying about familiarity, and I will discuss that and power imbalances in a future post, but this finding is striking enough to merit its own post.

Ten reasons why sign-to-speech is not going to be practical any time soon.

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that ? supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype. But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

  1. Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
  2. We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
  3. It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
  4. Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

    -Say “yes” or “no” after the tone.
    -No.
    -I think you said, “Go!” Is that correct?
    -No.
    -My mistake. Please try again.
    -No.
    -I think you said, “I love cheese.” Is that correct?
    -Operator!

  5. There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
  6. Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
  7. Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
  8. Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
  9. Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
  10. What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

Including linguistics at literary conferences

I just got back from attending my second meeting of the Northeast Modern Language Association. My experience at both conferences has been very positive: friendly people, interesting talks, good connections. But I would like to see a little more linguistics at NeMLA, and better opportunities for linguists to attend. I’ve talked with some of the officers of the organization about this, and they have told me they welcome more papers from linguists.

One major challenge is that the session calls tend to be very specific and/or literary. Here are some examples from this year’s conference:

  • The Language of American Warfare after World War II
  • Representing Motherhood in Contemporary Italy
  • ‘Deviance’ in 19th-century French Women?s Writing

There is nothing wrong with any of these topics, but when they are all that specific, linguistic work can easily fall through the cracks. For several years I scanned the calls and simply failed to find anything where my work would fit. The two papers that I have presented are both pedagogical (in 2014 on using music to teach French, and this year on using accent tag videos to teach language variation and language attitudes). I believe that papers about the structure of language can find an audience at NeMLA, when there are sessions where they can fit.

In contrast, the continental MLA tends to have several calls with broader scope: an open call for 18th-Century French, for example, as well as ones specifically related to linguistics. When I presented at the MLA in 2012 it was at a session titled “Change and Perception of Change in the Romance Languages,” organized by Chris Palmer (a linguist and all-around nice guy).

With all that in mind, if you are considering attending next year’s NeMLA in Baltimore, I would like to ask the following:

  • Would you consider submitting a session proposal by the April 29th deadline?
  • Would you like to co-chair a session with me? (please respond by private email)
  • What topics would you find most inviting for linguistics papers at a (mostly) literature conference?

I recognize that I have readers outside of the region. For those of you who do not live in northeastern North America, have you had similar experiences with literary conferences? Do you have suggestions for session topics – or session topics to avoid?

On mutual intelligibility and familiarity

There’s an idea that dialects are mutually intelligible and languages are mutually unintelligible. John McWhorter had a nice piece in the Atlantic where he summarized the evidence against this idea. There are two factors in mutual intelligibility that McWhorter does not mention: familiarity and power.

Ultimately we can place any pair of language varieties on a continuum of closeness. On one end, for example, I can speak my native Hudson Valley dialect of English and be understood very easily by speakers of the Northern Cities dialect on the other side of the mountains. On the other end of the continuum, when my neighbors from Bangladesh speak their native Bengali I have no idea what they are saying.

As McWhorter shows with examples like the “languages” Swedish and Danish and the “dialects” of Moroccan and Jordanian colloquial Arabic, the edge cases are much less clear. He talks about dialect continua like the one between French and Italian, where the variety spoken in each village is subtly different from that in the next village, but still understandable. When people from towns that are hundreds of miles apart meet, however, they cannot understand each other’s village dialect.

McWhorter simplifies things a bit when he says that in English, “speakers of different dialects within the same language can all understand each other, more or less. Cockney, South African, New Yorkese, Black, Yorkshire?all of these are mutually intelligible variations on a theme.” It’s not true that any English speaker can understand any other English speaker. Consider this video of the English spoken on Ocracoke Island, off the coast of North Carolina, produced by Walt Wolfram and his students:

I have played this video for my students here in New York, and none of them can understand it the first time. They can sometimes catch a few words, but often they identify the words incorrectly. I had similar difficulty the first time I heard it, but since then I’ve listened to it at least a dozen times.

I also spent a year living in Greenville, North Carolina, a town about a hundred miles inland from Ocracoke, where the inhabitants speak a related dialect. During that year my wife and I took several day trips to Raleigh (the state capital) and rural areas near Ocracoke, and spent a weekend on the island itself (a gorgeous, welcoming place).

What I observed in North Carolina in some ways resembles the dialect continua that McWhorter describes. Residents of mainland Hyde County sound a lot like Ocracokers, and people who grew up in Greenville (white people at least) sound kind of like people from mainland Hyde County. Raleigh is in the Piedmont region, and people there speak a Piedmont dialect with influence both from the nearby Coastal Plain and from economic migrants from Northern states. Data collected by Wolfram and his students largely corroborates that.

When we first moved to Greenville, I could understand people from Raleigh with no problem. I understood people from Greenville and the surrounding towns 95% or more of the time, but there were a few words that tripped me up, like “house” with its fronted syllable nucleus and front rounded offglide.

After a couple of months I felt a lot more comfortable understanding the Greenville accent, and the accents of Ocracoke and mainland Hyde County were no longer unintelligible. And that brings me back to the connection between familiarity and intelligibility.

Thinking back on it, I remembered that I used to have a much harder time understanding the Beatles. I didn’t really know what Eleanor Rigby and lovely Rita were doing and why, until I had listened to the songs over and over, and watched Dudley Moore movies, and met actual English people.

I didn’t have to do anything on this level to understand Billy Joel or Paul Simon, who sang in my (literal) mother tongue – their Hicksville and Queens accents are very close to my mom’s Bronx accent. I understood the Beatles, and the Rolling Stones, much better when they affected American accents in “Rocky Raccoon” and “Wild Horses.” Yes, they were affecting some kind of Southern or Western accent, but it wasn’t a coastal North Carolina accent, it was a pastiche they had picked up from movies and music, and I knew it as well as they did. Plus, my father was from Texas.

My point is that our dialects aren’t as mutually intelligible as we like to say they are. We don’t typically have to learn them the way we learn a more distant variety like German or Fulani, but there is a role of learning and familiarity.

I’m sure McWhorter knows this; there’s only so much you can fit in an article, and it was tangential to his main point. Similarly, above I mentioned the role power in intelligibility, and I’ll write about that in a future post.

Bloomberg at a press conference on Univision

The Mayor’s speech

The spectacle of two bilingual Presidential candidates arguing in Spanish last week reminded me of the Twitter feed, “Miguel Bloombito,” created by Rachel Figueroa-Levin to mock our former Mayor’s Spanish for the amusement of her friends. I may be coming late to the party here, but Bloombito is still tweeting, and was recently mentioned by one of my fellow linguists. If Bloomberg runs for President we can probably expect to hear more from El Bloombito, so it’s not too late to say how dismayed I was by this parody as a linguist, as a language teacher, as a non-native Spanish speaker and as a New Yorker.

If Bloombito were simply a fun, jokey phenomenon, punching up at a privileged white billionaire who needs no defending, I wouldn’t spend time on it. But the context is not as simple as that. Figueroa-Levin’s judgment is linguistically naive, and rests on a confusion of pronunciation with overall competence, and an implied critique of language learning that sets the bar so high that most of the world’s population can never meet it.

Figueroa-Levin says that, “I think he?s just reading something on a card,” and maybe he does that with Spanish in the same contexts as with English, but that is not all there is to his Spanish. As reporter Juan Manuel Ben?tez told the New York Times, “the mayor?s Spanish is a lot better than a lot of people really think it is.”

Unsurprisingly, then, the tweets of El Bloombito do not actually resemble the Mayor’s Spanish very much at all. Instead they are a caricature of bad Spanish, with bad morphology and syntax, and lots of English mixed in. Linguists actually agree that mixing two languages is generally a sign of competence in both languages, and New York Spanish has several English borrowings that are absolutely standard. In contrast, the fictional Bloombito mixes them in ways that no real speaker does, like adding Spanish gender markers to every English noun.

For years now, as the population of native Spanish speakers has grown, politicians have made an effort to speak the language in public. With President George W. Bush and Governor George Pataki, Spanish seemed mostly symbolic. But Bloomberg seems to have taken more seriously the fact that twenty percent of the city’s population speaks Spanish at home.

The most noticeable feature of the actual Michael Bloomberg’s Spanish is a very strong American accent. He has no real success in pronouncing sounds that are specific to Spanish, like the flapped /r/ or the pure /o/, substituting sounds from his Boston/New York English. But in addition, when he says a Spanish word that has an English cognate his pronunciation tends to sound closer to the English word, giving the impression that he is using more English words than he really is.

There are ways of rendering these mispronunciations into Spanish, but Figueroa-Levin does not use them in her parody, probably because her audience doesn’t know Spanish well enough to get the joke. She also confuses accent for overall competence in the language. But if you listen beyond his accent, Bloomberg displays a reasonable degree of competence in Spanish. He often reads from prepared remarks, as with English, but he is able to speak extemporaneously in Spanish. In particular, he is able to understand fairly complex questions and give thoughtful responses to them on the spot, as in this discussion of the confirmation of Justice Sotomayor:

The bottom line is that, as Bloomberg said in the first clip, “Es dif?cil para aprender un nuevo idioma.” My experience teaching ESL and French has confirmed that. No adult, especially not a man in his sixties, is going to achieve nativelike fluency. But we can achieve the kind of mastery that Bloomberg has. And this city runs on the work of millions of people who speak English less well than Bloomberg speaks Spanish, but still manage to get things done.

In fact, before Bloombito I used clips of Mayor Bloomberg to reassure my ESL students that they could still function in a foreign language and be respected, even with a thick accent. After Bloombito I can no longer give them that assurance.

In the Salon interview Figueroa-Levin makes the argument that this kind of language work is best left to professionals, as Bloomberg did with American Sign Language, for example, and that Bloomberg was doing his Spanish-speaking constituents a disservice by speaking it imperfectly. I have made similar arguments regarding interpreters and translators. But speaking to the media and constituents in a foreign language is nowhere near as difficult as interpreting, and does not need to be professionalized. I’m sure the Mayor always had fluent Spanish speaking staffers nearby to fall back on as well.

What I find particularly disturbing about Miguel Bloombito is the symbolism. For centuries in this country speakers of other languages, particularly Spanish, have been expected to speak English in addition to whatever else they are trying to do (work, advocacy, civic participation). English has been associated with power, Spanish with subjugation.

Figueroa-Levin told Salon, “You get this sense that he thinks we should be honored that he would even attempt to speak Spanish.” What she gets wrong is that this is not just an empty gesture, like memorizing a few words. Here we have a native English speaker, one of the most powerful people in the country, who puts in significant time and effort every day to learn Spanish, and people mock him for it. It’s like if someone saw the Pope washing the feet of homeless people, criticized him on his technique, and told him to let a licensed pedicurist do the job. I could say more, but I’ve run out of polite things to say, so I’ll leave the last word to Carlos Culerio, the man on the street interviewed in the first clip above:

“I feel especially proud, as a Dominican, that Mayor Bloomberg speaks Spanish. It’s a matter of pride for us as Hispanics.”