The spectacle of two bilingual Presidential candidates arguing in Spanish last week reminded me of the Twitter feed, “Miguel Bloombito,” created by Rachel Figueroa-Levin to mock our former Mayor’s Spanish for the amusement of her friends. I may be coming late to the party here, but Bloombito is still tweeting, and was recently mentioned by one of my fellow linguists. If Bloomberg runs for President we can probably expect to hear more from El Bloombito, so it’s not too late to say how dismayed I was by this parody as a linguist, as a language teacher, as a non-native Spanish speaker and as a New Yorker.

If Bloombito were simply a fun, jokey phenomenon, punching up at a privileged white billionaire who needs no defending, I wouldn’t spend time on it. But the context is not as simple as that. Figueroa-Levin’s judgment is linguistically naive, and rests on a confusion of pronunciation with overall competence, and an implied critique of language learning that sets the bar so high that most of the world’s population can never meet it.

Figueroa-Levin says that, “I think he’s just reading something on a card,” and maybe he does that with Spanish in the same contexts as with English, but that is not all there is to his Spanish. As reporter Juan Manuel Benítez told the New York Times, “the mayor’s Spanish is a lot better than a lot of people really think it is.”

Unsurprisingly, then, the tweets of El Bloombito do not actually resemble the Mayor’s Spanish very much at all. Instead they are a caricature of bad Spanish, with bad morphology and syntax, and lots of English mixed in. Linguists actually agree that mixing two languages is generally a sign of competence in both languages, and New York Spanish has several English borrowings that are absolutely standard. In contrast, the fictional Bloombito mixes them in ways that no real speaker does, like adding Spanish gender markers to every English noun.

For years now, as the population of native Spanish speakers has grown, politicians have made an effort to speak the language in public. With President George W. Bush and Governor George Pataki, Spanish seemed mostly symbolic. But Bloomberg seems to have taken more seriously the fact that twenty percent of the city’s population speaks Spanish at home.

The most noticeable feature of the actual Michael Bloomberg’s Spanish is a very strong American accent. He has no real success in pronouncing sounds that are specific to Spanish, like the flapped /r/ or the pure /o/, substituting sounds from his Boston/New York English. But in addition, when he says a Spanish word that has an English cognate his pronunciation tends to sound closer to the English word, giving the impression that he is using more English words than he really is.

There are ways of rendering these mispronunciations into Spanish, but Figueroa-Levin does not use them in her parody, probably because her audience doesn’t know Spanish well enough to get the joke. She also confuses accent for overall competence in the language. But if you listen beyond his accent, Bloomberg displays a reasonable degree of competence in Spanish. He often reads from prepared remarks, as with English, but he is able to speak extemporaneously in Spanish. In particular, he is able to understand fairly complex questions and give thoughtful responses to them on the spot, as in this discussion of the confirmation of Justice Sotomayor:

The bottom line is that, as Bloomberg said in the first clip, “Es difícil para aprender un nuevo idioma.” My experience teaching ESL and French has confirmed that. No adult, especially not a man in his sixties, is going to achieve nativelike fluency. But we can achieve the kind of mastery that Bloomberg has. And this city runs on the work of millions of people who speak English less well than Bloomberg speaks Spanish, but still manage to get things done.

In fact, before Bloombito I used clips of Mayor Bloomberg to reassure my ESL students that they could still function in a foreign language and be respected, even with a thick accent. After Bloombito I can no longer give them that assurance.

In the Salon interview Figueroa-Levin makes the argument that this kind of language work is best left to professionals, as Bloomberg did with American Sign Language, for example, and that Bloomberg was doing his Spanish-speaking constituents a disservice by speaking it imperfectly. I have made similar arguments regarding interpreters and translators. But speaking to the media and constituents in a foreign language is nowhere near as difficult as interpreting, and does not need to be professionalized. I’m sure the Mayor always had fluent Spanish speaking staffers nearby to fall back on as well.

What I find particularly disturbing about Miguel Bloombito is the symbolism. For centuries in this country speakers of other languages, particularly Spanish, have been expected to speak English in addition to whatever else they are trying to do (work, advocacy, civic participation). English has been associated with power, Spanish with subjugation.

Figueroa-Levin told Salon, “You get this sense that he thinks we should be honored that he would even attempt to speak Spanish.” What she gets wrong is that this is not just an empty gesture, like memorizing a few words. Here we have a native English speaker, one of the most powerful people in the country, who puts in significant time and effort every day to learn Spanish, and people mock him for it. It’s like if someone saw the Pope washing the feet of homeless people, criticized him on his technique, and told him to let a licensed pedicurist do the job. I could say more, but I’ve run out of polite things to say, so I’ll leave the last word to Carlos Culerio, the man on the street interviewed in the first clip above:

“I feel especially proud, as a Dominican, that Mayor Bloomberg speaks Spanish. It’s a matter of pride for us as Hispanics.”

Sampling and the digital humanities

I was pleased to have the opportunity to announce some progress on my Digital Parisian Stage project in a lightning talk at the kickoff event for New York City Digital Humanities Week on Tuesday. One theme that was expressed by several other digital humanists that day was the sheer volume of interesting stuff being produced daily, and collected in our archives.

The author and child, sampling the mists at Yaddo in 2003

I was particularly struck by Micki McGee’s story of how working on the Yaddo archive challenged her commitment to “horizontality” – flattening hierarchies, moving beyond the “greats” and finding valuable work and stories beyond the canon. The archive was simply too big for her to give everyone the treatment they deserved. She talked about using digital tools to overcome that size, but still was frustrated in the end.

At the KeystoneDH conference this summer I found out about the work of Franco Moretti, who similarly uses digital tools to analyze large corpora. Moretti’s methods seem very useful, but on Tuesday we saw that a lot of people were simply not satisfied with “distant reading”:

I am of the school that sees quantitative and qualitative methods as two ends of a continuum of tools, all of which are necessary for understanding the world. This is not even a humanities thing: from geologists with hammers to psychologists in clinics, all the sciences rely on close observation of small data sets.

My colleague in the NYU Computer Science Department, Adam Myers, uses the same approach to do natural language processing; I have worked with him on projects like this (PDF. We begin with a close reading of texts from the chosen corpus, then decide on a set of interesting patterns to annotate. As we annotate more and more texts, the patterns come into sharper focus, and eventually we use these annotations to train machine learning routines.

One question that arises with these methods is what to look at first. There is an assumption of uniformity in physics and chemistry, so that scientists can assume that one milliliter of ethyl alcohol will behave more or less like any other milliliter of ethyl alcohol under similar conditions. People are much less interchangeable, leading to problems like WEIRD bias in psychology. Groups of people and their conventions are even more complex, making it even more unlikely that the easiest texts or images to study are going to give us an accurate picture of the whole archive.

Fortunately, this is a solved problem. Pierre-Simon Laplace figured out in 1814 that he could get a reasonable estimate of the population of the French Empire by looking at a representative sample of its départements, and subsequent generations have improved on his sampling techniques.

We may not be able to analyze all the things, but if we study enough of them we may be able to get a good idea of what the rest are like. William Sealy “Student” Gosset developed his famous t-test precisely to avoid having to analyze all the things. His employers at the Guinness Brewery wanted to compare different strains of barley without testing every plant in the batch. The p-value told them whether they had sampled enough plants.

I share McGee’s appreciation of “horizontality” and looking beyond the greats, and in my Digital Parisian Stage corpus I achieved that horizontality with the methods developed by Laplace and Student. The creators of the FRANTEXT corpus chose its texts using the “principle of authority,” in essence just using the greats. For my corpus I built on the work of Charles Beaumont Wicks, taking a random sample from his list of all the plays performed in Paris between 1800 and 1815.

What I found was that characters in the randomly selected plays used a lot less of the conservative ne alone construction to negate sentences than characters in the FRANTEXT plays. This seems to be because the FRANTEXT plays focused mostly on aristocrats making long declamatory speeches, while the randomly selected plays also included characters who were servants, peasants, artisans and bourgeois, often in faster-moving dialogue. The characters from the lower classes tended to use much more of the ne … pas construction, while the aristocrats tended to use ne alone.

Student’s t-test tells me that the difference I found in the relative frequency of ne alone in just four plays was big enough that I could be confident of finding the same pattern in other plays. Even so, I plan to produce the full one percent sample (31 plays) so that I can test for differences that might be smaller

It’s important for me to point out here that this kind of analysis still requires a fairly close reading of the text. Someone might say that I just haven’t come up with the right regular expression or parser, but at this point I don’t know of any automatic tools that can reliably distinguish the negation phenomena that interest me. I find that to really get an accurate picture of what’s going on I have to not only read several lines before and after each instance of negation, but in fact the entire play. Sampling reduces the number of times I have to do that reading, to bring the overall workload down to a reasonable level.

Okay, you may be saying, but I want to analyze all the things! Even a random sample isn’t good enough. Well, if you don’t have the time or the money to analyze all the things, a random sample can make the case for analyzing everything. For example, I found several instances of the pas alone construction, which is now common but was rare in the early nineteenth century. I also turned up the script for a pantomime about the death of Captain Cook that gave the original Hawaiian characters a surprising (given what little I knew about these attitudes) level of intelligence and agency.

If either of those findings intrigued you and made you want to work on the project, or fund it, or hire me, that illustrates another use of sampling. (You should also email me.) Sampling gives us a place to start outside of the “greats,” where we can find interesting information that may inspire others to get involved.

One final note: the first step to getting a representative sample is to have a catalog. You won’t be able to generalize to all the things until you have a list of all the things. This is why my Digital Parisian Stage project owes so much to Beaumont Wicks. This “paper and ink” humanist spent his life creating a list of every play performed in Paris in the nineteenth century – the catalog that I sampled for my corpus.