Teaching language variation with accent tag videos

Last January I wrote that the purpose of phonetic transcription is to talk about differences in pronunciation. Last December I introduced accent tags, a fascinating genre of self-produced YouTube videos of crowdsourced dialectology and a great source of data about language variation. I put these together when I was teaching a unit on language variation for the second-semester Survey of Linguistics course at Saint John’s University. When I learned about language variation as an undergraduate, it was exciting to see accents as a legitimate object of study, and it was gratifying to see my family’s accents taken seriously.

At the same time, the focus on a single dialect at a time contrasts with the absence of variation from the discussion of English pronunciation, grammar and lexis in other units, and in the rest of the way English is typically taught. This implies that there is a single standard that does not vary, despite evidence from perceptual dialectology (such as Dennis Preston’s work) that language norms are fragmentary, incomplete and contested. I saw the cumulative effects of this devaluation in class discussions, when students openly denigrated features of the New York accents spoken by their neighbors, their families and often the students themselves.

At first I just wanted to illustrate variation in African American accents, but then I realized that the accent tags allowed me to set up the exercises as an explicit contrast between two varieties. I asked my students to search YouTube to find an accent tag that “sounds like you,” and one that sounded different, and to find differences between the two in pronunciation, vocabulary and grammar. I followed up on this exercise with other ones asking students to compare two accent tags from the same place but with different ethnic, economic or gender backgrounds.

My students did a great job at finding videos that sounded like them. Most of them were from the New York area, and were able to find accent tags made by people from New York City, Long Island or northern New Jersey. Some students were African American or Latin American, and were able to find videos that demonstrated the accents, vocabulary and grammar common among those groups. The rest of the New York students did not have any features that we noticed as ethnic markers, and whether the students were Indian, Irish or Circassian, they were satisfied that the Italian or Jewish speakers in the videos sounded pretty much like them.

Some of the students were from other parts of the country, and found accent tags from California or Boston that illustrated features that the students shared. A student from Zimbabwe who is bilingual in English and Shona was not able to find any accent tags from her country, but she found a video made by a white South African and was able to identify features of English pronunciation, vocabulary and grammar that they shared.

As I wrote last year, the phonetic transcription exercises I had done in introductory linguistics and phonology courses were difficult because they implicitly referred to unspecified standard pronunciations, leading to confusion among the students about the “right” transcriptions. In the variation unit, when I framed the exercise as an explicit comparison between something that “sounds like you” and something different, I removed the implied value judgment and replaced it with a neutral investigation of difference.

I found that this exercise was easier for the students than the standard transcription problems, because it gave them two recordings to compare instead of asking them to compare one recording against their imagination of the “correct” or “neutral” pronunciation. I realized that this could be used for the regular phonetics units as well. I’ll talk about my experiences with that in a future post.

African American English has accents too

Diversity is notoriously subjective and difficult to pin down. In particular, we tend be impressed if we know the names of a lot of categories for something. We might think there are more mammal species than insect species, but biologists tell us that there are hundreds of thousands of species of beetles alone. This is true in language as well: we think of the closely-related Romance and Germanic languages as separate, while missing the incredible diversity of “dialects” of Chinese or Arabic.

This is also true of English. As an undergraduate I was taught that there were four dialects in American English: New England, North Midland, South Midland and Coastal Southern. Oh yeah, and New York and Black English. The picture for all of those is more complicated than it sounds, and I went to Chicago I discovered that there are regional varieties of African American English.

In 2012 Annie Minoff, a blogger for Chicago public radio station WBEZ, took this oversimplification for truth: “AAE is remarkable for being consistent across urban areas; that is, Boston AAE sounds like New York AAE sounds like L.A. AAE, etc.” Fortunately a commenter, Amanda Hope, challenged her on that assertion. Minoff confirmed the pattern in an interview with variationist Walt Wolfram, and posted a correction in 2013.

In 2013 I was preparing to teach a unit on language variation and didn’t want to leave my students as misinformed as I – or Minoff – had been. Many of my students were African American, and I saw no reason to spend most of the unit on white varieties and leave African American English as a footnote. But the documentation is spotty: I know of no good undergraduate-level discussion of variation in African American English.

A few years before I had found a video that some guy took of a party in a parking lot on the West Side of Chicago. It wasn’t ideal, but it sort of gave you an idea. The link was dead, so I typed “Chicago West Side” into Google. The results were not promising, so on a whim I added “accent” and that’s how I found my first accent tag video.

Accent tag videos are an amazing thing, and I could write a whole series of posts about them. Here was a young black woman from Chicago’s West Side, not only talking about her accent but illustrating it, with words and phrases to highlight its differences from other dialects. She even talks (as many people do in these videos) about how other African Americans hear her accent in other places, like North Carolina. You can compare it (as I did in class) with a similar video made by a young black woman from Raleigh (or New York or California), and the differences are impossible to ignore.

In fact, when Amanda Hope challenged Minoff’s received wisdom on African American regional variation, she used accent tag videos to illustrate her point. These videos are amazing, particularly for teaching about language and linguistics, and from then on I made extensive use of them in my courses. There’s also a video made by two adorable young English women, one from London and one from Bolton near Manchester, where you can hear their accents contrasted in conversation. I like that I can go not just around the country but around the world (Nigeria, Trinidad, Jamaica) illustrating the diversity of English just among women of African descent, who often go unheard in these discussions. I’ll talk more about accent tag videos in future posts.

You can also find evidence of regional variation in African American English on Twitter. Taylor Jones has a great post about it that also goes into the history of African American varieties of English.

“Said” for 2016 Word of the Year

I just got back from the American Association for Corpus Linguistics conference in Ames, Iowa, and I’m calling the Word of the Year: for 2016 it will be said.

You may think you know said. It’s the past participle of say. You’ve said it yourself many times. What’s so special about it?

What’s special was revealed by Jordan Smith, a graduate student at Iowa State, in his presentation on Saturday afternoon. said is becoming a determiner. It is grammaticizing.

In addition to its participial use (“once the words were said”) you’ve probably seen said used as an attributive adjective (“the said property”). It indicates that the noun it modifies refers to a person, place or thing that has been mentioned recently, with the same noun, and that the speaker/writer expects it to be active in the hearer/reader’s memory.

Attributive said is strongly associated with legal documents, as in its first recorded use in the English Parliament in 1327. The Oxford English Dictionary reports that said was used outside of legal contexts as early as 1973, in the English sitcom Steptoe and Son. In this context it was clearly a joke: a word that evoked law courts used in a lower-class colloquial context.

Jordan Smith examined uses of said in the Corpus of Contemporary American English (COCA) and found that attributive said has increasingly been used without the for several years now, and outside the legal domain. He observes that syntactic changes and increased frequency have been named by linguists like Joan Bybee as harbingers of grammaticization.

Grammaticization (also known as grammaticalization; search for both) is when an ordinary lexical item (like a noun, verb or adjective, or even a phrase) becomes a grammatical item (like a pronoun, preposition or auxiliary verb). For example, while is a noun meaning a period of time, but it was grammaticized to a conjunction indicating simultaneity. Used is an adjective meaning accustomed, as in “I was used to being lonely,” but has also become part of an auxiliary indicating habitual aspect as in “I used to be lonely.”

Jordan is suggesting that said is no longer just a verb or even an adjective, it’s our newest determiner in English. Determiners are an exclusive club of short words that modify nouns. They include articles like an and the, but also demonstratives like these and quantifiers like several.

Noun phrases without a determiner tend to refer to generic categories, as I have been doing with phrases like legal documents and grammaticization. That is clearly not what is going on with said girlfriend. Noun phrases with said refer to a specific item or group of items, in some sense even more so than noun phrases with the.

Thanks to the wireless Internet at the AACL, I began searching for of said on Twitter, and found a ton of examples. There are plenty for in said examples as well.

It’s not just happening in English. The analogous French ledit is also used outside the legal domain. Its reanalysis is a bit different, since it incorporates the article rather than replacing it. Like most noun modifiers in French it is inflected for gender and number. I haven’t found anything similar for Spanish.

In 2013 the American Dialect Society chose because as its Word of the Year. Because is already a conjunction, having grammaticized from the noun cause, but it has been reanalyzed again into a preposition, as in because science. Some theorists consider this to be a further step in grammaticization. And here is a twenty-first century prepositional phrase for you, folks: because (P) said (Det) relationship (N).

After Jordan’s presentation it struck me that said is an excellent candidate for the 2016 Word of the year. And if the ADS isn’t interested, maybe another organization like the International Cognitive Linguistics Association, can sponsor a Grammaticization of the Year.

Printing differences and material issues in Google Books

I am looking forward to presenting my Digital Parisian Stage corpus and the exciting results I’ve gotten from it so far at the American Association for Corpus Linguistics at Iowa State in September. In the meantime I’m continuing to process texts, working towards a one percent sample from the Napoleonic period (Volume 1 of the Wicks catalog).

One of the plays in my sample is les Mœurs du jour, ou l’école des femmes, a comedy by Collin-Harleville (also known as Jean-François Collin d’Harleville). I ran the initial OCR on a PDF scanned for the Google Books project. For reasons that will become clear, I will refer to it by its Google Books ID, VyBaAAAAcAAJ. When I went to clean up the OCR text, I discovered that it was missing pages 2-6. I emailed the Google Books team about this, and got the following response:

google-books-material-issue

I’m guessing “a material issue” means that those pages were missing from the original paper copy, but I didn’t even bother emailing until the other day, since I found another copy in the Google Books database, with the ID kVwxUp_LPIoC.

Comparing the OCR text of VyBaAAAAcAAJ with the PDF of kVwxUp_LPIoC, I discovered some differences in spelling. For example, throughout the text, words that end in the old fashioned spelling -ois or -oit in VyBaAAAAcAAJ are spelled with the more modern -ais in kVwxUp_LPIoC. There is also a difference in the way “Madame” is abbreviated (“Mad.” vs. “M.me“) and in which accented letters preserve their accents when set in small caps, and differences in pagination. Here is the entirety of Act III, Scene X in each copy:

VyBaAAAAcAAJ

Act III, Scene X in copy VyBaAAAAcAAJ

Act III, Scene X in kVwxUp_LPIoC

Act III, Scene X in copy kVwxUp_LPIoC

My first impulse was to look at the front matter and see if the two copies were identified as different editions or different printings. Unfortunately, they were almost identical, with the most notable differences being that VyBaAAAAcAAJ has an œ ligature in the title, while kVwxUp_LPIoC is signed by the playwright and marked as being a personal gift from him to an unspecified recipient. Both copies give the exact same dates: the play was first performed on the 7th of Thermidor in year VIII and published in the same year (1800).

The Google Books metadata indicate that kVwxUp_LPIoC was digitized from the Lyon Public Library, while VyBaAAAAcAAJ came from the Public Library of the Netherlands. The other copies I have found in the Google Books database, OyL1oo2CqNIC from the National Library of Naples and dPRIAAAAcAAJ from Ghent University, appear to be the same printing as kVwxUp_LPIoC, as does the copy from the National Library of France.

Since the -ais and M.me spellings are closer to the forms used in France today, we might expect that kVwxUp_LPIoC and its cousins are from a newer printing. But in Act II, Scene XI I came across a difference that concerns negation, the variable that I have been studying for many years. The decadent Parisians Monsieur Basset and Madame de Verdie question whether marriage should be eternal. Our hero Formont replies that he has no reason not to remain with his wife forever. In VyBaAAAAcAAJ he says, “je n’ai pas de raisons,” while in kVwxUp_LPIoC he says “je n’ai point de raisons.”

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 78) in kVwxUp_LPIoC

Act III, Scene XI (page 78) in kVwxUp_LPIoC

In my dissertation study I found that the relative use of ne … point had already peaked by the nineteenth century, and was being overtaken by ne … pas. If this play fits the pattern, the use of the more conservative pattern in kVwxUp_LPIoC goes against the more innovative -ais and M.me spellings.

I am not an expert in French Revolutionary printing (if anyone knows a good reference or contact, please let me know!). My best guess is that kVwxUp_LPIoC is from a limited early run, some copies of which were given to the playwright to give away, while VyBaAAAAcAAJ and the other -ais/M.me/ne … point copies are from a larger, slightly later, printing.

In any case, it is clear that I should pick one copy and make it consistent with that. Since VyBaAAAAcAAJ is incomplete, I will try dPRIAAAAcAAJ. I will try to double-check all the spellings and wordings, but at the very least I will check all of the examples of negation against dPRIAAAAcAAJ as I annotate them.

Prejudice and intelligibility

Last month I wrote about the fact that intelligibility – the ability of native speakers of one language or dialect to understand a closely related one – is not constant or automatic. A major factor in intelligibility is familiarity: when I was a kid, for example, I had a hard time understanding the Beatles until I got used to them. Having lived in North Carolina, I find it much easier to understand people from Ocracoke Island than my students do.

Photo: Theonlysilentbob / Wikimedia

Photo: Theonlysilentbob / Wikimedia

Prejudice can play a big role in intelligibility, as Donald Rubin showed in 1992. (I first heard about this study from Rosina Lippi-Green’s book English With an Accent.) At the time, American universities had recently increased the overall number of instructors from East Asia they employed, and some students complained that they had difficulty understanding the accents of their instructors.

In an ingenious experiment, Rubin demonstrated that much of this difficulty was due to prejudice. He recorded four-minute samples of “a native speaker of English raised in Central Ohio” reading a script for introductory-level lectures on two different subjects and played those samples to three groups of students.

For one group, a still photo of a “Caucasian” woman representing the instructor was projected on a screen while the audio sample was played. For the second group, a photo of “an Asian (Chinese)” woman was projected, with the same audio of the woman from central Ohio (presumably not of Asian ancestry) was played. The third group heard only the audio and was not shown a photo.

In a survey they took after hearing the clip, most of the students who saw the picture of an Asian woman reported that the speaker had “Oriental/Asian ethnicity.” That’s not surprising, because it’s essentially what they were told by being shown the photograph. But many of these students went further and reported that the person in the recording “speaks with a foreign accent.” In contrast, the vast majority of the students who were shown the “Caucasian” picture said that they heard “an American accent.”

The kicker is that immediately after they heard the recording (and before answering the survey), Rubin tested the students on their comprehension of the content of the excerpt, by giving them a transcript with every seventh word replaced by a blank. The students who saw a picture of an Asian woman not only thought they heard a “foreign accent,” but they did worse on the comprehension task! Rubin concluded that “listening comprehension seemed to be undermined simply by identifying (visually) the instructor as Asian.”

Rubin’s subjects may not have felt any particular hostility towards people from East Asia, but they had a preconceived notion that the instructor would have an accent, and they assumed that they would have difficulty understanding her, so they didn’t bother trying.

This study (and a previous one by Rubin with Kim Smith) connect back to what I was saying about familiarity, and I will discuss that and power imbalances in a future post, but this finding is striking enough to merit its own post.

On mutual intelligibility and familiarity

There’s an idea that dialects are mutually intelligible and languages are mutually unintelligible. John McWhorter had a nice piece in the Atlantic where he summarized the evidence against this idea. There are two factors in mutual intelligibility that McWhorter does not mention: familiarity and power.

Ocracoke Village. Photo: Sloan Poe / Flickr

Ocracoke Village. Photo: Sloan Poe / Flickr

Ultimately we can place any pair of language varieties on a continuum of closeness. On one end, for example, I can speak my native Hudson Valley dialect of English and be understood very easily by speakers of the Northern Cities dialect on the other side of the mountains. On the other end of the continuum, when my neighbors from Bangladesh speak their native Bengali I have no idea what they are saying.

As McWhorter shows with examples like the “languages” Swedish and Danish and the “dialects” of Moroccan and Jordanian colloquial Arabic, the edge cases are much less clear. He talks about dialect continua like the one between French and Italian, where the variety spoken in each village is subtly different from that in the next village, but still understandable. When people from towns that are hundreds of miles apart meet, however, they cannot understand each other’s village dialect.

McWhorter simplifies things a bit when he says that in English, “speakers of different dialects within the same language can all understand each other, more or less. Cockney, South African, New Yorkese, Black, Yorkshire—all of these are mutually intelligible variations on a theme.” It’s not true that any English speaker can understand any other English speaker. Consider this video of the English spoken on Ocracoke Island, off the coast of North Carolina, produced by Walt Wolfram and his students:

I have played this video for my students here in New York, and none of them can understand it the first time. They can sometimes catch a few words, but often they identify the words incorrectly. I had similar difficulty the first time I heard it, but since then I’ve listened to it at least a dozen times.

I also spent a year living in Greenville, North Carolina, a town about a hundred miles inland from Ocracoke, where the inhabitants speak a related dialect. During that year my wife and I took several day trips to Raleigh (the state capital) and rural areas near Ocracoke, and spent a weekend on the island itself (a gorgeous, welcoming place).

What I observed in North Carolina in some ways resembles the dialect continua that McWhorter describes. Residents of mainland Hyde County sound a lot like Ocracokers, and people who grew up in Greenville (white people at least) sound kind of like people from mainland Hyde County. Raleigh is in the Piedmont region, and people there speak a Piedmont dialect with influence both from the nearby Coastal Plain and from economic migrants from Northern states. Data collected by Wolfram and his students largely corroborates that.

When we first moved to Greenville, I could understand people from Raleigh with no problem. I understood people from Greenville and the surrounding towns 95% or more of the time, but there were a few words that tripped me up, like “house” with its fronted syllable nucleus and front rounded offglide.

After a couple of months I felt a lot more comfortable understanding the Greenville accent, and the accents of Ocracoke and mainland Hyde County were no longer unintelligible. And that brings me back to the connection between familiarity and intelligibility.

Thinking back on it, I remembered that I used to have a much harder time understanding the Beatles. I didn’t really know what Eleanor Rigby and lovely Rita were doing and why, until I had listened to the songs over and over, and watched Dudley Moore movies, and met actual English people.

I didn’t have to do anything on this level to understand Billy Joel or Paul Simon, who sang in my (literal) mother tongue – their Hicksville and Queens accents are very close to my mom’s Bronx accent. I understood the Beatles, and the Rolling Stones, much better when they affected American accents in “Rocky Raccoon” and “Wild Horses.” Yes, they were affecting some kind of Southern or Western accent, but it wasn’t a coastal North Carolina accent, it was a pastiche they had picked up from movies and music, and I knew it as well as they did. Plus, my father was from Texas.

My point is that our dialects aren’t as mutually intelligible as we like to say they are. We don’t typically have to learn them the way we learn a more distant variety like German or Fulani, but there is a role of learning and familiarity.

I’m sure McWhorter knows this; there’s only so much you can fit in an article, and it was tangential to his main point. Similarly, above I mentioned the role power in intelligibility, and I’ll write about that in a future post.

Describing differences in pronunciation

Last month I wrote that instead of only two levels of phonetic transcription, “broad” and “narrow,” what people do in practice is to adjust their level of detail according to the point they want to make. In this it is like any other form of communication: too much detail can be a distraction.

comparing transcription

But how do we decide how much detail to put in a given transcription, and how can we teach this to our students? In my experience there is always some kind of comparison. Maybe we’re comparing two speakers from different times or different regions, ethnicities, first languages, social classes, anatomies. Maybe we’re comparing two utterances by the same person in different phonetic, semantic, social or emotional contexts.

Sometimes there is no overt comparison, but at those times there is almost always an implicit comparison. If we are presenting a particular pronunciation it is because we assume our readers will find it interesting, because it is pathological or nonstandard. This implies that there is a normal or standard pronunciation that we have in our heads to contrast it to.

The existence of this comparison tells us the right level of detail to include in our transcriptions: enough to show the contrasts that we are describing, maybe a little more, but not so much to distract from this contrast. And we want to focus on that contrast, so we will include details about tone, place of articulation or laryngeal timing, and leave out details about nasality, vowel tongue height or segment length.

This has implications for the way we teach transcription. For our students to learn the proper level of detail to include, they need practice comparing two pronunciations, transcribing both, and checking whether their transcriptions highlight the differences that they feel are most relevant to the current discussion.

I can illustrate this with a cautionary tale from my teaching just this past semester. I had found this approach of identifying differences to be useful, but students found the initial assignments overwhelming. Even as I was jotting down an early draft of this blog post, I just told my students to transcribe a single speech sample. I put off comparison assignments for later, and then put them off again.

As a result, I found myself focusing too much on some details while dismissing others. I could sense that my students were a bit frustrated, but I didn’t make the connection right away. I did ask them to compare two pronunciations on the final exam, and it went well, but not as well as it could have if they had been practicing it all semester. Overall the semester was a success, but it could have been better.

I’ll talk about how you can find comparable pronunciations in a future post.

Eclipsing

I’ve written about default assumptions before: how for example people in different parts of the English-speaking world have different assumptions about what they’ll get when they order “tea” or a “burger.” In the southern United States, the subcategory of “iced tea” has become the default, while in the northern US it’s “hot tea,” and in England it’s “hot tea with milk.” But even though iced tea is the default “tea” in the South, everyone there will still agree that hot tea is “tea.” In other cases, though, one subcategory can be so salient, so familiar as to crowd out all the other subcategories, essentially taking over the category.

British concentration camp, Second Boer War (ca. 1901). Photo: British National Army Museum / Wikipedia

British concentration camp, Second Boer War (ca. 1901). Photo: British National Army Museum / Wikipedia

An example of this eclipsing is the category of “concentration camp.” When you read those words, you probably imagined a Nazi death camp like Auschwitz, where my cousin Dora was imprisoned. (Unlike many of her fellow prisoners she survived the ordeal, and died peacefully earlier this year at the age of 101.) Almost every time we hear those words, they have referred to camps where our enemies killed millions of innocent civilians as part of a genocidal project, so that is what we expect.

This expectation is why so many people wrote in when National Public Radio’s Neal Conan referred to the camps where Japanese-Americans were imprisoned in World War II as “concentration camps” in 2012. NPR ombudspeople Edward Schumacher-Matos and Lori Grisham observed that the word dates back to the Boer War. Dan Carlin goes into detail about how widely the word “campos de reconcentración” was used in the Spanish-American war. Last year, Aya Katz compared the use of “concentration camp” to that of “cage,” and earlier this year, reviewed the history of the word.

In general, the “concentration camps” of the Boer War and the Spanish American War, as well as the “camps de regroupement” used by the French in the wars of independence in Algeria and Indochina, were a counter-insurgency tactic, whereby the colonial power controlled the movements of the civilian population in an effort to prevent insurgents from hiding among noncombatants, and to prevent noncombatants from being used as human shields.

As Roger Daniels writes in his great article “Words Do Matter: A Note on Inappropriate Terminology and the Incarceration of the Japanese Americans” (PDF), the concept of “internment” refers to the process of separating “alien enemies” – nationals of an enemy power – from the general population, and was first practiced with British subjects during the War of 1812. While this was done for citizens of Japan (and other enemy powers) during World War II, Daniels objects to the use of “internment” to describe the incarceration of American citizens on the basis of Japanese ancestry. He notes that President Roosevelt used the term “concentration camp” to describe them, and asks people to use that word instead of “internment.”

In the case of the colonial wars, the camps were used to isolate colonized people from suspected insurgents. In the case of the Japanese-American incarceration, the camps were used to isolate suspected spies from the general population. In neither case were they used to exterminate people, or to commit genocide. They were inhumane, but they were very different from Nazi death camps.

It is not hard to understand why the Nazi death camps have come to eclipse all other kinds of concentration camps. They were so horrific, and have been so widely discussed and taught, that the inhumanity of relocating the populations of entire towns and rounding up people based on ethnicity pales by comparison. It makes complete sense to spend so much more time on them. As a result, if we have ever heard the term “concentration camp” used outside of the context of extermination and genocide it doesn’t stick in our memory.

For most English speakers, “concentration camp” means a Nazi death camp, or one equally horrific. This is why Daniels acknowledges, following Alice Yang Murray, that “it is clearly unrealistic to expect everyone to agree to use the contested term concentration camp.”

Challenges for radical categorization

I enjoyed Miriam Posner’s keynote address at the Keystone Digital Humanities Conference. It was far from the only talk last week that was animated by a desire for justice and compassion, and it was good to see that desire given such prominence by the organizers and applauded by the attendees.

As a linguist I also welcomed Posner’s focus on categorization and language diversity. I was trained as a syntactician, but over the past several years I have paid more and more attention to semantics, and categorization in particular. Building on the work of Ludwig Wittgenstein, Eleanor Rosch, George Lakoff and Deborah Cameron, I have come to see categorization as a touchpoint for social justice.

I should note that for me, categorization is not just where I can advocate for others with less power. As a transgender person, the power to categorize myself, my feelings, my beliefs and my actions is denied to me on a daily basis. The main reason that I study categorization is to regain that power for myself and others.

Much as I share Posner’s passion for justice, her talk raised some concerns in my mind. First, digital humanities cannot bear the entire burden of social justice, and even language as a whole cannot. Second, categories are slippery and flexible, which is a great strength of humanity but also a great weakness. Third, there are limits to how much we can trust anyone, no matter how high or low they are in the hierarchies of power. These concerns are not insurmountable barriers to a radical approach to categorization, but keeping them in mind will help us to be more effective fighters for social justice.

I plan to address the issues of the burden and trust in future posts, and in this post focus on the slipperiness of categories. As I understood it, Posner drew a distinction between the data models used by digital humanists (among many others) to categorize the world, and the lived experience of the people who created and consume the data.

There is often conflict between the categories used in the model and in the experience, and there is often a power imbalance between the digital humanist and the humans whose data is being modeled. Digital humanists may be perpetuating injustice by imposing their data models on the lived experiences of others. Posner gave examples of binary gender forms, database fields for racial classification, and maps of places. She contrasted these models imposed from above with examples where humanists had contested those models, aiming to replace them with models closer to the lived experience of people with less power who had a stake in the categorization.

The problem is that our lived experience is also a data model. As George Lakoff and other cognitive scientists have shown, the categories that humans use to describe and interpret our experience are themselves conventions that are collectively negotiated and then imposed on all members of the language community, with penalties for non-compliance. They are just as distinct from reality as the fields in a SQL table, and they shape our perceptions of reality in the same ways.

Whether or not they are encoded in HTML forms, categories are always contested, and the degree to which they are contested is a function of what is at stake to be gained or lost from them. In my experience, categorizing people, whether by race, gender, nationality, religion or other criteria, is the most fraught, because these categories are often used as proxies for other factors, to grant or deny us access to valuable resources. The second most fraught is the categorization of place, because places contain resources and are often proxies for categorizing people. After that, food seems to be the most fraught; if you doubt me, ask your Facebook friends for the most (or least) authentic Mexican or Italian restaurant.

And yet, as Wittgenstein and Rosch have observed, it is normal for categories to have multiple, slightly different, meanings and memberships, not just in the same language community but in the same individual. In my observations it is possible for the same person to use two different senses of a word – a category with two different but overlapping memberships – on the same page, in the same paragraph, in the same sentence.

Responding to Posner’s talk, Matt Lincoln posted a recipe for using the Resource Description Framework (RDF) to describe overlapping, contrasting systems of categorization. I think that is an excellent start, particularly because he places the data models of lived experience on the same level as those imagined by the researchers. My word of caution would be to keep in mind that there is not one singular data model for the lived experience of a community, or even for an individual. As Whitman said, we contain multitudes. Each member of those contradicting multitudes has its own data model, and we should thus be prepared to give it its own entry in the RDF.

I and others brought up similar issues in the question and answer period, and in the reception after Posner’s keynote, and I very much appreciate her taking the time to discuss them. As I remember it, she acknowledged the challenges that I raised, and I look forward to us all working together to build a humane, compassionate humanities, whether digital or not. I will discuss the challenges of bearing the burden and of trusting the community in later posts.