“Said” for 2016 Word of the Year

I just got back from the American Association for Corpus Linguistics conference in Ames, Iowa, and I’m calling the Word of the Year: for 2016 it will be said.

You may think you know said. It’s the past participle of say. You’ve said it yourself many times. What’s so special about it?

What’s special was revealed by Jordan Smith, a graduate student at Iowa State, in his presentation on Saturday afternoon. said is becoming a determiner. It is grammaticizing.

In addition to its participial use (“once the words were said”) you’ve probably seen said used as an attributive adjective (“the said property”). It indicates that the noun it modifies refers to a person, place or thing that has been mentioned recently, with the same noun, and that the speaker/writer expects it to be active in the hearer/reader’s memory.

Attributive said is strongly associated with legal documents, as in its first recorded use in the English Parliament in 1327. The Oxford English Dictionary reports that said was used outside of legal contexts as early as 1973, in the English sitcom Steptoe and Son. In this context it was clearly a joke: a word that evoked law courts used in a lower-class colloquial context.

Jordan Smith examined uses of said in the Corpus of Contemporary American English (COCA) and found that attributive said has increasingly been used without the for several years now, and outside the legal domain. He observes that syntactic changes and increased frequency have been named by linguists like Joan Bybee as harbingers of grammaticization.

Grammaticization (also known as grammaticalization; search for both) is when an ordinary lexical item (like a noun, verb or adjective, or even a phrase) becomes a grammatical item (like a pronoun, preposition or auxiliary verb). For example, while is a noun meaning a period of time, but it was grammaticized to a conjunction indicating simultaneity. Used is an adjective meaning accustomed, as in “I was used to being lonely,” but has also become part of an auxiliary indicating habitual aspect as in “I used to be lonely.”

Jordan is suggesting that said is no longer just a verb or even an adjective, it’s our newest determiner in English. Determiners are an exclusive club of short words that modify nouns. They include articles like an and the, but also demonstratives like these and quantifiers like several.

Noun phrases without a determiner tend to refer to generic categories, as I have been doing with phrases like legal documents and grammaticization. That is clearly not what is going on with said girlfriend. Noun phrases with said refer to a specific item or group of items, in some sense even more so than noun phrases with the.

Thanks to the wireless Internet at the AACL, I began searching for of said on Twitter, and found a ton of examples. There are plenty for in said examples as well.

It’s not just happening in English. The analogous French ledit is also used outside the legal domain. Its reanalysis is a bit different, since it incorporates the article rather than replacing it. Like most noun modifiers in French it is inflected for gender and number. I haven’t found anything similar for Spanish.

In 2013 the American Dialect Society chose because as its Word of the Year. Because is already a conjunction, having grammaticized from the noun cause, but it has been reanalyzed again into a preposition, as in because science. Some theorists consider this to be a further step in grammaticization. And here is a twenty-first century prepositional phrase for you, folks: because (P) said (Det) relationship (N).

After Jordan’s presentation it struck me that said is an excellent candidate for the 2016 Word of the year. And if the ADS isn’t interested, maybe another organization like the International Cognitive Linguistics Association, can sponsor a Grammaticization of the Year.

Printing differences and material issues in Google Books

I am looking forward to presenting my Digital Parisian Stage corpus and the exciting results I’ve gotten from it so far at the American Association for Corpus Linguistics at Iowa State in September. In the meantime I’m continuing to process texts, working towards a one percent sample from the Napoleonic period (Volume 1 of the Wicks catalog).

One of the plays in my sample is les Mœurs du jour, ou l’école des femmes, a comedy by Collin-Harleville (also known as Jean-François Collin d’Harleville). I ran the initial OCR on a PDF scanned for the Google Books project. For reasons that will become clear, I will refer to it by its Google Books ID, VyBaAAAAcAAJ. When I went to clean up the OCR text, I discovered that it was missing pages 2-6. I emailed the Google Books team about this, and got the following response:

google-books-material-issue

I’m guessing “a material issue” means that those pages were missing from the original paper copy, but I didn’t even bother emailing until the other day, since I found another copy in the Google Books database, with the ID kVwxUp_LPIoC.

Comparing the OCR text of VyBaAAAAcAAJ with the PDF of kVwxUp_LPIoC, I discovered some differences in spelling. For example, throughout the text, words that end in the old fashioned spelling -ois or -oit in VyBaAAAAcAAJ are spelled with the more modern -ais in kVwxUp_LPIoC. There is also a difference in the way “Madame” is abbreviated (“Mad.” vs. “M.me“) and in which accented letters preserve their accents when set in small caps, and differences in pagination. Here is the entirety of Act III, Scene X in each copy:

VyBaAAAAcAAJ

Act III, Scene X in copy VyBaAAAAcAAJ

Act III, Scene X in kVwxUp_LPIoC

Act III, Scene X in copy kVwxUp_LPIoC

My first impulse was to look at the front matter and see if the two copies were identified as different editions or different printings. Unfortunately, they were almost identical, with the most notable differences being that VyBaAAAAcAAJ has an œ ligature in the title, while kVwxUp_LPIoC is signed by the playwright and marked as being a personal gift from him to an unspecified recipient. Both copies give the exact same dates: the play was first performed on the 7th of Thermidor in year VIII and published in the same year (1800).

The Google Books metadata indicate that kVwxUp_LPIoC was digitized from the Lyon Public Library, while VyBaAAAAcAAJ came from the Public Library of the Netherlands. The other copies I have found in the Google Books database, OyL1oo2CqNIC from the National Library of Naples and dPRIAAAAcAAJ from Ghent University, appear to be the same printing as kVwxUp_LPIoC, as does the copy from the National Library of France.

Since the -ais and M.me spellings are closer to the forms used in France today, we might expect that kVwxUp_LPIoC and its cousins are from a newer printing. But in Act II, Scene XI I came across a difference that concerns negation, the variable that I have been studying for many years. The decadent Parisians Monsieur Basset and Madame de Verdie question whether marriage should be eternal. Our hero Formont replies that he has no reason not to remain with his wife forever. In VyBaAAAAcAAJ he says, “je n’ai pas de raisons,” while in kVwxUp_LPIoC he says “je n’ai point de raisons.”

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 78) in kVwxUp_LPIoC

Act III, Scene XI (page 78) in kVwxUp_LPIoC

In my dissertation study I found that the relative use of ne … point had already peaked by the nineteenth century, and was being overtaken by ne … pas. If this play fits the pattern, the use of the more conservative pattern in kVwxUp_LPIoC goes against the more innovative -ais and M.me spellings.

I am not an expert in French Revolutionary printing (if anyone knows a good reference or contact, please let me know!). My best guess is that kVwxUp_LPIoC is from a limited early run, some copies of which were given to the playwright to give away, while VyBaAAAAcAAJ and the other -ais/M.me/ne … point copies are from a larger, slightly later, printing.

In any case, it is clear that I should pick one copy and make it consistent with that. Since VyBaAAAAcAAJ is incomplete, I will try dPRIAAAAcAAJ. I will try to double-check all the spellings and wordings, but at the very least I will check all of the examples of negation against dPRIAAAAcAAJ as I annotate them.

Eclipsing

I’ve written about default assumptions before: how for example people in different parts of the English-speaking world have different assumptions about what they’ll get when they order “tea” or a “burger.” In the southern United States, the subcategory of “iced tea” has become the default, while in the northern US it’s “hot tea,” and in England it’s “hot tea with milk.” But even though iced tea is the default “tea” in the South, everyone there will still agree that hot tea is “tea.” In other cases, though, one subcategory can be so salient, so familiar as to crowd out all the other subcategories, essentially taking over the category.

An example of this eclipsing is the category of “concentration camp.” When you read those words, you probably imagined a Nazi death camp like Auschwitz, where my cousin Dora was imprisoned. (Unlike many of her fellow prisoners she survived the ordeal, and died peacefully earlier this year at the age of 101.) Almost every time we hear those words, they have referred to camps where our enemies killed millions of innocent civilians as part of a genocidal project, so that is what we expect.

This expectation is why so many people wrote in when National Public Radio’s Neal Conan referred to the camps where Japanese-Americans were imprisoned in World War II as “concentration camps” in 2012. NPR ombudspeople Edward Schumacher-Matos and Lori Grisham observed that the word dates back to the Boer War. Dan Carlin goes into detail about how widely the word “campos de reconcentración” was used in the Spanish-American war. Last year, Aya Katz compared the use of “concentration camp” to that of “cage,” and earlier this year, reviewed the history of the word.

In general, the “concentration camps” of the Boer War and the Spanish American War, as well as the “camps de regroupement” used by the French in the wars of independence in Algeria and Indochina, were a counter-insurgency tactic, whereby the colonial power controlled the movements of the civilian population in an effort to prevent insurgents from hiding among noncombatants, and to prevent noncombatants from being used as human shields.

As Roger Daniels writes in his great article “Words Do Matter: A Note on Inappropriate Terminology and the Incarceration of the Japanese Americans” (PDF), the concept of “internment” refers to the process of separating “alien enemies” – nationals of an enemy power – from the general population, and was first practiced with British subjects during the War of 1812. While this was done for citizens of Japan (and other enemy powers) during World War II, Daniels objects to the use of “internment” to describe the incarceration of American citizens on the basis of Japanese ancestry. He notes that President Roosevelt used the term “concentration camp” to describe them, and asks people to use that word instead of “internment.”

In the case of the colonial wars, the camps were used to isolate colonized people from suspected insurgents. In the case of the Japanese-American incarceration, the camps were used to isolate suspected spies from the general population. In neither case were they used to exterminate people, or to commit genocide. They were inhumane, but they were very different from Nazi death camps.

It is not hard to understand why the Nazi death camps have come to eclipse all other kinds of concentration camps. They were so horrific, and have been so widely discussed and taught, that the inhumanity of relocating the populations of entire towns and rounding up people based on ethnicity pales by comparison. It makes complete sense to spend so much more time on them. As a result, if we have ever heard the term “concentration camp” used outside of the context of extermination and genocide it doesn’t stick in our memory.

For most English speakers, “concentration camp” means a Nazi death camp, or one equally horrific. This is why Daniels acknowledges, following Alice Yang Murray, that “it is clearly unrealistic to expect everyone to agree to use the contested term concentration camp.”

She is calling you “dude”

I was struck by this tweet from Lynne Murphy today:

For those who don’t know, Lynne is an American linguist who lives in England and teaches at the University of Sussex, and blogs regularly about differences between British and American varieties of English. I’ve heard women saying “dude” to each other, but I wouldn’t call it calling each other “dude.” Lynne and I went back and forth (and got some input from Sylvia Sierra, a sociolinguistics graduate student who uses “dude” this way), but it comes down to two questions:

– Are Lynne and Sylvia observing the same things I remember, or something different?
– Are all three of us using the word “calling” in the same sense?

Fortunately, back in 1974 Arnold Zwicky developed a taxonomy of vocatives (PDF). Basically, a noun phrase, or something more or less nouny, can be used for four functions that are relevant to this question:

  1. Will the owner of a red Ford Taurus, license plate number XYZ123, please pick up any yellow house phone? (referential)
  2. Sheree Heil, come on down! You’re the next contestant on The Price is Right! (vocative call)
  3. No, Mom, I can’t pause. (vocative address)
  4. Oh boy, I can’t wait! (exclamation)

Scott Kiesling, in a 2004 American Speech article (PDF), further divides the use of dude as “(1) marking discourse structure, (2) exclamation, (3) confrontational stance mitigation, (4) marking affiliation and connection, and (5) signaling agreement,” but for the question at hand they are all non-referential and do not imply that the addressee is “a dude,” so in this post I will subsume all five under “exclamation.”

Boy is one of a long series of noun phrases that have made the journey from referential noun phrase to vocative call to vocative address to exclamation. Along the way, this sense of boy has been bleached of all of its old meaning: it can be used in context that have nothing remotely to do with boys. Other examples include man, baby, dear, babe, and of course God and lord.

A tricky thing about these, though, is that the functions can overlap. For example, in (2), “Sheree Heil” is actually being used for all four functions simultaneously. This is not unusual: Elizabeth Traugott has written extensively about how meaning change proceeds through ambiguity. The result is that we often are unable to tell exactly what stage a phrase is on in the journey.

That said, there are some features that can exclude one or more readings. The pure referential sense of a word is often much narrower than vocative or exclamatory senses; for example, consider the following examples:

  1. The baby threw up all over herself.
  2. Baby, let me give you a kiss.
  3. Look, baby, we’ve been through a lot together.
  4. Baby, it’s going to be a scorcher today!

It is hard to read (5) as referring to anything but an actual infant, while (6) could apply to either an infant or any other animate object. We can tell that (7) does not support a pure referential reading, because it would be incongruous if anyone said it to an actual baby. Note also that in the referential sense in (5), the noun phrase is fully integrated into the argument structure of the sentence, while in the vocative senses in (6) and (7) there are coreferential noun phrases (“you” and “we” respectively) in the argument structure.

Many of these have come out the other side of the chute and are no longer used as vocatives at all. In the exclamatory sense in (8), there is no coreferential noun phrase, and baby does not require the existence of a baby at all, as we saw above with boy.

Also note that in (7) the noun phrase does not come at the beginning of the sentence. For both the vocative call and exclamatory readings, it almost always does, so this is a pretty strong indicator that this is a vocative address.

There is also an interesting category of vocatives that have not (and may never) become exclamations, but have nonetheless broadened their reference considerably beyond their purely referential sense. Examples include buddy (which is almost never used for brothers, let alone buddies), bro (also not used for brothers), guys (no longer gender specific), son (rarely used for sons), and my son (almost always used for metaphorical sons in a religious or spiritual context).

One of my favorite examples of this comes from a hiking trip in Iceland, where I was the only American. The guides, however, both women, were used to taking Americans on trips, and had a running joke on the phonetic and functional similarity of “Guides?” and “Guys?” in the English vocative.

So we all agree that dude can be used as an exclamation, and in that context is bleached of its masculine reference restriction. I would not think of this as people “calling each other dude,” and I don’t think Lynne or Sylvia would. As I understand it, they are claiming that dude is like guys, in that it is also bleached of its masculine reference restriction in the vocative sense.

I am not ruling out this possibility; I know both Lynne and Sylvia to be astute observers of language. But I have not seen any evidence of it, and here is the kind of thing that would convince me: an example of dude in an unambiguous vocative address context. The easiest is one where it is not at the beginning of a sentence, for example:

  1. So, dude, what are we doing tonight?
  2. Before you go, dude, show me that picture.
  3. I am not impressed, dude.

If we can find examples of women using dude to address each other in contexts like that, to me that would count as them calling each other dude. What do you think?

Both of them?

When I wrote about my son’s use of “they” pronouns to refer to a single, specific person, I mentioned how there are people who want to be referred to with “they” or another set of gender-neutral pronouns because they don’t want to be identified by a gender. This change is also happening, but it’s not as straightforward as it sounds.

A few months ago I got into a small argument on Facebook. A former student of mine had posted something about transgender issues, and two of his Facebook friends disagreed with a comment of mine.

A few days later I ran into my student on campus, and he mentioned that one of the friends was his partner. “They just came out, so they get a little excited about these issues,” he said. This often happens to people when they come out, so I was not surprised.

At the time I assumed my student was saying that both of his Facebook friends had just come out. Two people coming out at about the same time? Well, it’s college, and my student is one of the officers of the campus LGBT group.

It was only later that it occurred to me that my student might have been talking about a single person (his partner) who had come out as genderqueer, and thus used “they” pronouns.

A few weeks ago I organized a karaoke event for members of my transgender support group, which is open to all genders. I was presenting as a woman, so everyone called me Andrea and referred to me with “she” pronouns. Another member of the group was presenting as a man but had asked us to use a feminine name and “she” pronouns, so we did.

At the event there were a few people who hadn’t shown up yet. I asked about one person, and the answer was, “They said they weren’t feeling well, so I don’t know if they’re going to make it.” Now, I knew that this person identified as genderqueer, and had complained that their boyfriend was reluctant to use “they” pronouns, and still my first thought was, “Oh, was the boyfriend planning to come too?”

I tell these stories to show that, at least for me, if I hear “they” in a specific context, I expect it to be plural. But hold on! This is not going to be some reactionary rant.

I don’t think it’s impossible for me to understand “they” as referring to single, specific people. I don’t think it’s impossible for entire communities of English speakers, or even the whole population, to make that shift. I don’t think it’s unreasonable to ask me or anyone else to try.

I do want to point out that these are pronouns, part of our entrenched, high-frequency core grammar, so it’s not going to be as easy as shifting from “stewardess” to “flight attendant.” On the other hand, using “they” pronouns would be easier than adopting any of the pronoun sets that have been specifically invented for gender-neutral use.

It would actually be easier if we used “they” pronouns for everyone, like my son may be doing part of the time. We’d have to come up with some way to specify plurality then, like “those people.” Let me know if you hear anything like that…

That guy and their red face

Today I was walking with my son, and we passed two men going the other way. I said to him, “Did you see how one of those guys was really red in the face?”

“No, what’s so special about them being red in the face?”

“I think he was drunk. Sometimes when people get really drunk, their faces get red that way. Not every red face means the person is drunk; sometimes it could be windburn-”

“So they might just have windburn?”

“Well, no, it’s a different pattern of redness…”

The conversation went on like that, with me using he pronouns to refer to the man, and my son using they pronouns. And no, he wasn’t talking about both of the men, he was talking about the one with the red face. I know this because he’s used they pronouns to refer to classmates in his all-boys gym class, and to his teachers who take the “Ms.” honorific and wear makeup and high heels.

I’ve been meaning to write about this for a while, but I figured tonight is a good night to post it, since the lexicographers are talking to the copy editors about singular “they.”

I grew up using “singular they” for generic referents: “If anyone needs help with this reading, they should talk to me.” I was familiar with the “the pronoun game,” as it was called in Chasing Amy, where the lesbian and bisexual characters obscured their sexuality by using “they” to refer to their (specific) partners. Being transgender and a linguist, I’m familiar with a relatively new use of “they” pronouns: for specific genderqueer or agender people who don’t want to be identified with any gender.

My son’s use of “they” doesn’t fit any of these established uses. He is using it for specific individuals whose gender is either male or female, and already known to us. I asked, and none of these people asked to be referred to with gender-neutral pronouns. I don’t have the impression that this is a conscious effort on my son’s part, either. It just seems to be the third person pronoun that he uses for everyone.

I don’t know if my son’s classmates use it this way, or if it’s just one of those quirks that comes from growing up as the child of two linguists. I haven’t yet heard him use “they” to refer to any immediate family members, or to people who are present. I’ll post an update if I hear anything like that. In the meantime, have you heard this use of “they”?

Times Square Subway Station

The Reduction Effect

Last week I talked about how high-frequency words and phrases resist analogical change. This entrenchment happens because analogical change is driven by forgetting, and it’s harder to forget something that you’ve said a lot. In this post I want to talk about a different effect of frequency, the reduction effect, where high-frequency words and phrases get shortened and simplified.

We see reduction in all the words and phrases we say most often. “How are you?” becomes “Hiya” and then “Hi.” “I don’t know” becomes “I dunno” and then something I can’t even write, a single “uh” vowel with a low-high-low tone pattern. “I am going to let you” becomes “I’m gonna let you,” and then, in the speech of Kanye West and Eminem, “amaletchoo.”

A lot of people find these frequency effects confusing. How can high frequency words and phrases be simultaneously the first to change and the last to change? What makes this possible is that they are two different kinds of change. Entrenchment is about forgetting, and the more we do things, the more we remember how to do them. Reduction is about ease, and the more we do things the easier they become.

This is like any habit. Because I take the subway to Times Square so frequently, I not only never forget the way, but I do all kinds of things to make it faster and easier. I know where to stand on the platform, where to sit on the train, and when to stand up, so that I get off right by the most convenient staircase.

More importantly, I have a low-level “muscle memory” of the movements involved in the trip. Every time, I climb the stairs the same way, sit down the same way, stand up the same way. It’s the same with unlocking my apartment door or cooking a steak. My movements are all smaller and smoother. I can do a lot of it without thinking.

As with entrenchment, I learned about the Reduction Effect in class with Joan Bybee. In one of her early papers, published in 1976 under the name Joan B. Hooper, she credits Hugo Schuchardt with discovering the relationship. In 1885 (German PDF p. 28 | English translation p. 56), Schuchardt wrote, “What is more natural than making things easier whenever frequency provides the strongest impulse for this and wherever the danger of misunderstanding is least?”

I know I said I’d talk about why it’s not so surprising that we get “snuck.” I’m almost there; I wanted to get this relatively straightforward stuff out of the way first.

Forgetting the infrequent things

I’m pleased that so many people found my last post on forgetting and language change interesting. Ariel Cohen-Goldberg in particular noted this about forgetting:


Cohen-Goldberg is absolutely right, and this stems from forgetting. The more frequently we do something, the more likely we are to do it the same way, without forgetting how. I never forget which train to take to get to Times Square, which way to turn the key in my apartment door, or which spices to use when cooking a steak, because I do all these things on a regular basis.

It is the same with language: I say “I had a pen in my pocket,” and never “I haved.” I always say “there were three children,” and never “three childs.” I say “was he there yesterday?” and never “did he be there yesterday?” This is what Joan Bybee and Sandy Thompson (2000) called the “conserving effect” of frequency, and Ron Langacker (1987) called “entrenchment.”

I learned about entrenchment from Joan Bybee in a course on frequency effects. She discusses it in more detail in her 1995 paper on regular morphology. In her 1985 book, she credits Witold Mańczak (1980), but Mark Aronoff suggests that it may go back to Zipf (1949). I went to check Zipf’s book; someone has it out of the library, but I put in a request for it.

This course in frequency effects actually changed my life. My term paper for the course, on the shift from ne alone to ne … pas in French, provided a good starting point for my dissertation. In section 7.3.2 of my dissertation I look at the entrenchment of high-frequency phrases like je ne sais “I don’t know,” je ne peux “I can’t,” and je n’ose “I daren’t.”

The study of entrenchment has also brought us the Google Ngram Viewer, a tool that linguists feel decidedly ambivalent about. Earlier this month, Elizabeth Weingarten profiled the Ngram Viewer in Slate, particularly its founders, mathematician Erez Lieberman Aiden and biologist Jean-Baptiste Michel.

And that was the question that set Aiden and Jean-Baptiste Michel, another Viewer founding father and co-founder of the Culturomics field, on the path to create such a tool in the first place. Back in 2007, Aiden, Michel, and a crew of undergraduate students decided to test the word evolution hypothesis by tracking irregular verbs over the past 1,000 years. They found 177 that were traceable (for instance, go and went, run and ran), plotted them manually, and discovered that the verbs did undergo a kind of evolutionary process. “The less frequent the verb, the more rapidly it becomes irregular,” Aiden explains. “Our work became this demo of how evolution by natural selection might work in a cultural study.”

In their paper, which came out while I was examining entrenchment in my corpus, Lieberman and his colleagues cited Bybee’s work on entrenchment, but somehow Bybee didn’t make it into Weingarten’s article, just as Mańczak didn’t make it into Lieberman et al.’s paper (or my dissertation), and Zipf (if he did write about it) didn’t make it into Bybee’s book. The main thing: it came from linguists.

Entrenchment is a very important effect, but many people forget to take it into account in their studies. At the 2008 conference of the American Association for Corpus Linguistics I was That Annoying Guy who asked everyone “If you take out this handful of high-frequency items, is there any evidence in your study that the change is still happening?” The other presenters were surprisingly tolerant of these questions.

You may be familiar with another effect of frequency, what Bybee and Thompson call the “reduction effect.” I’ll talk about that in a future post. And I’ll definitely get around to analogy as well. In the meantime, don’t forget to forget your low-frequency verbs!