“Said” for 2016 Word of the Year

I just got back from the American Association for Corpus Linguistics conference in Ames, Iowa, and I’m calling the Word of the Year: for 2016 it will be said.

You may think you know said. It’s the past participle of say. You’ve said it yourself many times. What’s so special about it?

What’s special was revealed by Jordan Smith, a graduate student at Iowa State, in his presentation on Saturday afternoon. said is becoming a determiner. It is grammaticizing.

In addition to its participial use (“once the words were said”) you’ve probably seen said used as an attributive adjective (“the said property”). It indicates that the noun it modifies refers to a person, place or thing that has been mentioned recently, with the same noun, and that the speaker/writer expects it to be active in the hearer/reader’s memory.

Attributive said is strongly associated with legal documents, as in its first recorded use in the English Parliament in 1327. The Oxford English Dictionary reports that said was used outside of legal contexts as early as 1973, in the English sitcom Steptoe and Son. In this context it was clearly a joke: a word that evoked law courts used in a lower-class colloquial context.

Jordan Smith examined uses of said in the Corpus of Contemporary American English (COCA) and found that attributive said has increasingly been used without the for several years now, and outside the legal domain. He observes that syntactic changes and increased frequency have been named by linguists like Joan Bybee as harbingers of grammaticization.

Grammaticization (also known as grammaticalization; search for both) is when an ordinary lexical item (like a noun, verb or adjective, or even a phrase) becomes a grammatical item (like a pronoun, preposition or auxiliary verb). For example, while is a noun meaning a period of time, but it was grammaticized to a conjunction indicating simultaneity. Used is an adjective meaning accustomed, as in “I was used to being lonely,” but has also become part of an auxiliary indicating habitual aspect as in “I used to be lonely.”

Jordan is suggesting that said is no longer just a verb or even an adjective, it’s our newest determiner in English. Determiners are an exclusive club of short words that modify nouns. They include articles like an and the, but also demonstratives like these and quantifiers like several.

Noun phrases without a determiner tend to refer to generic categories, as I have been doing with phrases like legal documents and grammaticization. That is clearly not what is going on with said girlfriend. Noun phrases with said refer to a specific item or group of items, in some sense even more so than noun phrases with the.

Thanks to the wireless Internet at the AACL, I began searching for of said on Twitter, and found a ton of examples. There are plenty for in said examples as well.

It’s not just happening in English. The analogous French ledit is also used outside the legal domain. Its reanalysis is a bit different, since it incorporates the article rather than replacing it. Like most noun modifiers in French it is inflected for gender and number. I haven’t found anything similar for Spanish.

In 2013 the American Dialect Society chose because as its Word of the Year. Because is already a conjunction, having grammaticized from the noun cause, but it has been reanalyzed again into a preposition, as in because science. Some theorists consider this to be a further step in grammaticization. And here is a twenty-first century prepositional phrase for you, folks: because (P) said (Det) relationship (N).

After Jordan’s presentation it struck me that said is an excellent candidate for the 2016 Word of the year. And if the ADS isn’t interested, maybe another organization like the International Cognitive Linguistics Association, can sponsor a Grammaticization of the Year.

On being a public linguist

People say you should stand up for what you believe in. They say you should look out for those less fortunate, and speak up for those who don’t get heard. They say that those of us who come from marginalized backgrounds, like TBLG backgrounds for example, but have enough privilege to be out in relative safety should speak up for those who don’t have that privilege. They say that those of us who have undertaken in-depth study in the interest of society have a particular responsibility to share what we know with the world as “public intellectuals.” They say that we linguists need to do a better job of applying our knowledge to real-world problems and communicating solutions to the public at large.

They’re right of course, but there’s a reason more people don’t do these things. They’re hard to do, and even harder to do right. Lots of people are strongly invested in the status quo and in thinking of themselves as good people, and they don’t like to be told that what they’re doing at best ineffective and at worst harmful. Lots of people think that because they’re trans they know everything there is to know about trans issues, or that because they use language they know everything there is to know about language.

Case in point: after watching with increasing frustration for years as the word “cisgender” was invented and abused, back in December I wrote a series of blog posts about it. I know this is a controversial topic, and I was a bit apprehensive since I was on the job market, but my posts was not idle rants: as a linguist, a trans person, and someone who has observed trans politics for years, I had been trained to do this kind of analysis, and pursued these topics beyond my training.

I anticipated a number of potential objections to my argument and addressed them in the first three posts. As I published each one I was worried it would get a huge backlash, but there was barely a peep (more on that in another post). So for the title of the last one I went big: “The word “cisgender” is anti-trans.” Not much reaction.

A few weeks ago I came across a Facebook post by a gender therapist asking for opinions about “cisgender,” so I left a link to my blog post, identifying it as “my professional opinion as a linguist.” The therapist then shared my post without identifying me as either trans or a linguist.

Then there was a backlash. Several people immediately called my post “garbage” and “horse shit.” There were a handful of substantive disagreements, all of which I had anticipated in my post and previous ones that I had linked to. There was some support, but the vast majority of comments were negative. There were several similar comments made on my blog post itself, most of which I left unpublished since they were repetitive and unhelpful.

I know that plenty of people face far worse reactions to things they post. I didn’t receive any comments on my looks, rape threats or death threats. But it was still very upsetting, particularly as it was posted the same day I began my first full-time job since receiving my Ph.D. – an event that was positive on a number of levels, but upsetting on other levels.

The gender therapist, who presumably helps people with their mental states, showed no interest whatsoever in mine. They made no effort to moderate, did not intervene in the comments, and sent me no personal messages. The idea that a trans person might be losing sleep over these attacks on their page may not have even occurred to them.

The response my post has gotten from other public linguists has been minimal. A columnist who’s written about the issue and encouraged me to write gave my post a few tweets. A radical feminist whose writings about language and politics inspired me for years completely ignored it. It has not been picked up by any of the popular linguistics blogs, or by anyone talking about language, gender and sexuality.

It’s quite possible that these linguists disagree with me. There are some very specific linguistic questions at stake. But linguists love to argue, and I would welcome respectful, constructive engagement with these questions. So far there has been none.

I have also gotten very little support from other linguists. When I was first formulating these arguments a few years ago on Twitter, there were at least two linguists who explicitly denied that I had any standing to contest the arguments for “cis” that they were retweeting. They were satisfied with the flimsiest of pseudolinguistic rationales in pursuit of their political and social goals, and for whatever reasons I did not qualify as an authentic voice of the trans community in their eyes. I stopped following them on Twitter, and as far as I could tell they had no reaction whatsoever to my posts.

I know that a lot of people don’t want to get involved in flamewars on Twitter or Facebook. It’s really hard to know who’s right and who’s wrong. At first glance I look like just another white guy, and I project an image of success and confidence on social media because that’s what everyone tells me I need to do. Some people may disagree with my stance on a political basis.

I mostly came out of the Facebook flareup okay, although it’s hard to tell how much of my insomnia and touchiness relates to that as opposed to other stresses. Re-reading some of those comments just now was pretty upsetting. I made a decision to focus on the new job, and avoided reading comments, posts or links for a week or two. Now it’s blown over – but there’s no telling when it’ll get shared by someone else.

My main point is that being a public linguist isn’t easy. Speaking out isn’t easy. Fighting on your own behalf instead of some Little People somewhere isn’t easy – even if you’ve got a certain amount of privilege. If you’re wondering why people don’t fight for themselves more often, why they don’t speak up, why linguists don’t write more public posts about issues that matter – there’s your answer. It’s much easier to bury your nose in a book and write about grammaticization vs. reanalysis in Old Church Slavonic.

If we really want people to take a stand on these things, we need to support them. We need to stick up for linguists who speak out in public. We need principles that go beyond identity and political and social affiliation. And we need people who are willing to support linguists who speak out based on those principles. We need people who will make themselves available to back up other linguists on the Internet. Without real support, it’s all empty rhetoric.

@everytreenyc

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

silver maple 20160827

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

On pet parents

I’m a parent. It doesn’t make me better or worse than anyone else, it’s just a category that reflects some facts about me: I conceived a new human with my wife, we are raising and caring for that human, and we expect to have a relationship with him for the rest of our lives. Some people don’t take parenthood seriously, so it doesn’t impact their lives very much, but their kids suffer. We take it very seriously, and it’s a lot of work for us.

I also take care of pets. We own three cats, and sometimes I walk my mom’s dog or take him to be groomed. It can be a lot of work, and the relationships can be very intimate at times. “Ownership” is kind of a funny word for it. In some ways it can be like certain stages of parenting: we buy all the food and make sure the animals don’t get into danger. It makes sense when I hear people refer to their pets as their “baby” or put words in their pets calling themselves “daddy.” I even understand when I hear them refer to themselves as “pet moms.”

I understand this usage, but I do not agree with it. I have a kid, and I have pets. The relationships are similar, but different. When someone calls themself a “pet dad,” it trivializes my relationship with my kid and infantilizes my pets. It erases the work of the actual parents, and trivializes the hard work of humans who act as surrogate parents to infant pets. I am a dad: I am not a pet dad, and I am not my pets’ dad. Or their mom.

My kid will one day be an adult, and while I may always think of him as The Kid, he will be able to function as an autonomous member of society. (Note that the term “kid” itself is an animal metaphor – referring to a juvenile goat.) Only one of my cats can still be considered a juvenile by any standard; the others are five years old and twenty years old, respectively. They are adult males, and until the last century they would have been free to come and go as they wished.

If my cats are incapable of leaving our house unaccompanied it is more likely due to the fact that we have cars everywhere than anything else. When I was a kid we lost three dogs to car culture. When I was eleven I saw a neighbor’s cat crushed beneath the wheels of a car, and arrived just in time to see him take his last breath. We have indoor cats and dog leashes in part because we have made the outdoors inhospitable.

I suspect one reason we hear more about “pet parents” is that so few of our pets are parents themselves. I support universal neutering, and have only adopted neutered cats from shelters or feral rescuers. It’s the best response to the overpopulation of feral animals, but it does make the pets neuter – and childless.

When I was a kid we had a cat who had a litter of kittens. I watched one of our dogs give birth to eleven puppies, and then found homes for the ten that lived. Our male cats were aggressive, sexual toms. Again, not wise in retrospect, but it was hard to think of any of the humans in the house as “moms” or “dads” of our pets while they were themselves moms and dads.

There is one human I know who would qualify as a “cat mom” in my mind. She is the woman who leads the feral cat helpers in our neighborhood. Six years ago someone found a baby kitten near some railroad tracks in Manhattan. My neighbor fostered this kitten in her apartment for five months, feeding him with an eyedropper until he was old enough to eat. She posted his picture on her website and we adopted him. If he has a “pet mom” it’s her.

What Professor Bigshot said

I was feeling very nervous, sitting there in Professor Bigshot’s office. I had just been accepted into the PhD program, and was visiting the department to get to know everyone and see if it was the right fit. I hadn’t applied to any other PhD program. If I didn’t go here, I probably wouldn’t get a PhD.

You can figure out pretty easily who Professor Bigshot is, if you care. I guess you could say I’m giving her a pseudonym for SEO reasons.

The student who was showing me around the department had asked, “Oh, have you met Professor Bigshot yet?” I had not. I had heard of her, but I had absolutely no idea what her work was: what she studied, what she had written, what her theories were. I was nervous, sitting there in her office, because I was afraid she would find out that I hadn’t read anything she’d written. I was right to be nervous, but for a completely different reason.

“So Angus,” Professor Bigshot asked me, “You know that the job market in linguistics is very tight? You understand that we cannot guarantee you a job when you graduate?”

I relaxed a bit. I knew this one. I had thought long and hard about it. I said, brightly, “Oh yes. But that’s okay. I have computer skills, and I can always get another IT job if this doesn’t work out.”

“Well, at this university,” Professor Bigshot’s face abruptly twisted into a snarl. “We are not in the business of granting recreational PhDs.”

That was the last thing I was expecting to hear. I did the only thing I could think of: I thanked Professor Bigshot politely, got up and walked out of her office.

I still had a day and a half before I left town. I had planned to visit classes and see the rest of the university.

I didn’t quite know how to tell my student guide what Professor Bigshot had said, so in a few minutes I was sitting down in Professor Littleshot’s office. I didn’t know what he had done in linguistics either, but at this point it hardly seemed to matter.

“So Angus,” said Professor Littleshot. “Have you made up your mind whether you’re going to attend our program?”

I opened my mouth. “Well…”

“Is there anything I can say to convince you?”

I shut my mouth and thought for a minute. “Well, I guess you just did.”

That was slightly over nineteen years ago. Professor Littleshot retired before I could propose a dissertation topic. I wrote a dissertation in Professor Bigshot’s theoretical framework, received my PhD in 2009, taught linguistics as an adjunct for seven years, sent out applications for tenure-track jobs and was invited to exactly zero interviews. Last week I started working as a Python developer in the IT Department at Columbia University.

Recreational PhD? Well, there have been times that I’ve enjoyed quite a lot. And yes, I suppose you can get a back injury, chronic insomnia and thousands of dollars of debt from plenty of other recreational activities. Maybe I would have enjoyed it more if I hadn’t tried so hard to prove Professor Bigshot wrong.

Quantitative needs qualitative, and vice versa

Data Science is all the rage these days. But this current craze focuses on a particular kind of data analysis. I conducted an informal poll as an icebreaker at a recent data science party, and most of the people I talked to said that it wasn’t data science if it didn’t include machine learning. Companies in all industries have been hiring “quants” to do statistical modeling. Even in the humanities, “distant reading” is a growing trend.

primula-1326409_1280

There has been a reaction to this, of course. Other humanists have argued for the continued value of close reading. Some companies have been hiring anthropologists and ethnographers. Academics, journalists and literary critics regularly write about the importance of nuance and empathy.

For years, my response to both types of arguments has been “we need both!” But this is not some timid search for a false balance or inclusion. We need both close examination and distributional analysis because the way we investigate the world depends on both, and both depend on each other.

I learned this from my advisor Melissa Axelrod, and a book she assigned me for an independent study on research methods. The Professional Stranger is a guide to ethnographic field methods, but also contains some commentary on the nature of scientific inquiry, and mixes its well-deserved criticism of quantitative social science with a frank acknowledgment of the interdependence of qualitative and quantitative methods. On Page 134 he discusses Labov’s famous study of /r/-dropping in New York City:

The catch, of course, is that he would never have known which variable to look at without the blood, sweat and tears of previous linguists who had worked with a few informants and identified problems in the linguistic structure of American English. All of which finally brings us to the point of this example traditional ethnography struggles mightily with the existence of pattern among the few.

Labov acknowledges these contributions in Chapter 2 of his 1966 book: Babbitt (1896), Thomas (1932, 1942, 1951), Kurath (1949, based on interviews by Guy S. Lowman), Hubbell (1950) and Bronstein (1962). His work would not be possible without theirs, and their work was incomplete until he developed a theoretical framework to place their analysis in, and tested that framework with distributional surveys.

We’ve all seen what happens when people try to use one of these methods without the other. Statistical methods that are not grounded in close examination of specific examples produce surveys that are meaningless to the people who take them and uninformative to scientists. Qualitative investigations that are not checked with rigorous distributional surveys produce unfounded, misleading generalizations. The worst of both worlds are quantitative surveys that are neither broadly grounded in ethnography nor applied to representative samples.

It’s also clear in Agar’s book that qualitative and quantitative are not a binary distinction, but rather two ends of a continuum. Research starts with informal observations about specific things (people, places, events) that give rise to open-ended questions. The answers to these questions then provoke more focused questions that are asked of a wider range of things, and so on.

The concepts of broad and narrow, general and specific, can be confusing here, because at the qualitative, close or ethnographic end of the spectrum the questions are broad and general but asked about a narrow, specific set of subjects. At the quantitative, distant or distributional end of the spectrum the questions are narrow and specific, but asked of a broad, general range of subjects. Agar uses a “funnel” metaphor to model how the questions narrow during this progression, but he could just as easily have used a showerhead to model how the subjects broaden at the same time.

The progression is not one-way, either. The findings of a broad survey can raise new questions, which can only be answered by a new round of investigation, again beginning with qualitative examination on a small scale and possibly proceeding to another broad survey. This is one of the cycles that increase our knowledge.

Rather than the funnel metaphor, I prefer a metaphor based on seeing. Recently I’ve been re-reading The Omnivore’s Dilemma, and in Chapter 8 Michael Pollan talks about taking a close view of a field of grass:

In fact, the first time I met Salatin he’d insisted that even before I met any of his animals, i get down on my belly in this very pasture to make the acquaintance of the less charismatic species his farm was nurturing that, in turn, were nurturing his farm.

Pollan then gets up from the grass to take a broader view of the pasture, but later bends down again to focus on individual cows and plants. He does this metaphorically throughout the book, as many great authors do: focusing in on a specific case, then zooming out to discuss how that case fits in with the bigger picture. Whether he’s talking about factory-farmed Steer 534, or Budger the grass-fed cow, or even the thousands of organic chickens that are functionally nameless under the generic name of “Rosie,” he dives into specific details about the animals, then follows up by reporting statistics about these farming methods and the animals they raise.

The bottom line is that we need studies from all over the qualitative-quantitative spectrum. They build on each other, forming a cycle of knowledge. We need to fund them all, to hire people to do them all, and to promote and publish them all. If you do it right, the plural of anecdote is indeed data, and you can’t have data without anecdotes.

Viewing in free motion

Last month I went on a walk with my friend Ezra. It was his birthday, so we walked for almost two hours, drinking coffee, eating cinnamon rolls, and talking about semantics and coding. The funny thing is that Ezra lives on the West Coast and I live in New York, so we conducted our entire conversation by cell phone, with him walking through Ballard and Loyal Heights, and me walking through Jackson Heights and East Elmhurst.

20141207_142554

Cell phones have been around for decades, and I’m sure we’re far from the first to walk together this way. You’ve probably done it yourself. But it reminded me of Isaac Asimov’s 1956 novel The Naked Sun, in which our hero Elijah Baley visits an Earth colony on the planet Solaria, where all the colonists live on separate estates, with at most one spouse and possibly an infant child, surrounded by robots who tend to their every need, almost never seeing one another in person. They interact socially by “viewing” each other through realistic virtual-reality projections.

Baley interviews a murder suspect, Gladia Delmarre, and is intrigued when she tells him she goes on walks together with her neighbor. “I didn’t know you could go on walks together with anyone,” says Baley.

“I said viewing,” responds Gladia. “Oh well, I keep forgetting you’re an Earthman. Viewing in free motion means we focus on ourselves and we can go anywhere we want to without losing contact. I walk on my estate and he walks on his and we’re together.”

I had no visual contact with Ezra during this walk. I’ve seen people “viewing in free motion” on FaceTime. We could probably have rigged something up with a GoPro camera and Google Glass, but it would most likely not have been much like on Solaria, where I could have looked over and seen a chunk of Seattle superimposed on Queens, with Ezra walking across it next to me.

The biggest reason not to attempt any visual presence is that it was dangerous enough for me to be crossing the street while talking; it would have been much worse if the virtual view of the cars on 24th Avenue NW were blocking my view of the cars coming at me down Northern Boulevard.

Of course, on Solaria all the cars were (or will be?) automatic, and there are armies of robots to protect the humans from danger.

Printing differences and material issues in Google Books

I am looking forward to presenting my Digital Parisian Stage corpus and the exciting results I’ve gotten from it so far at the American Association for Corpus Linguistics at Iowa State in September. In the meantime I’m continuing to process texts, working towards a one percent sample from the Napoleonic period (Volume 1 of the Wicks catalog).

One of the plays in my sample is les Mœurs du jour, ou l’école des femmes, a comedy by Collin-Harleville (also known as Jean-François Collin d’Harleville). I ran the initial OCR on a PDF scanned for the Google Books project. For reasons that will become clear, I will refer to it by its Google Books ID, VyBaAAAAcAAJ. When I went to clean up the OCR text, I discovered that it was missing pages 2-6. I emailed the Google Books team about this, and got the following response:

google-books-material-issue

I’m guessing “a material issue” means that those pages were missing from the original paper copy, but I didn’t even bother emailing until the other day, since I found another copy in the Google Books database, with the ID kVwxUp_LPIoC.

Comparing the OCR text of VyBaAAAAcAAJ with the PDF of kVwxUp_LPIoC, I discovered some differences in spelling. For example, throughout the text, words that end in the old fashioned spelling -ois or -oit in VyBaAAAAcAAJ are spelled with the more modern -ais in kVwxUp_LPIoC. There is also a difference in the way “Madame” is abbreviated (“Mad.” vs. “M.me“) and in which accented letters preserve their accents when set in small caps, and differences in pagination. Here is the entirety of Act III, Scene X in each copy:

VyBaAAAAcAAJ

Act III, Scene X in copy VyBaAAAAcAAJ

Act III, Scene X in kVwxUp_LPIoC

Act III, Scene X in copy kVwxUp_LPIoC

My first impulse was to look at the front matter and see if the two copies were identified as different editions or different printings. Unfortunately, they were almost identical, with the most notable differences being that VyBaAAAAcAAJ has an œ ligature in the title, while kVwxUp_LPIoC is signed by the playwright and marked as being a personal gift from him to an unspecified recipient. Both copies give the exact same dates: the play was first performed on the 7th of Thermidor in year VIII and published in the same year (1800).

The Google Books metadata indicate that kVwxUp_LPIoC was digitized from the Lyon Public Library, while VyBaAAAAcAAJ came from the Public Library of the Netherlands. The other copies I have found in the Google Books database, OyL1oo2CqNIC from the National Library of Naples and dPRIAAAAcAAJ from Ghent University, appear to be the same printing as kVwxUp_LPIoC, as does the copy from the National Library of France.

Since the -ais and M.me spellings are closer to the forms used in France today, we might expect that kVwxUp_LPIoC and its cousins are from a newer printing. But in Act II, Scene XI I came across a difference that concerns negation, the variable that I have been studying for many years. The decadent Parisians Monsieur Basset and Madame de Verdie question whether marriage should be eternal. Our hero Formont replies that he has no reason not to remain with his wife forever. In VyBaAAAAcAAJ he says, “je n’ai pas de raisons,” while in kVwxUp_LPIoC he says “je n’ai point de raisons.”

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 75) in VyBaAAAAcAAJ

Act III, Scene XI (page 78) in kVwxUp_LPIoC

Act III, Scene XI (page 78) in kVwxUp_LPIoC

In my dissertation study I found that the relative use of ne … point had already peaked by the nineteenth century, and was being overtaken by ne … pas. If this play fits the pattern, the use of the more conservative pattern in kVwxUp_LPIoC goes against the more innovative -ais and M.me spellings.

I am not an expert in French Revolutionary printing (if anyone knows a good reference or contact, please let me know!). My best guess is that kVwxUp_LPIoC is from a limited early run, some copies of which were given to the playwright to give away, while VyBaAAAAcAAJ and the other -ais/M.me/ne … point copies are from a larger, slightly later, printing.

In any case, it is clear that I should pick one copy and make it consistent with that. Since VyBaAAAAcAAJ is incomplete, I will try dPRIAAAAcAAJ. I will try to double-check all the spellings and wordings, but at the very least I will check all of the examples of negation against dPRIAAAAcAAJ as I annotate them.

Introducing Selected Birthdays

If you have an Android phone like me, you probably use Google Calendar. I like the way it integrates with my contacts so that I can schedule events with people. I like the idea of it integrating with my Google+ contacts to automatically create a calendar of birthdays that I don’t want to miss. There’s a glitch in that, but I’ve created a new app to get around it, called Selected Birthdays.

birthdays-screenshot20160514

The glitch is that the builtin Birthdays calendar has three options: show your Google Contacts, show your contacts and the people in your Google+ circles, or nothing. I have a number of contacts who are attractive and successful people, but I’m sorry to say I have no interest in knowing when their birthdays are. Natasha Lomas has even stronger feelings.

Google doesn’t let you change the builtin Birthdays calendar, but it does let you create a new calendar and fill it with the birthdays that interest you. My new web app, Selected Birthdays, automates that process. It goes through your contacts, finds the ones who have shared their birthdays with you, and gives you a checklist. You decide whose birthdays to include, and Select Birthdays will create a new calendar with those birthdays. It’ll also give you the option of hiding Google’s built-in birthday calendar.

I wrote the Selected Birthdays app in Javascript with the Google+ and Google Calendar APIs. Ian Jones was a big help in recommending the moment.js library, which I used to manipulate dates. Bootflat helped me add a bit of visual style.

For the app to work you’ll have to authorize it to read your contacts and write your calendars. For your privacy, the app communicates directly between your browser and Google’s server; once you download it there is no further contact with my server. There is no way for me to see or edit your contacts or calendars. You can verify that in the source code.

Please let me know if you have any comments, questions or suggestions. I have also made the code available on GitHub for free under the Apache License, if you want to build on it. A number of people have said they wish they had an app like this for Facebook. If enough of you repeat that, I’ll look into it!

Prejudice and intelligibility

Last month I wrote about the fact that intelligibility – the ability of native speakers of one language or dialect to understand a closely related one – is not constant or automatic. A major factor in intelligibility is familiarity: when I was a kid, for example, I had a hard time understanding the Beatles until I got used to them. Having lived in North Carolina, I find it much easier to understand people from Ocracoke Island than my students do.

Photo: Theonlysilentbob / Wikimedia

Photo: Theonlysilentbob / Wikimedia

Prejudice can play a big role in intelligibility, as Donald Rubin showed in 1992. (I first heard about this study from Rosina Lippi-Green’s book English With an Accent.) At the time, American universities had recently increased the overall number of instructors from East Asia they employed, and some students complained that they had difficulty understanding the accents of their instructors.

In an ingenious experiment, Rubin demonstrated that much of this difficulty was due to prejudice. He recorded four-minute samples of “a native speaker of English raised in Central Ohio” reading a script for introductory-level lectures on two different subjects and played those samples to three groups of students.

For one group, a still photo of a “Caucasian” woman representing the instructor was projected on a screen while the audio sample was played. For the second group, a photo of “an Asian (Chinese)” woman was projected, with the same audio of the woman from central Ohio (presumably not of Asian ancestry) was played. The third group heard only the audio and was not shown a photo.

In a survey they took after hearing the clip, most of the students who saw the picture of an Asian woman reported that the speaker had “Oriental/Asian ethnicity.” That’s not surprising, because it’s essentially what they were told by being shown the photograph. But many of these students went further and reported that the person in the recording “speaks with a foreign accent.” In contrast, the vast majority of the students who were shown the “Caucasian” picture said that they heard “an American accent.”

The kicker is that immediately after they heard the recording (and before answering the survey), Rubin tested the students on their comprehension of the content of the excerpt, by giving them a transcript with every seventh word replaced by a blank. The students who saw a picture of an Asian woman not only thought they heard a “foreign accent,” but they did worse on the comprehension task! Rubin concluded that “listening comprehension seemed to be undermined simply by identifying (visually) the instructor as Asian.”

Rubin’s subjects may not have felt any particular hostility towards people from East Asia, but they had a preconceived notion that the instructor would have an accent, and they assumed that they would have difficulty understanding her, so they didn’t bother trying.

This study (and a previous one by Rubin with Kim Smith) connect back to what I was saying about familiarity, and I will discuss that and power imbalances in a future post, but this finding is striking enough to merit its own post.