Nobody’s Boy

I got a paper rejected from a generativist conference a few years ago. A generativist friend of mine said, “Why did you bother submitting your paper to that conference? You knew they were going to reject it.” I said, “Well, the conference was in town, so I figured I’d send something in anyway.”

My friend proceeded to tell me a story from her early grad school days about reviewing papers for her school’s signature conference. She sat down one evening with Professor Big Deal, who glanced through the stack of anonymous submissions and sorted them one by one into piles. “This is from one of Professor X’s students, and this is from one of Professor Y’s students. Here’s another from Professor X’s group. This must be Professor Z.” She continued like this until all the papers were sorted, and then as I recall she had some formula for allocating time to each professor and their students.

I think about this a lot, because I’m not a Student Of anyone in particular. On paper I may look like a student of Professor Bigshot, and that’s probably how my paper got accepted to a conference where Professor Bigshot was a keynote speaker. But I’m not really a Student Of Professor Bigshot. I didn’t ask her to be on my committee. And I know she doesn’t think of me as a Student Of hers, because she was sitting in front of me later in that conference, and walked out of the room right before it was my turn to present my paper.

My relationship with my actual advisor is Complicated, but suffice it to say that we don’t work in the same subfield of linguistics, and I’m tied to the New York area, where she doesn’t have the pull to get me a job anyway. My relationships with my other committee members are problematic in various ways. I’m on good terms with plenty of other linguists, but since I’m not their Student their loyalty to me is always secondary.

Even if my friend’s story about Professor Big Deal is an egregious outlier, it is still a regular occurrence to see professors co-authoring and co-presenting papers with their students, making introductions and writing letters. If you know me professionally, I can pretty much guarantee that we were not introduced by Professor Bigshot, or by any member of my committee. If you’ve seen me present my research, or read it anywhere, or hired me, it’s entirely through my own hard work. I have not had any of the advantages that come with being a Student Of anyone.

You could say that it’s my fault for not choosing the right advisors, or for the problems in my relationships with my advisors. In my defense I would argue that most of the problems in these relationships had to do with my supporting my wife’s progress on the tenure track and my kid’s not being in daycare ten hours a day over my own progress on the PhD. But even if you disagree, does that mean that I deserve to be a second-class citizen in the field?

I know I’m not the only academic orphan out there. Maybe we should get together and found a Home for Orphaned Linguists, where we can hope to someday be adopted by professors with generous allocations of reassigned time, who will co-author with us and introduce us and attend our talks. Some day…

Sampling is a labor-saving device

Last month I wrote those words on a slide I was preparing to show to the American Association for Corpus Linguistics, as a part of a presentation of my Digital Parisian Stage Corpus. I was proud of having a truly representative sample of theatrical texts performed in Paris between 1800 and 1815, and thus finding a difference in the use of negation constructions that was not just large but statistically significant. I wanted to convey the importance of this.

I was thinking about Laplace finding the populations of districts “distributed evenly throughout the Empire,” and Student inventing his t-test to help workers at the Guinness plants determine the statistical significance of their results. Laplace was not after accuracy, he was going for speed. Student was similarly looking for the minimum amount of effort required to produce an acceptable level of accuracy. The whole point was to free resources up for the next task.

I attended one paper at the conference that gave p-values for all its variables, and they were all 0.000. After that talk, I told the student who presented that those values indicated he had oversampled, and he should have stopped collecting data much sooner. “That’s what my advisor said too,” he said, “but this way we’re likely to get statistical significance for other variables we might want to study.”

The student had a point, but it doesn’t seem very – well, “agile” is a word I’ve been hearing a lot lately. In any case, as the conference was wrapping up, it occurred to me that I might have several hours free – on my flight home and before – to work on my research.

My initial impulse was to keep doing what I’ve been doing for the past couple of years: clean up OCRed text and tag it for negation. Then it occurred to me that I really ought to take my own advice. I had achieved statistical significance. That meant it was time to move on!

I have started working on the next chunk of the nineteenth century, from 1816 through 1830. I have also been looking into other variables to examine. I’ve got some ideas, but I’m open to suggestions. Send them if you have them!

Shelter from the tweetstorm

It’s happened to me too: I’m angry, or upset, or excited about something. I go on Twitter. I’ve got stuff to say. It’s more than will fit in the 140-character limit, but I don’t have the time or energy to write a blog post. So I just write a tweet. And then another, and another.

I’ve seen other people doing this, and I’m fine with it. But for a while now I’ve seen people doing something more planned, numbering their tweets. Many people try to predict how many tweets are going to be in a particular rant, and often fail spectacularly along the lines of Monty Python’ Spanish Inquisition sketch. Some people are clearly composing the whole thing ahead of time, as a unit. Sometimes they’re not even excited, just telling a story. It’s developing into a genre: the tweetstorm.

I get why people are reluctant to blog in these cases. If you’re already in Twitter and you want to write something longer, you have to switch to a different window, maybe log in, come up with a picture to grab people’s attention. Assuming you already have an account on a blogging platform. It doesn’t help that Twitter sees some of these as competitors and drags its feet on integrating them. And yes, mobile blogging apps still leave a lot to be desired, especially if you’ve got an intermittent connection like on the train.

People also tend to be drawn in easier one tweet at a time, like Beorn meeting the dwarves in the Hobbit. Maybe they don’t feel in the mood for reading something longer, or opening a web browser.

There may also be an aspect of live performance for the tweetstormer and the people who happen to be on Twitter while the storm is passing over, and the thread functions as an inferior archive of the performance, like concert videos. I can understand that too, but it’s a pain for the rest of us.

The problem is that Twitter sucks as a platform for reading longform pieces, or even medium-form ones. Yes, I know they’ve introduced “threading” features to make it easier to follow conversations. That doesn’t mean it’s easy to follow a single person’s multi-tweet rant. Combine that with other people replying in the middle of the “storm” and the original tweeter taking time in the middle to respond to them, and people using the quote feature and replying to quotes and quoting replies, and it gets really chaotic. If I bother to take the time, usually at the end it turns out it’s not worth it.

In terms of Bad Things on Twitter this is nowhere near the level of harassment and death threats, or even people livetweeting Netflix videos. But please, just go write a blog post and post a link. I promise I’ll read it.

What’s worse is that people are encouraging each other to do it. It’s one thing to get outraged on Twitter, or even to see someone else get outraged on Twitter and tell your followers to go check it out. It’s another when you know the whole thing is planned and you tell everyone to Read This. Now.

I get that you think it’s interesting, but that’s not enough for me. Tell me why, and let me decide if it’s worth my time to go reading through all those tweets in reverse chronological order. Better yet, storify that shit and tweet me the URL.

You know what would be even better? Tell that other tweeter, “What an awesome thread! It would make an even better blog post. Do you have a blog?”

“Said” for 2016 Word of the Year

I just got back from the American Association for Corpus Linguistics conference in Ames, Iowa, and I’m calling the Word of the Year: for 2016 it will be said.

You may think you know said. It’s the past participle of say. You’ve said it yourself many times. What’s so special about it?

What’s special was revealed by Jordan Smith, a graduate student at Iowa State, in his presentation on Saturday afternoon. said is becoming a determiner. It is grammaticizing.

In addition to its participial use (“once the words were said”) you’ve probably seen said used as an attributive adjective (“the said property”). It indicates that the noun it modifies refers to a person, place or thing that has been mentioned recently, with the same noun, and that the speaker/writer expects it to be active in the hearer/reader’s memory.

Attributive said is strongly associated with legal documents, as in its first recorded use in the English Parliament in 1327. The Oxford English Dictionary reports that said was used outside of legal contexts as early as 1973, in the English sitcom Steptoe and Son. In this context it was clearly a joke: a word that evoked law courts used in a lower-class colloquial context.

Jordan Smith examined uses of said in the Corpus of Contemporary American English (COCA) and found that attributive said has increasingly been used without the for several years now, and outside the legal domain. He observes that syntactic changes and increased frequency have been named by linguists like Joan Bybee as harbingers of grammaticization.

Grammaticization (also known as grammaticalization; search for both) is when an ordinary lexical item (like a noun, verb or adjective, or even a phrase) becomes a grammatical item (like a pronoun, preposition or auxiliary verb). For example, while is a noun meaning a period of time, but it was grammaticized to a conjunction indicating simultaneity. Used is an adjective meaning accustomed, as in “I was used to being lonely,” but has also become part of an auxiliary indicating habitual aspect as in “I used to be lonely.”

Jordan is suggesting that said is no longer just a verb or even an adjective, it’s our newest determiner in English. Determiners are an exclusive club of short words that modify nouns. They include articles like an and the, but also demonstratives like these and quantifiers like several.

Noun phrases without a determiner tend to refer to generic categories, as I have been doing with phrases like legal documents and grammaticization. That is clearly not what is going on with said girlfriend. Noun phrases with said refer to a specific item or group of items, in some sense even more so than noun phrases with the.

Thanks to the wireless Internet at the AACL, I began searching for of said on Twitter, and found a ton of examples. There are plenty for in said examples as well.

It’s not just happening in English. The analogous French ledit is also used outside the legal domain. Its reanalysis is a bit different, since it incorporates the article rather than replacing it. Like most noun modifiers in French it is inflected for gender and number. I haven’t found anything similar for Spanish.

In 2013 the American Dialect Society chose because as its Word of the Year. Because is already a conjunction, having grammaticized from the noun cause, but it has been reanalyzed again into a preposition, as in because science. Some theorists consider this to be a further step in grammaticization. And here is a twenty-first century prepositional phrase for you, folks: because (P) said (Det) relationship (N).

After Jordan’s presentation it struck me that said is an excellent candidate for the 2016 Word of the year. And if the ADS isn’t interested, maybe another organization like the International Cognitive Linguistics Association, can sponsor a Grammaticization of the Year.

On being a public linguist

People say you should stand up for what you believe in. They say you should look out for those less fortunate, and speak up for those who don’t get heard. They say that those of us who come from marginalized backgrounds, like TBLG backgrounds for example, but have enough privilege to be out in relative safety should speak up for those who don’t have that privilege. They say that those of us who have undertaken in-depth study in the interest of society have a particular responsibility to share what we know with the world as “public intellectuals.” They say that we linguists need to do a better job of applying our knowledge to real-world problems and communicating solutions to the public at large.

They’re right of course, but there’s a reason more people don’t do these things. They’re hard to do, and even harder to do right. Lots of people are strongly invested in the status quo and in thinking of themselves as good people, and they don’t like to be told that what they’re doing at best ineffective and at worst harmful. Lots of people think that because they’re trans they know everything there is to know about trans issues, or that because they use language they know everything there is to know about language.

Case in point: after watching with increasing frustration for years as the word “cisgender” was invented and abused, back in December I wrote a series of blog posts about it. I know this is a controversial topic, and I was a bit apprehensive since I was on the job market, but my posts was not idle rants: as a linguist, a trans person, and someone who has observed trans politics for years, I had been trained to do this kind of analysis, and pursued these topics beyond my training.

I anticipated a number of potential objections to my argument and addressed them in the first three posts. As I published each one I was worried it would get a huge backlash, but there was barely a peep (more on that in another post). So for the title of the last one I went big: “The word “cisgender” is anti-trans.” Not much reaction.

A few weeks ago I came across a Facebook post by a gender therapist asking for opinions about “cisgender,” so I left a link to my blog post, identifying it as “my professional opinion as a linguist.” The therapist then shared my post without identifying me as either trans or a linguist.

Then there was a backlash. Several people immediately called my post “garbage” and “horse shit.” There were a handful of substantive disagreements, all of which I had anticipated in my post and previous ones that I had linked to. There was some support, but the vast majority of comments were negative. There were several similar comments made on my blog post itself, most of which I left unpublished since they were repetitive and unhelpful.

I know that plenty of people face far worse reactions to things they post. I didn’t receive any comments on my looks, rape threats or death threats. But it was still very upsetting, particularly as it was posted the same day I began my first full-time job since receiving my Ph.D. – an event that was positive on a number of levels, but upsetting on other levels.

The gender therapist, who presumably helps people with their mental states, showed no interest whatsoever in mine. They made no effort to moderate, did not intervene in the comments, and sent me no personal messages. The idea that a trans person might be losing sleep over these attacks on their page may not have even occurred to them.

The response my post has gotten from other public linguists has been minimal. A columnist who’s written about the issue and encouraged me to write gave my post a few tweets. A radical feminist whose writings about language and politics inspired me for years completely ignored it. It has not been picked up by any of the popular linguistics blogs, or by anyone talking about language, gender and sexuality.

It’s quite possible that these linguists disagree with me. There are some very specific linguistic questions at stake. But linguists love to argue, and I would welcome respectful, constructive engagement with these questions. So far there has been none.

I have also gotten very little support from other linguists. When I was first formulating these arguments a few years ago on Twitter, there were at least two linguists who explicitly denied that I had any standing to contest the arguments for “cis” that they were retweeting. They were satisfied with the flimsiest of pseudolinguistic rationales in pursuit of their political and social goals, and for whatever reasons I did not qualify as an authentic voice of the trans community in their eyes. I stopped following them on Twitter, and as far as I could tell they had no reaction whatsoever to my posts.

I know that a lot of people don’t want to get involved in flamewars on Twitter or Facebook. It’s really hard to know who’s right and who’s wrong. At first glance I look like just another white guy, and I project an image of success and confidence on social media because that’s what everyone tells me I need to do. Some people may disagree with my stance on a political basis.

I mostly came out of the Facebook flareup okay, although it’s hard to tell how much of my insomnia and touchiness relates to that as opposed to other stresses. Re-reading some of those comments just now was pretty upsetting. I made a decision to focus on the new job, and avoided reading comments, posts or links for a week or two. Now it’s blown over – but there’s no telling when it’ll get shared by someone else.

My main point is that being a public linguist isn’t easy. Speaking out isn’t easy. Fighting on your own behalf instead of some Little People somewhere isn’t easy – even if you’ve got a certain amount of privilege. If you’re wondering why people don’t fight for themselves more often, why they don’t speak up, why linguists don’t write more public posts about issues that matter – there’s your answer. It’s much easier to bury your nose in a book and write about grammaticization vs. reanalysis in Old Church Slavonic.

If we really want people to take a stand on these things, we need to support them. We need to stick up for linguists who speak out in public. We need principles that go beyond identity and political and social affiliation. And we need people who are willing to support linguists who speak out based on those principles. We need people who will make themselves available to back up other linguists on the Internet. Without real support, it’s all empty rhetoric.


At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

silver maple 20160827

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

On pet parents

I’m a parent. It doesn’t make me better or worse than anyone else, it’s just a category that reflects some facts about me: I conceived a new human with my wife, we are raising and caring for that human, and we expect to have a relationship with him for the rest of our lives. Some people don’t take parenthood seriously, so it doesn’t impact their lives very much, but their kids suffer. We take it very seriously, and it’s a lot of work for us.

I also take care of pets. We own three cats, and sometimes I walk my mom’s dog or take him to be groomed. It can be a lot of work, and the relationships can be very intimate at times. “Ownership” is kind of a funny word for it. In some ways it can be like certain stages of parenting: we buy all the food and make sure the animals don’t get into danger. It makes sense when I hear people refer to their pets as their “baby” or put words in their pets calling themselves “daddy.” I even understand when I hear them refer to themselves as “pet moms.”

I understand this usage, but I do not agree with it. I have a kid, and I have pets. The relationships are similar, but different. When someone calls themself a “pet dad,” it trivializes my relationship with my kid and infantilizes my pets. It erases the work of the actual parents, and trivializes the hard work of humans who act as surrogate parents to infant pets. I am a dad: I am not a pet dad, and I am not my pets’ dad. Or their mom.

My kid will one day be an adult, and while I may always think of him as The Kid, he will be able to function as an autonomous member of society. (Note that the term “kid” itself is an animal metaphor – referring to a juvenile goat.) Only one of my cats can still be considered a juvenile by any standard; the others are five years old and twenty years old, respectively. They are adult males, and until the last century they would have been free to come and go as they wished.

If my cats are incapable of leaving our house unaccompanied it is more likely due to the fact that we have cars everywhere than anything else. When I was a kid we lost three dogs to car culture. When I was eleven I saw a neighbor’s cat crushed beneath the wheels of a car, and arrived just in time to see him take his last breath. We have indoor cats and dog leashes in part because we have made the outdoors inhospitable.

I suspect one reason we hear more about “pet parents” is that so few of our pets are parents themselves. I support universal neutering, and have only adopted neutered cats from shelters or feral rescuers. It’s the best response to the overpopulation of feral animals, but it does make the pets neuter – and childless.

When I was a kid we had a cat who had a litter of kittens. I watched one of our dogs give birth to eleven puppies, and then found homes for the ten that lived. Our male cats were aggressive, sexual toms. Again, not wise in retrospect, but it was hard to think of any of the humans in the house as “moms” or “dads” of our pets while they were themselves moms and dads.

There is one human I know who would qualify as a “cat mom” in my mind. She is the woman who leads the feral cat helpers in our neighborhood. Six years ago someone found a baby kitten near some railroad tracks in Manhattan. My neighbor fostered this kitten in her apartment for five months, feeding him with an eyedropper until he was old enough to eat. She posted his picture on her website and we adopted him. If he has a “pet mom” it’s her.

What Professor Bigshot said

I was feeling very nervous, sitting there in Professor Bigshot’s office. I had just been accepted into the PhD program, and was visiting the department to get to know everyone and see if it was the right fit. I hadn’t applied to any other PhD program. If I didn’t go here, I probably wouldn’t get a PhD.

You can figure out pretty easily who Professor Bigshot is, if you care. I guess you could say I’m giving her a pseudonym for SEO reasons.

The student who was showing me around the department had asked, “Oh, have you met Professor Bigshot yet?” I had not. I had heard of her, but I had absolutely no idea what her work was: what she studied, what she had written, what her theories were. I was nervous, sitting there in her office, because I was afraid she would find out that I hadn’t read anything she’d written. I was right to be nervous, but for a completely different reason.

“So Angus,” Professor Bigshot asked me, “You know that the job market in linguistics is very tight? You understand that we cannot guarantee you a job when you graduate?”

I relaxed a bit. I knew this one. I had thought long and hard about it. I said, brightly, “Oh yes. But that’s okay. I have computer skills, and I can always get another IT job if this doesn’t work out.”

“Well, at this university,” Professor Bigshot’s face abruptly twisted into a snarl. “We are not in the business of granting recreational PhDs.”

That was the last thing I was expecting to hear. I did the only thing I could think of: I thanked Professor Bigshot politely, got up and walked out of her office.

I still had a day and a half before I left town. I had planned to visit classes and see the rest of the university.

I didn’t quite know how to tell my student guide what Professor Bigshot had said, so in a few minutes I was sitting down in Professor Littleshot’s office. I didn’t know what he had done in linguistics either, but at this point it hardly seemed to matter.

“So Angus,” said Professor Littleshot. “Have you made up your mind whether you’re going to attend our program?”

I opened my mouth. “Well…”

“Is there anything I can say to convince you?”

I shut my mouth and thought for a minute. “Well, I guess you just did.”

That was slightly over nineteen years ago. Professor Littleshot retired before I could propose a dissertation topic. I wrote a dissertation in Professor Bigshot’s theoretical framework, received my PhD in 2009, taught linguistics as an adjunct for seven years, sent out applications for tenure-track jobs and was invited to exactly zero interviews. Last week I started working as a Python developer in the IT Department at Columbia University.

Recreational PhD? Well, there have been times that I’ve enjoyed quite a lot. And yes, I suppose you can get a back injury, chronic insomnia and thousands of dollars of debt from plenty of other recreational activities. Maybe I would have enjoyed it more if I hadn’t tried so hard to prove Professor Bigshot wrong.

Quantitative needs qualitative, and vice versa

Data Science is all the rage these days. But this current craze focuses on a particular kind of data analysis. I conducted an informal poll as an icebreaker at a recent data science party, and most of the people I talked to said that it wasn’t data science if it didn’t include machine learning. Companies in all industries have been hiring “quants” to do statistical modeling. Even in the humanities, “distant reading” is a growing trend.


There has been a reaction to this, of course. Other humanists have argued for the continued value of close reading. Some companies have been hiring anthropologists and ethnographers. Academics, journalists and literary critics regularly write about the importance of nuance and empathy.

For years, my response to both types of arguments has been “we need both!” But this is not some timid search for a false balance or inclusion. We need both close examination and distributional analysis because the way we investigate the world depends on both, and both depend on each other.

I learned this from my advisor Melissa Axelrod, and a book she assigned me for an independent study on research methods. The Professional Stranger is a guide to ethnographic field methods, but also contains some commentary on the nature of scientific inquiry, and mixes its well-deserved criticism of quantitative social science with a frank acknowledgment of the interdependence of qualitative and quantitative methods. On Page 134 he discusses Labov’s famous study of /r/-dropping in New York City:

The catch, of course, is that he would never have known which variable to look at without the blood, sweat and tears of previous linguists who had worked with a few informants and identified problems in the linguistic structure of American English. All of which finally brings us to the point of this example traditional ethnography struggles mightily with the existence of pattern among the few.

Labov acknowledges these contributions in Chapter 2 of his 1966 book: Babbitt (1896), Thomas (1932, 1942, 1951), Kurath (1949, based on interviews by Guy S. Lowman), Hubbell (1950) and Bronstein (1962). His work would not be possible without theirs, and their work was incomplete until he developed a theoretical framework to place their analysis in, and tested that framework with distributional surveys.

We’ve all seen what happens when people try to use one of these methods without the other. Statistical methods that are not grounded in close examination of specific examples produce surveys that are meaningless to the people who take them and uninformative to scientists. Qualitative investigations that are not checked with rigorous distributional surveys produce unfounded, misleading generalizations. The worst of both worlds are quantitative surveys that are neither broadly grounded in ethnography nor applied to representative samples.

It’s also clear in Agar’s book that qualitative and quantitative are not a binary distinction, but rather two ends of a continuum. Research starts with informal observations about specific things (people, places, events) that give rise to open-ended questions. The answers to these questions then provoke more focused questions that are asked of a wider range of things, and so on.

The concepts of broad and narrow, general and specific, can be confusing here, because at the qualitative, close or ethnographic end of the spectrum the questions are broad and general but asked about a narrow, specific set of subjects. At the quantitative, distant or distributional end of the spectrum the questions are narrow and specific, but asked of a broad, general range of subjects. Agar uses a “funnel” metaphor to model how the questions narrow during this progression, but he could just as easily have used a showerhead to model how the subjects broaden at the same time.

The progression is not one-way, either. The findings of a broad survey can raise new questions, which can only be answered by a new round of investigation, again beginning with qualitative examination on a small scale and possibly proceeding to another broad survey. This is one of the cycles that increase our knowledge.

Rather than the funnel metaphor, I prefer a metaphor based on seeing. Recently I’ve been re-reading The Omnivore’s Dilemma, and in Chapter 8 Michael Pollan talks about taking a close view of a field of grass:

In fact, the first time I met Salatin he’d insisted that even before I met any of his animals, i get down on my belly in this very pasture to make the acquaintance of the less charismatic species his farm was nurturing that, in turn, were nurturing his farm.

Pollan then gets up from the grass to take a broader view of the pasture, but later bends down again to focus on individual cows and plants. He does this metaphorically throughout the book, as many great authors do: focusing in on a specific case, then zooming out to discuss how that case fits in with the bigger picture. Whether he’s talking about factory-farmed Steer 534, or Budger the grass-fed cow, or even the thousands of organic chickens that are functionally nameless under the generic name of “Rosie,” he dives into specific details about the animals, then follows up by reporting statistics about these farming methods and the animals they raise.

The bottom line is that we need studies from all over the qualitative-quantitative spectrum. They build on each other, forming a cycle of knowledge. We need to fund them all, to hire people to do them all, and to promote and publish them all. If you do it right, the plural of anecdote is indeed data, and you can’t have data without anecdotes.

Viewing in free motion

Last month I went on a walk with my friend Ezra. It was his birthday, so we walked for almost two hours, drinking coffee, eating cinnamon rolls, and talking about semantics and coding. The funny thing is that Ezra lives on the West Coast and I live in New York, so we conducted our entire conversation by cell phone, with him walking through Ballard and Loyal Heights, and me walking through Jackson Heights and East Elmhurst.


Cell phones have been around for decades, and I’m sure we’re far from the first to walk together this way. You’ve probably done it yourself. But it reminded me of Isaac Asimov’s 1956 novel The Naked Sun, in which our hero Elijah Baley visits an Earth colony on the planet Solaria, where all the colonists live on separate estates, with at most one spouse and possibly an infant child, surrounded by robots who tend to their every need, almost never seeing one another in person. They interact socially by “viewing” each other through realistic virtual-reality projections.

Baley interviews a murder suspect, Gladia Delmarre, and is intrigued when she tells him she goes on walks together with her neighbor. “I didn’t know you could go on walks together with anyone,” says Baley.

“I said viewing,” responds Gladia. “Oh well, I keep forgetting you’re an Earthman. Viewing in free motion means we focus on ourselves and we can go anywhere we want to without losing contact. I walk on my estate and he walks on his and we’re together.”

I had no visual contact with Ezra during this walk. I’ve seen people “viewing in free motion” on FaceTime. We could probably have rigged something up with a GoPro camera and Google Glass, but it would most likely not have been much like on Solaria, where I could have looked over and seen a chunk of Seattle superimposed on Queens, with Ezra walking across it next to me.

The biggest reason not to attempt any visual presence is that it was dangerous enough for me to be crossing the street while talking; it would have been much worse if the virtual view of the cars on 24th Avenue NW were blocking my view of the cars coming at me down Northern Boulevard.

Of course, on Solaria all the cars were (or will be?) automatic, and there are armies of robots to protect the humans from danger.