African American English has accents too

Diversity is notoriously subjective and difficult to pin down. In particular, we tend be impressed if we know the names of a lot of categories for something. We might think there are more mammal species than insect species, but biologists tell us that there are hundreds of thousands of species of beetles alone. This is true in language as well: we think of the closely-related Romance and Germanic languages as separate, while missing the incredible diversity of “dialects” of Chinese or Arabic.

This is also true of English. As an undergraduate I was taught that there were four dialects in American English: New England, North Midland, South Midland and Coastal Southern. Oh yeah, and New York and Black English. The picture for all of those is more complicated than it sounds, and I went to Chicago I discovered that there are regional varieties of African American English.

In 2012 Annie Minoff, a blogger for Chicago public radio station WBEZ, took this oversimplification for truth: “AAE is remarkable for being consistent across urban areas; that is, Boston AAE sounds like New York AAE sounds like L.A. AAE, etc.” Fortunately a commenter, Amanda Hope, challenged her on that assertion. Minoff confirmed the pattern in an interview with variationist Walt Wolfram, and posted a correction in 2013.

In 2013 I was preparing to teach a unit on language variation and didn’t want to leave my students as misinformed as I – or Minoff – had been. Many of my students were African American, and I saw no reason to spend most of the unit on white varieties and leave African American English as a footnote. But the documentation is spotty: I know of no good undergraduate-level discussion of variation in African American English.

A few years before I had found a video that some guy took of a party in a parking lot on the West Side of Chicago. It wasn’t ideal, but it sort of gave you an idea. The link was dead, so I typed “Chicago West Side” into Google. The results were not promising, so on a whim I added “accent” and that’s how I found my first accent tag video.

Accent tag videos are an amazing thing, and I could write a whole series of posts about them. Here was a young black woman from Chicago’s West Side, not only talking about her accent but illustrating it, with words and phrases to highlight its differences from other dialects. She even talks (as many people do in these videos) about how other African Americans hear her accent in other places, like North Carolina. You can compare it (as I did in class) with a similar video made by a young black woman from Raleigh (or New York or California), and the differences are impossible to ignore.

In fact, when Amanda Hope challenged Minoff’s received wisdom on African American regional variation, she used accent tag videos to illustrate her point. These videos are amazing, particularly for teaching about language and linguistics, and from then on I made extensive use of them in my courses. There’s also a video made by two adorable young English women, one from London and one from Bolton near Manchester, where you can hear their accents contrasted in conversation. I like that I can go not just around the country but around the world (Nigeria, Trinidad, Jamaica) illustrating the diversity of English just among women of African descent, who often go unheard in these discussions. I’ll talk more about accent tag videos in future posts.

You can also find evidence of regional variation in African American English on Twitter. Taylor Jones has a great post about it that also goes into the history of African American varieties of English.

Is your face red?

In 1936, Literary Digest magazine made completely wrong predictions about the Presidential election. They did this because they polled based on a bad sample: driver’s licenses and subscriptions to their own magazine. Enough people who didn’t drive or subscribe to Literary Digest voted, and they voted for Roosevelt. The magazine’s editors’ faces were red, and they had the humility to put that on the cover.

This year, the 538 website made completely wrong predictions about the Presidential election, and its editor, Nate Silver, sorta kinda took responsibility. He had put too much trust in polls conducted at the state level. They were not representative of the full spectrum of voter opinion in those states, and this had skewed his predictions.

Silver’s face should be redder than that, because he said that his conclusions were tentative, but he did not act like it. When your results are so unreliable and your data is so problematic, you have no business being on television and in front-page news articles as much as Silver has.

In part this attitude of Silver’s comes from the worldview of sports betting, where the gamblers know they want to bet and the only question is which team they should put their money on. There is some hedging, but not much. Democracy is not a gamble, and people need to be prepared for all outcomes.

But the practice of blithely making grandiose claims based on unrepresentative data, while mouthing insincere disclaimers, goes far beyond election polling. It is widespread in the social sciences, and I see it all the time in linguistics and transgender studies. It is pervasive in the relatively new field of Data Science, and Big Data is frequently Non-representative Data.

At the 2005 meeting of the American Association for Corpus Linguistics there were two sets of interactions that stuck with me and have informed my thinking over the years. The first was a plenary talk by the computer scientist Ken Church. He described in vivid terms the coming era of cheap storage and bandwidth, and the resulting big data boom.

But Church went awry when he claimed that the size of the datasets available, and the computational power to analyze them, would obviate the need for representative samples. It is true that if you can analyze everything you do not need a sample. But that’s not the whole story.

A day before Church’s talk I had had a conversation over lunch with David Lee, who had just written his dissertation on the sampling problems in the British National Corpus. Lee had reiterated what I had learned in statistics class: if you simply have most of the data but your data is incomplete in non-random ways, you have a biased sample and you can’t make generalizations about the whole.

I’ve seen this a lot in the burgeoning field of Data Science. There are too many people performing analyses they don’t understand on data that’s not representative, making unfounded generalizations. As long as these generalizations fit within the accepted narratives, nobody looks twice.

We need to stop making it easier to run through the steps of data analysis, and instead make it easier to get those steps right. Especially sampling. Or our faces are going to be red all the time.

The Digital Parisian Stage is now on GitHub

For the past five years I’ve been working on a project, the Digital Parisian Stage, that aims to create a representative sample of Nineteenth-century Parisian theater. I’ve made really satisfying progress on the first stage, 1800 through 1815, which corresponds to the first volume of Charles Beaumont Wicks’s catalog, the Parisian Stage (1950). Of the initial one-percent sample (31 plays), I have obtained 24, annotated 15 and discarded three for length, for a current total of twelve plays.

At conferences like the Keystone Digital Humanities Conference and the American Association for Corpus Linguistics, I’ve presented results showing that these twelve plays cover a much wider and more innovative range of language than the four theatrical plays from this period in the FRANTEXT corpus, a sample drawn fifty years ago based on a “principle of authority.”

Just looking at declarative sentence negation, I found that in the FRANTEXT corpus the playwrights negate declarative sentences with the ne … pas construction 49 percent of the time. In the twelve randomly sampled plays, the playwrights used ne … pas 75 percent of the time to negate declarative sentences. Because this was a representative sample, I even have a p value below 0.01, based on a chi-square goodness of fit test!

This seems like a good point to release the twelve texts that I have OCRed and cleaned to the public. I have uploaded them to GitHub as HTML files. In this I have been partly inspired by the work of Alex Gil, now my colleague at Columbia University.

You can read them for your own entertainment (Jocrisse-maître et Jocrisse-valet is my favorite), stage your own production of them (I’ll buy tickets!) or use them as data for your scientific investigations. I hope that you will also consider contributing to the repository, by checking for errors in the existing texts, adding new texts from the catalog, or converting them to a different format like TEI or Markdown.

If you do use them in your own studies, please don’t forget to cite me along the lines given below, or even to contact me to discuss co-authorship!

Grieve-Smith, Angus B. (2016). The Digital Parisian Stage Corpus. GitHub.

Nobody’s Boy

I got a paper rejected from a generativist conference a few years ago. A generativist friend of mine said, “Why did you bother submitting your paper to that conference? You knew they were going to reject it.” I said, “Well, the conference was in town, so I figured I’d send something in anyway.”

My friend proceeded to tell me a story from her early grad school days about reviewing papers for her school’s signature conference. She sat down one evening with Professor Big Deal, who glanced through the stack of anonymous submissions and sorted them one by one into piles. “This is from one of Professor X’s students, and this is from one of Professor Y’s students. Here’s another from Professor X’s group. This must be Professor Z.” She continued like this until all the papers were sorted, and then as I recall she had some formula for allocating time to each professor and their students.

I think about this a lot, because I’m not a Student Of anyone in particular. On paper I may look like a student of Professor Bigshot, and that’s probably how my paper got accepted to a conference where Professor Bigshot was a keynote speaker. But I’m not really a Student Of Professor Bigshot. I didn’t ask her to be on my committee. And I know she doesn’t think of me as a Student Of hers, because she was sitting in front of me later in that conference, and walked out of the room right before it was my turn to present my paper.

My relationship with my actual advisor is Complicated, but suffice it to say that we don’t work in the same subfield of linguistics, and I’m tied to the New York area, where she doesn’t have the pull to get me a job anyway. My relationships with my other committee members are problematic in various ways. I’m on good terms with plenty of other linguists, but since I’m not their Student their loyalty to me is always secondary.

Even if my friend’s story about Professor Big Deal is an egregious outlier, it is still a regular occurrence to see professors co-authoring and co-presenting papers with their students, making introductions and writing letters. If you know me professionally, I can pretty much guarantee that we were not introduced by Professor Bigshot, or by any member of my committee. If you’ve seen me present my research, or read it anywhere, or hired me, it’s entirely through my own hard work. I have not had any of the advantages that come with being a Student Of anyone.

You could say that it’s my fault for not choosing the right advisors, or for the problems in my relationships with my advisors. In my defense I would argue that most of the problems in these relationships had to do with my supporting my wife’s progress on the tenure track and my kid’s not being in daycare ten hours a day over my own progress on the PhD. But even if you disagree, does that mean that I deserve to be a second-class citizen in the field?

I know I’m not the only academic orphan out there. Maybe we should get together and found a Home for Orphaned Linguists, where we can hope to someday be adopted by professors with generous allocations of reassigned time, who will co-author with us and introduce us and attend our talks. Some day…

Sampling is a labor-saving device

Last month I wrote those words on a slide I was preparing to show to the American Association for Corpus Linguistics, as a part of a presentation of my Digital Parisian Stage Corpus. I was proud of having a truly representative sample of theatrical texts performed in Paris between 1800 and 1815, and thus finding a difference in the use of negation constructions that was not just large but statistically significant. I wanted to convey the importance of this.

I was thinking about Laplace finding the populations of districts “distributed evenly throughout the Empire,” and Student inventing his t-test to help workers at the Guinness plants determine the statistical significance of their results. Laplace was not after accuracy, he was going for speed. Student was similarly looking for the minimum amount of effort required to produce an acceptable level of accuracy. The whole point was to free resources up for the next task.

I attended one paper at the conference that gave p-values for all its variables, and they were all 0.000. After that talk, I told the student who presented that those values indicated he had oversampled, and he should have stopped collecting data much sooner. “That’s what my advisor said too,” he said, “but this way we’re likely to get statistical significance for other variables we might want to study.”

The student had a point, but it doesn’t seem very – well, “agile” is a word I’ve been hearing a lot lately. In any case, as the conference was wrapping up, it occurred to me that I might have several hours free – on my flight home and before – to work on my research.

My initial impulse was to keep doing what I’ve been doing for the past couple of years: clean up OCRed text and tag it for negation. Then it occurred to me that I really ought to take my own advice. I had achieved statistical significance. That meant it was time to move on!

I have started working on the next chunk of the nineteenth century, from 1816 through 1830. I have also been looking into other variables to examine. I’ve got some ideas, but I’m open to suggestions. Send them if you have them!

Shelter from the tweetstorm

It’s happened to me too: I’m angry, or upset, or excited about something. I go on Twitter. I’ve got stuff to say. It’s more than will fit in the 140-character limit, but I don’t have the time or energy to write a blog post. So I just write a tweet. And then another, and another.

I’ve seen other people doing this, and I’m fine with it. But for a while now I’ve seen people doing something more planned, numbering their tweets. Many people try to predict how many tweets are going to be in a particular rant, and often fail spectacularly along the lines of Monty Python’ Spanish Inquisition sketch. Some people are clearly composing the whole thing ahead of time, as a unit. Sometimes they’re not even excited, just telling a story. It’s developing into a genre: the tweetstorm.

I get why people are reluctant to blog in these cases. If you’re already in Twitter and you want to write something longer, you have to switch to a different window, maybe log in, come up with a picture to grab people’s attention. Assuming you already have an account on a blogging platform. It doesn’t help that Twitter sees some of these as competitors and drags its feet on integrating them. And yes, mobile blogging apps still leave a lot to be desired, especially if you’ve got an intermittent connection like on the train.

People also tend to be drawn in easier one tweet at a time, like Beorn meeting the dwarves in the Hobbit. Maybe they don’t feel in the mood for reading something longer, or opening a web browser.

There may also be an aspect of live performance for the tweetstormer and the people who happen to be on Twitter while the storm is passing over, and the thread functions as an inferior archive of the performance, like concert videos. I can understand that too, but it’s a pain for the rest of us.

The problem is that Twitter sucks as a platform for reading longform pieces, or even medium-form ones. Yes, I know they’ve introduced “threading” features to make it easier to follow conversations. That doesn’t mean it’s easy to follow a single person’s multi-tweet rant. Combine that with other people replying in the middle of the “storm” and the original tweeter taking time in the middle to respond to them, and people using the quote feature and replying to quotes and quoting replies, and it gets really chaotic. If I bother to take the time, usually at the end it turns out it’s not worth it.

In terms of Bad Things on Twitter this is nowhere near the level of harassment and death threats, or even people livetweeting Netflix videos. But please, just go write a blog post and post a link. I promise I’ll read it.

What’s worse is that people are encouraging each other to do it. It’s one thing to get outraged on Twitter, or even to see someone else get outraged on Twitter and tell your followers to go check it out. It’s another when you know the whole thing is planned and you tell everyone to Read This. Now.

I get that you think it’s interesting, but that’s not enough for me. Tell me why, and let me decide if it’s worth my time to go reading through all those tweets in reverse chronological order. Better yet, storify that shit and tweet me the URL.

You know what would be even better? Tell that other tweeter, “What an awesome thread! It would make an even better blog post. Do you have a blog?”

“Said” for 2016 Word of the Year

I just got back from the American Association for Corpus Linguistics conference in Ames, Iowa, and I’m calling the Word of the Year: for 2016 it will be said.

You may think you know said. It’s the past participle of say. You’ve said it yourself many times. What’s so special about it?

What’s special was revealed by Jordan Smith, a graduate student at Iowa State, in his presentation on Saturday afternoon. said is becoming a determiner. It is grammaticizing.

In addition to its participial use (“once the words were said”) you’ve probably seen said used as an attributive adjective (“the said property”). It indicates that the noun it modifies refers to a person, place or thing that has been mentioned recently, with the same noun, and that the speaker/writer expects it to be active in the hearer/reader’s memory.

Attributive said is strongly associated with legal documents, as in its first recorded use in the English Parliament in 1327. The Oxford English Dictionary reports that said was used outside of legal contexts as early as 1973, in the English sitcom Steptoe and Son. In this context it was clearly a joke: a word that evoked law courts used in a lower-class colloquial context.

Jordan Smith examined uses of said in the Corpus of Contemporary American English (COCA) and found that attributive said has increasingly been used without the for several years now, and outside the legal domain. He observes that syntactic changes and increased frequency have been named by linguists like Joan Bybee as harbingers of grammaticization.

Grammaticization (also known as grammaticalization; search for both) is when an ordinary lexical item (like a noun, verb or adjective, or even a phrase) becomes a grammatical item (like a pronoun, preposition or auxiliary verb). For example, while is a noun meaning a period of time, but it was grammaticized to a conjunction indicating simultaneity. Used is an adjective meaning accustomed, as in “I was used to being lonely,” but has also become part of an auxiliary indicating habitual aspect as in “I used to be lonely.”

Jordan is suggesting that said is no longer just a verb or even an adjective, it’s our newest determiner in English. Determiners are an exclusive club of short words that modify nouns. They include articles like an and the, but also demonstratives like these and quantifiers like several.

Noun phrases without a determiner tend to refer to generic categories, as I have been doing with phrases like legal documents and grammaticization. That is clearly not what is going on with said girlfriend. Noun phrases with said refer to a specific item or group of items, in some sense even more so than noun phrases with the.

Thanks to the wireless Internet at the AACL, I began searching for of said on Twitter, and found a ton of examples. There are plenty for in said examples as well.

It’s not just happening in English. The analogous French ledit is also used outside the legal domain. Its reanalysis is a bit different, since it incorporates the article rather than replacing it. Like most noun modifiers in French it is inflected for gender and number. I haven’t found anything similar for Spanish.

In 2013 the American Dialect Society chose because as its Word of the Year. Because is already a conjunction, having grammaticized from the noun cause, but it has been reanalyzed again into a preposition, as in because science. Some theorists consider this to be a further step in grammaticization. And here is a twenty-first century prepositional phrase for you, folks: because (P) said (Det) relationship (N).

After Jordan’s presentation it struck me that said is an excellent candidate for the 2016 Word of the year. And if the ADS isn’t interested, maybe another organization like the International Cognitive Linguistics Association, can sponsor a Grammaticization of the Year.

On being a public linguist

People say you should stand up for what you believe in. They say you should look out for those less fortunate, and speak up for those who don’t get heard. They say that those of us who come from marginalized backgrounds, like TBLG backgrounds for example, but have enough privilege to be out in relative safety should speak up for those who don’t have that privilege. They say that those of us who have undertaken in-depth study in the interest of society have a particular responsibility to share what we know with the world as “public intellectuals.” They say that we linguists need to do a better job of applying our knowledge to real-world problems and communicating solutions to the public at large.

They’re right of course, but there’s a reason more people don’t do these things. They’re hard to do, and even harder to do right. Lots of people are strongly invested in the status quo and in thinking of themselves as good people, and they don’t like to be told that what they’re doing at best ineffective and at worst harmful. Lots of people think that because they’re trans they know everything there is to know about trans issues, or that because they use language they know everything there is to know about language.

Case in point: after watching with increasing frustration for years as the word “cisgender” was invented and abused, back in December I wrote a series of blog posts about it. I know this is a controversial topic, and I was a bit apprehensive since I was on the job market, but my posts was not idle rants: as a linguist, a trans person, and someone who has observed trans politics for years, I had been trained to do this kind of analysis, and pursued these topics beyond my training.

I anticipated a number of potential objections to my argument and addressed them in the first three posts. As I published each one I was worried it would get a huge backlash, but there was barely a peep (more on that in another post). So for the title of the last one I went big: “The word “cisgender” is anti-trans.” Not much reaction.

A few weeks ago I came across a Facebook post by a gender therapist asking for opinions about “cisgender,” so I left a link to my blog post, identifying it as “my professional opinion as a linguist.” The therapist then shared my post without identifying me as either trans or a linguist.

Then there was a backlash. Several people immediately called my post “garbage” and “horse shit.” There were a handful of substantive disagreements, all of which I had anticipated in my post and previous ones that I had linked to. There was some support, but the vast majority of comments were negative. There were several similar comments made on my blog post itself, most of which I left unpublished since they were repetitive and unhelpful.

I know that plenty of people face far worse reactions to things they post. I didn’t receive any comments on my looks, rape threats or death threats. But it was still very upsetting, particularly as it was posted the same day I began my first full-time job since receiving my Ph.D. – an event that was positive on a number of levels, but upsetting on other levels.

The gender therapist, who presumably helps people with their mental states, showed no interest whatsoever in mine. They made no effort to moderate, did not intervene in the comments, and sent me no personal messages. The idea that a trans person might be losing sleep over these attacks on their page may not have even occurred to them.

The response my post has gotten from other public linguists has been minimal. A columnist who’s written about the issue and encouraged me to write gave my post a few tweets. A radical feminist whose writings about language and politics inspired me for years completely ignored it. It has not been picked up by any of the popular linguistics blogs, or by anyone talking about language, gender and sexuality.

It’s quite possible that these linguists disagree with me. There are some very specific linguistic questions at stake. But linguists love to argue, and I would welcome respectful, constructive engagement with these questions. So far there has been none.

I have also gotten very little support from other linguists. When I was first formulating these arguments a few years ago on Twitter, there were at least two linguists who explicitly denied that I had any standing to contest the arguments for “cis” that they were retweeting. They were satisfied with the flimsiest of pseudolinguistic rationales in pursuit of their political and social goals, and for whatever reasons I did not qualify as an authentic voice of the trans community in their eyes. I stopped following them on Twitter, and as far as I could tell they had no reaction whatsoever to my posts.

I know that a lot of people don’t want to get involved in flamewars on Twitter or Facebook. It’s really hard to know who’s right and who’s wrong. At first glance I look like just another white guy, and I project an image of success and confidence on social media because that’s what everyone tells me I need to do. Some people may disagree with my stance on a political basis.

I mostly came out of the Facebook flareup okay, although it’s hard to tell how much of my insomnia and touchiness relates to that as opposed to other stresses. Re-reading some of those comments just now was pretty upsetting. I made a decision to focus on the new job, and avoided reading comments, posts or links for a week or two. Now it’s blown over – but there’s no telling when it’ll get shared by someone else.

My main point is that being a public linguist isn’t easy. Speaking out isn’t easy. Fighting on your own behalf instead of some Little People somewhere isn’t easy – even if you’ve got a certain amount of privilege. If you’re wondering why people don’t fight for themselves more often, why they don’t speak up, why linguists don’t write more public posts about issues that matter – there’s your answer. It’s much easier to bury your nose in a book and write about grammaticization vs. reanalysis in Old Church Slavonic.

If we really want people to take a stand on these things, we need to support them. We need to stick up for linguists who speak out in public. We need principles that go beyond identity and political and social affiliation. And we need people who are willing to support linguists who speak out based on those principles. We need people who will make themselves available to back up other linguists on the Internet. Without real support, it’s all empty rhetoric.


At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

On pet parents

I’m a parent. It doesn’t make me better or worse than anyone else, it’s just a category that reflects some facts about me: I conceived a new human with my wife, we are raising and caring for that human, and we expect to have a relationship with him for the rest of our lives. Some people don’t take parenthood seriously, so it doesn’t impact their lives very much, but their kids suffer. We take it very seriously, and it’s a lot of work for us.

I also take care of pets. We own three cats, and sometimes I walk my mom’s dog or take him to be groomed. It can be a lot of work, and the relationships can be very intimate at times. “Ownership” is kind of a funny word for it. In some ways it can be like certain stages of parenting: we buy all the food and make sure the animals don’t get into danger. It makes sense when I hear people refer to their pets as their “baby” or put words in their pets calling themselves “daddy.” I even understand when I hear them refer to themselves as “pet moms.”

I understand this usage, but I do not agree with it. I have a kid, and I have pets. The relationships are similar, but different. When someone calls themself a “pet dad,” it trivializes my relationship with my kid and infantilizes my pets. It erases the work of the actual parents, and trivializes the hard work of humans who act as surrogate parents to infant pets. I am a dad: I am not a pet dad, and I am not my pets’ dad. Or their mom.

My kid will one day be an adult, and while I may always think of him as The Kid, he will be able to function as an autonomous member of society. (Note that the term “kid” itself is an animal metaphor – referring to a juvenile goat.) Only one of my cats can still be considered a juvenile by any standard; the others are five years old and twenty years old, respectively. They are adult males, and until the last century they would have been free to come and go as they wished.

If my cats are incapable of leaving our house unaccompanied it is more likely due to the fact that we have cars everywhere than anything else. When I was a kid we lost three dogs to car culture. When I was eleven I saw a neighbor’s cat crushed beneath the wheels of a car, and arrived just in time to see him take his last breath. We have indoor cats and dog leashes in part because we have made the outdoors inhospitable.

I suspect one reason we hear more about “pet parents” is that so few of our pets are parents themselves. I support universal neutering, and have only adopted neutered cats from shelters or feral rescuers. It’s the best response to the overpopulation of feral animals, but it does make the pets neuter – and childless.

When I was a kid we had a cat who had a litter of kittens. I watched one of our dogs give birth to eleven puppies, and then found homes for the ten that lived. Our male cats were aggressive, sexual toms. Again, not wise in retrospect, but it was hard to think of any of the humans in the house as “moms” or “dads” of our pets while they were themselves moms and dads.

There is one human I know who would qualify as a “cat mom” in my mind. She is the woman who leads the feral cat helpers in our neighborhood. Six years ago someone found a baby kitten near some railroad tracks in Manhattan. My neighbor fostered this kitten in her apartment for five months, feeding him with an eyedropper until he was old enough to eat. She posted his picture on her website and we adopted him. If he has a “pet mom” it’s her.