Theories are tools for communication

I’ve written in the past about instrumentalism, the scientific practice of treating theories as tools that can be evaluated by their usefulness, rather than as claims that can be evaluated as true or false. If you haven’t tried this way of looking at science, I highly recommend it! But if theories are tools, what are they used for? What makes a theory more or less useful?

The process of science starts when someone makes an observation about the world. If we don’t understand the observation, we need to explore more, make more observations. We make hypotheses and test them, trying to get to a general principle that we can apply to a whole range of situations. We may then look for ways to apply this principle to our interactions with the world.

At every step of this process there is communication. The person who makes the initial observation, the people who make the further observations, who make the hypotheses, who test them, who who generalize the findings, who apply them: these are usually multiple people. They need to communicate all these things (observations, hypotheses, applications) to each other. Even if it’s one single person who does it all end to end, that person needs to communicate with their past and future selves, in the form of notes or even just thinking aloud.

These observations, hypotheses and applications are always new, because that’s what science is for: processing new information. It’s hard to deal with new information, to integrate it with the systems we already have for dealing with the world. What helps us in this regard are finding similarities between the new information and things we already know about the world. Once we find those similarities, we need to record this for our own reference and to signal it to others: other researchers, technologists and the rest of the population.

In informal settings, we already have ways of finding and communicating similarities between different observations. We use similes and metaphors: a person’s eyes may be blue like the sky, not blue like police lights. These are not just idle observations, though: the similarities often have implications for how we respond to things. If someone is leaving a job and they say that they’re passing the baton to a new person, they are signaling a similarity between their job and a relay race, and the suggestion is that the new person will be expected to continue towards the same goal the way a relay runner continues along the racecourse.

Theories and models are just formalized versions of metaphors: saying that light is a wave is a way of noting that it can move through the air like a wave moves through water. That theory allowed scientists to predict that light would diffract around objects the way that water waves behave when they encounter objects, a testable hypothesis that has been confirmed. This in turn allowed technologists to design lasers and other devices that took advantage of those wavelike properties, applications that have proven useful.

Here’s a metaphor that will hopefully help you understand how theories are communication tools: another communication tool is a photograph. Sometimes I see a photograph of myself and I notice that I’ve recently lost weight. Let’s say that I have been cutting back on snacks and I see a photo like that. I have other tools for discovering that I’ve lost weight, like scales and measuring tape and what I can observe of my body with my own eyes, but seeing a photo can communicate it to me in a different way and suggest that if I continue cutting back on snacks I will continue to lose weight. Similarly, if I post that photo on Facebook my friends can see that I’ve lost weight and understand that I’m going to continue to cut back on snacks.

A theory is like a photograph in that there is no single best photograph. To communicate my weight loss I would want a photo that shows my full body, but to communicate my feelings about it, a close-up on my face might be more appropriate. Friends of mine who get new tattoos on their legs will take close-ups of the tattoos. We may have six different photos of the exact same thing (full body, face or leg, for example), and be satisfied with them all. Theories are similar: they depend entirely on the purpose of communication.

A theory is like a photograph in that the best level of detail depends on what is being communicated and who the target is. If a friend takes a close-up of four square inches of their calf, that may be enough to show off their new tattoo, but a close-up of four square inches of my calf will probably not tell me or anyone else how much weight I’ve lost. Similarly, if I get someone to take an aerial photograph of me, that may indicate where I am at the time, but it will not communicate much about my weight. This applies to theories: a model with too much detail will simply swamp the researchers, and one with too little will not convey anything coherent about the topic.

A theory is like a photograph in that its effectiveness depends on who is on the other end of the communication. If someone who doesn’t know me sees that picture, they will have no idea how much I weighed before, or that my weight has been affecting my health. They will just see a person, and interpret it in whatever way they can.

A photograph may not be the best way to communicate my weight loss to my doctor. Their methods depend on measurable benchmarks, and they would prefer to see actual measurements made with scales or tape. On the other hand, a photo is a better way to communicate my weight loss to my Facebook friends than posting scale and tape measurements on Facebook, because they (or some of them at least) are more concerned with the overall way I look.

A theory’s effectiveness similarly depends on its audience. Population researchers may be familiar with the theories of Alfred Lotka and Vito Volterra, so if I tell them that ne…pas in French follows a Lotka-Volterra model, they are likely to understand. Chemists have probably never heard of Lotka or Volterra, so if I tell them the same thing I’m likely to get a blank stare.

This means that there is no absolute standard for comparing theories. We are never going to find the best theory. We may be able to compare theories for a particular purpose, with a particular level of detail, aimed at a particular audience, but even then there may be several theories that work about as well.

When I tell people about this instrumental approach to scientific theories and models, some of them get anxious. If there’s no way for theories to be true or false, how can we ever have a complete picture of the universe? The answer is that we can’t. Kurt Gödel showed decades ago with his Incompleteness Theorem that no theory or model can ever completely capture reality, not even a mathematical or computer model. Jorge Luis Borges illustrated it with his story of the map that is the same size as the territory.

Science is not about finding out everything. It’s not about getting a complete picture. That’s because reality is too big and complex for our understanding, or for the formal systems that our computers are based on. It’s just about figuring out more than we knew before. It will never be finished. And that’s okay.

Indistinguishable from magic

You might be familiar with Arthur C. Clarke’s Third Law, “Any sufficiently advanced technology is indistinguishable from magic.” Clarke tucked this away in a footnote without explanation, but it fits in with the discussion of magic in Chapter III of James Frazer’s magnum opus The Golden Bough. These two works have shaped a lot of my thoughts about science, technology and the way we interact with our world.

Frazer lays out two broad categories of magic, homeopathic magic and contagious magic. Homeopathic magic follows the Law of Similarity, and involves things like creating effigies of people in order to hurt them, and keeping red birds to cure fever. Contagious magic follows the Law of Contact, and involves things like throwing a child’s umbilical cord into water to improve the child’s swimming abilities later in life, or a young woman planting a marigold into dirt taken from a man’s footprint to help his love for her grow.

Frazer is careful to observe that the Laws of Similarity and Contact are widespread cognitive patterns that people use to understand their environments. In semantics we know them as the foundation of the processes of metaphor and metonymy, respectively. He notes that sympathetic magic’s “fundamental conception is identical with that of modern science: underlying the whole system is a faith, implicit but real and firm, in the order and uniformity of nature.”

In this both science and magic stand in contrast to religion: “if religion involves, first, a belief in superhuman beings who rule the world, and second, an attempt to win their favour, it clearly assumes that the course of nature is to some extent elastic or variable, and that we can persuade or induce the mighty beings who control it to deflect, for our benefit, the current of events from the channel in which they would otherwise flow.” After this Frazer engages in some sloppy thinking, concluding that because religion seems to have arisen after magic it must be an improvement over what the “savages” do. He also fails to complete the fourth quadrant of his taxonomy: that as science is to magic, social sciences are to religion.

The key difference between magic and science (and between religion and social science) is the element of faith. The potion brewer doesn’t check to see that there is a logical explanation for the inclusion of certain ingredients. If the potion fails, she must have gotten impure ingredients, or misread the incantation. A scientist looks for explanations as to why a medicine works when it works, and why it fails when it fails.

Some of you may be thinking that Clarke’s quote was about technology, not science. I first learned of technology as “applied science,” which should mean that it’s no more faith-based than science itself. In practice, it is not possible to understand every tool we use. In fact, it’s not even possible for a human to completely understand a single tool, in all its complexity.

My stepfather was a carpenter. When I was first taught to hammer a nail, I started out by picking the hammer up and putting it down on the nail, vertically. I had to be shown how to swing the hammer to take advantage of the angular momentum of the hammer head. It took another layer of learning to know that I could swing from my wrist, elbow or shoulder to customize the force of the hammer blow to the task at hand, and then another to get a sense of the various types of hammers available, not to mention the various types of nails. In a home improvement project several years ago I discovered that, as electric screwdrivers have gotten smaller and lighter, practices have changed and people use screws in situations where nails used to be more common.

My stepfather might at some point have explained to me why his hammer heads were steel and not iron, and the handles were hardwood and not softwood, metal or fiberglass, but his explanations did not go to the molecular level, much less the atomic or quantum levels. To be honest, all I needed to know was “steel is solid, heavy and doesn’t rust” and “hardwood is solid but absorbs some of the impact.” The chance that the molecular or subatomic structure of the hammers would affect our work beyond that was so small that it wasn’t worth spending time on.

At the beginning I didn’t even need to know that much. All I needed to know was that my stepfather had handed me this hammer and these nails, and told me to nail those two boards together at that angle. I had to trust his expertise. As I began to get comfortable, I started asking him questions and trying things slightly different ways. Eventually people get to the point of saying, “Why not a fiberglass handle?” and even “Why not an electric screwdriver?” But at first it’s 99 percent faith-based.

That’s how the technology of hammers and nails and houses works, but the same principles apply to technologies that many people take for granted, like pencils (we know how to sharpen them, but how many of us know how to mine graphite?) and clothing (some of us can darn a sock, and some of us can knit a scarf, but how many of us have even seen any of the machines that produce shoelaces, or Spanx?). We take it on faith that the pencils will write like they’re supposed to, and that socks will keep our feet warm.

This, then, is what Clarke meant when he talked about technology being indistinguishable from magic. Yes, Sprague de Camp portrayed ancient Romans mistaking explosives for magic in his 1939 novel Lest Darkness Fall (which explicitly invokes the sympathetic and contagious forms of magic described by Frazer). And the magically moving photographs described by J.K. Rowling in Harry Potter and the Philosopher’s Stone have become real technology just twenty years later, omnipresent in the UK and the United States.

But beyond the simple resemblance between technology and magic, if someone is not inclined to be critical or scientific, their relationship to technology is functionally the same as it would be to magic. If the technology is sufficiently advanced, people can do the same things they’ve always done. They don’t need to “get under the hood” (now there’s an example of non-magical technology!) because it seems to work most of the time,

On the other hand, our faith is not blind. I had faith in my stepfather to teach me carpentry because my mother and I had lived with him and trusted him, and seen his work. I also learned to have faith in cars to get me places safely, but as I learned more about kinematics and human attention, and as I was confronted with more evidence of the dangers of this technology, I realized that my faith was misplaced and revised my habits.

Our faith in these technologies is based on a web of trust: I trusted my stepfather when he told me that if I hit the nails this way they would securely fasten the pieces of this house together and if properly maintained, it wouldn’t fall down on us. He in turn trusted his training from other carpenters and recommendations from other professionals in related fields, which were then corroborated, revised and extended by his experiences.

I want to stress here that these methods were also supported by scientific studies of materials and manufacturing. Over the millennia, carpenters, architects and other craftspeople have tried using different materials, different structures, different techniques. Some worked better, some didn’t work so well. They’ve taken the materials apart to figure out what makes them strong in some ways and flexible in other ways. This is an ongoing process: vinyl siding may have seemed like a good idea at the time, but it can pollute if burned or discarded.

That is how you tell the difference between technology and magic: every aspect of the technology is open to question and revision. With magic, you may be able to try new things or test the existing methods, but beyond a certain point there is no more trying or testing, there is only faith.

Data science and data technology

The big buzz over the past few years has been Data Science. Corporations are opening Data Science departments and staffing them with PhDs, and universities have started Data Science programs to sell credentials for these jobs. As a linguist I’m particularly interested in this new field, because it includes research practices that I’ve been using for years, like corpus linguistics and natural language processing.

As a scientist I’m a bit skeptical of this field, because frankly I don’t see much science. Sure, the practitioners have labs and cool gadgets. But I rarely see anyone asking hard questions, doing careful observations, creating theories, formulating hypotheses, testing the hypotheses and examining the results.

The lack of careful observation and skeptical questioning is what really bothers me, because that’s what’s at the core of science. Don’t get me wrong: there are plenty of people in Data Science doing both. But these practices should permeate a field with this name, and they don’t.

If there’s so little science, why do we call it “science”? A glance through some of the uses of the term in the Google Books archive suggests that it was first used in the late twentieth century it did include hypothesis testing. In the early 2000s people began to use it as a synonym for “big data,” and I can understand why. “Big data” was a well-known buzzword associated with Silicon Valley tech hype.

I totally get why people replaced “big data” with “data science.” I’ve spent years doing science (with observations, theories, hypothesis testing, etc.). Occasionally I’ve been paid for doing science or teaching it, but only part time. Even after getting a PhD I had to conclude that science jobs that pay a living wage are scarce and in high demand, and I was probably not going to get one.

It was kind of exciting when I got a job with Scientist in the title. It helped to impress people at parties. At first it felt like a validation of all the time I spent learning how to do science. So I completely understand why people prefer to say they’re doing “data science” instead of “big data.”

The problem with being called a Scientist in that job was that I wasn’t working on experiments. I was just helping people optimize their tools. Those tools could possibly be used for science, but that was not why we were being paid to develop them. We have a word for a practice involving labs and gadgets, without requiring any observation or skepticism. That word is not science, it’s technology.

Technology is perfectly respectable; it’s what I do all day. For many years I’ve been well paid to maintain and expand the technology that sustains banks, lawyers, real estate agents, bakeries and universities. I’m currently building tools that help instructors at Columbia University with things like memorizing the names of their students and sending them emails. It’s okay to do technology. People love it.

If you really want to do science and you’re not one of the lucky ones, you can do what I do: I found a technology job that doesn’t demand all my time. Once in a while they need me to stay late or work on a weekend, but the vast majority of my time outside of 9-5 is mine. I spend a lot of that time taking care of my family and myself, and relaxing with friends. But I have time to do science.

I just have to outrun your theory

The Problem

You’ve probably heard the joke about the two people camping in the woods who encounter a hungry predator. One person stops to put on running shoes. The other says, “Why are you wasting time? Even with running shoes you’re not going to outrun that animal!” The other replies, “I don’t have to outrun the animal, I just have to outrun you.”

For me this joke highlights a problem with the way some people argue about climate change. First of all, spreading uncertainty and doubt against competitors is a common marketing tactic, and as Naomi Orestes and Erik Conway documented in their book Merchants of Doubt, that same tactic has been used by marketers against concerns about smoking, DDT, acid rain and most recently climate change.

In the case of climate change, as with fundamentalist criticisms of evolution, there is a lot of stress on the idea that the climatic models are “only a theory,” and that they leave room for the possibility of error. The whole idea is to deter a certain number of influential people from taking action.

That Bret Stephens Column

The latest example is Bret Stephens, newly hired as an opinion columnist by New York Times editors who should really have known better. Stephens’s first column is actually fine on the surface, as far as it goes, aside from some factual errors: never trust anyone who claims to be 100% certain about anything. Most people know this, so if you claim to be 100% certain, you may wind up alienating some potential allies. And he doesn’t go beyond that; I re-read it several times in case I missed anything.

Since all Stephens did was to say those two things, none of which amount to an actual critique of climate change or an argument that we should not act, the intensely negative reactions it generated may be a little surprising. But it helps if you look back at Stephens’s history and see that he’s written more or less the same thing over and over again, at the Wall Street Journal and other places.

Many of the responses to Stephens’s column have pointed out that if there’s any serious chance of climate change having the effects that have been predicted, we should do something about it. The logical next step is talking about possible actions. Stephens hasn’t talked about any possible actions in over fifteen years, which is pretty solid evidence of concern trolling: he pretends to be offering constructive criticism while having no interest in actually doing anything constructive. And if you go all the way back to a 2002 column in the Jerusalem Post, you can see that he was much more overtly critical in the past.

Stephens is very careful not to recommend any particular course of action, but sometimes he hints at the potential costs of following recommendations based on the most widely accepted climate change models. Underlying all his columns is the implication that the status quo is just fine: Stephens doesn’t want to do anything to reduce carbon emissions. He wants us to keep mining coal, pumping oil and driving everywhere in single-occupant vehicles.

People are correctly discerning Stephens’s intent: to spread confusion and doubt, disrupting the consensus on climate change and providing cover for greedy polluters and ideologues of happy motoring. But they play into his trap, responding in ways that look repressive, inflexible and intolerant. In other words, Bret Stephens is the Milo Yiannopoulos of climate change.

The weak point of mainstream science

Stephens’s trolling is particularly effective because he exploits a weakness in the way mainstream scientists handle theories. In science, hypotheses are predictions that can be tested and found to be true or false: the hypothesis that you can sail around the world was confirmed when Juan Sebastián Elcano completed Magellan’s expedition.

Many people view scientific theories as similarly either true or false. Those that are true – complete and consistent models of reality – are valid and useful, but those that are false are worthless. For them, Galileo’s measurements of the movements of the planets demonstrated that the heliocentric model of the solar system is true and the model with the earth at the center is false.

In this all-or-nothing view of science, uncertainty is death. If there is any doubt about a theory, it has not been completely proven, and is therefore worthless for predicting the future and guiding us as we decide what to do.

Trolls like Bret Stephens and the Marshall Institute exploit this intolerance of uncertainty by playing up any shred of doubt about climate change. And there are many such doubts, because this is actually the way science is supposed to work: highlighting uncertainty and being cautious about results. Many people respond to them in the most unscientific ways, by downplaying doubts and pointing to the widespread belief in climate change among scientists.

The all-or-nothing approach to theories is actually a betrayal of the scientific method. The caution built into the gathering of scientific evidence was not intended as a recipe for paralysis or preparation for popularity contests. There is a way to use cautious reports and uncertain models as the basis for decisive action.

The instrumental approach

This approach to science is called instrumentalism, and its core principles are simple: theories are never true or false. Instead, they are tools for understanding and prediction. A tool may be more effective than another tool for a specific purpose, but it is not better in any absolute sense.

In an instrumentalist view, when we find fossils that are intermediate between species it does not demonstrate that evolution is true and creation is false. Instead, it demonstrates that evolution is a better predictor of what we will find underground, and produces more satisfying explanations of fossils.

Note that when we evaluate theories from an instrumental perspective, it is always relative to other theories that might also be useful for understanding and predicting the same data. Like the two people running from the wild animal, we are not comparing theories against some absolute standard of truth, but against each other.

In climate change, instrumentalism simply says that certain climate models have been better than others at predicting the rising temperature readings and melting glaciers we have seen recently. These models suggest that it is all the driving we’re doing and the dirty power plants we’re running that are causing these rising temperatures, and to reduce the dangers from rising temperatures we need to reconfigure our way of living around walking and reducing our power consumption.

Evaluating theories relative to each other in this way takes all the bite out of Bret Stephens’s favorite weapon. He never makes it explicit, but he does have a theory: that we’re not doing much to raise the temperature of the planet. If we make his theory explicit and evaluate it against the best climate change models, it sucks. It makes no sense of the melting glaciers and rising tides, and has done a horrible job of predicting climate readings.

We can fight against Bret Stephens and his fellow merchants of doubt. But in order to do that, we need to set aside our greatest weakness: the belief that theories can be true, and must be proven true to be the basis for action. We don’t have to outrun Stephens’s uncertainty; we just have to outrun his love of the status quo. And instrumentalism is the pair of running shoes we need to do that.

Is your face red?

In 1936, Literary Digest magazine made completely wrong predictions about the Presidential election. They did this because they polled based on a bad sample: driver’s licenses and subscriptions to their own magazine. Enough people who didn’t drive or subscribe to Literary Digest voted, and they voted for Roosevelt. The magazine’s editors’ faces were red, and they had the humility to put that on the cover.

This year, the 538 website made completely wrong predictions about the Presidential election, and its editor, Nate Silver, sorta kinda took responsibility. He had put too much trust in polls conducted at the state level. They were not representative of the full spectrum of voter opinion in those states, and this had skewed his predictions.

Silver’s face should be redder than that, because he said that his conclusions were tentative, but he did not act like it. When your results are so unreliable and your data is so problematic, you have no business being on television and in front-page news articles as much as Silver has.

In part this attitude of Silver’s comes from the worldview of sports betting, where the gamblers know they want to bet and the only question is which team they should put their money on. There is some hedging, but not much. Democracy is not a gamble, and people need to be prepared for all outcomes.

But the practice of blithely making grandiose claims based on unrepresentative data, while mouthing insincere disclaimers, goes far beyond election polling. It is widespread in the social sciences, and I see it all the time in linguistics and transgender studies. It is pervasive in the relatively new field of Data Science, and Big Data is frequently Non-representative Data.

At the 2005 meeting of the American Association for Corpus Linguistics there were two sets of interactions that stuck with me and have informed my thinking over the years. The first was a plenary talk by the computer scientist Ken Church. He described in vivid terms the coming era of cheap storage and bandwidth, and the resulting big data boom.

But Church went awry when he claimed that the size of the datasets available, and the computational power to analyze them, would obviate the need for representative samples. It is true that if you can analyze everything you do not need a sample. But that’s not the whole story.

A day before Church’s talk I had had a conversation over lunch with David Lee, who had just written his dissertation on the sampling problems in the British National Corpus. Lee had reiterated what I had learned in statistics class: if you simply have most of the data but your data is incomplete in non-random ways, you have a biased sample and you can’t make generalizations about the whole.

I’ve seen this a lot in the burgeoning field of Data Science. There are too many people performing analyses they don’t understand on data that’s not representative, making unfounded generalizations. As long as these generalizations fit within the accepted narratives, nobody looks twice.

We need to stop making it easier to run through the steps of data analysis, and instead make it easier to get those steps right. Especially sampling. Or our faces are going to be red all the time.

Sampling is a labor-saving device

Last month I wrote those words on a slide I was preparing to show to the American Association for Corpus Linguistics, as a part of a presentation of my Digital Parisian Stage Corpus. I was proud of having a truly representative sample of theatrical texts performed in Paris between 1800 and 1815, and thus finding a difference in the use of negation constructions that was not just large but statistically significant. I wanted to convey the importance of this.

I was thinking about Laplace finding the populations of districts “distributed evenly throughout the Empire,” and Student inventing his t-test to help workers at the Guinness plants determine the statistical significance of their results. Laplace was not after accuracy, he was going for speed. Student was similarly looking for the minimum amount of effort required to produce an acceptable level of accuracy. The whole point was to free resources up for the next task.

I attended one paper at the conference that gave p-values for all its variables, and they were all 0.000. After that talk, I told the student who presented that those values indicated he had oversampled, and he should have stopped collecting data much sooner. “That’s what my advisor said too,” he said, “but this way we’re likely to get statistical significance for other variables we might want to study.”

The student had a point, but it doesn’t seem very – well, “agile” is a word I’ve been hearing a lot lately. In any case, as the conference was wrapping up, it occurred to me that I might have several hours free – on my flight home and before – to work on my research.

My initial impulse was to keep doing what I’ve been doing for the past couple of years: clean up OCRed text and tag it for negation. Then it occurred to me that I really ought to take my own advice. I had achieved statistical significance. That meant it was time to move on!

I have started working on the next chunk of the nineteenth century, from 1816 through 1830. I have also been looking into other variables to examine. I’ve got some ideas, but I’m open to suggestions. Send them if you have them!

@everytreenyc

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

Quantitative needs qualitative, and vice versa

Data Science is all the rage these days. But this current craze focuses on a particular kind of data analysis. I conducted an informal poll as an icebreaker at a recent data science party, and most of the people I talked to said that it wasn’t data science if it didn’t include machine learning. Companies in all industries have been hiring “quants” to do statistical modeling. Even in the humanities, “distant reading” is a growing trend.

There has been a reaction to this, of course. Other humanists have argued for the continued value of close reading. Some companies have been hiring anthropologists and ethnographers. Academics, journalists and literary critics regularly write about the importance of nuance and empathy.

For years, my response to both types of arguments has been “we need both!” But this is not some timid search for a false balance or inclusion. We need both close examination and distributional analysis because the way we investigate the world depends on both, and both depend on each other.

I learned this from my advisor Melissa Axelrod, and a book she assigned me for an independent study on research methods. The Professional Stranger is a guide to ethnographic field methods, but also contains some commentary on the nature of scientific inquiry, and mixes its well-deserved criticism of quantitative social science with a frank acknowledgment of the interdependence of qualitative and quantitative methods. On Page 134 he discusses Labov’s famous study of /r/-dropping in New York City:

The catch, of course, is that he would never have known which variable to look at without the blood, sweat and tears of previous linguists who had worked with a few informants and identified problems in the linguistic structure of American English. All of which finally brings us to the point of this example traditional ethnography struggles mightily with the existence of pattern among the few.

Labov acknowledges these contributions in Chapter 2 of his 1966 book: Babbitt (1896), Thomas (1932, 1942, 1951), Kurath (1949, based on interviews by Guy S. Lowman), Hubbell (1950) and Bronstein (1962). His work would not be possible without theirs, and their work was incomplete until he developed a theoretical framework to place their analysis in, and tested that framework with distributional surveys.

We’ve all seen what happens when people try to use one of these methods without the other. Statistical methods that are not grounded in close examination of specific examples produce surveys that are meaningless to the people who take them and uninformative to scientists. Qualitative investigations that are not checked with rigorous distributional surveys produce unfounded, misleading generalizations. The worst of both worlds are quantitative surveys that are neither broadly grounded in ethnography nor applied to representative samples.

It’s also clear in Agar’s book that qualitative and quantitative are not a binary distinction, but rather two ends of a continuum. Research starts with informal observations about specific things (people, places, events) that give rise to open-ended questions. The answers to these questions then provoke more focused questions that are asked of a wider range of things, and so on.

The concepts of broad and narrow, general and specific, can be confusing here, because at the qualitative, close or ethnographic end of the spectrum the questions are broad and general but asked about a narrow, specific set of subjects. At the quantitative, distant or distributional end of the spectrum the questions are narrow and specific, but asked of a broad, general range of subjects. Agar uses a “funnel” metaphor to model how the questions narrow during this progression, but he could just as easily have used a showerhead to model how the subjects broaden at the same time.

The progression is not one-way, either. The findings of a broad survey can raise new questions, which can only be answered by a new round of investigation, again beginning with qualitative examination on a small scale and possibly proceeding to another broad survey. This is one of the cycles that increase our knowledge.

Rather than the funnel metaphor, I prefer a metaphor based on seeing. Recently I’ve been re-reading The Omnivore’s Dilemma, and in Chapter 8 Michael Pollan talks about taking a close view of a field of grass:

In fact, the first time I met Salatin he’d insisted that even before I met any of his animals, i get down on my belly in this very pasture to make the acquaintance of the less charismatic species his farm was nurturing that, in turn, were nurturing his farm.

Pollan then gets up from the grass to take a broader view of the pasture, but later bends down again to focus on individual cows and plants. He does this metaphorically throughout the book, as many great authors do: focusing in on a specific case, then zooming out to discuss how that case fits in with the bigger picture. Whether he’s talking about factory-farmed Steer 534, or Budger the grass-fed cow, or even the thousands of organic chickens that are functionally nameless under the generic name of “Rosie,” he dives into specific details about the animals, then follows up by reporting statistics about these farming methods and the animals they raise.

The bottom line is that we need studies from all over the qualitative-quantitative spectrum. They build on each other, forming a cycle of knowledge. We need to fund them all, to hire people to do them all, and to promote and publish them all. If you do it right, the plural of anecdote is indeed data, and you can’t have data without anecdotes.

Why I probably won’t take your survey

I wrote recently that if you want to be confident in generalizing observations from a sample to the entire population, your sample needs to be representative. But maybe you’re skeptical. You might have noticed that a lot of people don’t pay much attention to representativeness, and somehow there are hardly any consequences for them. But that doesn’t mean that there are never consequences, for them or other people.

In the “hard sciences,” sampling can be easier. Unless there is some major impurity, a liter of water from New York usually has the same properties as one from Buenos Aires. If you’re worried about impurities you can distill the samples to increase the chance that they’re the same. Similarly, the commonalities in a basalt column or a wheel often outweigh any variation. A pigeon in New York is the same as one in London, right? A mother in New York is the same as a mother in Buenos Aires

Well, maybe. As we’ve seen, a swan in New York can be very different from a swan in Sydney. And when we get into the realm of social sciences, things get more complex and the complexity gets hard to avoid. There are probably more differences between a mother in New York and one in Buenos Aires than for pigeons or stones or water, and the differences are more important to more people.

This is not just speculation based on rigid rules about sampling. As Bethany Brookshire wrote last year, psychologists are coming to realize the drawbacks of building so much of their science around WEIRD people. And when she says WEIRD, she means WEIRD like me: White, Educated and from an Industrialized, Rich, Democratic country. And not just any WEIRD people, but college sophomores. Brookshire points out how much that skews the results in a particular study of virginity, but she also links to a review by Heinrich, Heine and Norenzayan (2010) that examines several studies and concludes that “members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans.”

I think about this whenever I get an invitation to participate in a social science study. I get them pretty frequently, probably at least twice a week, on email lists and Twitter, and occasionally Tumblr and even Facebook. Often they’re directly from the researchers themselves: “Native English speakers, please fill out my questionnaire on demonstratives!” That means that they’re going primarily to a population of educated people, most of whom are white from an industrialized, rich, democratic country.

(A quick reminder, in case you just tuned in: This applies to universal observations – percentages, averages and all or none statements. It does not apply to existential statements, where you simply say that you found ten people who say “less apples.” You take those wherever you find them, as long as they’re reliable sources.)

I don’t have a real problem with using non-representative samples for pilot studies. You have a hunch about something, you want to see if it’s not just you before you spend a lot of time sending a survey out to people you don’t know. I have a huge problem with it being used for anything that’s published in a peer-reviewed journal or disseminated in the mainstream media. And yeah, that means I have a huge problem with just about any online dialect survey.

I also don’t like the idea of students generalizing universal observations from non-representative online surveys for their term papers and master’s theses. People learn skills by doing. If they get practice taking representative samples, they’ll know how to do that. If they get practice making qualitative, existential observations, they’ll be able to do those. If they spend their time in school making unfounded generalizations from unrepresentative samples (with a bit of handwaving boilerplate, of course!), most of them will keep doing that after they graduate.

So that’s my piece. I’m actually going to keep relatively quiet about this because some of the people who do those studies (or their friends) might be on hiring committees, but I do want to at least register my objections here. And if you’re wondering why I haven’t filled out your survey, or even forwarded it to all my friends, this is your answer.

You can’t get significance without a representative sample

Recently I’ve talked about the different standards for existential and universal claims, how we can use representative samples to estimate universal claims, and how we know if our representative sample is big enough to be “statistically significant.” But I want to add a word of caution to these tests: you can’t get statistical significance without a representative sample.

If you work in social science you’ve probably seen p-values reported in studies that aren’t based on representative samples. They’re probably there because the authors took one required statistics class in grad school and learned that low p-values are good. It’s quite likely that these p-values were actually expected, if not explicitly requested, by the editors or reviewers of the article, who took a similar statistics class. And they’re completely useless.

P-values tell you whether your observation (often a mean, but not always) is based on a big enough sample that you can be 99% (or whatever) sure it’s not the luck of the draw. You are clear to generalize your representative sample to the entire population. But if your sample is not representative, it doesn’t matter!

Suppose you need 100% pure Austrian pumpkin seed oil, and you tell your friend to make sure he gets only the 100% pure kind. Your friend brings you 100% pure Australian tea tree oil. They’re both oils, and they’re both 100% pure, so your friend doesn’t understand why you’re so frustrated with him. But purity is irrelevant when you’ve got the wrong oil. P-values are the same way.

So please, don’t report p-values if you don’t have a representative sample. If the editor or reviewer insists, go ahead and put it in, but please roll your eyes while you’re running your t-tests. But if you are the editor or reviewer, please stop asking people for p-values if they don’t have a representative sample! Oh, and you might want to think about asking them to collect a representative sample…