Data science and data technology

The big buzz over the past few years has been Data Science. Corporations are opening Data Science departments and staffing them with PhDs, and universities have started Data Science programs to sell credentials for these jobs. As a linguist I’m particularly interested in this new field, because it includes research practices that I’ve been using for years, like corpus linguistics and natural language processing.

As a scientist I’m a bit skeptical of this field, because frankly I don’t see much science. Sure, the practitioners have labs and cool gadgets. But I rarely see anyone asking hard questions, doing careful observations, creating theories, formulating hypotheses, testing the hypotheses and examining the results.

The lack of careful observation and skeptical questioning is what really bothers me, because that’s what’s at the core of science. Don’t get me wrong: there are plenty of people in Data Science doing both. But these practices should permeate a field with this name, and they don’t.

If there’s so little science, why do we call it “science”? A glance through some of the uses of the term in the Google Books archive suggests that it was first used in the late twentieth century it did include hypothesis testing. In the early 2000s people began to use it as a synonym for “big data,” and I can understand why. “Big data” was a well-known buzzword associated with Silicon Valley tech hype.

I totally get why people replaced “big data” with “data science.” I’ve spent years doing science (with observations, theories, hypothesis testing, etc.). Occasionally I’ve been paid for doing science or teaching it, but only part time. Even after getting a PhD I had to conclude that science jobs that pay a living wage are scarce and in high demand, and I was probably not going to get one.

It was kind of exciting when I got a job with Scientist in the title. It helped to impress people at parties. At first it felt like a validation of all the time I spent learning how to do science. So I completely understand why people prefer to say they’re doing “data science” instead of “big data.”

The problem with being called a Scientist in that job was that I wasn’t working on experiments. I was just helping people optimize their tools. Those tools could possibly be used for science, but that was not why we were being paid to develop them. We have a word for a practice involving labs and gadgets, without requiring any observation or skepticism. That word is not science, it’s technology.

Technology is perfectly respectable; it’s what I do all day. For many years I’ve been well paid to maintain and expand the technology that sustains banks, lawyers, real estate agents, bakeries and universities. I’m currently building tools that help instructors at Columbia University with things like memorizing the names of their students and sending them emails. It’s okay to do technology. People love it.

If you really want to do science and you’re not one of the lucky ones, you can do what I do: I found a technology job that doesn’t demand all my time. Once in a while they need me to stay late or work on a weekend, but the vast majority of my time outside of 9-5 is mine. I spend a lot of that time taking care of my family and myself, and relaxing with friends. But I have time to do science.