And we mean really every tree!

When Timm, Laura, Elber and I first ran the @everytreenyc Twitter bot almost a year ago, we knew that it wasn’t actually sampling from a list that included every street tree in New York City. The Parks Department’s 2015 Tree Census was a huge undertaking, and was not complete by the time they organized the Trees Count! Data Jam last June. There were large chunks of the city missing, particularly in Southern and Eastern Queens.

The bot software itself was not a bad job for a day’s work, but it was still a hasty patch job on top of Neil Freeman’s original Everylotbot code. I hadn’t updated the readme file to reflect the changed we had made. It was running on a server in the NYU Computer Science Department, which is currently my most precarious affiliation.

On April 28 I received an email from the Parks Department saying that the census was complete, and the final version had been uploaded to the NYC Open Data Portal. It seemed like a good opportunity to upgrade.

Over the past two weeks I’ve downloaded the final tree database, installed everything on Pythonanywhere, streamlined the code, added a function to deal with Pythonanywhere’s limited scheduler, and updated the readme file. People who follow the bot might have noticed a few extra tweets over the past couple of days as I did final testing, but I’ve removed the cron job at NYU, and @everytreenyc is now up and running in its new home, with the full database, a week ahead of its first birthday. Enjoy the dérive!

@everytreenyc

At the beginning of June I participated in the Trees Count Data Jam, experimenting with the results of the census of New York City street trees begun by the Parks Department in 2015. I had seen a beta version of the map tool created by the Parks Department’s data team that included images of the trees pulled from the Google Street View database. Those images reminded me of others I had seen in the @everylotnyc twitter feed.

silver maple 20160827

@everylotnyc is a Twitter bot that explores the City’s property database. It goes down the list in order by taxID number. Every half hour it compose a tweet for a property, consisting of the address, the borough and the Street View photo. It seems like it would be boring, but some people find it fascinating. Stephen Smith, in particular, has used it as the basis for some insightful commentary.

It occurred to me that @everylotnyc is actually a very powerful data visualization tool. When we think of “big data,” we usually think of maps and charts that try to encompass all the data – or an entire slice of it. The winning project from the Trees Count Data Jam was just such a project: identifying correlations between cooler streets and the presence of trees.

Social scientists, and even humanists recently, fight over quantitative and qualitative methods, but the fact is that we need them both. The ethnographer Michael Agar argues that distributional claims like “5.4 percent of trees in New York are in poor condition” are valuable, but primarily as a springboard for diving back into the data to ask more questions and answer them in an ongoing cycle. We also need to examine the world in detail before we even know which distributional questions to ask.

If our goal is to bring down the percentage of trees in Poor condition, we need to know why those trees are in Poor condition. What brought their condition down? Disease? Neglect? Pollution? Why these trees and not others?

Patterns of neglect are often due to the habits we develop of seeing and not seeing. We are used to seeing what is convenient, what is close, what is easy to observe, what is on our path. But even then, we develop filters to hide what we take to be irrelevant to our task at hand, and it can be hard to drop these filters. We can walk past a tree every day and not notice it. We fail to see the trees for the forest.

Privilege filters our experience in particular ways. A Parks Department scientist told me that the volunteer tree counts tended to be concentrated in wealthier areas of Manhattan and Brooklyn, and that many areas of the Bronx and Staten Island had to be counted by Parks staff. This reflects uneven amounts of leisure time and uneven levels of access to city resources across these neighborhoods, as well as uneven levels of walkability.

A time-honored strategy for seeing what is ordinarily filtered out is to deviate from our usual patterns, either with a new pattern or with randomness. This strategy can be traced at least as far as the sampling techniques developed by Pierre-Simon Laplace for measuring the population of Napoleon’s empire, the forerunner of modern statistical methods. Also among Laplace’s cultural heirs are the flâneurs of late nineteenth-century Paris, who studied the city by taking random walks through its crowds, as noted by Charles Baudelaire and Walter Benjamin.

In the tradition of the flâneurs, the Situationists of the mid-twentieth century highlighted the value of random walks, that they called dérives. Here is Guy Debord (1955, translated by Ken Knabb):

The sudden change of ambiance in a street within the space of a few meters; the evident division of a city into zones of distinct psychic atmospheres; the path of least resistance which is automatically followed in aimless strolls (and which has no relation to the physical contour of the ground); the appealing or repelling character of certain places — these phenomena all seem to be neglected. In any case they are never envisaged as depending on causes that can be uncovered by careful analysis and turned to account. People are quite aware that some neighborhoods are gloomy and others pleasant. But they generally simply assume that elegant streets cause a feeling of satisfaction and that poor streets are depressing, and let it go at that. In fact, the variety of possible combinations of ambiances, analogous to the blending of pure chemicals in an infinite number of mixtures, gives rise to feelings as differentiated and complex as any other form of spectacle can evoke. The slightest demystified investigation reveals that the qualitatively or quantitatively different influences of diverse urban decors cannot be determined solely on the basis of the historical period or architectural style, much less on the basis of housing conditions.

In an interview with Neil Freeman, the creator of @everylotbot, Cassim Shepard of Urban Omnibus noted the connections between the flâneurs, the dérive and Freeman’s work. Freeman acknowledged this: “How we move through space plays a huge and under-appreciated role in shaping how we process, perceive and value different spaces and places.”

Freeman did not choose randomness, but as he describes it in a tinyletter, the path of @everylotbot sounds a lot like a dérive:

@everylotnyc posts pictures in numeric order by Tax ID, which means it’s posting pictures in a snaking line that started at the southern tip of Manhattan and is moving north. Eventually it will cross into the Bronx, and in 30 years or so, it will end at the southern tip of Staten Island.

Freeman also alluded to the influence of Alfred Korzybski, who coined the phrase, “the map is not the territory”:

Streetview and the property database are both a widely used because they’re big, (putatively) free, and offer a completionist, supposedly comprehensive view of the world. They’re also both products of people working within big organizations, taking shortcuts and making compromises.

I was not following @everylotnyc at the time, but I knew people who did. I had seen some of their retweets and commentaries. The bot shows us pictures of lots that some of us have walked past hundreds of times, but seeing it in our twitter timelines makes us see it fresh again and notice new things. It is the property we know, and yet we realize how much we don’t know it.

When I thought about those Street View images in the beta site, I realized that we could do the same thing for trees for the Trees Count Data Jam. I looked, and discovered that Freeman had made his code available on Github, so I started implementing it on a server I use. I shared my idea with Timm Dapper, Laura Silver and Elber Carneiro, and we formed a team to make it work by the deadline.

It is important to make this much clear: @everytreenyc may help to remind us that no census is ever flawless or complete, but it is not meant as a critique of the enterprise of tree counts. Similarly, I do not believe that @everylotnyc was meant as an indictment of property databases. On the contrary, just as @everylotnyc depends on the imperfect completeness of the New York City property database, @everytreenyc would not be possible without the imperfect completeness of the Trees Count 2015 census.

Without even an attempt at completeness, we could have no confidence that our random dive into the street forest was anything even approaching random. We would not be able to say that following the bot would give us a representative sample of the city’s trees. In fact, because I know that the census is currently incomplete in southern and eastern Queens, when I see trees from the Bronx and Staten Island and Astoria come up in my timeline I am aware that I am missing the trees of southeastern Queens, and awaiting their addition to the census.

Despite that fact, the current status of the 2015 census is good enough for now. It is good enough to raise new questions: what about that parking lot? Is there a missing tree in the Street View image because the image is newer than the census, or older? It is good enough to continue the cycle of diving and coming up, of passing through the funnel and back up, of moving from quantitative to qualitative and back again.

Ten reasons why sign-to-speech is not going to be practical any time soon.

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see! They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

Gee-whiz Tech Photo: Texas A&M

Gee-whiz Tech Photo: Texas A&M

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that … supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype. But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

  1. Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
  2. We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
  3. It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
  4. Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

    -Say “yes” or “no” after the tone.
    -No.
    -I think you said, “Go!” Is that correct?
    -No.
    -My mistake. Please try again.
    -No.
    -I think you said, “I love cheese.” Is that correct?
    -Operator!

  5. There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
  6. Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
  7. Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
  8. Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
  9. Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
  10. What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

One way of generating spam

This showed up today in the comments that Akismet flagged for spam:

{Photo|Picture|Photograph|Image|Photography|Snapshot|Shot|Pic|Photographic|Graphic|Pics} {credit|credit score|credit rating|credit history|credit ratings|consumer credit|credit ranking|credit standing|consumer credit rating|credit scores|credit worthiness}: {AP|Elp} | {FILE|Document|Record|Report|Data file|Submit|Computer file|Data|Register|Archive|Database} #file_links\keywords1.txt,1,S] {-|–|:|*|( space )|( blank )|,|To|. . .|And|As} {In this|Within this|On this|With this|In this particular|During this|In such a|Through this|From this|In that|This particular} {O|To|A|E|I|U|} #file #file_links\keywords2.txt,1,S] _links\keywords3.txt,1,S] ct. {7|Seven|Several|6|8|Six|5|Five|9|Eight|10}, {2012|Next year}, {file|document|record|report|data file|submit|computer file|data|register|archive|database} {photo|picture|photograph|image|photography|snapshot|shot|pic|photographic|graphic|pics}, {Chicago|Chi town|Chicago, il|Detroit|Dallas|Chicago, illinois|Philadelphia|Los angeles|Denver|Chicagoland|Miami} {Bears|Has|Contains|Holds|Carries|Provides|Offers|Includes|Teddy bears|Requires|Features} {middle|center|midsection|midst|heart|centre|core|mid|central|middle section|middle of the} linebacker {Brian|John|Mark} Urlacher {watches|wrist watches|timepieces|designer watches|wristwatches|different watches|pieces|running watches|looks after|monitors|devices} {from the|in the|from your|through the|on the|with the|within the|belonging to the|out of the|out of your|of your} {sideline|part time} {during the|throughout the|through the|in the|over the|while in the|within the|all through the|through|usually in the|within} {second half|other half|better half|lover|wife or husband|partner|loved one} {of an|of the|of your|associated with an|connected with an|of|of any|of each|associated with the|of some|associated with} {NFL|National football league|American footbal|Football|Nhl|Nba} {football|soccer|sports|basketball|baseball|hockey|footballing|rugby|nfl|golf|nfl football} {game|sport|video game|online game|recreation|activity|match|adventure|gameplay|performance|gaming} {against the|from the|up against the|contrary to the|resistant to the|about the|with the|on the|versus the|with|around the} {Jacksonville Jaguars|Gambling} {in|within|inside|throughout|with|around|during|on|when it comes to|for|found in} {Jacksonville|The city of jacksonville|The town of jacksonville}, Fla. {The|The actual|The particular|Your|This|A|Any|Typically the|All the|That|All of the} {Bears|Has|Contains|Holds|Carries|Provides|Offers|Includes|Teddy bears|Requires|Features} {announced|introduced|declared|released|reported|proclaimed|publicised|publicized|launched|revealed|stated} {on|upon|about|in|with|for|regarding|concerning|at|relating to|on the subject of} {Wednesday|Thursday|Friday|Wed|Saturday|Sunday|Mondy|Monday|The following friday|The following thursday|Tuesday}, {March|03|Goal|Drive|Walk|April|Mar|Strut|Next month|May|Celebration} {20|Twenty|Something like 20|30|Thirty|10|21|19|More than 20|20 or so|22}, {20|Twenty|Something like 20|30|Thirty|10|21|19|More than 20|20 or so|22} #file_links\keywords4.txt,1,S] {13|Thirteen|Tough luck|12|14|15|10}, {that they were|that they are|them to be|they were} {unable to|not able to|struggling to|can not|struggle to|cannot|incapable of|helpless to|struggles to|could not|canrrrt} {reach|achieve|attain|get to|accomplish|arrive at|access|obtain|get through to|contact|grasp} {a contract|an agreement|a legal contract|a binding agreement|binding agreement|legal contract|a partnership|an understanding|a|a deal} {agreement|contract|arrangement|deal|understanding|settlement|commitment|binding agreement|legal contract|transaction|decision} {with|along with|together with|using|having|by using|utilizing|through|with the help of|by means of|by way of} Urlacher, {who is|who’s|that is|that’s|who’s going to be|who will be|who may be|who might be|who seems to be|who is responsible for|the person} {an|a good|a great|the|a|a strong|some sort of|a powerful|a particular|any|an excellent} unre #file_links\keywords5.txt,1,S] stricted {free|totally free|free of charge|no cost|cost-free|absolutely free|zero cost|100 % free|complimentary|free of cost|no charge} {agent|broker|realtor|adviser|representative|real estate agent|professional|advisor|solution|dealer|factor} {for the first time|the very first time|the first time|initially|in my ballet shoes|somebody in charge of|at last|now|responsible for|as a beginner|there’s finally someone} {in his|in the|as part of his|in their|within his|in her|within the|in|on his|during his|with his} {career|profession|job|occupation|vocation|employment|work|professional|livelihood|position|line of work}. ({AP|Elp} Photo/Phelan {M|Michael|Meters|Mirielle|L|T|D|N|E|S|R}. Ebenhack, {File|Document|Record|Report|Data file|Submit|Computer file|Data|Register|Archive|Database})