How to set up your own LanguageLab

I’ve got great news! I have now released LanguageLab, my free, open-source software for learning languages and music, to the public on GitHub.

I wish I could tell you I’ve got a public site up that you can all use for free. Unfortunately, the features that would make LanguageLab easy for multiple users to share one server are later in the roadmap. There are a few other issues that also stand in the way of a massive public service. But you can set up your own server!

I’ve documented the steps in the README file, but here’s an overview. You don’t need to know how to program, but you will need to know how to set up web services, retrieve files from GitHub, edit configuration files, and run a few commands at a Linux/MacOS/DOS prompt.

LanguageLab uses Django, one of the most popular web frameworks for Python, and React, one of the most popular frameworks for Javascript. All you need is a server that can run Django and host some Javascript files! I’ve been doing my development and testing on Pythonanywhere, but I’ve also set it up on Amazon Web Services, and you should be able to run it on Google Cloud, Microsoft Azure, a University web server or even your personal computer.

There are guides online for setting up Django in all those environments. Once you’ve got a basic Django setup installed, you’ll need to clone the LanguageLab repo from GitHub to a place where it can be read by your web server. Then you’ll configure it to access the database, and configure the web server to load it. You’ll use Pip and NPM to download the Python and Javascript libraries you need, like the Django REST Framework, React and the Open Iconic font. Finally, you’ll copy all the files into the right places for the web server to read them and restart the server.

Once you’ve got everything in place, you should be able to log in! You can make multiple accounts, but keep in mind that at this point we do not have account-level access, so all accounts have full access to all the data. You can then start building your library of languages, media, exercises and lessons. LanguageLab comes with the most widely used languages, but it’s easy to set up new ones if yours are not on the list.

Media can be a bit tricky, because LanguageLab is not a media server. You can upload your media to another place on your server, or any other server – as long as it’s got an HTTPS URL you should be able to use it. If the media you’re using is copyrighted you may want to set up some basic password protection to avoid any accusations of piracy. I use a simple .htaccess password. I have to log in every time, but it works.

With the URL of your media file, you can create a media entry. Just paste that URL into the form and add metadata to keep track of the file and what it can be used for. You can then set up one or more exercises based on particular segments of that media file. It may take a little trial and error to get the exercises right.

You can then create one or more lessons to organize your exercises. You can choose to have a lesson for all the exercises in a particular media file, or you can combine exercises from multiple media files in a lesson. It’s up to you how to organize the lessons. You can edit the queues for each lesson to reorder or remove exercises.

Once you’ve got exercises, you can start practicing! The principle is simple: listen to the model, repeat into the microphone, then listen to the model again, followed by your recording. Set yourself a goal of a.certain number of repetitions per session.

After you’ve created your language and media entries, exercises and lessons, you can export the data. Importing the data is not yet implemented, but the data is exported to a human-readable JSON format that you can then recreate if necessary.

In the near future I will go on Twitch to demonstrate how to set up exercises and lessons, and how to practice with them. I will also try to find time to demonstrate the installation process. I will record each demonstration and put it on YouTube for your future reference. You can follow me on Twitter to find out when I’m doing the demos and posting the videos.

If you try setting up a LanguageLab, please let me know how it goes! You can report bugs by creating incidents on GitHub, or you can send me an email. I’m happy to hear about problems, but I’d also like to hear success stories! And if you know some Python or Javascript, please consider writing a little code to help me add one of the features in the roadmap!

A free, open source language lab app

Viewers of the Crown may have noticed a brief scene where Prince Charles practices Welsh by sitting in a glass cubicle wearing a headset.  Some viewers may recognize that as a language lab. Some may have even used language labs themselves.

The core of the language lab technique is language drills, which are based on the bedrock of all skills training: mimicry, feedback and repetition.  An instructor can identify areas for the learner to focus on.

Because it’s hard for us to hear our own speech, the instructor also can observe things in the learner’s voice that the learner may not perceive.  Recording technology enabled the learner to take on some of the role of observer more directly.

When I used a language lab to learn Portuguese in college, it ran on cassette tapes.  The lab station played the model (I can still remember “Elena, estudante francesa, vai passar as ferias em Portugal…“), then it recorded my attempted mimicry onto a blank cassette.  Once I was done recording it played back the model, followed by my own recording.

Hearing my voice repeated back to me after the model helped me judge for myself how well I had mimicked the model.  It wasn’t enough by itself, so the lab instructor had a master station where he could listen in on any of us and provide additional feedback.  We also had classroom lessons with an instructor, and weekly lectures on culture and grammar.

There are several companies that have brought language lab technology into the digital age, on CD-ROM and then over the internet.  Many online language learning providers rely on proprietary software and closed platforms to generate revenue, which is fine for them but doesn’t allow teachers the flexibility to add new language varieties.

People have petitioned these language learning companies to offer new languages, but developing offerings for a new language is expensive.  If a language has a small user base it may never generate enough revenue to offset the cost of developing the lessons.  It would effectively be a donation to people who want to promote these languages, and these companies are for profit entities.

Duolingo has offered a work-around to this closed system: they will accept materials developed by volunteers according to their specifications and freely donated.  Anyone who remembers the Internet Movie Database before it was sold to Amazon can identify the problems with this arrangement: what happens to those submissions if Duolingo goes bankrupt, or simply decides not to support them anymore?

Closed systems raise another issue: who decides what it means to learn French, or Hindi?  This has been discussed in the context of Duolingo, which chose to teach the artificial Modern Standard Arabic rather than a colloquial dialect or the classical language of the Qur’an.  Similarly, activists for the Hawai’ian language wanted the company to focus on lessons to encourage Hawai’ians to speak the language, rather than tourists who might visit for a few weeks at most.

Years ago I realized that we could make a free, open-source language lab application.  It wouldn’t have to replicate all the features of the commercial apps, especially not initially.  An app would be valuable if it offers the basic language lab functionality: play a model, record the learner’s mimicry, play the model again and finally play the recording of the learner.

An open system would be able to use any recording that the device can play.  This would allow learners to choose the models they practice with, or allow an instructor to choose models for their students.  The lessons don’t have to be professionally produced.  They can be created for a single student, or even for a single occasion.  I am not a lawyer, but I believe they can even use copyrighted materials.

I have created a language lab app using the Django Rest Framework and ReactJS that provides basic language lab functionality.  It runs in a web browser using responsive layout, and I have successfully tested it in Chrome and Firefox, on Windows and Android.

This openness and flexibility drastically reduces the cost of producing a lesson.  The initial code can be installed in an hour, on any server that can host Django.  The monthly cost of hosting code and media can be under $25.  Once this is set up, a media item and several exercises based on it can be added in five minutes.

This reduced cost means that a language does not have to bring in enough learners to recoup a heavy investment.  That in turn means that teachers can create lessons for every dialect of Arabic, or in fact for every dialect of English.  They can create Hawai’ian lessons for both tourists and heritage speakers.  They could even create lessons for actors to learn dialects, or master impressions of celebrities.

As a transgender person I’ve long been interested in developing a feminine voice to match my feminine visual image.  Gender differences in language include voice quality, pitch contour, rhythm and word choice – areas that can only be changed through experience.  I have used the alpha and beta versions of my app to create exercises for practicing these differences.

Another area where it helps a learner to hear a recording of their own voice is singing.  This could be used by professional singers or amateurs.  It could even be used for instrument practice.  I use it to improve my karaoke!

This week I was proud to present my work at the QueensJS meetup.  My slides from that talk contain more technical details about how to record audio through the web browser.  I’ll be pushing my source to GitHub soon. You can read more details about how to set up and use LanguageLab.  In the meantime, if you’d like to contribute, or to help with beta testing, please get in touch!

Angus Grieve-Smith wears a mask of his own design, featuring IPA vowel quadrilaterals on each cheek

Show your vowels and support Doctors Without Borders!

I’m very excited about a new face mask I designed.  You can order it online!

I was inspired by two tweets I saw within minutes of each other on July Fourth.  First, Médéric Gasquet-Cyrus, a professor at Aix-Marseille, posted a picture of  his colleague Pascal Roméas wearing a “triangle vocalique” T-shirt designed by the linguistics YouTuber Romain Filstroff, known as Linguisticae. Gasquet-Cyrus’s tweet translates to “When you eat out with a phonetician colleague, you get a chance to practice your vowel quadrilateral!”

The vowel quadrilateral is one of the great data visualizations of linguistics: a two-dimensional diagram of the tongue height and position assigned to the vowel symbols of the Interneational Phonetic Alphabet, as viewed from the left side of the face.   It is also known as the vowel triangle, depending on how much wiggle room you think people have for their tongues when their mouths are fully open.  It can even be plotted based on the formant frequencies extracted from acoustic analysis.

The second was a tweet by Emily Bender, a professor at the University of Washington, about face masks with a random grid of IPA symbols on them.  These are designed by the Lingthusiasm podcast team of author Gretchen McCulloch and professor Lauren Gawne, using the same pattern as in their popular IPA scarves.

Seeing the two pictures one after the other, I realized that rather than a random grid, I could put a vowel quadrilateral on an IPA mask.  Then I realized that if I placed the quadrilateral on one side, I could get it to line up with the wearer’s mouth.  I also had to make a corresponding chart for the right side.

I decided that I wanted the money to go to a charity that was helping with COVID-19.  Doctors Without Borders has been doing good work around the world for years, and with COVID they’ve really stepped up.  Here in New York they provided support to several local organizations and operated two shower trailers in Manhattan at the height of the outbreak.

From July 16 through 29, and then from November 27 through December 28, I ran a fundraiser through Custom Ink where we raised $430 in profits for Doctors Without Borders, and masks were sent to 32 supporters.

There’s another way to get masks!  I have made a slightly different mask design available at RedBubble.com.  You can even get a mug or a phone case.  This is the same store where I’ve been selling Existential Black Swan T-shirts for years.  You can get a mask with the swan on it, if that’s your style.  None of these part of a fundraiser, but you can still donate directly to Doctors Without Borders!

Update, February 1, 2021: There are more virulent strains of COVID spreading, so medical experts are recommending that people wear three-layer masks, or wear a single or double layer mask over a disposable surgical mask.  You should know that the white-on-black Custom Ink masks sold in the fundraisers in 2020 are single layer, and the RedBubble masks sold in 2020 are double layer.  They can both be worn over surgical masks.  Both services are now offering triple-layer masks, so I’ve updated the RedBubble links to the three-layer masks, and will use three-layer masks for any future fundraisers.  Stay safe, everyone!

Seeing the Star Wars movies does not make you a Star Wars fan. Actual Star Wars fans have done some of the following: * Read the novelizations * Read books in the EU * Read new canon books * Read some comics * Watched the animated shows * Participated in SW discussion groups.

Coercing with categories

Recently some guy tweeted “Seeing the Star Wars movies does not make you a Star Wars fan. Actual Star Wars fans have done some of the following…”  This is a great opportunity for me to talk about a particular kind of category fight: coercion.

Over the past several years I’ve written about some things people try to do with categories: watchdogging, gatekeeping, pedantry, eclipsing and splitting.  Coercion is similar to gatekeeping, which is where someone highlights category boundaries with the goal of preventing free riders from accessing benefits that they are not entitled to: the example I gave was of Dr. Nerdlove defending the category of “socially awkward men” from incursion by genuinely abusive men.  He argues that these abusive men do not deserve the accommodation that is sometimes extended to men who are simply socially awkward.

Coercion is different from gatekeeping in that the person making the accusation is shifting the category boundaries.  Ed Powell knows quite well that most people’s definition of “Star Wars fan” includes people who have not done any of the six things he lists.  So why is he insisting that “Actual Star Wars fans” have all done some of those things? Because he wants to control the behavior of people who care about whether they are considered Star Wars fans.

Why would someone care about being considered a Star Wars fan?  Because fandom is often a communal affair. Fans go to movies and conventions together, and bond over their shared appreciation for Star wars.  As Powell says, they may participate in discussion groups. There’s a satisfaction people get in talking about Wookiees or midichlorians with people who share background knowledge and don’t have to ask what a protocol droid is.

I’ve also heard that some people get a sense of belonging from participating in these groups.  They may have been teased – and rejected from other groups – for being one of the few Star Wars fans in their high school, especially in the seventies and eighties.  There’s a satisfaction and relief in finally finding a group that you share so much with.

Of course, these groups are vulnerable to the dark side.  They contain people, and people aren’t necessarily nice just because they’ve been treated badly by other people.  Sometimes not even if they’re Star Wars fans. Sometimes people discover they can wield power within a group like that, and they’re not always interested in using that power for good.

One way to wield power is to be able to give people something they want – or to deny it to them.  And if people want the sense of belonging to a group, or the enjoyment of participating in group activities, it’s a source of power to be able to control who belongs to the group – and who doesn’t.  Some groups are arbitrary: in theory, the only person who gets to decide who belongs to “Brenda’s friends” is Brenda, and the only person who gets to decide who’s invited to Kevin’s party is Kevin.

Other groups are based on categories, like these Meetup groups that are hosting events tomorrow: the New York Haskell Users Group, Black Baby Boomers Just Want to have Fun, or First Time Upper West Side Moms.  Or like Star Wars fans. These groups are much less arbitrary: if a woman lives on the Upper West Side with her only child, it’s going to be hard to throw her out.

It’s hard to exclude people from a category-based group, but not impossible.  What if our First Time Upper West Side Mom is trans, or a stepmother? Or if she’s a stepmother and a first-time biological mother?  Or if she lives on 107th Street? Or if her kid is in college? Because categories are fuzzy, the power to draw category boundaries can be the power to exclude people from group membership.  If the group leader doesn’t like our hypothetical mom, all she has to do is draw the boundary of the Upper West Side at 106th Street. Sorry honey, there is no First Time Morningside Heights Moms?  Oh gee, what a shame.

The power to exclude doesn’t even need to be exercised.  It doesn’t even need to have any direct force to have a chilling effect.  Even if the head of your local Star Wars fan club totally owns Ed Powell on Twitter, you still may be wondering if people at the next regional convention are going to look at you funny because you haven’t read Dark Force Rising.

But if you’re not actually going to use this power to exclude people, what do you use it for?  This is where the coercion comes in. You can use the threat of exclusion to bully people into doing things.  And the easiest way to do that is simply to make doing those things the criteria for inclusion.

So here’s what I think happened: Ed Powell got tired of going to conferences and not having anyone to talk about novelizations and animated series with.  All they wanted to talk about was the movies (I can’t imagine why!). So how does Powell get people to read these books? He changes the criteria for what counts as an Actual Star Wars fan.  Now they have to read them, or watch the series, if they want to be Actual Star Wars fans.

Now as far as I can tell, Ed Powell is just some guy on Twitter, and has no authority to exclude anyone from any fan club.  And he seems to be getting owned by everyone. I doubt that his shaming will have an effect on the general population of Star Wars fans.  It may serve as advertising to encourage people who have read these books and watched the animated series to talk with him about them. If it doesn’t turn them off too.

Flu. What is Thisby? A wandring knight? Quin. It is the Lady, that Pyramus must loue. Fl. Nay faith: let not me play a wom?: I haue a beard c?-(ming. Quin. Thats all one: you shall play it in a Maske: and you may speake as small as you will. Bott. And I may hide my face, let me play Thisby to: Ile speake in a monstrous little voice; Thisne, Thisne, ah Py-, ramus my louer deare, thy Thysby deare, & Lady deare. Qu. No, no: you must play Pyramus: & Flute, you Thysby.

The History of English through SparkNotes

Language change has been the focus of my research for over twenty years now, so when I taught second semester linguistics at Saint John’s University, I was very much looking forward to teaching a unit focused on change.  I had been working to replace constructed examples with real data, so I took a tip from my natural language processing colleague Dr. Wei Xu and turned to SparkNotes.

I first encountered SparkNotes when I was teaching French Language and Culture, and I assigned all of my students to write a book report on a work of French literature, or a book about French language or culture.  I don’t remember the details, but at times I had reason to suspect that one or another of my students was copying summary or commentary information about their chosen book from SparkNotes rather than writing their own.

When I was in high school, my classmates would make use of similar information for their book reports.  The rule was that you could consult the Cliffs Notes for help understanding the text, but you weren’t allowed to simply copy the Cliffs Notes.

Modern Text

FLUTE
Who’s Thisbe? A knight on a quest?

QUINCE
Thisbe is the lady Pyramus is in love with.

FLUTE
No, come on, don’t make me play a woman. I’m growing a beard.

QUINCE
That doesn’t matter. You’ll wear a mask, and you can make your voice as high as you want to.

BOTTOM
In that case, if I can wear a mask, let me play Thisbe too! I’ll be Pyramus first: “Thisne, Thisne!”—And then in falsetto: “Ah, Pyramus, my dear lover! I’m your dear Thisbe, your dear lady!”

QUINCE
No, no. Bottom, you’re Pyramus.—And Flute, you’re Thisbe.

When I discovered SparkNotes I noticed that for some older authors – Shakespeare, of course, but even Dickens – they not only offered summaries and commentary, but translations of the text into contemporary English.  It was this feature I drew on for the unit on language change.

While I was developing and teaching this second semester intro linguistics course at Saint John’s, I was also working as a linguistic annotator for an information extraction project in the NYU Computer Science Department.  I met a doctoral student, Wei Xu, who was studying a number of interesting corpora, including Twitter, hip-hop and SparkNotes. Wei graduated in 2014, and is now Assistant Professor of Computer Science and Engineering at Ohio State.

Wei had realized that the modern translations on SparkNotes and eNotes, combined with the original Shakespearean text, formed a parallel corpus, a collection of texts in one language variety that are paired with translations in another language variety.  Parallel corpora, like the Canadian Hansard Corpus of French and English parliamentary debates, are used in translation studies, including for training machine translation software. Wei used the SparkNotes/eNotes parallel Shakespeare corpus to generate Shakespearean-style paraphrases of contemporary movie lines, among other things.

When it came time to teach the unit on language change at Saint John’s, I found a few small exercises that asked students to compare older literary excerpts with modern translations.  Given the constraints of this being one unit in a survey course, it made sense to focus on the language of instruction, English. The Language Files had one such exercise featuring a short Chaucer passage.  In general, when working with corpora I prefer to look at larger segments, ideally an entire text but at minimum a full page.

I realized that I could cover all the major areas of language change – phonological, morphological, syntactic, semantic and pragmatic – with these texts.  Linguists have been able to identify phonological changes from changes in spelling, for example that Chaucer’s spelling of “when” as “whan” indicates that we typically put our tongues in a higher place in our mouths when pronouncing the vowel of that word than people did in the fourteenth century.

When teaching Shakespeare to college students it is common to use texts with standardized spelling, but we now have access to scans of Shakespeare’s work as it was first published in his lifetime or shortly after his death, with the spellings chosen by those printers.  This spelling modernization is even practiced with some nineteenth century authors, and similarly we have access to the first editions of most words through digitization projects like Google Books.

With this in mind, I created exercises to explore language change.  For a second semester intro course the students learned a lot from a simple scavenger hunt: compare a passage from the SparkNotes translation of Shakespeare with the Quarto, find five differences, and specify whether they are phonological, morphological, syntactic, semantic or pragmatic.  In more advanced courses stufents could compare differences more systematically.

This comparison is the kind of thing that we always do when we read an old text: compare older spellings and wordings with the forms we would expect from a more modern text.  Wei Xu showed us that the translations and spelling changes in SparkNotes and eNotes can be used for a more explicit comparison, because they are written down based on the translators’ and editors’ understanding of what modern students will find difficult to read.

As I have detailed in my forthcoming book, Building a Representative Theater Corpus, we must be careful not to generalize universal statements, including statements about prevalence, to the language as a whole.  This is especially problematic when we are looking at authors who appealed to elite audiences, but it applies to Shakespeare and Dickens as well.  Existential observations, such as that Shakespeare used bare not (“let me not”) in one instance where SparkNotes used do-support (“don’t let me”) are much safer.

My students seemed to learn a lot from this technique.  I hope some of you find it useful in your classrooms!

What is “text” for a sign language?

I started writing this post back in August, and I hurried it a little because of a Limping Chicken article guest written by researchers at the Deafness, Cognition and Language Research Centre at University College London. I’ve known the DCAL folks for years, and they graciously acknowledged some of my previous writings on this issue. I know they don’t think the textual form of British Sign Language is written English, so I was surprised that they used the term “sign-to-text” in the title of their article and in a tweet announcing the article. After I brought it up, Dr. Kearsy Cormier acknowledged that there was potential for confusion in that term.

So, what does “sign-to-text” mean, and why do I find it problematic in this context? “Sign-to-text” is an analogy with “speech-to-text,” also known as speech recognition, the technology that enables dictation software like DragonSpeak. Speech recognition is also used by agents like Siri to interpret words we say so that they can act on them.

There are other computer technologies that rely on the concept of text. Speech synthesis is also known as text-to-speech. It’s the technology that enables a computer to read a text aloud. It can also be used by agents like Siri and Alexa to produce sounds we understand as words. Machine translation is another one: it typically proceeds from text in one language to text in another language. When the DCAL researchers wrote “sign-to-text” they meant a sign recognition system hooked up to a BSL-to-English machine translation system.

Years ago I became interested in the possibility of applying these technologies to sign languages, and created a prototype sign synthesis system, SignSynth, and an experimental English-to-American Sign Language system.

I realized that all these technologies make heavy use of text. If we want automated audiobooks or virtual assistants or machine translation with sign languages, we need some kind of text, or we need to figure out a new way of accomplishing these things without text. So what does text mean for a sign language?

One big thing I discovered when working on SignSynth is that (unlike the DCAL researchers) many people really think that the written form of ASL (or BSL) is written English. On one level that makes a certain sense, because when we train ASL signers for literacy we typically teach them to read and write English. On another level, it’s completely nuts if you know anything about sign languages. The syntax of ASL is completely different from that of English, and in some ways resembles Mandarin Chinese or Swahili more than English.

It’s bad enough that we have speakers of languages like Moroccan Arabic and Fujianese that have to write in a related language (written Arabic and written Chinese, respectively) that is different in non-trivial ways that take years of schooling to master. ASL and English are so totally different that it’s like writing Korean or Japanese with Chinese characters. People actually did this for centuries until someone smart invented hangul and katakana, which enabled huge jumps in literacy.

There are real costs to this, serious costs. I spent some time volunteering with Deaf and hard-of-hearing fifth graders in an elementary school, and after years of drills they were able to put English words on paper and pronounce them when they saw them. But it became clear to me that despite their obvious intelligence and curiosity, they had no idea that they could use words on paper to send a message, or that some of the words they saw might have a message for them.

There are a number of Deaf people who are able to master English early on. But from extensive reading and discussions with Deaf people, it is clear to me that the experience of these kids is typical of that for the vast majority of Deaf people.

It is a tremendous injustice to a child, and a tremendous waste of that child’s time and attention, for them to get to the age of twelve, at normal intelligence, without being able to use writing. This is the result of portraying English as the written form of ASL or BSL.

So what is the written form of ASL? Simply put, it doesn’t have one, despite several writing systems that have been invented, and it won’t have one until Deaf people adopt one. There will be no sign-to-text until signers have text, in their language.

I can say more about that, but I’ll leave it for another post.

Theories are tools for communication

I’ve written in the past about instrumentalism, the scientific practice of treating theories as tools that can be evaluated by their usefulness, rather than as claims that can be evaluated as true or false. If you haven’t tried this way of looking at science, I highly recommend it! But if theories are tools, what are they used for? What makes a theory more or less useful?

The process of science starts when someone makes an observation about the world. If we don’t understand the observation, we need to explore more, make more observations. We make hypotheses and test them, trying to get to a general principle that we can apply to a whole range of situations. We may then look for ways to apply this principle to our interactions with the world.

At every step of this process there is communication. The person who makes the initial observation, the people who make the further observations, who make the hypotheses, who test them, who who generalize the findings, who apply them: these are usually multiple people. They need to communicate all these things (observations, hypotheses, applications) to each other. Even if it’s one single person who does it all end to end, that person needs to communicate with their past and future selves, in the form of notes or even just thinking aloud.

These observations, hypotheses and applications are always new, because that’s what science is for: processing new information. It’s hard to deal with new information, to integrate it with the systems we already have for dealing with the world. What helps us in this regard are finding similarities between the new information and things we already know about the world. Once we find those similarities, we need to record this for our own reference and to signal it to others: other researchers, technologists and the rest of the population.

In informal settings, we already have ways of finding and communicating similarities between different observations. We use similes and metaphors: a person’s eyes may be blue like the sky, not blue like police lights. These are not just idle observations, though: the similarities often have implications for how we respond to things. If someone is leaving a job and they say that they’re passing the baton to a new person, they are signaling a similarity between their job and a relay race, and the suggestion is that the new person will be expected to continue towards the same goal the way a relay runner continues along the racecourse.

Theories and models are just formalized versions of metaphors: saying that light is a wave is a way of noting that it can move through the air like a wave moves through water. That theory allowed scientists to predict that light would diffract around objects the way that water waves behave when they encounter objects, a testable hypothesis that has been confirmed. This in turn allowed technologists to design lasers and other devices that took advantage of those wavelike properties, applications that have proven useful.

Here’s a metaphor that will hopefully help you understand how theories are communication tools: another communication tool is a photograph. Sometimes I see a photograph of myself and I notice that I’ve recently lost weight. Let’s say that I have been cutting back on snacks and I see a photo like that. I have other tools for discovering that I’ve lost weight, like scales and measuring tape and what I can observe of my body with my own eyes, but seeing a photo can communicate it to me in a different way and suggest that if I continue cutting back on snacks I will continue to lose weight. Similarly, if I post that photo on Facebook my friends can see that I’ve lost weight and understand that I’m going to continue to cut back on snacks.

A theory is like a photograph in that there is no single best photograph. To communicate my weight loss I would want a photo that shows my full body, but to communicate my feelings about it, a close-up on my face might be more appropriate. Friends of mine who get new tattoos on their legs will take close-ups of the tattoos. We may have six different photos of the exact same thing (full body, face or leg, for example), and be satisfied with them all. Theories are similar: they depend entirely on the purpose of communication.

A theory is like a photograph in that the best level of detail depends on what is being communicated and who the target is. If a friend takes a close-up of four square inches of their calf, that may be enough to show off their new tattoo, but a close-up of four square inches of my calf will probably not tell me or anyone else how much weight I’ve lost. Similarly, if I get someone to take an aerial photograph of me, that may indicate where I am at the time, but it will not communicate much about my weight. This applies to theories: a model with too much detail will simply swamp the researchers, and one with too little will not convey anything coherent about the topic.

A theory is like a photograph in that its effectiveness depends on who is on the other end of the communication. If someone who doesn’t know me sees that picture, they will have no idea how much I weighed before, or that my weight has been affecting my health. They will just see a person, and interpret it in whatever way they can.

A photograph may not be the best way to communicate my weight loss to my doctor. Their methods depend on measurable benchmarks, and they would prefer to see actual measurements made with scales or tape. On the other hand, a photo is a better way to communicate my weight loss to my Facebook friends than posting scale and tape measurements on Facebook, because they (or some of them at least) are more concerned with the overall way I look.

A theory’s effectiveness similarly depends on its audience. Population researchers may be familiar with the theories of Alfred Lotka and Vito Volterra, so if I tell them that ne…pas in French follows a Lotka-Volterra model, they are likely to understand. Chemists have probably never heard of Lotka or Volterra, so if I tell them the same thing I’m likely to get a blank stare.

This means that there is no absolute standard for comparing theories. We are never going to find the best theory. We may be able to compare theories for a particular purpose, with a particular level of detail, aimed at a particular audience, but even then there may be several theories that work about as well.

When I tell people about this instrumental approach to scientific theories and models, some of them get anxious. If there’s no way for theories to be true or false, how can we ever have a complete picture of the universe? The answer is that we can’t. Kurt Gödel showed decades ago with his Incompleteness Theorem that no theory or model can ever completely capture reality, not even a mathematical or computer model. Jorge Luis Borges illustrated it with his story of the map that is the same size as the territory.

Science is not about finding out everything. It’s not about getting a complete picture. That’s because reality is too big and complex for our understanding, or for the formal systems that our computers are based on. It’s just about figuring out more than we knew before. It will never be finished. And that’s okay.

Le Corpus de la scène parisienne

C’est l’année 1810, et vous vous promenez sur les Grands Boulevards de Paris. Vous avez l’impression que toute la ville, voir même toute la France, a eu la même idée, et est venue pour se promener, pour voir les gens et se faire voir. Qu’est-ce que vous entendez?

Vous arrivez à un théâtre, vous montrez un billet pour une nouvelle pièce, et vous entrez. La pièce commence. Qu’est-ce que vous entendez de la scène? Quels voix, quel langage?

Le projet du Corpus de la scène parisienne cherche à répondre à cette dernière question, avec l’idée que cela nous informera sur la première question aussi. Il s’appuie sur les travaux du chercheur Beaumont Wicks et des ressources comme Google Books et le projet Gallica de la Bibliothèque Nationale de France pour créer un corpus vraiment représentatif du langage du théâtre parisien.

Certains corpus sont construits à base d’une «principe d’autorité», qui tend à mettre les voix des aristocrates et des grands bourgeois au premier plan. Le Corpus de la Scène Parisienne corrige ce biais par se baser sur une échantillon tirée au sort. En incorporant ainsi le théâtre populaire, le Corpus de la Scène Parisienne permet au langage des classes ouvrières, dans sa représentation théâtrale, de prendre sa place dans le tableau linguistique de cette période.

La première phase de construction, qui couvre les années 1800 à 1815, a déjà contribué à la découverte des résultats intéressants. Par exemple, dans le CSP en 75% des négations de phrase on utilise la construction ne … pas, mais dans les quatre pièces de théâtre qui font partie du corpus FRANTEXT de la même période, on n’utilise ne … pas qu’en 49% des négations de phrase.

En 2016 j’ai créé un dépôt sur GitHub et commencé à y mettre les textes de la première phase en format HTML. Vous pouvez en lire pour vous amuser (Jocrisse-Maître et Jocrisse-Valet en particulier m’a amusé), les mettre sur scène (j’achèterai des places) ou bien les utiliser pour vos propres recherches. Peut-être vous voudriez aussi contribuer au dépôt, par corriger des erreurs dans les textes, ajouter de nouveaux textes du catalogue, ou convertir les textes en de nouveaux formats, comme TEI ou Markdown.

En janvier 2018 j’ai créé le bot spectacles_xix sur Twitter. Chaque jour il diffuse les descriptions des pièces qui ont débuté ce jour-là il y a exactement deux cents ans.

N’hésitez pas à utiliser ce corpus dans vos recherches, mais je vous prie de ne pas oublier de me citer, ou même me contacter pour discuter des collaborations éventuelles!

Deaf scholar Ben Bahan gives a lecture about Deaf architecture

Teaching sign linguistics in introductory classes

Language is not just spoken and written, and even though I’ve been working mostly on spoken languages for the past fifteen years, my understanding of language has been tremendously deepened by my study of sign languages. At the beginning of the semester I always asked my students what languages they had studied and what aspects of language they wanted to know more about, and they were always very interested in sign language. Since they had a professor with training and experience in sign linguistics it seemed natural to spend some time on it in class.

Our primary textbook, by George Yule,contains a decent brief overview of sign languages. The Language Files integrates sign language examples throughout and has a large section on sign phonetics. I added a lecture on the history of sign languages in Europe and North America, largely based on Lane, Hoffmeister and Bahan’s Journey Into the Deaf-World (1996), and other information I had learned over the years.

I also felt it was important for my students to actually observe a sign language being used to communicate and to express feeling, so I found an online video of an MIT lecture by psychologist and master storyteller (and co-author of Journey Into the Deaf-World) Ben Bahan. Bahan’s talk does not focus exclusively on language, but demonstrates the use of American Sign Language well, and the English interpretation is well done.

Studying a video lecture is a prime candidate for “flipped classroom” techniques, but I never got around to trying that. We watched the video in class, but before starting the video I assigned my students a simple observation task: could they find examples of the four phonological subsystems of American Sign Language – lexical signs, fingerspelling, depicting signs and nonmanual gestures?

Some of the students were completely overwhelmed by the task at first, but I made it clear that this was not a graded assignment, only introductory exploration. Other students had had a semester or more of ASL coursework, and the students with less experience were able to learn from them. Bahan, being Ben Bahan, produces many witty, thought-provoking examples of all four subsystems over the course of the lecture.

The phonological subsystems are among the easiest sign language phenomena for a novice to distinguish, but as we watched the video I pointed out other common features of ASL and other sign languages, such as topic-comment structures and stance-shifting.

Later, when I started teaching Introduction to Phonology, we had the opportunity to get deeper into sign language phonology. I’ll cover that in a future post.

Indistinguishable from magic

You might be familiar with Arthur C. Clarke’s Third Law, “Any sufficiently advanced technology is indistinguishable from magic.” Clarke tucked this away in a footnote without explanation, but it fits in with the discussion of magic in Chapter III of James Frazer’s magnum opus The Golden Bough. These two works have shaped a lot of my thoughts about science, technology and the way we interact with our world.

Frazer lays out two broad categories of magic, homeopathic magic and contagious magic. Homeopathic magic follows the Law of Similarity, and involves things like creating effigies of people in order to hurt them, and keeping red birds to cure fever. Contagious magic follows the Law of Contact, and involves things like throwing a child’s umbilical cord into water to improve the child’s swimming abilities later in life, or a young woman planting a marigold into dirt taken from a man’s footprint to help his love for her grow.

Frazer is careful to observe that the Laws of Similarity and Contact are widespread cognitive patterns that people use to understand their environments. In semantics we know them as the foundation of the processes of metaphor and metonymy, respectively. He notes that sympathetic magic’s “fundamental conception is identical with that of modern science: underlying the whole system is a faith, implicit but real and firm, in the order and uniformity of nature.”

In this both science and magic stand in contrast to religion: “if religion involves, first, a belief in superhuman beings who rule the world, and second, an attempt to win their favour, it clearly assumes that the course of nature is to some extent elastic or variable, and that we can persuade or induce the mighty beings who control it to deflect, for our benefit, the current of events from the channel in which they would otherwise flow.” After this Frazer engages in some sloppy thinking, concluding that because religion seems to have arisen after magic it must be an improvement over what the “savages” do. He also fails to complete the fourth quadrant of his taxonomy: that as science is to magic, social sciences are to religion.

The key difference between magic and science (and between religion and social science) is the element of faith. The potion brewer doesn’t check to see that there is a logical explanation for the inclusion of certain ingredients. If the potion fails, she must have gotten impure ingredients, or misread the incantation. A scientist looks for explanations as to why a medicine works when it works, and why it fails when it fails.

Some of you may be thinking that Clarke’s quote was about technology, not science. I first learned of technology as “applied science,” which should mean that it’s no more faith-based than science itself. In practice, it is not possible to understand every tool we use. In fact, it’s not even possible for a human to completely understand a single tool, in all its complexity.

My stepfather was a carpenter. When I was first taught to hammer a nail, I started out by picking the hammer up and putting it down on the nail, vertically. I had to be shown how to swing the hammer to take advantage of the angular momentum of the hammer head. It took another layer of learning to know that I could swing from my wrist, elbow or shoulder to customize the force of the hammer blow to the task at hand, and then another to get a sense of the various types of hammers available, not to mention the various types of nails. In a home improvement project several years ago I discovered that, as electric screwdrivers have gotten smaller and lighter, practices have changed and people use screws in situations where nails used to be more common.

My stepfather might at some point have explained to me why his hammer heads were steel and not iron, and the handles were hardwood and not softwood, metal or fiberglass, but his explanations did not go to the molecular level, much less the atomic or quantum levels. To be honest, all I needed to know was “steel is solid, heavy and doesn’t rust” and “hardwood is solid but absorbs some of the impact.” The chance that the molecular or subatomic structure of the hammers would affect our work beyond that was so small that it wasn’t worth spending time on.

At the beginning I didn’t even need to know that much. All I needed to know was that my stepfather had handed me this hammer and these nails, and told me to nail those two boards together at that angle. I had to trust his expertise. As I began to get comfortable, I started asking him questions and trying things slightly different ways. Eventually people get to the point of saying, “Why not a fiberglass handle?” and even “Why not an electric screwdriver?” But at first it’s 99 percent faith-based.

That’s how the technology of hammers and nails and houses works, but the same principles apply to technologies that many people take for granted, like pencils (we know how to sharpen them, but how many of us know how to mine graphite?) and clothing (some of us can darn a sock, and some of us can knit a scarf, but how many of us have even seen any of the machines that produce shoelaces, or Spanx?). We take it on faith that the pencils will write like they’re supposed to, and that socks will keep our feet warm.

This, then, is what Clarke meant when he talked about technology being indistinguishable from magic. Yes, Sprague de Camp portrayed ancient Romans mistaking explosives for magic in his 1939 novel Lest Darkness Fall (which explicitly invokes the sympathetic and contagious forms of magic described by Frazer). And the magically moving photographs described by J.K. Rowling in Harry Potter and the Philosopher’s Stone have become real technology just twenty years later, omnipresent in the UK and the United States.

But beyond the simple resemblance between technology and magic, if someone is not inclined to be critical or scientific, their relationship to technology is functionally the same as it would be to magic. If the technology is sufficiently advanced, people can do the same things they’ve always done. They don’t need to “get under the hood” (now there’s an example of non-magical technology!) because it seems to work most of the time,

On the other hand, our faith is not blind. I had faith in my stepfather to teach me carpentry because my mother and I had lived with him and trusted him, and seen his work. I also learned to have faith in cars to get me places safely, but as I learned more about kinematics and human attention, and as I was confronted with more evidence of the dangers of this technology, I realized that my faith was misplaced and revised my habits.

Our faith in these technologies is based on a web of trust: I trusted my stepfather when he told me that if I hit the nails this way they would securely fasten the pieces of this house together and if properly maintained, it wouldn’t fall down on us. He in turn trusted his training from other carpenters and recommendations from other professionals in related fields, which were then corroborated, revised and extended by his experiences.

I want to stress here that these methods were also supported by scientific studies of materials and manufacturing. Over the millennia, carpenters, architects and other craftspeople have tried using different materials, different structures, different techniques. Some worked better, some didn’t work so well. They’ve taken the materials apart to figure out what makes them strong in some ways and flexible in other ways. This is an ongoing process: vinyl siding may have seemed like a good idea at the time, but it can pollute if burned or discarded.

That is how you tell the difference between technology and magic: every aspect of the technology is open to question and revision. With magic, you may be able to try new things or test the existing methods, but beyond a certain point there is no more trying or testing, there is only faith.