It’s well known that some languages have multiple national standards, to the point where you can take courses in either Brazilian or European Portuguese, for example. Most language instruction services seem to choose one variety per language: when I studied Portuguese at the University of Paris X-Nanterre it was the European variety, but the online service Duolingo only offers the Brazilian one.
I looked into some of Duolingo’s offerings for this post, because they’re the most talked about language instruction service these days. I was surprised to discover that they use no recordings of human speakers; all their speech samples are synthesized using an Amazon speech synthesis service named Polly. Interestingly, even though Duolingo only offers one variety of each language, Amazon Polly offers multiple varieties of English, Spanish, Portuguese and French.
As an aside, when I first tried Duolingo years ago I had the thought, “Wait, is this synthesized?” but it just seemed too outrageous to think that someone would make a business out of teaching humans to talk like statistical models of corpus speech. It turns out it wasn’t too outrageous, and I’m still thinking through the implications of that.
Synthesized or not, it makes sense for a company with finite resources to focus on one variety. But if that one company controls a commanding market share, or if there’s a significant amount of collusion or groupthink among language instruction services, they can wind up shutting out whole swathes of the world, even while claiming to be inclusive.
This is one of the reasons I created an open LanguageLab platform: to make it easier for people to build their own exercises and lessons, focusing on any variety they choose. You can set up your own LanguageLab server with exercises exclusively based on recordings of the English spoken on Smith Island, Maryland (population 149), if you like.
So what about excluded varieties with a few more speakers? I made a table of all the Duolingo language offerings according to their number of English learners, along with the Amazon Polly dialect that is used on Duolingo. If the variety is only vaguely specified, I made a guess.
For each of these languages I picked another variety, one with a large number of speakers. I tried to find the variety with the largest number of speakers, but these counts are always very imprecise. The result is an imagined alternate language service, one that does not automatically privilege the speakers of the most influential variety. Here are the top ten:
|Language||Duolingo dialect||Alternate dialect|
To show what could be done with a little volunteer work, I created a sample lesson for a language that I know, the third-most popular language on Duolingo, French. After France, the country with the next largest number of French speakers is Canada. Canadian French is distinct in pronunciation, vocabulary and to some degree grammar.
Canadian French is stigmatized outside Canada, to the point where I’m not aware of any program in the US that teaches it, but it is omnipresent in all forms of media in Canada, and there is quite a bit of local pride. These days at least, it would be as odd for a Canadian to speak French like a Parisian as for an American to speak English like a Londoner. There are upper and lower class accents, but they all share certain features, notably the ranges of the nasal vowels.
I chose a bestselling author and television anchor, Michel Jean, who has one grandmother from the indigenous Innu people and three presumably descended from white French settlers. I took a small excerpt from an interview with Jean about his latest novel where he responds spontaneously to the questions of a librarian, Josianne Binette.
The sample lesson in Canadian French based on Michel Jean’s speech is available on the LanguageLab demo site. You are welcome to try it! Just log in with the username demo and the password LanguageLab.