Screenshot of the "Compose new Tweet" modal on Twitter, with the "+" button and a tooltip reading "Add another Tweet". The tweet texts reads "blah blah blah bl"

Dialogue and monologue in social media

I wrote most of this post in June 2022, before a lot of us decided to try out Mastodon. I didn’t publish it because I despaired of it making a difference. It felt like so many people were set in particular practices, including not reading blog posts! My experience on Mastodon has been so much better than the past several years on Twitter. I think this is connected with how Twitter and Mastodon handle threads.

A few years ago I wrote a critique of Twitter threads, tweetstorms, essays, and similar forms. I realize now that I didn’t actually talk much about what’s wrong with them. I focused on how difficult they are to read, but I didn’t realize how the native Twitter website and app actually makes them easier to read. So let me tell you some of the deeper problems with threads.

In 2001 I visited some of the computational linguistics labs at Carnegie Mellon University. Unfortunately I don’t remember the researchers’ names, but they described a set of experiments that has informed my thinking about language ever since. They were looking at the size of the input box in a communication app.

These researchers did experiments where they asked people to communicate with each other using a custom application. They presented different users with input boxes of different sizes: some got only a single line, others got three or four, and maybe some got six or eight lines.

What they found was that when someone was presented with a large blank space, as in an email application or the Google Docs application I’m writing this in, they tended to take their time and write long blocks of text, and edit them until they were satisfied. Only then did they hit send. Then the other user would do the same.

When the Carnegie Mellon researchers presented users with only one line, as in a text message app, their behavior was much different. They wrote short messages and sent them off with minimal editing. The short turnaround time resulted in a dialogue that was much closer to the rhythm of spoken conversation.

This echoed my own findings from a few years before. I was searching for features of French that I heard all over the streets of Paris, but had not been taught to me in school, in particular what linguists call right dislocation (“Ils sont fous, ces Romains”) and left dislocation (“L’état, c’est moi”).

In 1998 the easiest place to look was USENET newsgroups, and I found that even casual newsgroups like fr.rec.animaux were heavy on the formal, carefully crafted types of messages I remembered from high school French class. I had already read some prior research on this kind of language variation, so I decided to try something with faster dialogue.

In Internet Relay Chat (IRC) I hit the jackpot. On the IRC channel, left and right dislocations made up between 21% and 38% of all finite clauses. I noticed other features of conversational French like ne-dropping were common as well. I could even see IRC newbies adapting in real time: they would start off trying to write formal sentences the way they were taught in lycée, and soon give up and start writing the way they talked.

At this point I have to say: I love dialogue. Don’t get me wrong: I can get into a nice well-crafted monologue or monograph. And anyone who knows me knows I enjoy telling a good story or tearing off on a rant about something. But dialogue keeps me honest, and it keeps other people honest too.

Dialogue is not inherently or automatically good. On Twitter as in many other places, it is used to harass and intimidate. But when properly structured and regulated it can be a democratizing force. It’s important to remember how long our media has been dominated by monologues: newspapers, films, television. Even when these formats contain dialogues, they are often fictional dialogues written by a single author or team of authors to send a single message.

One of my favorite things about the internet is that it has always favored dialogue. Before large numbers of people were on the internet there was a large gap between privileged media sources and independent ones. Those of us who disagreed with the monologues being thrust upon us by television and newspapers were often reduced to impotently talking back at those powerful media sources, in an empty room.

USENET, email newsletters, personal websites and blogs were democratizing forces because they allowed anyone who could afford the hosting fees (sometimes with the help of advertisers) to command these monologic platforms. They were the equivalent of Speakers’ Corner in London. They were like pamphlets or letters to the editor or cable access television, but they eliminated most of the barriers to entry. But they were focused on monologues.

In the 1990s and early 2000s we had formats that encouraged dialogue, like mailing lists and bulletin boards, but they had large input boxes. As I saw on fr.rec.animaux in 1998, that encouraged long, edited messages.
We did have forums with smaller input boxes, like IRC or the group chats on AOL Instant Messenger. As I found, those encouraged people to write short messages in dialog with each other. When I first heard about Twitter with its 140-character limit I immediately recognized it as a dialogic forum.

But what sets Twitter apart from IRC or AOL Instant Messenger? Twitter is a broadcast platform. The fact that every tweet is public by default, searchable and assigned a unique URL, makes it a “microblog” site like some popular sites in China.

If someone said something on IRC or AIM in 1999 it was very hard to share it outside that channel. I was able to compile my corpus by creating a “bot” that logged on to the channel every night and logged a copy of all the messages. What Twitter and the sites it copied like Weibo brought was the combination of permanent broadcast, low barrier to entry, and dialogue.

This is why I’m bothered by Twitter threads, by screenshots of text, by the unending demands for an edit button. These are all attempts to overpower the dialogue on Twitter, to remove one of the key elements that make it special.

Without the character limits, Twitter is just a blogging platform. Of course, there’s nothing wrong with blogs! I’ve done a lot of blogging, I’ve done a lot of commenting on blogs and I’ve tweeted a lot of links to blogs. But I want to choose when to follow those links and go read those blog posts or news articles or press releases.

I want a feed full of dialogue or short statements. Threads and screenshots interrupt the dialogue. They aggressively claim the floor, crowding out other tweets. Screenshots interrupt the other tweets with large blocks of text, demanding to be read in their entirety. Threads take up even more of the timeline. The Twitter web app will show as many as three tweets of a thread, interrupting the flow of dialogue.

The experience of threads is much worse on Twitter clients that don’t manipulate the timeline, like TweetDeck (which was bought by Twitter in 2011) and HootSuite. If it’s a long thread, your timeline is screwed, and you have to scroll endlessly to get past it.

One of the things I love the most about Mastodon is the standard practice of making the first toot in a thread public, but publishing all the other toots as unlisted. That broadcasts the toot announcing the thread, and then gives readers the agency to decide whether they want to read the follow-up toots. It’s more or less the equivalent of including a link to a web page or blog post in a toot.

There’s a lot more to say about dialogue and social media, but for now I’m hugely encouraged by the feeling of being on Mastodon, and I’m hoping it leads us in a better direction for dialogue, away from threads and screenshots.

WASHINGTON, DC - OCTOBER 20: Actress and model Paris Hilton speaks during a news conference outside the U.S. Capitol October 20, 2021 in Washington, DC. Congressional Democrats held a news conference with Hilton to discuss child abuse and legislation to establish a “bill of rights” to protect children placed in congregate care facilities. (Photo by Alex Wong/Getty Images)

Listen to the voices of the sexy babies

A few days ago, Byron Ahn drew our attention to an excerpt from a new, six-hour audiobook, Inside Voice by Lake Bell, credited as an “actress/writer/director/producer.” Bell is a friend of author and podcaster Malcolm Gladwell, and Gladwell agreed to serve as a kind of sounding board for Bell’s ideas about something she calls “sexy baby voice,” pointing to the voices of Paris Hilton and Kim Kardashian as paradigm examples of it. Gladwell, whose company is publishing Inside Voice, also published this excerpt as a free bonus episode of his podcast Revisionist History, which I listen to regularly, although I’m almost two years behind.

Bell argues for a few points: that what she calls “sexy baby voice” is a distinct speech style with specific audible features, that it is particularly inauthentic (she claims several times that it requires effort to speak that way, and describes a coaching technique for helping women to find their “true” voices) and that it makes them sound stupider than Bell knows them to be. She repeatedly assures us that she is not passing judgment, and then uses extremely judgmental language to describe “sexy baby voice,” which I interpret as an application of “love the sinner, hate the sin.”

Ahn posted a series of Twitter threads about the excerpt. He notes that it’s problematic for Bell to criticize women as a self-identified feminist, but he focuses on the terminology that she uses to describe the features of “sexy baby voice,” particularly the word “pitch.” He concludes, “we should encourage public figures talking about voices to consult linguists who have the training.”

I’ve got a lot of thoughts and feelings about this excerpt and Bell’s idea of “sexy baby voice.” I could probably write several blog posts on the practical, cultural and social angles to this. For this post I’m going to keep with Ahn’s focus on what “sexy baby voice” is, phonetically. I sketched some of this out on Ahn’s Twitter thread, and I’ll synthesize and expand that here.

Bell says that the primary feature that defines “sexy baby voice” is “pitch,” and as linguists, we’re trained to interpret “pitch” as the fundamental frequency of the voice – essentially, the lowest pitch produced by the voice at any given time. I’ve been taking singing lessons, and all the singers and singing teachers I’ve talked to use “pitch” in the same way.

Ahn introduces his discussion of the “sexy baby voice” excerpt with a graph of the fundamental frequency of a segment of the recording – throughout the excerpt, Bell uses her own voice to demonstrate the “sexy baby voice” style, even though she says she does not use it in everyday conversation. In the graph he posts, the floor and ceiling of Bell’s fundamental frequency range are not particularly higher when she is using “sexy baby voice” than at other times.

Bell mentions two other factors: “vocal fry” (the linguistic term is “creaky voice”) and “slurring” speech. Ahn speculates that she may be picking up on other factors as well, like “SoCal vowels” or laryngeal constriction. He also acknowledges that “pitch” may refer to other pitch-related features besides fundamental frequency range, such as “uptalk,” a pattern of rising in fundamental frequency at the ends of phrases. Gladwell uses the word “uptalk” when echoing Bell’s explanations, but it’s not clear that he’s referring to phrase-final pitch rise.

So here’s where I come in: my gender expression is fluid, so I’ve been studying differences in vocal quality. When I listen to the samples in the chapter of “sexy baby voice” and … not-sexy-baby-voice (that’s for another post!) given by Bell, both in recordings and her own mimicry, I hear some creaky voice (“vocal fry”), but the main difference I hear is resonance.

This section is going to be a bit of a departure from my normal linguistics blogging, because I have not studied any of the literature on this. My understanding of it comes from practical training, so I don’t know who to cite or credit for any of this besides my teachers, Kristy Bissell and Erin Carney.  Of course, any inaccuracies are most likely due to my misunderstanding of what they’ve tried to teach me!

Resonance is about the pitch of speech, but it’s not about the fundamental frequency. It’s about everything else: the harmonics that result from the way the tones from our vocal folds echo around our bodies and are filtered through different parts of our vocal tracts and nasal passages. Just as plucking a string on an acoustic guitar produces overtones from the guitar body, whenever we arrange our vocal folds to talk or sing we produce overtones: higher pitched frequencies that can harmonize or clash with the fundamental frequency.

There are a ton of things you can do with resonance and it can get really complicated, so let’s focus on the primary resonance difference I’m hearing between Lake Bell’s “sexy baby voice” and the other examples. To me, the “sexy baby voice” examples sound brighter.

Bright and dark are useful terms to evoke the quality of resonance while distinguishing it from fundamental frequency. Bright sounds are ones where we hear more of the higher-pitched harmonics, while in dark sounds the lower harmonics dominate.

As I’ve learned from my teachers, and as Bell demonstrates, there’s a lot we can do with our voices to shift the balance of harmonics towards light or dark, but a substantial part of resonance comes form the structure of our bones, cartilage, muscles and fat. Higher-pitched harmonics tend to come from shorter vocal tracts, smaller nasal cavities, and in general, from smaller bodies. As a result, the voices of smaller people tend to sound brighter.

Testosterone during the teenage years also changes the configuration of our vocal tracts: thickening the vocal folds, making the larynx larger and shifting it lower in the throat. This is why men’s and trans women’s voices tend to sound darker than those of women, girls and prepubescent boys, even when singing the same pitch.

Bodies that see an increase in testosterone after puberty do not get larger or lower larynxes, but do tend to develop thicker vocal folds. This is why many trans men’s voices change, but often sound different from typical men’s voices. It is also, as Bell mentions, why women’s voices often change when they give birth or go through menopause.

As you might have guessed, this is where the “baby” in “sexy baby voice” comes from. Children are smaller than adults and tend to have brighter resonances. It’s also why Bell sees “sexy baby voice” as an exaggerated expression of femininity: women tend to be smaller than men and therefore have brighter voices. Women who haven’t given birth or gone through menopause tend to have brighter voices. Bright resonance suggests youth, femininity and immaturity.

As I mentioned above, there are several things that people can do, consciously or unconsciously, to shift their resonances, and I want to talk about them. I would also love to get into a discussion of the sociopolitical issues that Bell identifies around “sexy baby voice” and women’s voices in general. But this is already pretty long for a blog post, so I’ll save those for another time.