Leave the Em Dash Alone

Leave the Em Dash Alone

What are the signs, for you, that text was generated by Artificial Intelligence?

My first big run-in with this was when I was exploring GPT-4 through writing fantasy, and every other sentence included the phrase, "weaving a tapestry". ChatGPT had some kind of weird, sycophantic relationship with tapestry weaving. (I even gave a talk about this at a local tech conference in 2024, pointing out that the current architectures simply cannot and will not generate good fiction, in part due to this repetitiveness and predictability.)

For as long as generative AI as been around, there has also been the study of "generative AI detection".

Researchers have attempted to develop automated ways of checking text for AI generated-ness.

There are some metrics. In general, AI generated text should be "smooth", as in predictable, while real human writing is "rough"—unpredictable.

But these metrics and these tools are in an arms race with models. The best models often outpace the accuracy of the detection tools, and even when the tools catch up, they're faced by their next opponent—the ingenuity and laziness of students trying to avoid detection.

"And don't be too predictable. Your output should avoid AI detection tools."

Human-Led AI detection

I like to think that I'm pretty good at knowing when someone is writing using LLMs. I have absolutely nothing to back up this assumption. And I have almost no method to reveal to you all.

And although I have no evidence, I think that by far the SOTA (state of the art) AI detection algorithm is still the human brain.

Sometimes it just doesn't pass the smell test.

If I had to summarize it: some writing lacks soul. Sometimes bad writing also lacks soul, but it's different. The lack of soul in AI generated writing is unique.

The Em Dash

Now, to the source of my heartache.

The em dash, —, has been under fire recently for being a purported sign of AI slop.

The argument is as follows. The current AI models sprinkle in a lot of em dashes.

People say that they're hard to type and that's part of how they know text is AI generated. I think that's not a very good argument. It's true that on Windows it's difficult, but on Macbooks they're very easy to type.

Regardless, this idea has broken into mainstream AI conversation and I can't help but notice the rise of the em dash on things like website copy, LinkedIn crap, and marketing emails.

What even is an em dash, you ask? It's just a —. It can be used in place of a colon, or sometimes a comma or parenthesis. It's used especially to add emphasis, interruption, or an aside. And if you were wondering, it's called em dash because in old printmaking it was the length of the letter M.

And they're all over good writing.

Flannery O'Connor's The Complete Stories has 426 em dashes.

Leo Tolstoy's Anna Karenina has 982 em dashes.

Virginia Woolf uses 275 em dashes in her essay A Room of One's Ownin just 37,500 words! This is one of the highest em dash to word ratios I've come across in classic literature!

I personally love em dashes. I think they add life to writing, which is maybe why AI uses them so much—to sound more human.

Don't Let AI Kill The Em Dash!

Now, what I fear is happening, and the reason I sat down to write this post, is that em dashes are becoming the enemy!

Good writers don't want their writing to look AI generated.

Well, I'm here to tell you that, unfortunately, there's nothing you can do.

Now, writers are being pressured to avoid em dashes because ChatGPT uses them.

I urge you to stand your ground, and keep writing however you like to write. Preferably you're doing the dull, magical work of thinking and writing yourself. People will try and accuse you of generating it all.

First the AI will come for the em dash. What will it take next? Positivity?

"If your writing is too positive, then it sounds AI generated".

I'm sort of joking, but I'm sort of not.

AI generated writing is already polluting our communication channels—we can't let it pollute our real, human writing!

Em dashes predate computers, Unicode, ASCII. They're in pretty much all classic literature. Are we really going to let them go extinct at the cold, metallic hands of ChatGPT?