The Rise of the Em-Dash in Hacker News Comments

(boazsobrado.com)

40 points | by sobradob 16 hours ago

20 comments

  • Iuz 15 hours ago
    I don't comment much but I have read everything that Friedrich Nietzsche wrote, and because of him, have always used em-dashes on my writing. I think I even saw some memes in circles that discuss his work when people started realizing GPT used them a lot...
    • lamasery 40 minutes ago
      I got my heavy m-dash use from Salinger, many years ago. When I find some distinctive habit of an author I'm reading, and it's to my taste, I often rob them.
    • MikeTheGreat 14 hours ago
      genuine question: How could you tell they were em-dashes?

      Like, I could see some people noticing that the book they're reading has dashes that are a bit longer than normal, but what made you think "That must be it's own thing, separate from a normal dash" as opposed to something like "In this font the dashes are very long"?

      • nagaiaida 8 hours ago
        well, hyphenation will most likely insert a lot of regular dashes for easy comparison to rule out "this font is blessed with uncommonly long dashes" and the differing uses of both en and em dashes will cluster along grammatical lines (with em dashes separating clauses and en dashes relating concepts or bounds) which ought to eventually make it clear even to someone who initially bins those together separate from hyphens.
      • Iuz 14 hours ago
        [dead]
    • bb88 15 hours ago
      I like the em-dash as well as it provides visual space more than just a "-" or a ";" or a ", and".
    • ranger_danger 13 hours ago
      It's always funny to see people arguing that em-dash use is indicative of LLM usage, yet they don't realize where that training came from in the first place.
      • palmotea 7 hours ago
        > It's always funny to see people arguing that em-dash use is indicative of LLM usage, yet they don't realize where that training came from in the first place.

        The em-dash is indicative of AI usage when it shows up in contexts where it doesn't belong. Like informal context like forum comments and emails (though "smart" substitutions do complicate the picture a bit).

        I'd only be funny if they argued it indicated AI usage in context where it does belong, like formal writing.

        • lamasery 43 minutes ago
          But the em-dash is a pretty informal mark... I'd tend to re-structure my sentences to avoid it more often in a formal context, than an informal one. It's what you reach for either for a specific effect, or because it's the least-disruptive way to keep writing without having to go back and edit mid-sentence, and end up with something that scans OK. It's super-informal.
          • palmotea 17 minutes ago
            > But the em-dash is a pretty informal mark...

            I think you have to make a distinction: there's using a dash as you describe and using the actual em-dash character. Without an smartquotes-type autocorrect-type feature (which admittedly is common in certain apps/platforms like Outlook and Word), an actual em-dash is awkward to type. I'd expect someone using it informally to just use a regular dash (-) or two (--).

            I think you're automatically in a pretty formal writing context if you care if you use an em-dash character or not.

            Which brings up an interesting idea: would Microsoft turn off it's smartquotes-type autocorrect, because now it makes you look like a dumb AI-user? Probably, if they cared about their users. But I doubt they will because they're so into hyping AI that "Microslop" is a thing.

        • JKCalhoun 1 hour ago
          Informal contexts are where I get to practice my writing in general. In terms of punctuation, I don't make a distinction. (I just say "bullshit" a lot more in the informal contexts. Durn, I did it again.)
  • BeetleB 15 hours ago
    Classic case of hacking the axis to exaggerate a point.

    It went from 19.3 to 32.5. It did not even double. Which means that if you see a comment with an em-dash, it's more likely to be human than LLM.

  • meisel 15 hours ago
    Gotta love starting the y-axis above 0
    • tmoertel 15 hours ago
      While it is generally considered a No-No to start a bar chart from a baseline that is not zero, there is no corresponding prohibition, especially among numerically sophisticated audiences, for scatter plots or line charts. In general, we want graphs to focus on the area of variation.

      For example, take a look at just about any stock chart (try https://www.google.com/finance/beta/quote/GOOG:NASDAQ?hl=en). There's actual money on the line, but no baseline. Why do you think that is?

      • wtallis 15 hours ago
        For stock prices, starting the y axis wherever is aesthetically pleasing makes some sense because everybody will have a different non-zero cost basis for their investment, and the graphs need to be able to clearly depict fluctuations that are minor on a percentage basis. For something like the em-dash prevalence on HN, the most meaningful question is whether it has doubled, tripled, or whatever relative to the pre-LLM corpus, and that's most clearly visually depicted by starting the y axis at precisely zero.
      • throwway120385 15 hours ago
        The real answer actually depends. In cases where you want to visually emphasize the ratio between any pair of values, you should start from zero. In cases where only the difference between any pair of values matters and the ratio is meaningless you can start at a different baseline. A surprising number of measures are interesting in their ratio though, so we generally prefer a zero-based chart.
      • BeetleB 15 hours ago
        > In general, we want graphs to focus on the area of variation.

        Visually, this is vastly exaggerating the variation. Actual usage did not even double.

        • tmoertel 14 hours ago
          > Visually, this is vastly exaggerating the variation. Actual usage did not even double.

          No, it is literally showing the exact variation of interest. If you think it's exaggerating the variation, you are not reading the chart. You are glancing at the chart, ignoring what it actually says in multiple ways, and imagining it has a baseline of zero, when it clearly does not.

          Read the chart. What does it actually say?

          • BeetleB 12 hours ago
            > If you think it's exaggerating the variation, you are not reading the chart.

            That's true of every instance where a chart is criticized for playing around with the axes scale. Imagine the stock price of a company varied between 50.1 and 50.2 over a week. And I presented it as a chart with the min being 50.09 and max being 50.21, and drew all the variation over a large vertical space. And then tried to imply that the stock was volatile. What would be the problem?

            Let me ask you this. What is the point of this chart (or any similar chart)? Simply presenting a table with all the values would have conveyed all the information - wouldn't you agree?

            • tmoertel 11 hours ago
              > > If you think it's exaggerating the variation, you are not reading the chart.

              > That's true of every instance where a chart is criticized for playing around with the axes scale.

              Indeed. The criticism, however, is only apt when the chart's intended audience is likely to have a hard time understanding what that chart is trying to communicate. If you're publishing a bar chart in USA Today and its y-axis doesn't start at zero, yeah, that's a problem.

              But the OP's chart that started this whole thread? It's fine. First, the intended audience is HN readers, who can be assumed to be numerically literate. Second, it's a line chart whose y-axis labels make clear what the range of variation is. Third, the data points, themselves, are labeled with their values. Finally, the thrust of the chart, that em-dash usage in HN posts has markedly increased since the widespread adoption of LLMs, is itself also explicitly called out and labeled: "+79% from pre-AI baseline."

              If you try to tell me that the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN, I'm going to have a hard time taking your claim seriously.

              > Imagine the stock price of a company varied between 50.1 and 50.2 over a week. And I presented it as a chart with the min being 50.09 and max being 50.21, and drew all the variation over a large vertical space.

              I have an easy time imagining your chart because that's how stock charts are plotted. That's what the financial community expects. That's how it's done: The y axis is bracketed by the low and high values over the period being charted, perhaps after rounding to the nearest nice value. For example, today's chart for the Russell 2000 Index shows a gain of just 0.30%, similar to the tiny relative volatility in your example. The chart's y axis ranges from 2,695 to 2,715 (https://share.google/oKPQxlmZFsgSVoNOS). It does not start at zero.

              If it did start at zero, it would be unsuited for its intended purpose. How would you observe the day's variation on what appeared to be a flat horizontal line at the top of a chart whose y axis ranged from 0 to 3000?

              Why do you think the financial world does stock charts the way it does stock charts? Do you think financial analysts don't know how to communicate the day’s movement of a stock to each other?

              > And then tried to imply that the stock was volatile. What would be the problem?

              The problem would be that your audience, if they were accustomed to reading stock charts, would think you didn't know what you're talking about. Your chart would refute your claims, and anybody accustomed to reading stock charts would know it.

              > Let me ask you this. What is the point of this chart (or any similar chart)? Simply presenting a table with all the values would have conveyed all the information - wouldn't you agree?

              The point of this chart, like any good chart, is to present the intended information to the intended audience faster and more conveniently than the alternatives. (Do you have any problem with that claim?) And, in this case, I'd say the OP's chart met that standard. Likewise, I'd argue that the typical stock chart, which is bracketed by the stock's low and high values, meets that standard as well.

              In both of those examples, you could also communicate the same information in a table, but a table wouldn't be as fast or convenient as a chart, given the expected audiences.

              • BeetleB 10 hours ago
                > If you try to tell me that the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN, I'm going to have a hard time taking your claim seriously.

                I am saying precisely that. A significant number of HN users have a strong (and IMO irrational) anti-LLM bias. And these people pollute the discussion forums accusing people of using LLMs to write the content/comments.

                It's not a stretch to believe that those folks will look at the chart uncritically. Everyone - even the smartest of folks - have blind spots (this was quite apparent when I worked with top professors in their fields while in academia). And blind spots often correlate with their biases.

                • tmoertel 10 hours ago
                  > I am saying precisely that [the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN].

                  Well, then, do you believe that the following evidence supports or undermines your hypothesis that the author is trying to mislead HN readers about em-dash use?

                  1. The author explicitly labeled each data point with its numeric value so that even if readers ignored the y-axis labels they could not misread the points.

                  2. The author explicitly labeled the pre- to post-AI growth as +79% so that even if readers ignored the y-axis labels and the data-point labels they could not misread the growth.

                  (The fact that you posed an example about a stock chart earlier but then completely ignored my response that refuted your argument about it suggests that you are not likely to be swayed by evidence and reason, but I'm giving it this one last try.)

      • mcphage 12 hours ago
        > take a look at just about any stock chart

        Honestly, I hate that about stock charts. They adjust the axes and scales so that the graph itself provides no information. Did it go up 1 point? 200 points? 5%? 50%? You can’t tell, because the graph is just a scale free squiggle.

    • cosmotic 15 hours ago
      And even worse, no glyph to denote the deviation
  • flowerthoughts 9 hours ago
    And here I am, just wishing that someone with the knowledge would make font ligatures that render -- and --- as en and em dashes, so I could use them more.
    • iamnothere 2 hours ago
      Just enable the compose key, then you can hold this key (typically right alt is used) and type the number of dashes required.

      https://en.wikipedia.org/wiki/Compose_key

    • JKCalhoun 1 hour ago
      Some random comment turned me on to the fact that two dashes does indeed collapse to an em-dash using the iPhone keyboard.
  • xxxxxxxx 14 hours ago
    This is interesting. I just fixed a Github issue where the code did not handle Em-Dash correctly. Ran some queries to check the stats there. No surprises: https://deepspaceplace.com/emdash
  • ortusdux 15 hours ago
    Did AI raise awareness of Em-dashes, causing more people to use them organically?
    • Sarkie 15 hours ago
      My wife is a journalist and has always loved them.

      Now she's been accused of using AI for her pieces.

      Oh well.

      • throw0101a 13 hours ago
        > Now she's been accused of using AI for her pieces.

        Read the observation that AI was (presumably) trained on the 'best' (or at least 'quality') writing, and so if good writers tended to use em-dashes, it should not be surprising that AI generates text with it.

        But, if one's personal style included using them, you should continue to do so because why should you dial down your own voice just because someone else may be mimicking it?

      • bb88 15 hours ago
        em-dashes help flow ideas better than other means. For whatever reason, it's easier to process in my brain a comment with an em-dash rather than trying to split the idea into separate succinct sentences.

        You can do small succinct sentences, but style-wise it sucks for longer passages.

    • Terr_ 15 hours ago
      I do use fewer em-dashes now, but only because I spend more time on Linux, where my habitual Windows trick of alt + 0151 no longer works.
      • Gormo 14 hours ago
        Press Ctrl+Shift+U to enter Unicode entry mode in GTK controls, then enter the code point for the em dash, 2014. That will produce '—'.

        Although I still prefer the traditional ASCII double-dash -- easier to type, and less potential for character encoding issues. Also, LLMs don't seem to use it at all.

    • Gormo 14 hours ago
      AI raised awareness of em-dashes among people who didn't/don't read much, especially the kind of long-form writing that LLMs have been trained on. Treating em-dashes as a tell of LLM output is a form of unintentional "vice signalling".
    • operatingthetan 15 hours ago
      I think it's both. People started writing AI comments and also started using em-dashes. However when my former boss would write emails with AI he would add intentional typos and remove all dashes.
    • ButlerianJihad 15 hours ago
      For my part, editing Wikipedia raised my awareness of the different types of dashes, and when to use them appropriately. Unfortunately, my Chromebook is not so forthcoming in ease of input.
    • interstice 15 hours ago
      If anything I use them _less_ now thanks to this whole thing.
      • turtleyacht 14 hours ago
        Yes. Defanging smart quotes, double-dashing em's, spelling out numbers, and swearing off emojis. Next up, double-spaced sentences.
        • lamasery 33 minutes ago
          HTML still collapses multiple spaces, doesn't it? Does HN go out of its way to add an nbsp glyph? Typing this post the original way I learned to type, with two spaces after the end of each sentence, to see if it renders that way. Here we go!

          (Incidentally, I love that backlash against LLM writing has more people developing as much of an allergy to emojis and content-marketing- and personal-branding-style writing as I've long had)

        • JKCalhoun 1 hour ago
          I guess you have to choose now whether to be accused of being a Clanker or a Boomer.
      • Freedom2 14 hours ago
        A reminder that according to the HN guidelines, there's no need to use underscores or other annotations to emphasize words.
    • bitwize 15 hours ago
      I know I did. I don't want eloquent punctuation to fall exclusively to the clankers.
    • jordand 15 hours ago
      Unconsciously and consciously yes, and this new awareness means others are now consciously avoiding the use of them so their writing is less likely to be perceived as AI generated junk
    • umanwizard 15 hours ago
      In my case, yes. I have never used AI to write any prose (including HN comments), and I never will. But I certainly started using them more often since the ChatGPT era began, purely through osmosis. I'm not exactly proud of that, but there you have it.
      • razingeden 15 hours ago
        I use the double dash.

        This gets corrected to an emdash.

        I get annoyed and put the double dash back in.

        Sometimes swearing a little or grumbling “HEY. I typed what I typed” at it helps a little.

        I don’t even know how many times in 20-30+ years I’ve checked some box in system or program preferences begging it to knock that off.

        This is the real reason I already loathe and avoid the emdash (nitpicking over a personal stylistic preference I won’t relent on even if I’m wrong) but I can’t be the only one this happens to.

        Getting piled on and called “AI” really doesn’t ease my distaste for it, but .. do people.. not write enough to understand that it brute forces its way into human copy as well?

        and yes. phone posting on HN. will insert them. to my dismay.

        The other one that ticks me off endlessly but I’ve finally said to hell with it and just let it go?

        Turning " into “.

        (Writer. Not a very good one and I’m not here to steer anyone to that drivel. But at least I’m a human one.)

    • tayo42 15 hours ago
      I don't think my phone keyboard even has one to type
      • JKCalhoun 1 hour ago
        Two dashes seems to collapse into an em-dash using my iPhone keyboard.
      • BoredPositron 15 hours ago
        Long press on - on both iOS and android (Gboard)
        • throw0101a 13 hours ago
          > Long press on - on both iOS and android (Gboard)

          Depending on the text area you are typing into, if you type two hyphens/minuses right after each other (no spaces), Apple systems often translate them to an em-dash (kind of mimicking (La)TeX).

          (If you don't want the em-dash, hit <cmd-z> with macOS to undo that auto-conversion.)

        • sobradob 15 hours ago
          did not know that
        • bitwize 15 hours ago
          Y'all on Android should be using Unexpected Keyboard, where it's Compose - - -.
    • add-sub-mul-div 15 hours ago
      Surely yes, but also surely neglibly compared to the rise of slop being posted. Sometimes things are what they seem!
  • number6 7 hours ago
    shamless self plug: https://emdashmanifesto.org/
    • JKCalhoun 1 hour ago
      Thank you (and I support you).
  • northisup 15 hours ago
    me waiting for the "the rise of posts analyzing the rise of the em-dash on hacker news" posts
  • lz400 15 hours ago
    I just learnt that em dash in a mac is option+shift+hyphen. I hadn't realized it was so difficult and inconvenient, and in the end it looks so similar to the other one: — -. Thin value. It's no surprise humans barely use them. Then why did it get picked up so much by AIs? I'd have imagined it's not in a lot of training data. Print media practices I guess?
    • dragonwriter 15 hours ago
      > and in the end it looks so similar to the other one:

      Maybe if you are looking at it in a monospaced environment like the HN edit window; rendered in a proportional font, hyphens, en-dashes, and em-dashes are quite distinct from eachother.

      > It's no surprise humans barely use them. Then why did it get picked up so much by AIs?

      It got picked up by AIs because their training corpus includes plenty of professionally published work, not just informal, off-the-cuff communication, and professionally published work uses typographic dashes (em-dashes, en-dashes, and even 2-em- and 3-em-dashes) extensively. (3-em less so in newer works, it having, e.g., dropped out of the recommendations of the Chicago Manual of Style as of 2024.)

    • dr_dshiv 15 hours ago
      I love em dashes. They are so much less pretentious than colons or semicolons — and they help with flow of speech. I learned that key command a couple years ago and it made me feel so smart. I’ve had my comeuppance but I’m not stopping — just a better way to write
    • marssaxman 15 hours ago
      Difficult and inconvenient compared to what, I wonder? I've always really liked the Mac OS option-key system, which I found convenient and easy to understand; I sometimes wish I could type that way in linux instead of using compose keys.
      • lamasery 25 minutes ago
        I grew up on Windows, Linux, and other non-Mac machines, only shifting to Macs around age 30.

        Within months I was convinced that every default English keyboard I'd ever seen except the Mac one is strictly worse. It bothers me now how hard it is to get a consistent Mac-style keymap on Linux. This is one thing others should for-sure just rip off entirely. It's so much better.

      • mananaysiempre 15 hours ago
        What is it that you like about it specifically? If you’re not picky about the choice of modifier key, you can configure the so-called “level 3 shift key” and have the em dash on the hyphen key at level four (both L3 shift and L2 aka normal shift pressed). For instance, on GNOME Wayland I have “Input Source” = “English (Western European AltGr dead keys)”, “Alternate Characters Key” (GNOME lingo for the L3 shift) = “Right Alt”, so the em dash is RAlt-Shift-hyphen.
        • marssaxman 13 hours ago
          The option-key layout system was easier to memorize than the compose-key patterns, which I struggle to recall. I couldn't tell you why, I just felt like I got the hang of it easily, while using the compose key system has always been slow and clunky.

          I've never heard of a "level 3 shift key"; I'll have to look that up.

    • BeetleB 15 hours ago
      It's used a lot in LaTeX and Word. It's not as rare as people make them out to be. It's just that we haven't had a convenient way to enter it in a browser form that some of us (younger folks!) find the em-dash weird.
    • hyperhello 15 hours ago
      Why is that inconvenient? It’s a hyphen with modifier keys.
    • wwalexander 15 hours ago
      Apple’s text inputs usually autocorrect double hyphens to em dashes.
    • UqWBcuFx6NV4r 15 hours ago
      It’s neither difficult nor inconvenient, it’s just new to you.
    • yojo 15 hours ago
      option + hyphen gives you an en-dash (–), which is easier to type and I am guilty of way overusing/misusing.
      • dragonwriter 14 hours ago
        The main use of an em-dash can also be done with an en-dash set open, and different style guides have different preferences for which should be used.
  • crazygringo 15 hours ago
    How is it picking the comments?

    If it's all comments, including flagged/dead/downvoted/etc., then it's not reflective of the actual filtering HN does.

    But if it's weighting comments by their likelihood of being read -- e.g. mostly top comments on popular stories -- then I'd be a lot more curious.

    I'm not surprised AI spam has increased substantially. But I'd be surprised if it's affected the comments most people actually read to anywhere close to the degree shown in this graph.

    • sobradob 15 hours ago
      Its a random-ish sample. Question, do you often use -- in your writing?
      • crazygringo 15 hours ago
        All the time. So funny, it's so automatic I genuinely didn't even realize I was using them in a comment about em dashes. My comment history has been full of them for over a decade by now... and I think you can tell which comments are from my phone vs my laptop by whether they're converted to — or not.
  • jcims 15 hours ago
  • ChrisArchitect 12 hours ago
    Related from last year:

    Show HN: Hacker News em dash user leaderboard pre-ChatGPT

    https://news.ycombinator.com/item?id=45071722

  • juped 15 hours ago
    You can pry my em dash—short for "Emily's dash", after the poet—from my cold dead hands.
    • spudlyo 15 hours ago
      Close, it's the width of the 'M' from the the famous author EMdash Forster's name. ;)
      • kstrauser 15 hours ago
        Not even close. It's named after the em drive, after how fast it helps your thoughts flow to written word.
        • myhf 15 hours ago
          I thought it was a reference to "kill 'em all and let God sort 'em out"

          https://en.wikipedia.org/wiki/Caedite_eos._Novit_enim_Dominu....

          • lamasery 6 minutes ago
            How is there this much misinformation out there about this? It's a neologism based on the character M from the James Bond franchise. Do a little free-association from the phrase "Bond—James Bond" and you end up calling that the M-dash.

            Until that relatively recent shift, it was named the Morse Dash—you'd think because of the "long" glyph when rendering Morse Code, but no, it was named for the 17th century English Catholic martyr Henry Morse, for reasons lost to time.

  • negura 13 hours ago
    stylometric analysis can be used to profile you. so if you were using em-dashes, this is good news. it helps you blend in better than before
  • Rekindle8090 14 hours ago
    I'll stand firm on my believe that no one types an em or en dash. its always an llm. its a pain in the ass to type on most keyboards, impossible on some, and pointless on phones
    • mcphage 12 hours ago
      It’s pretty easy on a Mac (or iPad with a keyboard)—Option+Shift+Hyphen. I do it without even thinking about how, it’s so second-nature.
  • lapcat 15 hours ago
    Now someone do "the rise of Hacker News meta-analysis blog posts".
    • qup 15 hours ago
      Serious request, I'd love to see this OP plotted against the rise of the em-dash elsewhere.

      Is HN more botted, or less? And are banned accounts excluded?

  • derbOac 15 hours ago
    [dead]
  • andrewclunn 15 hours ago
    [dead]
  • adampunk 15 hours ago
    WooooooOOOOOOOOooooooooOOOOOOOOOoooooo—

    — A spooky ghost

    WooooooOOOOOOOOooooooooOOOOOOOOOoooooo-

    - A less spooky ghost