For years, I've been good heartedly losing the blog SEO ranking fight to a great developer and writer who has the same name as me. A football player eclipses us both if you just google our shared name, but if you add any sort of "developer" or "programming", he's clearly got me beat for the top marks. It makes sense — he writes about tech much more consistently than I do, and his articles are likely much more helpful than my sporadic and eclectic posts.
Naturally, being vain, when I saw this post, I immediately looked up my own blog and was chuffed to see it at #292.
Funnily enough, I have a very common name and take quite a bit of relief in knowing that were someone to even attempt to look me up, they would see a number of authors, artists, politicians, etc. long before anything of me ever appeared. I've gone searching once or twice out of curiosity, and didn't find anything relevant on at least the first two or three pages of Google, Bing, etc.
I suppose the difference is that when I make any sort of professional blog, etc. I do so under online handles in place of my legal name because I see the content as being the important bit, not the person it's coming from; the credibility flows from the information provided, not the name it's tied to.
...Well, provided it's not xXx_DongMaster6969_xXx or whatever. :)
In the future, our names will simply be a cryptographic hash of our genome and we’ll all enjoy unambiguous identity and individuality as we express ourselves solely with Unicode Emoji.
I had two posts[0, 1] I was super confident would be a match for HN and two others[2, 3] that I thought had a so-so chance. They've all flopped except I got lucky that someone else submitted this one after I gave up.
It's somewhat arbitrary, but the threshold where you're not allowed to resubmit a story on HN is if it reaches over 20 points. I'm not sure if that's still the case, but it was a few years ago when I asked why I couldn't resubmit.
Also, just from my subjective sense, it's a reasonable cutoff for when articles are officially on the front page for any meaningful amount of time rather than they briefly appear and then fall off.
That's weird that I'm excluded. Thanks for watching out for me :-)
Hypotheses: 1. it's an error. 2. I have powerful enemies. 3. Someone from the future is trying to stop me. 4. My blog triggered the FDIV bug and needed to be excluded.
There are a lot of odd exclusions on that list. Just spot-checking, I see blog.plover.com, the blog of Mark Jason Dominus, who by the way is looking for a job[1].
Also, dtrace.org is excluded, which hosts four individual blogs that surely should qualify.
I think what happened was that I was going through the list of domains and assumed that "righto.com" would be too valuable a domain name for a personal blog and excluded it without checking. Sorry about that!
Hi, my blog incoherency.co.uk appears at number 207 but it says the author is David Given.
I'm not sure if the error is that you think my blog is written by David Given instead of James Stanley, or if you think David Given's blog is incoherency.co.uk instead of cowlark.com !
I love that dynomight.net stands out with "existential angst" as a very unique category among the top blogs, as well as being written by an anonymous/pseudonymous author. I'm a big fan of their writing.
Also quite surprised to find my own site in the top 5000 for the past 5 years! It feels like Hacker News is simultaneously quite large but also a cozy community where you often recognize names from day to day.
Slate Star Codex and Astral Codex Ten should probably be combined. Also, it's odd that while ACX's author is listed as "Scott Alexander" (his long-time pseudonym), SSC's is listed as "anonymous." He went by Scott Alexander even in the days of SSC.
Yeah, that's a good point. I was trying to be respectful of the author's wishes around pseudonymity, as it seemed intentional that when he lost anonymity, he switched domains, so I didn't want to "out" the author, but it's kind of silly since it's very public now.
I think you're right that it makes sense to identify him by his pseudonym rather than just "Anonymous" so I've updated it:
Interesting website. I was reading the free chapter about active and passive voice, something I've never really understood or paid much attention to. Excellent explanation that cleared things up. My manager uses active voice when they're taking credit for our work, and passive voice when they do things wrong. It's a neat trick.
It may console you: in some sense, the top 4900 are more valuable than the top-100.
Why? Everybody here knows Paul Graham. I know Krebs and Schneier, most of you will, too. In a long tail distribution like this, the top entries (left) are the obvious ones, the lowest frequented ones (right) might be noise (artifact of the methods e.g. bugs in the data cleaning), but the middle part is really where the value is: blogs we don't know but would like to know.
In search engine ranking, people needed a lot of time until the late Karen Spärck Jones finally discovered IDF (inverse document [collection] frequency) in 1972, the "Yang" to raw term frequency (TF), which had been the "Yin" that was missing a counterforce to retrieve truly relevant documents when balanced in the TFIDF formula.
So, plea to the OP: please release the rest of your list (101-100000).
+1 to this. I'd also argue that some on the list are unapologetic self-promoters like Simon Willison. Nothing wrong with it but it shows and I think it's much more impressive to be below that cohort but still only a reasonable distance away.
FYI, in my May 2023 survey of HN's archived front pages (under the "past" link in the HN titlebar) ... horse.sheep doesn't appear at all.
That's looking at just the top <=30 stories per day. Your high-water mark seems to have been 2022-12-19, with 88 points / 32 comments, appearing on the 3rd page of the daily archive, ranked #73:
I had to fiddle with the dates to find a couple examples of blogs that violate the single-author rule in the methodology (marginalrevolution, ribbonfarm) but it's probably better to have them included.
Even though a glance suggests the majority of high-scorers are self-hosted, I wonder if this dataset is valuable for predicting the strength of different blog hosts. Some fiddling did lead to a couple results that are hosted on Blogger or Ghost or Medium, so they are there.
I’ve been thinking of starting an anonymous blog with my thoughts just to record them and any projects I do. I want them to be visible on the internet and searchable instead of behind some facebook or instagram wall. What is a good blog service to use that will be around for decades? I don’t really want to run my own domain. Do things like Blogger and Blogspot still exist and will they continue to in the future?
Where does the "bio" field come from? Mine says "Developer and writer" and I suppose those are both things that I am, but not very close to what I'd have put there.
I'm guessing it's annotated by an LLM. Would be a lot of thankless work otherwise, so don't really blame the author, but it means you get the occasional nonsense summary.
They're about 95% accurate, so not completely incorrect. I think the value of the correct ones is higher than the few incorrect ones. I manually review and fix them, but for just a fun tool, it's not practical to write 5000 eight-word bios.
I started by writing them by hand, but it was taking forever to read enough of each author's blog to write a summary and topic list, so I used an LLM and then spot-checked.
I just updated yours, but let me know if you'd like something different.
Based on my May 2023 survey of front-page results, those rank at 954 and 3,028 (of all sites) respectively:
954 18 63.008 lcamtuf.blogspot.com :::: blog
3028 6 77.327 lcamtuf.substack.com :::: blog
Among blogs I'd identified (similar methodology though all but certainly different URL set from TFA), #295 and #521 of 5,506 blogs identified.
My set includes 52,642 sites all told, with 16,185 classified as, e.g., "programming", "blog", "social media", "academic/science", "corporate comm.", "general news", "government", "software", "tech news", etc. I'd come up with 61 total classifications, covering all sites with at least 18 front-page appearances within the HN archive.
He is, but he's not in the top 100. He would be, but he splits his articles across different blogs.
For authors that just change domains (e.g., christine.website and xeiaso.net), I combine scores, but if they maintain different writing under different domains, I treat the domains as separate.
That said, I think Michał Zalewski is the author who most suffers due to this rule.
A simple statistics of the top 5,000 blog domain names shows that 54% use .com, 14% use .org, 7% use .io (40% of which are github.io), and 6% use .net. These five together account for 81%.
This is something I wanted but I couldn't figure out a way to do it in a way that's meaningful. Authors like Simon Willison publish frequently, so even though he has a lot of high-scoring posts, he has a lot of low-to-no-scoring posts too. It feels unfair to penalize people who publish frequently just because not every post is a homerun.
Note that this is gonna be skewed pretty heavily toward domains that have existed for most of HN's history, at the expense of any newer domains that had fewer chances to rack up points.
If you look at any 2-4 year period, the ranking tends to be quite different. Well, Paul Graham is there pretty consistently, but everything else changes.
You can change the date ranges (e.g. just the YTD, or last 12 months, or set a custom range), and it gives an interesting overview of the evolution over time.
Like jvns.ca drops off the list entirely for 2025, but was consistently in the top 5 until last year.
paulg's blog shouldnt count as for its an extension of this and more of a long game sales pitch tailored for different purposes. Not a bad thing, but I just wouldnt consider it a blog.
I think that is wholly unfair. If Paul Graham is anything its a writer/creator first. Those articles have also been extremely influential to many entrepreneurs and people in the business world. I'm personally appreciative of them, even if I don't always agree with him.
Interesting that if you were to combine AstralCodexTen and SlateStarCodex it would be around top #20. Even with the traffic split he's in the top 50 twice.
Naturally, being vain, when I saw this post, I immediately looked up my own blog and was chuffed to see it at #292.
But, guess who I see just above at #289.
I suppose the difference is that when I make any sort of professional blog, etc. I do so under online handles in place of my legal name because I see the content as being the important bit, not the person it's coming from; the credibility flows from the information provided, not the name it's tied to.
...Well, provided it's not xXx_DongMaster6969_xXx or whatever. :)
I tried submitting this as a Show HN a couple times but it didn't take, so I'm happy to see some interest!
I caught this just before bed, but I'm happy to take any suggestions or questions, and I'll answer in the morning.
If you'd like to improve the metadata, I welcome PRs here: https://github.com/mtlynch/hn-popularity-contest-data
* https://news.ycombinator.com/item?id=43471177
Did that get filtered out become of the domain?
It doesn't change much, but it means less special-casing for universities.
> I can think of two remaining cards to play.
> The first is to get on the front page of Hacker News. That’s usually difficult to do, but I’m supposed to be the expert.
Well done
I had two posts[0, 1] I was super confident would be a match for HN and two others[2, 3] that I thought had a so-so chance. They've all flopped except I got lucky that someone else submitted this one after I gave up.
[0] https://news.ycombinator.com/item?id=43435961
[1] https://news.ycombinator.com/item?id=43345866
[2] https://news.ycombinator.com/item?id=43301897
[3] https://news.ycombinator.com/item?id=43412220
Also, just from my subjective sense, it's a reasonable cutoff for when articles are officially on the front page for any meaningful amount of time rather than they briefly appear and then fall off.
Looks like it's explicitly excluded[2].
[1] https://news.ycombinator.com/from?site=righto.com
[2] https://github.com/mtlynch/hn-popularity-contest-data/blob/d...
Hypotheses: 1. it's an error. 2. I have powerful enemies. 3. Someone from the future is trying to stop me. 4. My blog triggered the FDIV bug and needed to be excluded.
Also, dtrace.org is excluded, which hosts four individual blogs that surely should qualify.
[1]: https://mastodon.online/@[email protected]/1142231895042721...
Whoops, that was a mistake. Fixed now: https://github.com/mtlynch/hn-popularity-contest-data/pull/2...
>Also, dtrace.org is excluded, which hosts four individual blogs that surely should qualify.
I didn't realize the authors were on distinguishable URLs, so I've now added them back and canonicalized them to their new subdomain URLs.
But thanks for fixing.
This morning I am jolly and cheerful. Thanks again!
That was an error.
I think what happened was that I was going through the list of domains and assumed that "righto.com" would be too valuable a domain name for a personal blog and excluded it without checking. Sorry about that!
Ken is #8 of all-time now.
https://github.com/mtlynch/hn-popularity-contest-data/pull/2...
Sorry again!
I'm not sure if the error is that you think my blog is written by David Given instead of James Stanley, or if you think David Given's blog is incoherency.co.uk instead of cowlark.com !
Also quite surprised to find my own site in the top 5000 for the past 5 years! It feels like Hacker News is simultaneously quite large but also a cozy community where you often recognize names from day to day.
[1] https://blogs.hn
Please add your site or make corrections :)
[2] https://github.com/surprisetalk/blogs.hn
Honestly HackerNews has been a great place to grow up. Started posting here back in college when I was 21 or so. Now here we still are at 37.
I credit hacker news with getting me from Slovenia to San Francisco. It's been a great journey so far. Some of which has made it to the front page <3
https://github.com/mtlynch/hn-popularity-contest-data/pull/2...
Sincerely, your biographer. : )
I think you're right that it makes sense to identify him by his pseudonym rather than just "Anonymous" so I've updated it:
https://github.com/mtlynch/hn-popularity-contest-data/pull/2...
Why? Everybody here knows Paul Graham. I know Krebs and Schneier, most of you will, too. In a long tail distribution like this, the top entries (left) are the obvious ones, the lowest frequented ones (right) might be noise (artifact of the methods e.g. bugs in the data cleaning), but the middle part is really where the value is: blogs we don't know but would like to know.
In search engine ranking, people needed a lot of time until the late Karen Spärck Jones finally discovered IDF (inverse document [collection] frequency) in 1972, the "Yang" to raw term frequency (TF), which had been the "Yin" that was missing a counterforce to retrieve truly relevant documents when balanced in the TFIDF formula.
So, plea to the OP: please release the rest of your list (101-100000).
That's looking at just the top <=30 stories per day. Your high-water mark seems to have been 2022-12-19, with 88 points / 32 comments, appearing on the 3rd page of the daily archive, ranked #73:
<https://news.ycombinator.com/front?day=2022-12-19&p=3>
I've apparently put out some serious bangers to end up in such esteemed company.
https://refactoringenglish.com/tools/hn-popularity/?end=2025...
But my name is Lars Doucet, not Keith Burgun (that's KeithBurgen.net)
Happy to be included!
Fixed now: https://github.com/mtlynch/hn-popularity-contest-data/pull/1...
Even though a glance suggests the majority of high-scorers are self-hosted, I wonder if this dataset is valuable for predicting the strength of different blog hosts. Some fiddling did lead to a couple results that are hosted on Blogger or Ghost or Medium, so they are there.
21 the last 12 months.
https://github.com/mtlynch/hn-popularity-contest-data/pull/1...
If there's a descriptor you'd prefer, I'm happy to update it. I'm just going by what I know of the blog and a review of the most popular posts.
[0] https://lapcatsoftware.com/articles/disclosure2.html
[1] https://lapcatsoftware.com/articles/disclosure.html
[2] https://lapcatsoftware.com/articles/2023/7/1.html
https://github.com/mtlynch/hn-popularity-contest-data/pull/4...
But privately, I think of you as a security researcher, and there's nothing you can do about it. : )
Thanks for your writing and your security findings!
Ofc GitHub is will probably last longer, and so will running your own hosting/deployment
The combined score puts me at #71 all time or #31 since 2019 when I started writing. Very cool.
Edit: yeah, the "methodology" page confirms this.
So just don't include them, rather no info than completely incorrect LLM hallucinations
They're about 95% accurate, so not completely incorrect. I think the value of the correct ones is higher than the few incorrect ones. I manually review and fix them, but for just a fun tool, it's not practical to write 5000 eight-word bios.
I just updated yours, but let me know if you'd like something different.
https://github.com/mtlynch/hn-popularity-contest-data/pull/2...
But I think one of these entries needs to be updated, see: https://en.wikipedia.org/wiki/Felix_Reda
My set includes 52,642 sites all told, with 16,185 classified as, e.g., "programming", "blog", "social media", "academic/science", "corporate comm.", "general news", "government", "software", "tech news", etc. I'd come up with 61 total classifications, covering all sites with at least 18 front-page appearances within the HN archive.
For authors that just change domains (e.g., christine.website and xeiaso.net), I combine scores, but if they maintain different writing under different domains, I treat the domains as separate.
That said, I think Michał Zalewski is the author who most suffers due to this rule.
HN has been a fantastic community for me.
It's only barely there at 480 but still.. that reminds me it's been too long since I've written a post.
- the blog domains which I tend to comment on (relative to other people, not in absolute terms), or
- the people whose comments I most often reply to?
Nobody should listen to me. I have no idea what I'm doing.
I explain the methodology here:
https://refactoringenglish.com/tools/hn-popularity/methodolo...
If you'd like to add your metadata, I'll happily accept a PR here:
https://github.com/mtlynch/hn-popularity-contest-data/blob/m...
This is something I wanted but I couldn't figure out a way to do it in a way that's meaningful. Authors like Simon Willison publish frequently, so even though he has a lot of high-scoring posts, he has a lot of low-to-no-scoring posts too. It feels unfair to penalize people who publish frequently just because not every post is a homerun.
I'm open to suggestions!
I'm almost positive Paul Graham would be #1.
If you look at any 2-4 year period, the ranking tends to be quite different. Well, Paul Graham is there pretty consistently, but everything else changes.
Like jvns.ca drops off the list entirely for 2025, but was consistently in the top 5 until last year.
Also curious, what blogging platforms do they use?
[1] https://go.bsky.app/HmV5x47
Edit: wait, that is actually how many points I got. Eh, I am still on the list.
interesting site.
https://github.com/mtlynch/hn-popularity-contest-data/pull/1...
$50 that xeiaso.net will overtake justine.lol this year. (Kidding of course, they're two of my favorite sites.)