Show HN: Viral Potential Predictor

(hn-ph.vercel.app)

31 points | by salebanolow 3 hours ago

9 comments

minimaxir 59 minutes ago
As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (https://hn-ph.vercel.app/analysis) but after looking at it, I am calling shenanigans afoot.
That's not a methodology paper and it doesn't explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: "This analysis is descriptive and intended to summarize empirical patterns."
It's an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:
1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even more useless. It doesn't point out that the negative performing terms are likely spam.
2. It does not question why there are so few questions with a title >80 characters (answer: 80 characters is the max for a HN submission)
3. The analysis separates day of the week and hour: you can't do that. They're intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.
4. "Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)". No statistician would call that a weak correlation; those values are effectively no correlation.
There is also no person tied to this paper, just the "Memvid Research Team", which raises further questions.
[-]
- leohonexus 49 minutes ago
  I think it would have been much more appreciated as a dataset paper (and titled accordingly), rather than a "viral potential predictor".
delichon 2 hours ago
Here are the result for this username, this title and this description:
https://hn-ph.vercel.app/results/ZT06GF
It got a 62, a C+, predicting that this won't be very viral. So you either didn't test this submission on your own product, or you did, but didn't feel that the low score was a handicap? You don't seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
[-]
- baobun 1 hour ago
  Uncharitable and assumptious of the goals. I prefer submissions to not be hyper-optimized for virality.
tverbeure 1 hour ago
Current nr 3 in the leaderboard: "Show HN: I built a Rust compiler in Rust with Rust"
Could use some more Rust to boost it to nr 1.
[-]
- Frotag 38 minutes ago
  Show HN: I built a Rust compiler in Python with JavaScript using Java on Android
- baobun 1 hour ago
  I'm calling it: Some AI controversy in Rust core will be in the top 5 of 2026.
andr3wV 1 hour ago
The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn't actually predicting if your launch title will go viral, it's more like checking for heuristics and descriptive patterns.
Cool idea though! And they're on the front page lol
higginsniggins 54 minutes ago
According to your research paper you should have made this post a "Tell HN:" rather then a "Show HN:", lol
amitav1 1 hour ago
This tool: "Avoid keyword stuffing; make the title read naturally."
Also this tool: "Show HN (AI): I built GPT 6 in Rust Using Claude Gemini Grok OpenAI NVIDIA Google" - #1
(No hate to the creators obviously. Just really funny.)
codybontecou 1 hour ago
Well, he made it to the front page so there’s that.
simonw 1 hour ago
(Replaced my original comment here which was a little unkind.)
Question for OP, who created Memvid (the .mv2 file format that's used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for "AI agents" than those existing solutions?
[-]
- minimaxir 35 minutes ago
  As a side note: the dataset is referenced in the paper as being from Hugging Face (https://huggingface.co/datasets/julien040/hacker-news-posts), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
- simonw 29 minutes ago
  Found a relevant comment here: https://github.com/memvid/memvid/issues/86#issuecomment-3560...
  > We’ve rebuilt everything from the ground up for Memvid v2, new format, new core, new benchmarks, no QR hacks. It’s a real storage engine now, crash-safe, deterministic, and fully verified with a proper TOC, WAL, Merkle tree, and time index.
  So I guess the QR code hack isn't a thing any more.
- tossit444 49 minutes ago
  Look at memvid's closed issues. The entire thing is a farce.
  https://github.com/memvid/memvid/issues?q=is%3Aissue%20state...
mitexleo 2 hours ago
Let's see if this goes viral
[-]
- asciii 1 hour ago
  o7 see you in the 1% someday