I Accidentally Finished a Filesystem

(github.com)

18 points | by phboot 2 days ago

5 comments

lemonlime227 1 hour ago
> This means the driver doesn't "search" for empty space. It calculates where data goes using math.
From my understanding, we're still searching for empty space? We just have an easily computable sequence of spots to check. E.g., if our stride is 7 blocks, then instead of going linearly with a stateful search, we can easily compute where we check. It's hard to pull this apart from the README. The README looks a bit LLM generated (clued in by OP's comment as well), which contributes to the difficulty versus a more thoughtful writeup. Interesting idea, it's just hard to tell exactly what's going on.
[-]
- promiseofbeans 1 hour ago
  All the commit messages read like they’re from an LLM as well
phboot 2 days ago
I’ve been working on a storage system for a long time. Longer than I planned. Longer than was healthy.
It started as “just an allocator experiment.” Then it grew a compression engine. Then repair logic. Then identity. Then tags. Then time slicing. Then a namespace.
At some point I realized I wasn’t building components anymore — I had built the whole substrate.
Not a directory tree. A flat, identity-first namespace with semantic tags, time and generation slicing, CRC defense, extension chains, and deterministic resolution.
No public API. No SDK. It just speaks POSIX now.
I’m releasing the namespace engine today as a public reference implementation. It’s spec-locked, test-covered, and boring in the best way.
There’s no product. No startup. No VC story. Just a filesystem that finally works the way I always wished they did.
I’m tired. But I’m also weirdly calm about it.
If anyone wants to read, criticize, or tell me I reinvented something from 1987 — I’m ready.
[-]
- vlovich123 42 minutes ago
  I think the “find file” section could use some clarification. Unless I missed something, as implemented it’s impossible to list paths within the filesystem (unless the cortex stores the path? It’s not clear from the docs). At a minimum I’m curios about the costs associated with maintaining the cortex - there’s nothing about how the cost of metadata updates which is where the slowdown as disk fills up normally is since you have to do a sorted insertion and/or deletion or otherwise add indirection markers after a binary search.
  > The file's metadata in memory is updated to the new version.
  Which means this doesn’t work well for lots of (presumably small) files because of the bookkeeping overhead of needing to have all the metadata materialized in RAM? Have you tested how your filesystem scales as the number of files increases and how the RAM usage scales?
  Anyway, super interesting ideas. Congrats on achieving something difficult!
- promiseofbeans 1 hour ago
  The question on everyone’s minds: did Claude write all this prose (the readme has the exact same tone & vibe as the above comment) or was it ChatGPT?
  [-]
  - deafpolygon 1 hour ago
    My money’s on ChatGPT. I recognize some of the common phrases it uses.
  - Boltgolt 1 hour ago
    "No X, no X, no X, just Y"
- reubenmorais 1 hour ago
  I hate to be the first one commenting to say this, but here it goes: the flashy LLM writing style, "Apple Event Dialect" in the README and in this comment is very recognizable and also quite irritating. If this is supposed to be boring then just state the facts and the benchmarks to prove them.
  [-]
  - promiseofbeans 1 hour ago
    For a comment that goes on about not being flashy, the writing tries it’s very best to be flashy
bflesch 1 hour ago
Sounds too good to be true. What are the downsides? You say that it reads a location that was calculated, but then also checks the crc32 and if it doesn't match it will move to the next calculated position. Why is reading the crc32 needed? Why doesn't it immediately go to the next position?
sestep 1 hour ago
This sounds cool but is extremely uninteresting without performance measurements. Are there any?
d_silin 1 hour ago
From source code (definitely LLM-generated)
case HN4_ERR_DATA_ROT: return 80;
case HN4_ERR_HEADER_ROT: return 80;
case HN4_ERR_PAYLOAD_ROT: return 80;
Yeah, good luck mounting that filesystem in production. You will need a lot of it...
[-]
- d_silin 54 minutes ago
  Even better indication of non-human authorship:
  /* LOGICAL CONSISTENCY (85-90) - TRANSACTION VIOLATIONS */
  case HN4_ERR_GENERATION_SKEW: return 85;
  case HN4_ERR_PHANTOM_BLOCK: return 82;