Seems to be nfs v3 [0] - curious to test it out - the only userspace nfsv4 implementation I’m aware of is in buildbarn (golang) [1]. The example of their nfs v3 implementation disables locking. Still pretty cool to see all the ways the rust ecosystem is empowering stuff like this.
I’m kinda surprised someone hasn’t integrated the buildbarn nfs v4 stuff into docker/podman - the virtiofs stuff is pretty bad on osx and the buildbarn nfs 4.0 stuff is a big improvement over nfs v3.
Anyhow I digress. Can’t wait to take it for a spin.
Seems like a really interesting project! I don't understand what's going on with latency vs durability here. The benchmarks [1] report ~1ms latency for sequential writes, but that's just not possible with S3. So presumably writes are not being confirmed to storage before confirming the write to the client.
What is the durability model? The docs don't talk about intermediate storage. Slatedb does confirm writes to S3 by default, but I assume that's not happening?
SlateDB offers different durability levels for writes. By default writes are buffered locally and flushed to S3 when the buffer is full or the client invokes flush().
The durability profile before sync should be pretty close to a local filesystem. There’s (in-memory) buffering happening on writes, then when fsync is issued or when we exceed the in-memory threshold or we exceed a timeout, data is sync-ed.
I am in no way affiliated with JuiceFS, but I have done a lot of benchmarking and testing of it, and the numbers claimed here for JuiceFS are suspicious (basically 5 ops/second with everything mostly broken).
I have a juicefs setup and get operations in the 1000/s, not sure how they got such low numbers.
JuiceFS also supports multiple concurrent clients making their own connection to the metadata and object storage, allowing near instant synchronization and better performance, where this seems to rely on a single service having a connection and everyone connecting through it with no support for clustering.
I have no doubt that JuiceFS can perform “thousands of operations per second” across parallel clients. I don't think that's a useful benchmark because the use cases we are targeting are not embarrassingly parallel.
Using a bunch of clients on any system hides the latency profile. You could even get your "thousands of operations per second" on a system where any operation takes 10 seconds to complete.
The bench suite is published at the root of the repo in bench/. Stating “a bunch of claims” seems a bit dismissive when it’s that easy to reproduce.
JuiceFS maps operations 1:1 to s3 chunk wise, so the performance profile is entirely unsurprising to me, even untaring an archive is not going to be a pleasant experience.
I had to laugh out loud:
"In practice, you'll encounter other constraints well before these theoretical limits, such as S3 provider limits, performance considerations with billions of objects, or simply running out of money."
Incredibly cool! It shows running Ubuntu and Postgres, and also supports full posix operations.
Questions:
- I see there is a diagram running multiple Postgres nodes backed by this store, very similar to horizontally distributed web-server. Doesn't Postgres use WAL replication? Or is it disabled and they are they running on same "views" of the filesystem?
- What does this mean for services that handle geo-distribution on app layer? e.g. CockroachDB?
Built stop the excellent SlateDB! Breaks files down into 256k chunks. Encrypted. Much much better posix compatibility than most FUSE alternatives. SlateDB has snapshots & clones, so that could be another great superpower of zerofs.
Incredible performance figures, rocketing to probably the best way to use object storage in an fs like way. There's a whole series of comparisons, & they probably need a logarithmic scale given the scale of the lead slatedb has! https://www.zerofs.net/zerofs-vs-juicefs
Speaks 9p, NFS, or NBD. Some great demos of ZFS with l2arc caches giving a near local performance while having s3 persistence.
I’m kinda surprised someone hasn’t integrated the buildbarn nfs v4 stuff into docker/podman - the virtiofs stuff is pretty bad on osx and the buildbarn nfs 4.0 stuff is a big improvement over nfs v3.
Anyhow I digress. Can’t wait to take it for a spin.
[0] https://github.com/Barre/zerofs_nfsserve
[1] https://github.com/buildbarn/bb-remote-execution/tree/master...
What is the durability model? The docs don't talk about intermediate storage. Slatedb does confirm writes to S3 by default, but I assume that's not happening?
[1] https://www.zerofs.net/zerofs-vs-juicefs
https://slatedb.io/docs/design/writes/
I am in no way affiliated with JuiceFS, but I have done a lot of benchmarking and testing of it, and the numbers claimed here for JuiceFS are suspicious (basically 5 ops/second with everything mostly broken).
JuiceFS also supports multiple concurrent clients making their own connection to the metadata and object storage, allowing near instant synchronization and better performance, where this seems to rely on a single service having a connection and everyone connecting through it with no support for clustering.
Using a bunch of clients on any system hides the latency profile. You could even get your "thousands of operations per second" on a system where any operation takes 10 seconds to complete.
JuiceFS maps operations 1:1 to s3 chunk wise, so the performance profile is entirely unsurprising to me, even untaring an archive is not going to be a pleasant experience.
https://rclone.org/
https://github.com/rclone/rclone
Looking at those benchmarks, I think you must be using a local disk to sync writes before uploading to S3?
Questions:
- I see there is a diagram running multiple Postgres nodes backed by this store, very similar to horizontally distributed web-server. Doesn't Postgres use WAL replication? Or is it disabled and they are they running on same "views" of the filesystem?
- What does this mean for services that handle geo-distribution on app layer? e.g. CockroachDB?
Sorry if this sounds dumb.
Incredible performance figures, rocketing to probably the best way to use object storage in an fs like way. There's a whole series of comparisons, & they probably need a logarithmic scale given the scale of the lead slatedb has! https://www.zerofs.net/zerofs-vs-juicefs
Speaks 9p, NFS, or NBD. Some great demos of ZFS with l2arc caches giving a near local performance while having s3 persistence.
Totally what I was thinking of when in the Immich someone mentioned wanting a way to run it on cheap object storage. https://news.ycombinator.com/item?id=45169036