Hellishly Slow Level 13 Deflate Compression

(kirill.korins.ky)

51 points | by zX41ZdbW 4 days ago

7 comments

  • pella 15 minutes ago
    OpenZL is the future: https://openzl.org/

      "OpenZL delivers high compression ratios while preserving high speed, a level of performance that is out of reach for generic compressors. OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format."
  • jbosh 4 hours ago
    I love it. So much in computers is trade offs and this was a fun read exploring it.

    It would be interesting to see some economics of what 8,000% increase in encoding time takes to make that money back in terms of storage or bandwidth. I also wonder how brotli/lzma would compare here. Are there some obscene modes on those that had similar results?

    • Zenst 19 minutes ago
      Process-intensive, but higher compression has clear strategic value. Distant satellites such as Voyager, where bandwidth is severely limited, could transmit more data using such capabilities. Equally, for long-term archival storage, improved compression would allow far greater volumes of data to be preserved on durable, life-long media formats.
    • userbinator 2 hours ago
      I also wonder how brotli/lzma would compare here.

      Far better, just like anything else based on arithmetic coding. The main distinction here is that the output can still be decompressed with a standard Inflate implementation.

    • a_t48 3 hours ago
      zstd has higher level modes. Default is -3. I saw a good tradeoff between compression speed and ratio up to -9 or so. From -20 to -22 it will use much more memory and IIRC can have downstream effects on decompression speed. I'm using -9 for my container registry and plan to recompress at a higher level for commonly accessed base layers, as well as give customers a button that lets them pay a bit more to do it themselves.
      • loeg 2 hours ago
        To be a little pedantic, the usual zstd levels are positive integers (1-22 default 3). The negative integers denote "fast" modes with worse compression (there are only a few of these).
        • edflsafoiewq 38 minutes ago
          I think those are CLI options, not negative signs. Ie. you call zstd -3 for compression level 3.
        • a_t48 1 hour ago
          Whoops! You're right, and it's too late to edit.
    • Rebelgecko 3 hours ago
      [dead]
  • jedbrooke 1 hour ago
    reminds me of the x264 “placebo” encoder setting

    https://trac.ffmpeg.org/wiki/Encode/H.264#FAQ

  • tobijdc 3 hours ago
    There is also zopfli and it's decadent ECT that allow for more extreme tradeoffs.
  • blobbers 1 hour ago
    As someone currently exploring grid searches of encodings + compressor combos, and currently looking at neural compressors that reduce size almost half that of a traditional compressor yet take order from ms -> minutes to operate in either direction, I appreciate a good compression post!
  • Someone 2 hours ago
    So, what’s the effect on memory usage?

    And for decompression, the effect on memory usage and timings?

    • lifthrasiir 1 hour ago
      For decompression, nothing changes because DEFLATE is asymmetric; compressor can spend however much time to optimize the compressed stream independently from decompressor.
  • userbinator 2 hours ago
    It's interesting to see just how far Deflate can be taken, and to know that even after decades there is still some (admittedly tiny) room for improvement. Optimal LZ is well-known, and so is static Huffman, but their combination creates some additional inefficiencies(opportunities).

    ...and of course it's written by someone with a Russian name, and has that characteristic style common to many other articles about data compression.