Introducing S2

(s2.dev)

372 points | by brancz 206 days ago

42 comments

  • animex 205 days ago
    IANAL,but naming your product S2 and mentioning in the intro that AWS S3 is the tech you are enhancing is probably looking for a branding/copyright claim from Amazon. Same vertical & definitely will cause consumer confusion. I'm sure you've done the research about whether a trademark has been registered.

    https://tsdr.uspto.gov/#caseNumber=98324800&caseSearchType=U...

    • fcortes 205 days ago
      Fun fact: S2 and EC2 sound exactly the same in Spanish - both are "ese dos". Add that to EC2 and S3 already being confusing to tell apart by ear
      • the_gipsy 204 days ago
        Only for you lot of Thouth-american thpanish thpeakerth.
      • crowdyriver 205 days ago
        not for non latin american speakers.
    • volemo 205 days ago
      TBF, building something with the goal of enhancing S3 I would call it S4.
    • fasteo 205 days ago
      At least cloudflare’s R2 has an argument for the naming (IBM vs HAL, A Space Odyssey)
    • kevingadd 205 days ago
      I'm not sure whether they consulted a bad trademark lawyer or didn't consult one at all, but it wouldn't have cost that much to do so. I say this having just recently started the process of filing a trademark - the cost is about the same as buying i.e. 's4.dev' according to the domain registry's website.

      Having to rebrand your product after launching is a lot more painful than doing it before launching.

    • ChicagoDave 205 days ago
      OR

      Amazon just builds the same thing, calls it S3 Streams, and doesn’t care about S2.

      Maybe they make a buyout offer.

      I highly doubt they would sue.

      • isubkhankulov 205 days ago
        Trademark law encourages companies to defend their marks. If they don’t, they may lose the trademark. So Amazon has to write these guys a letter if it wants to defend the s3 trademark.
        • ChicagoDave 205 days ago
          Amazon might write a letter, but if the tech is solid, they’ll probably just work with them.
    • evertedsphere 205 days ago
      s3 (serverless stream store)
    • rsync 205 days ago
      What could possibly be better than being sued by Amazon for some nitpicky naming Issue ?

      That’s the kind of David vs. Goliath publicity one could only dream of …

      • blagie 205 days ago
        98% of the time, law suits are just a money pit. There is zero publicity. A tiny number go viral. I don't think this is likely to be one of those times.

        Most people would simply say "Amazon is right." Because Amazon is right. This is an intentional attempt to leverage their product branding to promote a new product. There is very little good here.

        If this were open-source, academic, non-profit, or something like that, perhaps. A small venture trying to commercialize on some digital equivalent of Amazon's trade dress? I can't imagine anyone would care....

        Even those times when someone is 100% right, usually, there is zero publicity. Right or wrong, most times I've seen, the small guy would settle with the big guy with the deep legal pockets and move on because litigating is too expensive.

        In a situation like this one, your marketing spend / press coverage on the existing name is shot, links to your domain are shot, and perhaps you have an egg on your face, depending on how things play out.

    • pxtail 205 days ago
      Yep, letter S and a number is copyrighted, can't do that
      • Biganon 205 days ago
        1) we're talking about trademark law, not copyright law.

        2) the problem here is that they're in the same business segment, and explicitly reference S3.

      • paulddraper 205 days ago
        S3. But trademark law prevents subtle variations.

        E.g. creating a product called “Gooogle”

  • myflash13 206 days ago
    This is a really good idea, beautiful API, and something that I would like to use for my projects. However I have zero confidence that this startup would last very long in its current form. If it's successful, AWS will build a better and cheaper in-house version. It's just as likely to fail to get traction.

    If this had been released instead as a Papertrail-like end-user product with dashboards, etc. instead of a "cloud primitive" API so closely tied to AWS, it would make a lot more sense. Add the ability to bring my own S3-Compatible backend (such as Digital Ocean Spaces), and boom, you have a fantastic, durable, cloud-agnostic product.

    • shikhar 206 days ago
      (Founder) we do intend to be multi-cloud, we are just starting with AWS. Our internal architecture is not tied to AWS, it's interfaces that we can implement for other cloud systems.
    • torginus 205 days ago
      It would be extra ironic if the whole thing already ran on top of AWS.

      There's no end to startups which can be described as existing-open-source-software as a service, marketed as a cheaper alternative to AWS offerings.. who run on AWS.

    • qudat 206 days ago
      People keep making the same argument against Aptible (https://aptible.com) and it is still a very successful PaaS over a decade later.
      • joshstrange 205 days ago
        I had never heard of this company so I took a look and the main pitch was compelling and then I went to the pricing page and saw the pricing goes from $0 to $500 a month once you want to go to “production”. i’m clearly not the target market, which makes sense why I’ve never heard it.
        • paulddraper 205 days ago
          It’s popular for security sensitive (e.g. healthcare) stuff
    • gr__or 205 days ago
      If you do cloud infra stuff, AWS will try to undercut you on price but will never outdo you on D/UX. So I wouldn't let Beezus hold me back
    • Too 205 days ago
      They just did https://news.ycombinator.com/item?id=42211280 (Amazon S3 now supports the ability to append data to an object, 30 days ago). Azure has had the same with append blobs for a long time. It's still a bit more raw than S2, without the concept of record. The step for a cloud provider to offer this natively is very small. And with the concept of a record, isn't this essentially a message queue, where the competitor space is equally big? Likewise if you look into log storage solutions.
      • shikhar 205 days ago
        (Founder) Both S3 Express _One Zone_ appends and Azure's append blobs charge the regular PUT price for appends. It may work for you, but probably not if you want to do smaller writes.

        Blob stores will also not let you do tailing reads, like you can with S2.

        In AWS, S2's Express storage class takes care of writing to a quorum of 3 zonal buckets for regional durability.

        I doubt object stores will go from operating at the level of blobs and byte ranges, to records and sequence numbers. But I could be wrong.

    • throwaway519 205 days ago
      Amazon don't compete for price sensitive product offerings.

      If anything, they normlise an expectation with a budget aware base.

  • solatic 206 days ago
    Help me understand - you build on top of AWS, which charges $0.09/GB for egress to the Internet, yet you're charging $0.05/GB for egress to the Internet? Sounds like you're subsidizing egress from AWS? Or do you have access to non-public egress pricing?
    • shikhar 206 days ago
      (Founder) We are not charging in preview. At the scale where it matters, we will work it out. Definitely some assumptions in here.
      • bigstrat2003 205 days ago
        For what it's worth, there's zero chance I would do business with a company whose business plan is "we'll work it out". It gives one every reason to believe that in a couple years time you guys will either be out of business (because you didn't figure out the numbers to make a profit) or will pull the rug from under customers in the form of surprise price hikes. Obviously you have to do what you think is right, but I think that this approach is going to scare off a lot of customers for you.
        • shikhar 205 days ago
          (Founder) We are not charging during preview. If anything, I wanted to be transparent about our planned pricing. Our mission is to make streams a cloud storage primitive, and I worked backwards from there in terms of our costs and expected costs looking ahead once we can scale a bit - based on concrete data points about what kind of discounts can be unlocked. I realized it was premature based on the comments here, so the price for internet egress has been updated. Thank you for your feedback.
      • 8n4vidtmkvmk 206 days ago
        Just FYI, that doesn't give me confidence in the longevity of your service.
        • srik 206 days ago
          Cloud services offer giant discounts sometimes and the receiving party aren't allowed to talk about it concretely so that's probably what's happening here.
        • shikhar 206 days ago
          (Founder) I understand the concern. However, cloud discounts at scale can be very large, and we are going to share as much of it as we reasonably can.
          • everfrustrated 205 days ago
            Discounts require multi year commitment for minimum (and increasing) spend. Generally you need to be either profitable or a well funded startup to demonstrate why a vendor would trust your ability to pay (it's literally a debt on your books). How do they know you're good for it?

            Plus multi cloud means less scale and less marketing incentive (can't talk about you as a x cloud customer).

            I wish you the best, but would encourage you to not set your prices below your costs.

            • shikhar 205 days ago
              (Founder) Thank you for the advice. I hope we can offer better when the deals come into play, but for now setting our planned internet egress price to $0.08/GiB.
      • JoshTriplett 205 days ago
        Do you plan to charge differently for bandwidth depending on whether the customer is in AWS or not? Would be nice if you pass on the cost savings.
        • shikhar 205 days ago
          (Founder) Yes, we will charge less for private connectivity. Pricing is transparent https://s2.dev/pricing - free during preview.
          • CodesInChaos 205 days ago
            Doesn't AWS charge $0.01 intra region and $0.02 between regions, even without setting up private links? Can't you pass part of those savings (compared to the $0.05-$0.09 of egress) on? Or is it too difficult to detect if the remote IP qualifies?
            • shikhar 205 days ago
              (Founder) Unfortunately, if you access over a public IP, it is internet egress. Even if the client is in the same AWS cloud region. PrivateLink is the only option.
              • chicagobuss 203 days ago
                Peering is an option as well but it's a whole different ballgame of complexities to set up vs. private link
    • nfm 206 days ago
      List pricing is $0.05 per GB after 150TB and at high volume it’s cheaper than that
    • kondro 206 days ago
      They’re probably betting on most users being in AWS and only having to pay 1¢-2¢ transfer.
      • ckdarby 205 days ago
        They're also banking on scale to PPA with a specific amendment for egress.
    • amazingman 205 days ago
      Nobody with sufficient scale will be paying retail for data transfer.
    • CodesInChaos 205 days ago
      Looks like they changed it to $0.08/GB. Which loses them at most $300/month at 50TB, and makes money after that.
    • MicolashKyoka 206 days ago
      strat is likely just get users, then offboard aws if the product works.
      • shikhar 205 days ago
        (Founder) No, we want to be in the same cloud regions as customers.
  • masterj 206 days ago
    So is this basically WarpStream except providing a lower-level API instead of jumping straight to Kafka compatibility?

    An S3-level primitive API for streaming seems really valuable in the long-term if adopted

    • shikhar 206 days ago
      (Founder) That somewhat summarizes yes :) We take a different approach than WarpStream architecturally too, which allows us to offer much lower latencies. No disks in our system, either.
  • iambateman 206 days ago
    These folks knowingly chose to spend the rest of their careers explaining that they are not, in fact, S3.
    • shikhar 206 days ago
      (Founder) well 50% of our name is different
      • joenot443 206 days ago
        I like it. I see it as ostensibly a product for engineers and so when I see a name like S2 it's immediately clear that it's a product led and conceived by engineers.

        I also see that on your pricing page -

        "We are building the S3 experience for streaming data, and that includes pricing transparency"

        Love the simple and earnest copy. One can imagine what an LLM would cook up instead, I find the brevity way preferable.

        • shikhar 205 days ago
          (Founder) Thank you for the kind comment!

          Yes we are not trying to confuse S2 with S3, we just think S3 is the best damn serverless experience out there, and we aspire to that greatness. We borrowed the structure of that name to reflect that aspiration, as have other services inspired by S3 like Cloudflare's object store R2.

          • andreasmetsala 205 days ago
            I actually thought S2 is a Cloudflare service at first.
      • do_not_redeem 206 days ago
        You should have gone with S4 tbh. The suits love bigger numbers. Super Simple Stream Store.
        • phamilton 206 days ago
          http://www.supersimplestorageservice.com/ exists and calls itself S4. It's a decent gag and the immediately came to mind when I heard S2 vs S3.
        • shikhar 206 days ago
          (Founder) I have definitely received that advice before :) - to not seem like a regression from S3. But as an abbreviation for Stream Store, it made sense.
          • blipvert 206 days ago
            Why not just use SS? There can’t possibly be any negative connotations there.
            • HideousKojima 206 days ago
              You could even make the s look kind of like a lightning bolt to emphasize how fast it is
              • kdmtctl 206 days ago
                Quite dangerous. Will look almost like the Schutzstaffel runic insignia. I'd better avoid this resemblance.
                • MrDOS 205 days ago
                  thatsthejoke.jpg
            • debuggerpk 206 days ago
              SS .. as in nazi?
            • edoceo 206 days ago
              Reserved by GM for the Super Sport
              • ethbr1 206 days ago
                So that's why GM has been asking itself "Are we the baddies?" lately.
          • rswail 205 days ago
            S3++?
        • nsxwolf 206 days ago
          How do you store a stream? Don’t they just spray around the internet here and there, and if you don’t catch them in the moment, they’re just gone?
        • rswail 205 days ago
          Surely S3++? /s
        • smitty1e 206 days ago
          Disagree. You have a marketing opportunity for a hipster character named "Stu" to be the spokesman.
      • iambateman 206 days ago
        Props to you for having a sense of humor about it. :D

        If I could put in one request...a video which describes what it is and how to use it would make it easier for me to understand.

        • shikhar 205 days ago
          (Founder) Yes we should create a video, thanks for the feedback.

          In the meantime, checkout this quickstart which will have to streaming Star Wars with the S2 CLI and give you a pretty good sense of things https://s2.dev/docs/quickstart#get-started-with-the-cli

          (You will have to apply to join the preview, but we are approving quickly)

      • CobrastanJorji 205 days ago
        You could say that. Or, in binary ASCII, you could say your name is 93.75% the same (it flips only the last bit of 16).
      • davej 206 days ago
        Your 66.66% (2/3) of the way there to the second character too. So I would say your only 16.66% different across the two characters.
      • ozim 206 days ago
        I would look much more into Levenshtein Distance ;) if I would like to be smart ass funny.
      • binary132 206 days ago
        You're 50% of the way closer to 1st!
    • jsheard 206 days ago
      How many of these letter-number storage services are there now? S3, B2, R2, S2...
      • thaumasiotes 206 days ago
        S3 isn't the name of the service - that's "Amazon Simple Storage Service". S3 is a nickname, short for "Simple Storage Service".
        • eterm 206 days ago
          Nickname implies it's unofficial, but S3 is very much the product name too:

          https://aws.amazon.com/s3/faqs/

          "Simple storage service" is used once. "S3" is used throughout.

        • hnlmorg 206 days ago
          While you’re technically correct, for all intents and purposes it is called S3 even by AWS themselves.
        • anal_reactor 206 days ago
          When I was a student we had a Facebook group to share information, and one angry guy ranted that the correct shortening of "Mathematical Analysis" is not, in fact, "anal", as we were used to say
        • rswail 205 days ago
          and EC2 stands for "Elastic Compute Cloud". but no one remembers that.
    • rcpt 206 days ago
    • andrelaszlo 206 days ago
      Seems preferable to having to explain you're not a paramilitary organization responsible for unspeakable war crimes. Nothing funny about that.
    • OJFord 206 days ago
      Including potentially in court / to lawyers? IANAL, but isn't this just inviting Amazon to claim it's deliberately leveraging their 'S3' trademark and sowing confusion in order to lift their own brand? (Correctly, and even somewhat transparently in TFA, IMO.)
    • cchance 206 days ago
      My issue is that 2<3 and for most people they will just assume its older/shittier S3 lol
  • pram 206 days ago
    It looks neat but, no Java SDK? Every company I've personally worked at is deeply reliant on Spring or the vanilla clients to produce/consume to Kafka 90% of the time. This kind of precludes even a casual PoC.
    • infiniteregrets 206 days ago
      (S2 Team member) As we move forward, a Java/Kotlin and a Python SDK are on our list. There is a Rust sdk and a CLI available (https://s2.dev/docs/quickstart) . Rust felt as a good starting point for us as our core service is also written in it.
      • mdaniel 205 days ago
        Merely as a "for your consideration," writing an SDK in a very, very different language can surface "rust-isms" in the way your API works that might not be obvious when using a homogeneous tech stack

        I think of that as the "Chinese wall" of shipping SDKs: can someone not familiar with your product use it effectively from a language you don't know

  • karmakaze 206 days ago
    I do like this. The next part I'd like someone to build on top of this is applying the stream 'events' into a point-in-time queryable representation. Basically the other part to make it a Datatomic. Probably better if it's a pattern or framework for making specific in-memory queryable data rather than a particular database. There's lots of ways this could work, like applying to a local Sqlite, or basing on a MySQL binlog that can be applied to a local query instance and rewindable to specific points, or more application-specific apply/undo events to a local state.
  • jgraettinger1 205 days ago
    Roughly ten years ago, I started Gazette [0]. Gazette is in an architectural middle-ground between Kafka and WarpStream (and S2). It offers unbounded byte-oriented log streams which are backed by S3, but brokers use local scratch disks for initial replication / durability guarantees and to lower latency for appends and reads (p99 <5ms as opposed to >500ms), while guaranteeing all files make it to S3 with niceties like configurable target sizes / compression / latency bounds. Clients doing historical reads pull content directly from S3, and then switch to live tailing of very recent appends.

    Gazette started as an internal tool in my previous startup (AdTech related). When forming our current business, we very briefly considered offering it as a raw service [1] before moving on to a holistic data movement platform that uses Gazette as an internal detail [2].

    My feedback is: the market positioning for a service like this is extremely narrow. You basically have to make it API compatible with a thing that your target customer is already using so that trying it is zero friction (WarpStream nailed this), or you have to move further up to the application stack and more-directly address the problems your target customers are trying to solve (as we have). Good luck!

    [0]: https://gazette.readthedocs.io/en/latest/ [1]: https://news.ycombinator.com/item?id=21464300 [2]: https://estuary.dev

    • shikhar 205 days ago
      (S2 Founder) Congrats on the success with Estuary! You are not the first person to tell me there is no/tiny market for this. Clearly _you_ thought there was something to it, when you looked to HN for validation. We may do a lot more on top of S2, like offering Kafka compatibility, but the core primitive matters. I have wanted it. It gets reinvented in all kinds of contexts and reused sub-optimally in the form of systems that have lost their soul, and that was enough for me to have this conviction and become a founder.

      ED: I appreciate where you are coming from, and understand the challenges ahead. Thank you for the advice.

      • jgraettinger1 205 days ago
        The market is gobsmackingly huge, it's just the go-to-market entry points which are narrow.

        In my opinion, the key is to find a value prop and positioning which lets prospects try your service while spending a minimum of their own risk capital / reputation points within their own org.

        That makes it hard to go after core storage, because it's such a widely used, fundamental, and reliable part of most every company's infrastructure. You and I may agree that conventions of incremental files in S3 are a less-than-ideal primitive for representing streams, but plenty of companies are doing it this way just fine and don't feel that it's broken.

        WarpStream, on the other hand, leaned in to the perceived complexity of running Kafka and the share of users who wanted a Kafka solution with the operational profile of using S3. Internal champions can sell trying their service because the prospect's existing thing is already understood to be a pain in the butt.

        For what it's worth, if I were entering the space anew today I'd be thinking carefully about the Iceberg standard and what I might be able to do with it.

        • shikhar 205 days ago
          Fair :) Yes, we are pretty hyped about the possibilities with Iceberg, especially now with S3 Table buckets.
  • Scaevolus 206 days ago
    This is a very useful service model, but I'm confused about the value proposition given how every write is persisted to S3 before being acknowledged.

    I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?

    AWS has shown their willingness to implement mostly-protocol compatible services (RDS -> Aurora), and I could see them doing the same with a Kafka reimplementation.

    • sensodine 206 days ago
      (S2 team member here)

      > I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?

      This is how it works essentially, yes. Architecting the system so that chunks that are written to object storage (before we acknowledge a write) are multi-tenant, and contain records from different streams, lets us write frequently while still targeting ideal (w/r/t price and performance) blob sizes for S3 standard and express puts respectively.

      • philjohn 206 days ago
        Wait, data from multiple tenants is stored in the same place. Do you have per-tenant encryption key, or how else are you ensuring no bugs allow tenants to read others data?
  • evantbyrne 205 days ago
    Seems like really cool tech. Such a bummer that the it is not source available. I might be a minority in this opinion, but I would absolutely consider commercial services where the core tech is all released under something like a FSL with fully supported self-hosting. Otherwise, the lock-in vs something like kafka is hard to justify.
    • shikhar 205 days ago
      (Founder) We are happy for S2 API to have alternate implementations, we are considering an in-memory emulator to open source ourselves. It is not a very complicated API. If you would prefer to stick with the Kafka API but benefit from features like S2's storage classes or having a very large number of topics/partitions or high throughput per partition, we are planning an open source Kafka compatibility layer that can be self-hosted, with features like client-side encryption so you can have even more peace of mind.
      • rswail 205 days ago
        Having a kafka compatible API and S3 storage would be something I would jump to, the savings over MSK would be huge.

        If you had a (paid for) API that sat on top of an S3 API for on-prem, that would be fantastic as well.

        Kafka is great, but the whole Java ecosystem and the lack of control of what is in the topics and the stuff about co-ordinating the cluster in zookeeper is a management PITA.

        • emgeee 205 days ago
          Checkout warpstream (recently acquired by confluent)
      • CodesInChaos 205 days ago
        > we are considering an in-memory emulator to open source ourselves

        I'd suggest a persistent emulator, using something like SQLite (one row per record). Even for local development, many applications need persistence. And it'd be even enough to run a single node low throughput production server which doesn't need robust durability and availability. But it still has enough overhead and limitations not to compete with your cloud offering.

        What's however important is being as close as possible to your production system, behavior wise. So I'd try so share as much of the frontend code (e.g. the GRPC and REST handlers) as possible between these.

      • evantbyrne 205 days ago
        First-class kafka compatibility could go a long way to making it a justifiable tech choice. When orgs go heavy on event streaming, that code gets _everywhere_, so a vendor off-ramp is needed.
        • shikhar 205 days ago
          (Founder) That makes sense. We would eventually host the Kafka layer too - and will be able to avoid a hop by inlining our edge service logic in there.
  • throwawayian 205 days ago
    I look at the egress costs to internet and it doesn’t check out. It’s a premium product dependent on DX, marketed to funded startups.

    But if I care about ingress and egress costs, which many stream heavy infrastructure providers do.. This doesn’t add up.

    I wish them luck, but I feel they would have had a much better chance from the start by getting some funding and having a loss leader start, then organising and passing on wholesale rates from cloud providers once they’d reached critical mass.

    Instead they’re going in at retail which is very spicy. I feel like someone will clone the tech and let you self host, before big players copy it natively.

    It’s a commodity space and they’re starting with a moat of a very busy 2 weeks from some Staff engineers at AWS.

    • shikhar 205 days ago
      (Founder) Thanks for sharing your thoughts. We are early and figuring things out. I agree egress cost is going to be a big concern. We want to do the best we can for users as we unlock some scale. During preview, we are focused on getting feedback so the service is free (we will need to talk if the usage is significant though).
  • h05sz487b 206 days ago
    Just you wait, I am launching S1 next year!
    • graypegg 206 days ago
      Ok good, my startup S½ (also known as Ç) is still unique, phew
  • Lucasoato 205 days ago
    Wow, imagine Debezium offering native compatibility with this, capturing the changes from a Postgres database, saving them as delta or iceberg in a pure serverless way!
  • bushido 206 days ago
    I wish more dev-tools startups would focus on clearly explaining the business use cases, targeting a slightly broader audience beyond highly technical users. I visited several pages on the site before eventually giving up.

    I can sort of grasp what the S2 team is aiming to achieve, but it feels like I’m forced to perform unnecessary mental gymnastics to connect their platform with the specific problems it can solve for a business or product team.

    I consider myself fairly technical and familiar with many of the underlying concepts, but I still couldn’t work out the practical utility without significant effort.

    It’s worth noting that much of technology adoption is driven by technical product managers and similar stakeholders. However, I feel this critical audience is often overlooked in the messaging and positioning of developer tools like this.

    • shikhar 206 days ago
      (Founder) Appreciate the feedback. We will try to do a better job on the messaging. It is geared at being a building block for data systems. The landing page has a section talking about some of the patterns it enables (Decouple / Buffer / Journal) in a serverless manner, with example use cases. It just may not be something that resonates with you though! We are interested in adoption by developers for now.
      • jcrites 206 days ago
        I think they're saying that you should provide some example use-cases for how someone would use your service. High-level use-cases that involve solving problems for a business.

        For what it's worth, I am already familiar with this design space well enough that I don't need this kind of example in order to understand it. I've worked with Kinesis and other streaming systems before. But for people who haven't, an example might help.

        What kind of business problem would someone have that causes them to turn to your service? What are the alternative solutions they might consider and how do those compare to yours? That's the kind of info they're asking for. You might benefit from pitching this such that people will understand it who have never considered streaming solutions before and don't understand the benefits. Pitch it to people who don't even realize they need this.

        • shikhar 205 days ago
          (Founder) Yes I understand, and this could definitely do with work. I struggle with it personally because it is so obvious to me. I don't even know where to start? How do you pitch use cases for object storage? Stream storage feels just as universal to me.
    • 8n4vidtmkvmk 206 days ago
      If you ever figure it out, LMK. I don't think I've ever looked at logs more than about 24 hours old. Persistence and durability is not something I care about.

      Errors, OTOH, I need a week or two of. But I consider these 2 different things. Logs are kind of a last resort when you really can't figure out what's going on in prod.

      • CodesInChaos 205 days ago
        Here "log" means "append-only stream of small records". This isn't just about traditional logs (including http request logs and error logs). You could use it to store events for an event-sourced application, and even as the Write-Ahead-Log (WAL) for a database.

        A distributed, but still consistent and durable log is a great building block for higher level abstractions.

        • 8n4vidtmkvmk 205 days ago
          That makes more sense. I suppose an audit log would also fit. I guess append-only backups wouldn't fit the "small" requirement though.
          • CodesInChaos 205 days ago
            "Small" means 1MiB per record here. But a higher level abstraction could split one logical operation into multiple records. Just like FoundationDB has severe limits on its transaction size, while higher level databases built on top of it work around that limit.

            The OP's blog post linked to this article, which explains some scenarios where this storage primitive would be helpful: https://engineering.linkedin.com/distributed-systems/log-wha...

            This product offers two advantages over S3: 1) Appending a small amount of data is cheap 2) Writes are forced into a consistent order (so you don't need to implement Paxos or RAFT yourself). Neither of these are useful for backups. Raw S3 already works well for that usage-case, especially now that Amazon added support for pre-conditions.

    • rswail 205 days ago
      "Replace our MSK clusters and EBS storage with S3 storage costs."
  • CodesInChaos 205 days ago
    1. Do you support compression for data stored in segments?

    2. Does the choice of storage class only affect chunks or also segments?

    To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so. But I assume that would require significant engineering effort for applications that require data to be replicated to several AZs before acknowledging them. Though some applications might be willing to sacrifice 1s of writes on node failure, in exchange for cheap and fast writes.

    3. You could be clearer about what "latency" means. I see at least three different latencies that could be important to different applications:

    a) time until a write is durably stored and acknowledged

    b) time until a tailing reader sees a write

    c) time to first byte after a read request for old data

    4. How do you handle streams which are rarely written to? Will newly appended records to those streams remain in chunks indefinitely? Or do you create tiny segments? Or replace and existing segment with the concatenated data?

    • shikhar 205 days ago
      (Founder) Thanks for the deep questions!

      1) Storage is priced on uncompressed data. We don't currently compress segments.

      2) It only affects chunk storage. We do have a 'Native' chunk store in mind, the sketch involves introducing NVMe disks (as a separate service the core depends on) - so we can offer under 5 millisecond end-to-end tail latencies.

      3) The append ack latency and end-to-end latency with a tailing reader is largely equivalent for us since latest writes are in memory for a brief period after acknowledgment. If you try the CLI ping command (see GIF on landing page) from the same cloud region as us (AWS us-east-1 only currently), you'll see end-to-end and append ack latency as basically the same. TTFB for older data is ~ TTFB to get a segment data range from object storage, so it can be a few hundred milliseconds.

      4) We have a deadline to free chunks, so we we PUT a tiny segment if we have to.

    • jgraettinger1 205 days ago
      > To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so.

      Yep, this is approximately Gazette's architecture (https://github.com/gazette/core). It buys the latency profile of flash storage, with the unbounded storage and durability of S3.

      An addendum is there's no need to flush to S3 quite that frequently, if readers instead tail ACK'd content from local disk. Another neat thing you can do is hand bulk historical readers pre-signed URLs to files in cloud storage, so those bytes don't need to proxy through brokers.

  • johnrob 206 days ago
    This is a very interesting abstraction (and service). I can’t help but feature creep and ask for something like Athena, which runs PrestoDB (map reduce) over S3 files. It could be superior in theory because anyone using that pattern must shoehorn their data stream (almost everything is really a stream) into an S3 file system. Fragmentation and file packing become requirements that degrade transactional qualities.
    • shikhar 205 days ago
      (Founder) There are definitely some interesting possibilities. Pretty hyped about S3 Table (Iceberg) buckets. S2 stream to buffer small writes so you can flush decent size Parquet into the table, and avoid compaction costs.
  • bdcravens 206 days ago
    My first thought: "introducing? The S2 has been out for a while!"

    https://www.sunlu.com/products/new-version-sunlu-filadryer-s...

  • nextworddev 205 days ago
    This is cool but I think it overlaps too much with something like Kinesis Data Streams from AWS which has been around for a long time. It’s good that AWS has some competition though
    • shikhar 205 days ago
      (Founder) We plan to be multi-cloud over time. Kinesis has pretty low ordered throughput limit (i.e. at the level of a stream shard) of 1 MBps, if you need higher. S2 will be cheaper and faster than Kinesis with the Express storage class. S2 is also a more serverless pricing model - closer to S3 - than paying for stream shard hours.
      • nextworddev 205 days ago
        Thanks. You are right about those points. One thing to probably consider is whether serverless provides enough cost savings for most streaming ingest use cases which need static provisioning since ingest volumes are unpredictable. A better messaging would be that your serverless model can handle bursts well. (for context: used to sell KDA and KDS at AWS as part of AI solutions)
  • jcmfernandes 206 days ago
    In the long-term, how different do you want to be from Apache Pulsar? At the moment, many differences are obvious, e.g., Pulsar offers transactions, queues and durable timers.
    • shikhar 206 days ago
      (Founder) We want S2 to be focussed on the stream primitive (log if you prefer). There is a lot that can be built on top, which we mostly want to do as open source layers. For example, Kafka compatibility, or queue semantics.
  • behnamoh 206 days ago
    so the naming convention for 2024-25 products seems to be <letter><number>.

    o1, o3, s2, M4, r2, ...

  • bawolff 206 days ago
    In terms of a pitch, i'm not sure i understand how this differs from existing solutions. Is the core value proposition a simpler api?
    • shikhar 206 days ago
      (Founder) Besides simple API,

      - Unlimited streams. Current cloud systems limit to a few thousand. With dedicated clusters, few hundred K? If you want a stream per user, you are now dealing with multiple clusters.

      - Elastic throughput per stream (i.e. a partition in Kafka) to 125 MiBps append / 500 MiBps realtime read / unlimited in aggregate for catching up. Current systems will have you at tens. And we may grow that limit yet. We are able to live migrate streams in milliseconds while keeping pipelined writes flowing, which gives us a lot of flexibility.

      - Concurrency control mechanisms (https://s2.dev/docs/stream#concurrency-control)

      • shikhar 205 days ago
        Forgot to mention storage classes to tune your latency vs cost tradeoff. That you can even reconfigure - soon we will make that a live migration.
  • adverbly 206 days ago
    Seems really good for IoT no? Been a while since I worked in that space, but having something like this would have been nice at the time.
    • shikhar 206 days ago
      (Founder) so many possibilities! That's what I love about building building blocks. I think we will create an open source layer for an IoT protocol over time (unless community gets to it first), e.g. MQTT. I have to admit I don't know too much about the space.
  • cultofmetatron 205 days ago
    I had an idea like this a few years ago. basicly emitting a stream interface to a cloud based fs to enable random access seeking on bystreams. I envisioned it to be useful for things like loading large files. would be amazing for enabling things like cloud gaming, images processing and CAD

    kudos for sitting down and makin it happen!

  • siliconc0w 205 days ago
    Definitely a useful API but not super compelling until I could store the data in my own bucket
  • ComputerGuru 206 days ago
    So is this a "serverless" named-pipe-as-a-service cloud offering? Or am I misreading?
    • 38 206 days ago
      Yep. Just tack "serverless" onto something that already exists and charge for it
      • shikhar 206 days ago
        (Founder) Named pipe that operates at the level of records, is durable regionally, you can read from any sequence, and lets you do concurrency control for writes if you need to.
  • nyclounge 205 days ago
    How is this compare to https://github.com/deuxfleurs-org/garage ?

    Seems like there are a lot of more lite weight self-hosted s3 around now days. Why even use S3?

  • unsnap_biceps 205 days ago
    I really liked the landing page and the service, but it took me a while to realize it wasn't a AWS service with a snazzy landing page.
  • dragonwriter 205 days ago
    Apparently this is “S2, a new S3 competitor” not “S2, the spatial index system based on heirarchical qaudrilaterals”.
  • zffr 205 days ago
    How does this compare to Kafka? Is the primary difference that this is a hosted solution?
  • tdba 206 days ago
    Is it possible to bring my own cloud account to provide the underlying S3 storage?
    • shikhar 206 days ago
      (Founder) Not currently! We want to explore this.
  • rswail 205 days ago
    Really interesting service and bookmarked.

    I'd really love this extending more into the event sourcing space not just the log/event streaming space.

    Dealing with problems like replay and log compaction etc.

    Plus things like dealing with old events. Under GDPR, removing personal information/isolating it from the data/events themselves in an event sourced system are a PITA.

    • shikhar 205 days ago
      (Founder) An S2 stream is a durable log and can be replayed! We do want to add compaction support. Event sourcing is a great use case for S2.
  • kdazzle 206 days ago
    Would this be like an alternative to Delta? Am I thinking about that right?
  • nikolay 204 days ago
    Pretty bad branding! It should have at least been S4!
  • BaculumMeumEst 206 days ago
    S2 is, in my opinion, the sweet spot of PRS's lineup.
    • veqq 194 days ago
      Related to an old comment of yours:

      > I also kind of strongly dislike HtDP.

      I'm researching programming pedagogy and I'm curious about your thoughts on this.

  • ThinkBeat 205 days ago
    This would sell much better is was S5 or S6 next level thing.

    Wow man are you stil stuck on S3?

  • locusofself 205 days ago
    "Making the world a better place through streamable, appendable object streams"
  • somerando7 205 days ago
    Scribe aaS? ;)
  • aorloff 205 days ago
    Kafka as a service ?
    • shikhar 205 days ago
      (Founder) Nope! We have a FAQ for this ;)
  • ms7892 206 days ago
    Can someone tell me what does this do? And why its better.
    • shikhar 206 days ago
      (Founder) There is a table on the landing page https://s2.dev/ which hopefully gives a nice overview :) It's like S3, but for streams. Cheap appends, and instead of dealing with blocks of data and byte ranges, you work with records. S2 takes care of ordering records, and letting you read from anywhere in the stream.

      This is an alternative to systems like Kafka which don't do great at giving a serverless experience.

      • stogot 206 days ago
        Could you clarify the Kafka difference further?

        Or more generally, when is it better to choose S2 vs services like SQS or Kinesis?

        S2 sounds like an ordered queue to me, but those exist?

        • shikhar 206 days ago
          (Founder here) Managed cloud offerings for streaming limit ordered throughput pretty low, e.g. Kinesis at 1 MiBps, Redpanda serverless at 1 MiBps, Confluent's even higher-end clusters at 10-20 MiBps IIRC. If you really need ordering, this can indeed be a limit. S2 lets you push 125 MiBps currently, and we may grow that.

          Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?

          We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.

          We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.

          ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.

          • alwa 206 days ago
            Between your site copy and your early comments on this thread, it was this rundown that made the product click in my mind.

            I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.

            Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!

            • shikhar 206 days ago
              thank you for the feedback!
          • yandie 205 days ago
            Hey congrats! Looks like a really cool idea.

            Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.

            Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.

            • shikhar 205 days ago
              (Founder) It's a number of dimensions. I get excited about the ordered throughput angle because I have personally cared about this in the past, and yeah a lot of folks may not need that :)

              Simple API, reasonable pricing, latency flexibility, unlimited streams, _and_ elastic to high throughputs. All adding up to a great serverless experience.

              Re: the data colocation. This is how most multi-tenant systems - including S3 itself AFAIU - operate. I understand there is a difference in level of trust vs a cloud provider, and the best we can do here while delivering a serverless experience is encrypting every single record at the edge of S2 where they transit in or out, with a tenant-specific key. We may even allow specifying it as part of the request, if clients want to manage the key for themself.

              The best data security when leveraging any multi-tenant service is going to be client-side encryption, and we also want to make this super easy. With our planned Kafka layer, we plan on client-side encryption as a value add.

              • shikhar 204 days ago
                I failed to mention that we do want to support single-tenant cells for customers that need isolation.
          • shikhar 206 days ago
            @agallego Yes in aggregate both Confluent and Redpanda can push GiBps throughputs, and I know Redpanda has amazing perf. I was referring to Redpanda Serverless :) And per-partition i.e. ordered throughput.

            ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.

            • agallego 206 days ago
              coo cool right on.
          • agallego 206 days ago
            Redpanda cloud doesn’t limit tput. Most ppl get a bigger discount at high volumes. We have customers in 10s of GB/s. Confluent has those volumes too.
    • alanfranz 206 days ago
      Sort of serverless Kafka, which natively uses object storage and promises better latencies than things like warpstream.
      • ivankelly 206 days ago
        A interesting difference is the ability to have exclusive access to writes on the log (the fencing token). This allows you to use the logs as write ahead logs.
    • moralestapia 206 days ago
      It's a message queue on the cloud.

      https://chatgpt.com/c/676703d4-7bc8-8003-9e5d-d6a402050439

      Edit: Keep downvoting, only 5.6k to go!

  • durkie 206 days ago
    [flagged]
    • shikhar 206 days ago
      Indeed... we sure wish we could have nabbed that crate name, but it was not to be. Our Rust SDK is here https://lib.rs/crates/streamstore
      • ISV_Damocles 206 days ago
        Replying to this one since you apparently can't reply to a comment that has been flagged. Why was the grandparent flagged? Google's S2 library has been around for more than a decade and is the first thing I think of when I see "S2" in a tech stack.

        And the flippant response from the parent here that they don't really care that they're muddying the waters and just want the crate name is irksome.

  • revskill 206 days ago
    Serverless pricing to me is exactly like the ETH gas pricing !