IANAL,but naming your product S2 and mentioning in the intro that AWS S3 is the tech you are enhancing is probably looking for a branding/copyright claim from Amazon. Same vertical & definitely will cause consumer confusion. I'm sure you've done the research about whether a trademark has been registered.
It's like S3, except better because, by focusing on being a write-only data store, they can manage much more throughput and efficiency, plus your data is far more secure at rest than it is in S3.
I'm not sure whether they consulted a bad trademark lawyer or didn't consult one at all, but it wouldn't have cost that much to do so. I say this having just recently started the process of filing a trademark - the cost is about the same as buying i.e. 's4.dev' according to the domain registry's website.
Having to rebrand your product after launching is a lot more painful than doing it before launching.
98% of the time, law suits are just a money pit. There is zero publicity. A tiny number go viral. I don't think this is likely to be one of those times.
Most people would simply say "Amazon is right." Because Amazon is right. This is an intentional attempt to leverage their product branding to promote a new product. There is very little good here.
If this were open-source, academic, non-profit, or something like that, perhaps. A small venture trying to commercialize on some digital equivalent of Amazon's trade dress? I can't imagine anyone would care....
Even those times when someone is 100% right, usually, there is zero publicity. Right or wrong, most times I've seen, the small guy would settle with the big guy with the deep legal pockets and move on because litigating is too expensive.
In a situation like this one, your marketing spend / press coverage on the existing name is shot, links to your domain are shot, and perhaps you have an egg on your face, depending on how things play out.
This is a really good idea, beautiful API, and something that I would like to use for my projects. However I have zero confidence that this startup would last very long in its current form. If it's successful, AWS will build a better and cheaper in-house version. It's just as likely to fail to get traction.
If this had been released instead as a Papertrail-like end-user product with dashboards, etc. instead of a "cloud primitive" API so closely tied to AWS, it would make a lot more sense. Add the ability to bring my own S3-Compatible backend (such as Digital Ocean Spaces), and boom, you have a fantastic, durable, cloud-agnostic product.
(Founder) we do intend to be multi-cloud, we are just starting with AWS. Our internal architecture is not tied to AWS, it's interfaces that we can implement for other cloud systems.
Help me understand - you build on top of AWS, which charges $0.09/GB for egress to the Internet, yet you're charging $0.05/GB for egress to the Internet? Sounds like you're subsidizing egress from AWS? Or do you have access to non-public egress pricing?
For what it's worth, there's zero chance I would do business with a company whose business plan is "we'll work it out". It gives one every reason to believe that in a couple years time you guys will either be out of business (because you didn't figure out the numbers to make a profit) or will pull the rug from under customers in the form of surprise price hikes. Obviously you have to do what you think is right, but I think that this approach is going to scare off a lot of customers for you.
(Founder) We are not charging during preview. If anything, I wanted to be transparent about our planned pricing. Our mission is to make streams a cloud storage primitive, and I worked backwards from there in terms of our costs and expected costs looking ahead once we can scale a bit - based on concrete data points about what kind of discounts can be unlocked. I realized it was premature based on the comments here, so the price for internet egress has been updated. Thank you for your feedback.
Cloud services offer giant discounts sometimes and the receiving party aren't allowed to talk about it concretely so that's probably what's happening here.
Discounts require multi year commitment for minimum (and increasing) spend.
Generally you need to be either profitable or a well funded startup to demonstrate why a vendor would trust your ability to pay (it's literally a debt on your books). How do they know you're good for it?
Plus multi cloud means less scale and less marketing incentive (can't talk about you as a x cloud customer).
I wish you the best, but would encourage you to not set your prices below your costs.
(Founder) Thank you for the advice. I hope we can offer better when the deals come into play, but for now setting our planned internet egress price to $0.08/GiB.
(Founder) That somewhat summarizes yes :) We take a different approach than WarpStream architecturally too, which allows us to offer much lower latencies. No disks in our system, either.
I like it. I see it as ostensibly a product for engineers and so when I see a name like S2 it's immediately clear that it's a product led and conceived by engineers.
I also see that on your pricing page -
"We are building the S3 experience for streaming data, and that includes pricing transparency"
Love the simple and earnest copy. One can imagine what an LLM would cook up instead, I find the brevity way preferable.
Yes we are not trying to confuse S2 with S3, we just think S3 is the best damn serverless experience out there, and we aspire to that greatness. We borrowed the structure of that name to reflect that aspiration, as have other services inspired by S3 like Cloudflare's object store R2.
We say stream because we would rather not be confused with "logs" as in application logs, but rather associate with the world of streaming data where this primitive is very relevant. We don't mean stream as in a TCP stream or live stream.
(Founder) I have definitely received that advice before :) - to not seem like a regression from S3. But as an abbreviation for Stream Store, it made sense.
When I was a student we had a Facebook group to share information, and one angry guy ranted that the correct shortening of "Mathematical Analysis" is not, in fact, "anal", as we were used to say
Including potentially in court / to lawyers? IANAL, but isn't this just inviting Amazon to claim it's deliberately leveraging their 'S3' trademark and sowing confusion in order to lift their own brand? (Correctly, and even somewhat transparently in TFA, IMO.)
I do like this. The next part I'd like someone to build on top of this is applying the stream 'events' into a point-in-time queryable representation. Basically the other part to make it a Datatomic. Probably better if it's a pattern or framework for making specific in-memory queryable data rather than a particular database. There's lots of ways this could work, like applying to a local Sqlite, or basing on a MySQL binlog that can be applied to a local query instance and rewindable to specific points, or more application-specific apply/undo events to a local state.
It looks neat but, no Java SDK? Every company I've personally worked at is deeply reliant on Spring or the vanilla clients to produce/consume to Kafka 90% of the time. This kind of precludes even a casual PoC.
(S2 Team member) As we move forward, a Java/Kotlin and a Python SDK are on our list. There is a Rust sdk and a CLI available (https://s2.dev/docs/quickstart) . Rust felt as a good starting point for us as our core service is also written in it.
Seems like really cool tech. Such a bummer that the it is not source available. I might be a minority in this opinion, but I would absolutely consider commercial services where the core tech is all released under something like a FSL with fully supported self-hosting. Otherwise, the lock-in vs something like kafka is hard to justify.
(Founder) We are happy for S2 API to have alternate implementations, we are considering an in-memory emulator to open source ourselves. It is not a very complicated API. If you would prefer to stick with the Kafka API but benefit from features like S2's storage classes or having a very large number of topics/partitions or high throughput per partition, we are planning an open source Kafka compatibility layer that can be self-hosted, with features like client-side encryption so you can have even more peace of mind.
First-class kafka compatibility could go a long way to making it a justifiable tech choice. When orgs go heavy on event streaming, that code gets _everywhere_, so a vendor off-ramp is needed.
(Founder) That makes sense. We would eventually host the Kafka layer too - and will be able to avoid a hop by inlining our edge service logic in there.
This is a very useful service model, but I'm confused about the value proposition given how every write is persisted to S3 before being acknowledged.
I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
AWS has shown their willingness to implement mostly-protocol compatible services (RDS -> Aurora), and I could see them doing the same with a Kafka reimplementation.
> I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
This is how it works essentially, yes. Architecting the system so that chunks that are written to object storage (before we acknowledge a write) are multi-tenant, and contain records from different streams, lets us write frequently while still targeting ideal (w/r/t price and performance) blob sizes for S3 standard and express puts respectively.
Wait, data from multiple tenants is stored in the same place. Do you have per-tenant encryption key, or how else are you ensuring no bugs allow tenants to read others data?
(Founder) We will be using authenticated encryption with per-basin (our term for bucket) or per-stream keys, but we don't have this yet. This is noted on https://s2.dev/docs/security#encryption
This is a very interesting abstraction (and service). I can’t help but feature creep and ask for something like Athena, which runs PrestoDB (map reduce) over S3 files. It could be superior in theory because anyone using that pattern must shoehorn their data stream (almost everything is really a stream) into an S3 file system. Fragmentation and file packing become requirements that degrade transactional qualities.
(Founder) There are definitely some interesting possibilities. Pretty hyped about S3 Table (Iceberg) buckets. S2 stream to buffer small writes so you can flush decent size Parquet into the table, and avoid compaction costs.
I wish more dev-tools startups would focus on clearly explaining the business use cases, targeting a slightly broader audience beyond highly technical users. I visited several pages on the site before eventually giving up.
I can sort of grasp what the S2 team is aiming to achieve, but it feels like I’m forced to perform unnecessary mental gymnastics to connect their platform with the specific problems it can solve for a business or product team.
I consider myself fairly technical and familiar with many of the underlying concepts, but I still couldn’t work out the practical utility without significant effort.
It’s worth noting that much of technology adoption is driven by technical product managers and similar stakeholders. However, I feel this critical audience is often overlooked in the messaging and positioning of developer tools like this.
(Founder) Appreciate the feedback. We will try to do a better job on the messaging. It is geared at being a building block for data systems. The landing page has a section talking about some of the patterns it enables (Decouple / Buffer / Journal) in a serverless manner, with example use cases. It just may not be something that resonates with you though! We are interested in adoption by developers for now.
I think they're saying that you should provide some example use-cases for how someone would use your service. High-level use-cases that involve solving problems for a business.
For what it's worth, I am already familiar with this design space well enough that I don't need this kind of example in order to understand it. I've worked with Kinesis and other streaming systems before. But for people who haven't, an example might help.
What kind of business problem would someone have that causes them to turn to your service? What are the alternative solutions they might consider and how do those compare to yours? That's the kind of info they're asking for. You might benefit from pitching this such that people will understand it who have never considered streaming solutions before and don't understand the benefits. Pitch it to people who don't even realize they need this.
(Founder) Yes I understand, and this could definitely do with work. I struggle with it personally because it is so obvious to me. I don't even know where to start? How do you pitch use cases for object storage? Stream storage feels just as universal to me.
If you ever figure it out, LMK. I don't think I've ever looked at logs more than about 24 hours old. Persistence and durability is not something I care about.
Errors, OTOH, I need a week or two of. But I consider these 2 different things. Logs are kind of a last resort when you really can't figure out what's going on in prod.
This is cool but I think it overlaps too much with something like Kinesis Data Streams from AWS which has been around for a long time. It’s good that AWS has some competition though
(Founder) We plan to be multi-cloud over time. Kinesis has pretty low ordered throughput limit (i.e. at the level of a stream shard) of 1 MBps, if you need higher. S2 will be cheaper and faster than Kinesis with the Express storage class. S2 is also a more serverless pricing model - closer to S3 - than paying for stream shard hours.
Thanks. You are right about those points. One thing to probably consider is whether serverless provides enough cost savings for most streaming ingest use cases which need static provisioning since ingest volumes are unpredictable. A better messaging would be that your serverless model can handle bursts well. (for context: used to sell KDA and KDS at AWS as part of AI solutions)
(Founder) so many possibilities! That's what I love about building building blocks. I think we will create an open source layer for an IoT protocol over time (unless community gets to it first), e.g. MQTT. I have to admit I don't know too much about the space.
(Founder) Named pipe that operates at the level of records, is durable regionally, you can read from any sequence, and lets you do concurrency control for writes if you need to.
- Unlimited streams. Current cloud systems limit to a few thousand. With dedicated clusters, few hundred K? If you want a stream per user, you are now dealing with multiple clusters.
- Elastic throughput per stream (i.e. a partition in Kafka) to 125 MiBps append / 500 MiBps realtime read / unlimited in aggregate for catching up. Current systems will have you at tens. And we may grow that limit yet. We are able to live migrate streams in milliseconds while keeping pipelined writes flowing, which gives us a lot of flexibility.
In the long-term, how different do you want to be from Apache Pulsar? At the moment, many differences are obvious, e.g., Pulsar offers transactions, queues and durable timers.
(Founder) We want S2 to be focussed on the stream primitive (log if you prefer). There is a lot that can be built on top, which we mostly want to do as open source layers. For example, Kafka compatibility, or queue semantics.
(Founder) There is a table on the landing page https://s2.dev/ which hopefully gives a nice overview :) It's like S3, but for streams. Cheap appends, and instead of dealing with blocks of data and byte ranges, you work with records. S2 takes care of ordering records, and letting you read from anywhere in the stream.
This is an alternative to systems like Kafka which don't do great at giving a serverless experience.
(Founder here) Managed cloud offerings for streaming limit ordered throughput pretty low, e.g. Kinesis at 1 MiBps, Redpanda serverless at 1 MiBps, Confluent's even higher-end clusters at 10-20 MiBps IIRC. If you really need ordering, this can indeed be a limit. S2 lets you push 125 MiBps currently, and we may grow that.
Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?
We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.
We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.
ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.
Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.
Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.
Between your site copy and your early comments on this thread, it was this rundown that made the product click in my mind.
I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.
Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!
@agallego Yes in aggregate both Confluent and Redpanda can push GiBps throughputs, and I know Redpanda has amazing perf. I was referring to Redpanda Serverless :) And per-partition i.e. ordered throughput.
ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.
A interesting difference is the ability to have exclusive access to writes on the log (the fencing token). This allows you to use the logs as write ahead logs.
Replying to this one since you apparently can't reply to a comment that has been flagged. Why was the grandparent flagged? Google's S2 library has been around for more than a decade and is the first thing I think of when I see "S2" in a tech stack.
And the flippant response from the parent here that they don't really care that they're muddying the waters and just want the crate name is irksome.
https://tsdr.uspto.gov/#caseNumber=98324800&caseSearchType=U...
It's like S3, except better because, by focusing on being a write-only data store, they can manage much more throughput and efficiency, plus your data is far more secure at rest than it is in S3.
My company is a Fivetran client, and they named that company after a (bad) joke, but it's worth a fortune.
[1] https://news.ycombinator.com/item?id=42434450
Having to rebrand your product after launching is a lot more painful than doing it before launching.
That’s the kind of David vs. Goliath publicity one could only dream of …
Most people would simply say "Amazon is right." Because Amazon is right. This is an intentional attempt to leverage their product branding to promote a new product. There is very little good here.
If this were open-source, academic, non-profit, or something like that, perhaps. A small venture trying to commercialize on some digital equivalent of Amazon's trade dress? I can't imagine anyone would care....
Even those times when someone is 100% right, usually, there is zero publicity. Right or wrong, most times I've seen, the small guy would settle with the big guy with the deep legal pockets and move on because litigating is too expensive.
In a situation like this one, your marketing spend / press coverage on the existing name is shot, links to your domain are shot, and perhaps you have an egg on your face, depending on how things play out.
If this had been released instead as a Papertrail-like end-user product with dashboards, etc. instead of a "cloud primitive" API so closely tied to AWS, it would make a lot more sense. Add the ability to bring my own S3-Compatible backend (such as Digital Ocean Spaces), and boom, you have a fantastic, durable, cloud-agnostic product.
If anything, they normlise an expectation with a budget aware base.
Plus multi cloud means less scale and less marketing incentive (can't talk about you as a x cloud customer).
I wish you the best, but would encourage you to not set your prices below your costs.
An S3-level primitive API for streaming seems really valuable in the long-term if adopted
I also see that on your pricing page -
"We are building the S3 experience for streaming data, and that includes pricing transparency"
Love the simple and earnest copy. One can imagine what an LLM would cook up instead, I find the brevity way preferable.
Yes we are not trying to confuse S2 with S3, we just think S3 is the best damn serverless experience out there, and we aspire to that greatness. We borrowed the structure of that name to reflect that aspiration, as have other services inspired by S3 like Cloudflare's object store R2.
When we say stream, we really mean The Log that Jay Kreps has a famous blog about https://engineering.linkedin.com/distributed-systems/log-wha...
We say stream because we would rather not be confused with "logs" as in application logs, but rather associate with the world of streaming data where this primitive is very relevant. We don't mean stream as in a TCP stream or live stream.
You can, however stream Star Wars on S2 ;-) https://s2.dev/docs/quickstart#get-started-with-the-cli
If I could put in one request...a video which describes what it is and how to use it would make it easier for me to understand.
In the meantime, checkout this quickstart which will have to streaming Star Wars with the S2 CLI and give you a pretty good sense of things https://s2.dev/docs/quickstart#get-started-with-the-cli
(You will have to apply to join the preview, but we are approving quickly)
https://github.com/google/s2geometry
https://aws.amazon.com/s3/faqs/
"Simple storage service" is used once. "S3" is used throughout.
I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
AWS has shown their willingness to implement mostly-protocol compatible services (RDS -> Aurora), and I could see them doing the same with a Kafka reimplementation.
> I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
This is how it works essentially, yes. Architecting the system so that chunks that are written to object storage (before we acknowledge a write) are multi-tenant, and contain records from different streams, lets us write frequently while still targeting ideal (w/r/t price and performance) blob sizes for S3 standard and express puts respectively.
Seems like there are a lot of more lite weight self-hosted s3 around now days. Why even use S3?
I can sort of grasp what the S2 team is aiming to achieve, but it feels like I’m forced to perform unnecessary mental gymnastics to connect their platform with the specific problems it can solve for a business or product team.
I consider myself fairly technical and familiar with many of the underlying concepts, but I still couldn’t work out the practical utility without significant effort.
It’s worth noting that much of technology adoption is driven by technical product managers and similar stakeholders. However, I feel this critical audience is often overlooked in the messaging and positioning of developer tools like this.
For what it's worth, I am already familiar with this design space well enough that I don't need this kind of example in order to understand it. I've worked with Kinesis and other streaming systems before. But for people who haven't, an example might help.
What kind of business problem would someone have that causes them to turn to your service? What are the alternative solutions they might consider and how do those compare to yours? That's the kind of info they're asking for. You might benefit from pitching this such that people will understand it who have never considered streaming solutions before and don't understand the benefits. Pitch it to people who don't even realize they need this.
Errors, OTOH, I need a week or two of. But I consider these 2 different things. Logs are kind of a last resort when you really can't figure out what's going on in prod.
https://www.sunlu.com/products/new-version-sunlu-filadryer-s...
- Unlimited streams. Current cloud systems limit to a few thousand. With dedicated clusters, few hundred K? If you want a stream per user, you are now dealing with multiple clusters.
- Elastic throughput per stream (i.e. a partition in Kafka) to 125 MiBps append / 500 MiBps realtime read / unlimited in aggregate for catching up. Current systems will have you at tens. And we may grow that limit yet. We are able to live migrate streams in milliseconds while keeping pipelined writes flowing, which gives us a lot of flexibility.
- Concurrency control mechanisms (https://s2.dev/docs/stream#concurrency-control)
Wow man are you stil stuck on S3?
o1, o3, s2, M4, r2, ...
This is an alternative to systems like Kafka which don't do great at giving a serverless experience.
Or more generally, when is it better to choose S2 vs services like SQS or Kinesis?
S2 sounds like an ordered queue to me, but those exist?
Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?
We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.
We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.
ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.
Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.
Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.
I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.
Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!
ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.
https://chatgpt.com/c/676703d4-7bc8-8003-9e5d-d6a402050439
Edit: Keep downvoting, only 5.6k to go!
And the flippant response from the parent here that they don't really care that they're muddying the waters and just want the crate name is irksome.