Show HN: DBOS transact – Ultra-lightweight durable execution in Python

(github.com)

89 points | by jedberg 408 days ago

13 comments

jedberg 408 days ago
Hey all, I'm excited to be the new CEO of DBOS! I'm coming up on my one month anniversary. I joined because I truly believe DBOS is solving a lot of the main issues with serverless deployments. I still believe that Serverless is the way of the future for most applications and I'm excited to make it a reality.
Ask me anything!
bb01100100 408 days ago
Would it be correct to say the these client libraries provide the functionality (eg ease of transactions, once only, recovery) whereas your cloud offering solves the scaling / performance issues you’d hit trying to do this with a regular pg compatible DB?
I do a lot of consulting on Kafka-related architectures and really like the concept of DBOS.
Customers tend to hit a wall of complexity when they want to actually use their streaming data (as distinct from simply piping it into a DWH).. being able to delegate a lot of that complexity to the lower layers is very appealing.
Would DBOS align with / complement these types of Kafka streaming pipelines or are you addressing a different need?
[-]
- KraftyOne 408 days ago
  Yeah exactly! The Kafka use case is a great one--specifically writing consumers that perform real-world processing on events from Kafka.
  In fact, one of our first customers used DBOS to build an event processing pipeline from Kafka. They hit the "wall of complexity" you described trying to persist events from Kafka to multiple backend data stores and services. DBOS made it much simpler because they could just write (and serverlessly deploy) durable workflows that ran exactly-once per Kafka message.
rtcoms 407 days ago
Recently I came to know about https://www.membrane.io/, which also follows similar approach, but it looks like that is more for internal apps and small projects.
How would you compare DBOS with that ?
[-]
- jedberg 407 days ago
  From a high level what we offer is similar -- durable and reliable compute.
  There isn't a lot of public information about how they are built, but from what I can tell you're right -- their architecture is more oriented for small projects.
  It looks like they store the entire JS heap in a SQLLite database. We store schematized state checkpoints in Postgres compatible database, which makes it so that we can scale up and allow interesting things like querying the previous states and time travel debugging, where you can actually step though previously run workflows.
ashwindharne 408 days ago
I've been using Temporal recently for some long-running multi-step AI workflows -- helps me get around API flakiness, manage rate limits for hosted models, and manage load on local models. It's pretty cool to write workers in different languages and run them on different infra and have them all orchestrate together nicely. How does DBOS compare -- what are the core differences?
From what I can tell, the programming model seems to be pretty similar but DBOS doesn't require a centralized workflow server, just serverless functions?
[-]
- KraftyOne 408 days ago
  Co-founder here:
  Great question! Yeah, the biggest difference is that DBOS doesn't require a centralized workflow server, but does all orchestration directly in your functions (through the decorators), storing your program's execution state in Postgres. Implications are:
  1. Performance. A state transition in DBOS requires only a database write (~1 ms) whereas in Temporal it requires a roundtrip and dispatch from the workflow servers (tens of ms -- https://community.temporal.io/t/low-latency-stateless-workfl...). 2. Simplicity. All you need to run DBOS is Postgres. You can run locally or serverlessly deploy your app to our hosted cloud offering.
sim7c00 408 days ago
it might be interesting to look at some standard for workflows like CACAO to express what a workflow is. that way, workflows can ultimately become shareable between such workflow execution engines, and have common workflow editors. its (in cyber) a big problem that workflows cannot be shared between different systems which adds great costs to implementing such a system (need to redesign or design all workflows from the ground up). I think workflows and easy editors to assemble and connect steps are a good step ahead in any automation domain, but everywhere people want to reinvent the wheel of expressing what a workflow is.
definitely a fan of what these types of systems can do in replay/recovering and retying steps etc. as well as centralizing a lot of didferent workloads to a common execution engine.
qianli_cs 408 days ago
Hello! I’m here to answer any questions. I’d love to hear your feedback, comments, and anything!
evantbyrne 408 days ago
Looks like an interesting abstraction. I can see the usefulness because I had to create a poor man's version of this when I built a CD. Sorry, because I don't have time to watch the 50 minute video, but how are you guaranteeing durability? Are you basically opening Postgres transactions for each step, or is there something else going on to persist state?
[-]
- qianli_cs 408 days ago
  Yeah! Under the hood, DBOS wraps each function (step) to log its output in the database. This ensures that workflows can be safely re-executed if they're interrupted, guaranteeing durability.
  More info here: https://docs.dbos.dev/explanations/how-workflows-work
- hmaxdml 406 days ago
  Can you give us some more details about the CD pipeline you built? :)
quickvi 408 days ago
Is it possible not to use a PostgreSQL database? For example would it run with SQLite? The goal is to improve developer experience.
[-]
- sitkack 408 days ago
  You can now run Wasm builds of PostgreSQL that will get you everything you like about SQLite.
  https://github.com/electric-sql/pglite
- qianli_cs 408 days ago
  Co-founder here! No current plans to support SQLite. We picked Postgres because of its huge ecosystem--you can use DBOS with any PostgreSQL-compatible database (Supabase, Neon, Aurora, Cockroach...) and with any Postgres extension (here's an example app using pgvector: https://github.com/dbos-inc/dbos-demo-apps/tree/main/python/...)
  [-]
  - threecheese 408 days ago
    (After a quick look at the code) Is this due to concurrency (writes)? It looks like this architecture supports multiple executors, and I would imagine you require transactional guards to ensure consistency. I really like this interface btw, the complexity is hidden very well and from reading your docs it remains accessible if you need to dig deeper than a decorator.
    And how the heck are you maintaining Typescript and Python copies? lol
    [-]
    - qianli_cs 408 days ago
      Thanks for your kind words! We're focusing on Postgres because an important scenario for durable execution is serverless computing, which won't work with an embedded database.
      [-]
      - threecheese 404 days ago
        I am sure you are aware of this, but if not: there are some emerging technologies around embedded database scale-out using CRDT and other replication protocols that would support various “serverless” (as in decentralized) topologies. PGLite, sqlite-cr, libSQL et al. I am informally looking at for serverless executor agents that do not need to coalesce around a central database instance (“server-full”). I am sure you tested something like this, I would guess that classic/CDC replication lag would throw a big wrench into an attempt to orchestrate disconnected remote executors, I am hoping that in a peer to peer topology this new tech will have low enough sync latencies to be useful. Best of luck with DBOS! You have an amazing team.
snicker7 408 days ago
How does the DX compare against AWS step functions? My experience is that it is very difficult to “unit test” step workflows.
[-]
- hmaxdml 408 days ago
  Step functions are an "external orchestrator". With DBOS your orchestration runs in the same process, so you can use normal testing frameworks like pytest for unit tests. Its super easy to test locally.
- jedberg 408 days ago
  We have a time travel debugger that makes it super easy to test workflows. You could set them up in test and then time travel them, or even time travel completed workflows in production.
  https://docs.dbos.dev/cloud-tutorials/timetravel-debugging
darkteflon 407 days ago
This looks quite cool. If anyone from DBOS is still around: does this handle more complex dependency relations between workflow steps (e.g. directed graphs), or is it only suitable for linear workflows?
[-]
- chuck_dbos 407 days ago
  I'd recommend using child workflows for directed graphs. The building blocks are start_workflow to split off a child, and then you can wait for the result at a later time. Workflows can also send events / messages to communicate back and forth with each other.
  One neat thing about starting a child workflow is you can assign an idemopotency ID, which might be intentionally calculated in a way such that multiple parents will only start one run of the child workflow.
hobs 408 days ago
Very funny to me how much everyone went away from the db to achieve idempotent behavior, and now we're back to just using a db as a complicated queue with state.
[-]
- snicker7 408 days ago
  It turns out that DBs were invented to solve hard problems with state management. When people moved away from DBs (at least transactional relational DBs) they had to rediscover all the old problems. Tech is cyclical.
  [-]
  - hmaxdml 408 days ago
    One of the motivation for DBOS is that OSes were designed with orders of magnitude less state to manage than today. (e.g. linux >30 years ago). What's made to manage tons of state? A DBMS! :)
  - catzapd 405 days ago
    Recovering the application from failures especially when updating multiple data sources, once and only once execution and such things are in the application domain. They have never been done by relational databases. That is the problem solved by the Python SDK of DBOS ( and typescript SDK)
    [-]
    - jedberg 404 days ago
      It's sort of a combination of both. The library solves those problems by storing specific data in the database and then taking advantage of the database's ACID properties and transactions to make the guarantees.
      Then the DBOS cloud platform optimizes those interactions between the database and code so that you get a superior experience to running locally.
rtcoms 407 days ago
Would I be able to use all the python and npm packages with it. Would something opening a headless browser to scrap data work with DBOS ?
[-]
- hmaxdml 407 days ago
  Yes, its normal Python/node.js so you an use all their packages.
  We know of users running puppeeter to scrap data.
mharig 407 days ago
[dead]