How/why to sweep async tasks under a Postgres table

(taylor.town)

33 points | by ostler 4 hours ago

7 comments

  • koolba 3 hours ago
    The article says:

    > Never Handroll Your Own Two-Phase Commit

    And then buried at the end:

    > A few notable features of this snippet:

    > Limiting number of retries makes the code more complicated, but definitely worth it for user-facing side-effects like emails.

    This isn't two-phase commit. This is lock the DB indefinitely while remote system is processing and pray we don't crash saving the transaction after it completes. That locked also eats up a database connection so your concurrency is limited by the size of your DB pool.

    More importantly, if the email sends but the transaction to update the task status fails, it will try again. And again. Forever. If you're going to track retries it would have to be before you start the attempt. Otherwise the "update the attempts count" logic itself could fail and lead to more retries.

    The real answer to all this is to use a provider that supports idempotency keys. Then when you can retry the action repeatedly without it actually happening again. My favorite article on this subject: https://brandur.org/idempotency-keys

    • tracker1 39 minutes ago
      For similar systems I've worked on, I'll use a process id, try/tries and time started as part of the process plucking an item of a db queue table... this way I can have something that resets anything started over N minutes prior that didn't finish, for whatever reason (handling abandoned, broken tasks that are in an unknown state.

      One reason to do this for emails, IE a database queue is to keep a log/history of all sent emails, as well as a render for "view in browser" links in the email itself. Not to mention those rare instances where an email platform goes down and everything starts blowing up.

    • maxmcd 2 hours ago
      Just that row should be locked since it's: "for update skip locked".

      I agree the concurrency limitation is kind of rough, but it's kind of elegant because you don't have to implement some kind of timeout/retry thing. You're certainly still exposed to the possibility of double-sending, so yes, probably much nicer to update the row to "processing" and re-process those rows on a timeout.

    • surprisetalk 1 hour ago
      Author here! Posting from phone while traveling so sorry for bad formatting.

      It was outside of the scope of this essay, but a lot of these problems can be resolved with a mid-transaction COMMIT and reasonable timeouts

      You can implement a lean idempotency system within the task pattern like this, but it really depends on what you need and what failures you want to prevent

      Thanks for providing more context and safety tips! :)

    • morshu9001 2 hours ago
      Idempotency is key. Stripe is good about that.
  • rictic 3 hours ago
    Missing from the article: how to communicate progress and failure to the user?

    This is much more complicated with task queues. Doable still! But often skipped, because it's tempting to imagine that the backend will just handle the failure by retrying. But there are lots of kinds of failure that can happen.

    The recipient's server doesn't accept the email. The recipient's domain name expired. Actually, we don't have an email address for that recipient at all.

    The user has seen "got it, will do, don't worry about it" but if that email is time sensitive, they might want to know that it hasn't been sent yet, and maybe they should place a phone call instead.

  • nullzzz 1 hour ago
    I can recommend this architecture. So much easier to maintain and understand than using an extra service. The implementation here I didn’t go into much detail, but you can surely roll your own if this doesn’t cut it for you, or use a library like pgboss.
  • efxhoy 1 hour ago
    I like it! We have a service with a similar postgres task queue but we use an insert trigger on the tasks table that does NOTIFY and the worker runs LISTEN, it feels a bit tidier than polling IMO.
    • surprisetalk 1 hour ago
      LISTEN/NOTIFY works great but they don’t have any mechanism for ACKs or retries so it’s got some tradeoffs to consider. Works great when you’re willing to sacrifice some durability!
  • stack_framer 3 hours ago
    > I like slim and stupid servers, where each endpoint wraps a very dumb DB query.

    I thought I was alone in this, but I build all my personal projects this way! I wish I could use this approach at work, but too many colleagues crave "engineering."

    • stronglikedan 2 hours ago
      Doesn't that make for exponentially more requests to get the same data, or possibly more data than you really need (bloated responses)?
  • rgbrgb 2 hours ago
    If you're in TS/JS land, I like to use an open source version of this called graphile-worker [0].

    [0]: https://worker.graphile.org

    • damidekronik 1 hour ago
      I am using pgboss myself, very decent, very simple. Had some issues with graphile back in the days, cant remember what exaclty, it probably did already overcome whatever I was struggling with!
  • morshu9001 2 hours ago
    Never do RPCs during an xact like this! Fastest way to lock up your DB. I don't even mean at large scale. I've been forced many times to set up two-phase commit. That way you also get more flexibility and visibility into what it's doing.