Very cool work!! This is the same pattern we used at $MY_STARTUP to develop $MY_HARNESS which persists the entire graph to disk, unlike all the other agent harnesses which only store the graph nodes and edges.
Event graphs aren’t just the agentic foundation for $MY_HARNESS —- they’re the working substrate.
(Looking for lead investors BTW! DM me if interested)
Very cool. I settled on the same/similar design in my agent harness.
All relevant events that affect the context window are stored in an event log. Forking agents and sessions is simply setting a pointer to the sequence number of another event log.
most coding harnesses store the exact messages that have been sent to the llm and the messages items that have come back from the llm, not much else. Then, there is also a set of events/commands/etc that are in-memory only. Together they constitute the current state of the agent loop.
In Lightspeed, we store all of them as events, thus can always reconstruct the exact state of the loop (e.g. state of open tool calls, compaction decisions in-flight, etc). This makes it possible to run the agent in a durable workflow engine easily.
It's more like the log is the only user/agent accepted consensus. It has to be the grounding base. Although extending it into an agentic system architecture becomes something not necessarily effective in practice.
With my database hat on, in the context of agentic systems I would argue that write-ahead logs form a good (and potentially transactional) interface between speculative agent work and durable world mutations [0].
That said, there are a _lot_ of "logs for agents" papers that I've read (and unfortunately gotten assigned to review) which are basically "we asked claude to hack on a graph DB and generate a paper".
We should probably only interact with the agent by writing to the log, which it executes from, and the agent should probably only interact with the external environment by writing and executing code. That fixes a lot of issues with non-determinism.
> In this arrangement the log is a byproduct: an audit artifact written alongside the real computation,
never the substrate of it.
I’ve come to the same conclusion building my own agents. It simply feels ‘wrong’ that most frameworks will happily mutate your context. You have to explicitly go out of your way to store the original events. I’ve now started storing an event log for my own agents, this is used as the source of truth for deriving all subsequent context.
The great thing about this is that I have finer control over drift in long runs, as I can look back through the conversation/tool history and build context suitable for the current state of the agent. It also allows me to run compactions across the entire event history instead of ‘compactions on top of compactions’ which happens on long runs with checkpoints.
It definitely feels like this will be a bigger issue going forward as we have agents running longer and more complex workflows, I’ve started building a product aimed at addressing this issue in a framework agnostic way. [0]
Why not save progress and important results of a conversation (i.e. including tool calls and such) to a project markdown (even multiple as needed) and clear your context window completely rather than compacting many times? You can then just specify a markdown file to be included as context. Especially if following any kind of plan document and executing on a part of it.
As others have commented, this is an obvious application of event sourcing. It's irritating to see the claim of "deterministic replay" in the abstract along with the caveat "we can't actually do deterministic replay, so we store all of the model's responses and reproject off of that". Sure, ok, whatever. You're doing session recording and calling it replay.
Event graphs aren’t just the agentic foundation for $MY_HARNESS —- they’re the working substrate.
(Looking for lead investors BTW! DM me if interested)
All relevant events that affect the context window are stored in an event log. Forking agents and sessions is simply setting a pointer to the sequence number of another event log.
So if you want to check an implementation of this pattern see: https://github.com/smartcomputer-ai/lightspeed
In Lightspeed, we store all of them as events, thus can always reconstruct the exact state of the loop (e.g. state of open tool calls, compaction decisions in-flight, etc). This makes it possible to run the agent in a durable workflow engine easily.
Chatbot is the command line
Agent is the bash script
___ is the GUI (macOS/Windows/GTA 6)
You need Xerox PARC all over again and we have one
The paper’s pip library can be tried here
It's more like the log is the only user/agent accepted consensus. It has to be the grounding base. Although extending it into an agentic system architecture becomes something not necessarily effective in practice.
That said, there are a _lot_ of "logs for agents" papers that I've read (and unfortunately gotten assigned to review) which are basically "we asked claude to hack on a graph DB and generate a paper".
[0] https://onewill.ai/blog/2026/stealing-50-years-of-database-i...
I’ve come to the same conclusion building my own agents. It simply feels ‘wrong’ that most frameworks will happily mutate your context. You have to explicitly go out of your way to store the original events. I’ve now started storing an event log for my own agents, this is used as the source of truth for deriving all subsequent context.
The great thing about this is that I have finer control over drift in long runs, as I can look back through the conversation/tool history and build context suitable for the current state of the agent. It also allows me to run compactions across the entire event history instead of ‘compactions on top of compactions’ which happens on long runs with checkpoints.
It definitely feels like this will be a bigger issue going forward as we have agents running longer and more complex workflows, I’ve started building a product aimed at addressing this issue in a framework agnostic way. [0]
[0]: https://statefabric.dev
but wouldn't feeding that log for each request/response iteration must get expensive really fast no?
also "We discuss--without claiming to demonstrate--" wtf? someone had a showerthought and slopped this out in 10mins to see what others thought?
The window on back-of-napkin-idea acquihires is closing fast. ;-)