Very interesting. The state management is the really insightful find here.
I always wondered how these large AI companies managed access for millions of simultaneous users without having to allocate a dedicated LLM instance for each user. Pushing the complete state down to the user after every call makes perfect sense. The LLM itself stays memoryless and ready to respond to an arbitrary prompt. Very nice.
N.B. This is exactly how seaside, vba, and even arc[1] do server-side state generally: by encrypting the blob-representing-state and sending to the client to be sent back on future requests (where it will be decrypted and rehydrated).
It's an old trick that everyone designing protocols should know, since there are lots of applications beyond AI companies.
I always wondered how these large AI companies managed access for millions of simultaneous users without having to allocate a dedicated LLM instance for each user. Pushing the complete state down to the user after every call makes perfect sense. The LLM itself stays memoryless and ready to respond to an arbitrary prompt. Very nice.
It's an old trick that everyone designing protocols should know, since there are lots of applications beyond AI companies.
[1]: As in, pg's lisp: https://arclanguage.github.io/ref/srv.html#:~:text=The%20pre...