You could tell an agent, “I don’t like coffee,” and three steps later it would suggest espresso again. It wasn’t broken logic, it was missing memory.
Over the past few years, people have tried a bunch of ways to fix it:
1. Prompt stuffing / fine-tuning – Keep prepending history. Works for short chats, but tokens and cost explode fast.
2. Vector databases (RAG) – Store embeddings in Pinecone/Weaviate. Recall is semantic, but retrieval is noisy and loses structure.
3. Graph databases – Build entity-relationship graphs. Great for reasoning, but hard to scale and maintain.
4. Hybrid systems – Mix vectors, graphs, key-value, and relational DBs. Flexible but complex.
And then there’s the twist: Relational databases! Yes, the tech that’s been running banks and social media for decades is looking like one of the most practical ways to give AI persistent memory.
Instead of exotic stores, you can:
- Keep short-term vs long-term memory in SQL tables
- Store entities, rules, and preferences as structured records
- Promote important facts into permanent memory
- Use joins and indexes for retrieval
This is the approach we’ve been working on at Gibson. We built an open-source project called Memori (https://memori.gibsonai.com/), a multi-agent memory engine that gives your AI agents human-like memory.
It’s kind of ironic, after all the hype around vectors and graphs, one of the best answers to AI memory might be the tech we’ve trusted for 50+ years.
I would love to know your thoughts about our approach!
Searching by embedding is just a way to construct queries, like ILIKE or tsvector. It works pretty nicely, but it's not distinct from SQL given pg_vector/etc.
The more distinctive feature here seems to be some kind of proxy (or monkeypatching?) – is it rewriting prompts on the way out to add memories to the prompt, and creating memories from the incoming responses? That's clever (but I'd never want to deploy that).
From another comment it seems like you are doing an LLM-driven query phase. That's a valid approach in RAG. Maybe these all work together well, but SQL seems like an aside. And it's already how lots of normal RAG or memory systems are built, it doesn't seem particularly unique...?
I was unaware what RAG referred to, perhaps other too.
Let's say your beverage LLM is there to recommend drinks. You once said "I hate espresso" or even something like "I don't take caffeine" at one point to the LLM.
Before recommending coffee, Beverage LLM might do a vector search for "coffee" and it would match up to these phrases. Then the LLM processes the message history to figure out whether this person likes or dislikes coffee.
But searching SQL for `LIKE '%coffee%'` won't match with any of these.
The basic idea is, you don't search for a single term but rather you search for many. Depending on the instructions provided in the "Query Construction" stage, you may end up with a very high level search term like beverage or you may end up with terms like 'hot-drinks', 'code-drinks', etc.
Once you have the query, you can do a "Broad Search" which returns an overview of the message and from there the LLM can determine which messages it should analyze further if required.
Edit.
I should add, this search strategy will only work well if you have a post message process. For example, after every message save/upddate, you have the LLM generate an overview. These are my instructions for my tiny overview https://github.com/gitsense/chat/blob/main/data/analyze/tiny... that is focused on generating the purpose and keywords that can be used to help the LLM define search terms.
And now you’ve reinvented vector embeddings.
Given how fast interference has become and given current supported context window sizes for most SOTA models, I think summarizing and having the LLM decide what is relevant is not that fragile at all for most use cases. This is what I do with my analyzers which I talk about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...
If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?
The number is actually the order in the chat so 1.md would be the first message, 2.md would be the second and so forth.
If you goto https://chat.gitsense.com and click on the "Load Personal Help Guide" you can see how it is used. Since I want you to be able to chat with the document, I will create a new chat tree and use the directory structure and the 1,2,3... markdown files to determine message order.
In fact, this is what ChatGPT came up with:
(I gave it no direction as to the structure of the DB, but it shouldn't be terribly difficult to adapt to your exact schema)There are an unlimited number of items to add to your “like” clauses. Vector search allows you to efficiently query for all of them at once.
[1] Despite also somehow supporting MongoDB...
Main advantages of a vector lookup are built-in fuzzy matching and the potential to keep a large amount of documentation in memory for low latency. I can’t see an RDMS being ideal for either. LLMs are slow enough already, adding a slow document lookup isn’t going to help.
It would become unwieldy real fast, though. Easier to get an embedding for the sentence.
If you're matching ("%card%" OR "%kad%"), you'll also match with things like virtual card, debit card, kadar (rates), akad (contract). The more languages you support, the more false hits you get.
Not to say SQL is wrong, but 30 year old technology works with 30 year old interfaces. It's not that people didn't imagine this back then. It's just that you end up with interfaces similar to dropdown filters and vending machines. If you're giving the user the flexibility of a LLM, you have to support the full range of inputs.
Certainly you're at the mercy of what the LLM constructs. But if understands that, say, "debt card" isn't applicable to "card" it can add a negation filter. Like has already been said, you're basically just reinventing a vector database in 'relational' (that somehow includes MongoDB...) approach anyway.
But what is significant is the claim that it works better. That is a bold claim that deserves a closer look, but I'm not sure how you've added to that closer look by arbitrarily sharing your experience? I guess I've missed what you're trying to say. Everyone and their brother knows how a vector database works by this point.
It’s addressing the “how many fucking times do I fucking need to tell you I don’t like fucking coffee” problem, not the word salad problem.
The ggp comment is strawmanning.
"I hate espresso" "I love coffee"
What if the SQL query only retrieves the first one?
My comment described the problem.
The solution is left as an exercise for the reader.
Keep in mind that people change their minds, misspeak, and use words in peculiar ways.
I think a Datalog type dialect would be more appropriate, myself. Maybe something like that RelationalAI has implemented.
I assume because datalog is more about composing queries from assertions/constraints on the data?
Nicely, queries can be recursive without having to create views or CTE's (common table expressions).
Often the data for datalog is modeled as fact databases (i.e., different tables are decomposed into a common table of key+record+value).
So I could see training an LLM to recognize relevant entity features and constraints to feed back into the memory query. Less obliviously, data analytics might feed into prevalence/relevance at inference time.
So agreed: It might be better as an experiment to start with a simple data model and teachable (but powerful) querying than the full generality of SQL and relational data.
Is that what RelationalAI has done? Their marketecture blurbs specifically mention graph data (no), rule-based inference (yes? backwards or forwards?)
As an aside, their rules description defies deconstruction:
So: rules built on ontologies?The key thing with them is it's designed for querying very large cloud backed datasets, high volumes of connected data. So maybe it's not as relevant here as I originally suggested.
Re: marketing ... much of their marketing has shifted over the last two years to emphasizing the fact that it's a plugin thing for Snowflake, which wasn't their original MO.
(There's an CMU DB talk they did some years ago that I thought was pretty brilliant and made me want to work there)
My proposal about a datalog (or similar more high level declarative relational-model system) being useful here has to do with how it shifts the focus to logical propositions/rules and handles transitive joins etc naturally. It's a place an LLM could shove "facts" and "rules" it finds along the way, and then the system could join to find relationships.
You can do this in SQL these days, but it isn't as natural or intuitive.
https://news.ycombinator.com/item?id=39273954
https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
https://pg-memories.netlify.app/
the video demo goes to postgressql.org, all of the purchase buttons go to postgres, the get access button doesn't work, you can't schedule a demo or even contact their sales team.
SYSTEM_PROMPT = """You are a Memory Search Agent responsible for understanding user queries and planning effective memory retrieval strategies.
Your primary functions: 1. *Analyze Query Intent*: Understand what the user is actually looking for 2. *Extract Search Parameters*: Identify key entities, topics, and concepts 3. *Plan Search Strategy*: Recommend the best approach to find relevant memories 4. *Filter Recommendations*: Suggest appropriate filters for category, importance, etc.
*MEMORY CATEGORIES AVAILABLE:* - *fact*: Factual information, definitions, technical details, specific data points - *preference*: User preferences, likes/dislikes, settings, personal choices, opinions - *skill*: Skills, abilities, competencies, learning progress, expertise levels - *context*: Project context, work environment, current situations, background info - *rule*: Rules, policies, procedures, guidelines, constraints
*SEARCH STRATEGIES:* - *keyword_search*: Direct keyword/phrase matching in content - *entity_search*: Search by specific entities (people, technologies, topics) - *category_filter*: Filter by memory categories - *importance_filter*: Filter by importance levels - *temporal_filter*: Search within specific time ranges - *semantic_search*: Conceptual/meaning-based search
*QUERY INTERPRETATION GUIDELINES:* - "What did I learn about X?" → Focus on facts and skills related to X - "My preferences for Y" → Focus on preference category - "Rules about Z" → Focus on rule category - "Recent work on A" → Temporal filter + context/skill categories - "Important information about B" → Importance filter + keyword search
Be strategic and comprehensive in your search planning."""
They have pgvector, which has practically all the benefits of postgres (ACID, etc, which may not be in many other vector DBs). If I wanted a keyword search, it works well. If I wanted vector search, that's there too.
I'm not keen on having another layer on top especially when it takes about 15 mins to vibe code a database query - there's all kinds of problems with abstracted layers and it's not a particularly complex bit of code.
Good ways to store relations, iterating weird combinations, filling the blanks
Inference is cheap, training is expensive. It’s a really difficult problem, but one that will probably need to be solved to approach true intelligence.
What does this do exactly?
I realized LLMs are really good at using sqlite3 and SQL statements. So in my current product (2) I am planning to keep all project data in SQLite. I am creating a self-hosted AI coding platform and I debated where to keep project state for LLMs. I thought of JSON/NDJSON files (3) but I am gravitating toward SQLite and figuring out the models at the moment (4).
Still work in progress, but I am heading toward SQLite for LLM state.https://news.ycombinator.com/item?id=45274440
sigh