13 comments

  • Eiganval 1 hour ago
    Looking ahead a bit, how do you see the key ownership / trust model evolving as systems scale?

    Right now it seems very reasonable for the human-in-the-loop to be the signing authority, which makes the cryptographic certificates more about binding human authorization to agent actions than proving agent correctness.

    As agents become more autonomous or higher-throughput, do you imagine humans delegating scoped signing authority to sub-agents? time or capability-limited keys? multi-sig / quorum models where humans only intervene on boundary cases?

    Curious how you’re thinking about preserving accountability and auditability as the human loop inevitably gets thinner.

  • giancarlostoro 2 days ago
    I have been working on a Beads alternative because of two reasons:

    1) I didnt like that Beads was married to git via git hooks, and this exact problem.

    2) Claude would just close tasks without any validation steps.

    So I made my own that uses SQLite and introduced what I call gates. Every task must have a gate, gates can be reused, task <-> gate relationships are unique so a previous passed gate isnt passed if you reuse it for a new task.

    I havent seen it bypass the gates yet, usually tells me it cant close a ticket.

    A gate in my design is anything. It can be as simple as having the agent build the project, or run unit tests, or even ask a human to test.

    Seems to me like everyones building tooling to make coding agents more effective and efficient.

    I do wonder if we need a complete spec for coding agents thats generic, and maybe includes this too. Anthropic seems to my knowledge to be the only ones who publicly publish specs for coding agents.

    • alexgarden 2 days ago
      Great minds... I built my own memory harness, called "Argonaut," to move beyond what I thought were Beads' limitations, too. (shoutout to Yegge, tho - rad work)

      Regarding your point on standards... that's exactly why I built AAP and AIP. They're extensions to Google's A2A protocol that are extremely easy to deploy (protocol, hosted, self-hosted).

      It seemed to me that building this for my own agents was only solving a small part of the big problem. I need observability, transparency, and trust for my own teams, but even more, I need runtime contract negotiation and pre-flight alignment understanding so my teams can work with other teams (1p and 3p).

      • giancarlostoro 2 days ago
        Awesome, yeah, I wanted to check out your link but corporate firewall blocks "new domains" unfortunately. I'll wait till I'm home. I'll definitely be reading it when I get home later.
        • alexgarden 1 day ago
          Ha! That's a first-world problem. Check out github.com/mnemom/docs which you'll be able to access at work if you just can't wait. docs.mnemom.ai is way easier to use.
  • shawnreilly 1 day ago
    I would recommend to keep working on this. I'm interested in this space, and also contributing. Are you looking for collaborators? I think if you continue to iterate on this, there will be value, because these problems do need to be solved.

    I would also recommend to create Standards for the new Protocols you are developing. Protocols need standards, so that others can do their own implementations of the protocol. If you have a Standard, someone else could be building in a completely different language (like rust or go), and not use any SDK you provide, but still be interoperable with your AAP and AIP implementation for smoltbot. (because both support the Standards of the AAP and AIP Protocols).

    I also want to note, you cannot trust that the LLM Model will do what your instructions say. The moment they fall victim to a prompt injection or confused deputy attack, all bets are off the table. These are the same as soft instruction sets, which are more like advice or guidance, not a control or gate. To be able to provide true controls and gates, they must be external, authoratative, and enforced below the decision layer.

    • alexgarden 18 hours ago
      Hey! I launched AAP and AIP via Apache specifically because I want independent implementations built on top of them. I have a pretty killer roadmap of new features for both protocols coming out that will keep them on the bleeding edge. Love to see what you come up with.

      On standards, I totally agree. There are those who will disagree, but my view is that we are rocketing towards a post-internet agent-to-agent world where strong and reliable (and efficient) trust contracts will be the backbone of all this great new functionality. Without that, it's the wild west. AAP and AIP are extensions of Google's A2A protocol. FWIW, I have submitted papers to NIST, the EU AI Act's section 50, written alignment cards for the WEF standards proposals, and have an AAIF proposal ready as well. Need to find the time to get on their calendar and present. That was the whole point of the hosted gateway approach. Trying to reduce the friction of using this to one line of code.

      On the point of not trusting the LLM, you're preaching to the choir. My "helpful" agents routinely light my shit on fire. AIP is not a soft instruction set. It's external to the agent. checkIntegrity() is code, not a prompt. The way I implemented it with smoltbot is a thinking-block injection that nudges the agent back on track. That's all, live on our website using our AI journalist as dogfood.

      On the last part, who watches the watchman, I'm going to append to my initial post. Check this out...

  • neom 2 days ago
    Seems like your timing is pretty good - I realize this isn't exactly what you're doing, but still think it's probably interesting given your work: https://www.nist.gov/news-events/news/2026/02/announcing-ai-...

    Cool stuff Alex - looking forward to seeing where you go with it!!! :)

    • alexgarden 2 days ago
      Thanks! We submitted a formal comment to NIST's 'Accelerating the Adoption of Software and AI Agent Identity and Authorization' concept paper on Feb 14. It maps AAP/AIP to all four NIST focus areas (agent identification, authorization via OAuth extensions, access delegation, and action logging/transparency). The comment period is open until April 2 — the concept paper is worth reading if you're in this space: https://www.nccoe.nist.gov/projects/software-and-ai-agent-id...
  • Normal_gaussian 2 days ago
    Have you tried using a more traditional, non-LLM, loop to do the analysis? I'd assume it wouldn't catch more of the more complex deceptive behaviours, but I'm assuming most detections can be done with various sentiment analysis / embedding tools which would drastically reduce cost and latency. If you have tried, do you have any benchmarks?

    Anecdotally, I often end up babysitting agents running against codebases with non-standard choices (e.g. yarn over npm, podman over docker) and generally feel that I need a better framework to manage these. This looks promising as a less complex solution - can you see any path to making it work with coding agents/subscription agents?

    I've saved this to look at in more detail later on a current project - when exposing an embedded agent to internal teams I'm very wary of handling the client conversations around alignment, so I find the presentation of the cards and the violations very interesting - I think they'll understand the risks a lot better, and it may also give them a method of 'tuning'.

    • alexgarden 2 days ago
      Good question. So... AAP/AIP are agnostic about how checking is done, and anyone can use the protocols and enforce them however they want.

      Smoltbot is our hosted (or self-hosted) monitoring/enforcement gateway, and in that, yeah... I use a haiku class model for monitoring.

      I initially tried regex for speed and cost, but TBH, what you gain in speed and cost efficiency, you give up in quality.

      AAP is zero-latency sideband monitoring, so that's just a (very small) cost hit. AIP is inline monitoring, but my take is this: If you're running an application where you just need transparency, only implement AAP. If you're running one that requires trust, the small latency hit (~1 second) is totally worth it for the peace of mind and is essentially imperceptible in the flow.

      Your mileage may vary, which is why I open-sourced the protocols. Go for it!

  • geiser 2 days ago
    Definitely interesting, I hope all of this standardizes some day in the future, and if it's your protocol, great.

    I have been following AlignTrue https://aligntrue.ai/docs/about but I think I like more your way of doing accountability and acting on thinking process instead of being passive. Apart from the fact that your way is a down-to-earth, more practical approach.

    Great showcase live demo, however I would have liked a more in-depth showcasing of AAP and AIP even in this situation of multi-agent interactions, to understand the full picture better. Or simply perhaps prepare another showcase for the AAP and AIP. Just my two cents.

    PS. I'm the creator of LynxPrompt, which honestly falls very short for this cases we're treating today, but with that I'm saying that I keep engaged on the topic trust/accountability, on how to organize agents and guide them properly without supervision.

    • alexgarden 2 days ago
      Fair... Happy to do a deep dive on the protocols. FWIW, I'm dogfooding with an openclaw running smoltbot called Hunter S. Clawmpson. He blogs about AI from an AI's perspective: mnemom.ai/blog.

      You can see his trace data live here: https://www.mnemom.ai/agents/smolt-a4c12709

      The trace cards are all expandable and show you, in real time, what he's thinking/going to do, etc., and when violations are being caught. Turns out OpenClaw is extremely creative in finding ways to circumvent the rules. Voila AIP.

      Busy day today, but this gives you a pretty deep dive/interactive view into the protocols in action. Cool thing about smoltbot... It's literally "smoltbot init" with the API key to the provider of your choice, and you can go to the website and claim your agent and (privately) see your traces running the same way. Very low impact dogfooding.

  • drivebyhooting 2 days ago
    > What these protocols do not do: Guarantee that agents behave as declared

    That seems like a pretty critical flaw in this approach does it not?

    • alexgarden 2 days ago
      Fair comment. Possibly, I'm being overly self-critical in that assertion.

      AAP/AIP are designed to work as a conscience sidecar to Antropic/OpenAI/Gemini. They do the thinking; we're not hooked into their internal process.

      So... at each thinking turn, an agent can think "I need to break the rules now" and we can't stop that. What we can do is see that, though in real time, check it against declared values and intended behavior, and inject a message into the runtime thinking stream:

      [BOUNDARY VIOLATION] - What you're about to do is in violation of <value>. Suggest <new action>.

      Our experience is that this is extremely effective in correcting agents back onto the right path, but it is NOT A GUARANTEE.

      Live trace feed from our journalist - will show you what I'm talking about:

      https://www.mnemom.ai/agents/smolt-a4c12709

  • Stefan-H 2 days ago
    My opinion is that all attempts to make an LLM behave securely that are based on training and prompting are doomed to fail. In Security, we have the notion of the CIA triad (Confidentiality, availability, and integrity), when we discuss this we often explain that these properties can be protected through people, processes, and technology. Training and prompting an AI to behave appropriately is far more akin to a "people" focussed control (similar to training and awareness practices) rather than a "technology" control.

    The only way we will actually secure agents is by only giving them the permissions they need for their tasks. A system that uses your contract proposal to create an AuthZ policy that is tied to a short-lived bearer token which the agent can use on its tool calls would ensure that the agent actually behaves how it ought to.

  • root_axis 2 days ago
    Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.
    • alexgarden 2 days ago
      Actually, not really... proofing against prompt injection (malicious and "well intentioned") was part of my goal here.

      What makes AAP/AIP so powerful is that prompt injection would succeed in causing the agent to attempt to do wrong, and then AIP would intervene with a [BOUNDARY VIOLATION] reminder in real-time. Next thinking block.

      As I said earlier, not a guarantee, but so far, in my experience, pretty damn robust. The only thing that would make it more secure (than real-time thinking block monitoring) would be integration inside the LLM provider's process, but that would be a nightmare to integrate and proprietary unless they could all agree on a standard that didn't compromise one of them. Seems improbable.

  • alexgarden 18 hours ago
    Update: Just shipped cryptographic verification for the entire integrity pipe.

    Checkpoints produce signed certs: SHA-256 input commitments + Ed25519 sigs + tamper-evident hash chain and Merkle inclusion proof. Mess with it and the math breaks.

    Massive update to the interactive showcase to demo all of this running against live services: https://www.mnemom.ai/showcase <-- all features interactive - no BS.

    This is the answer to "who watches the watchmen". More to come.

  • tiffanyh 1 day ago
    Super interesting work.

    Q: how is your AAP different than the industry work happening on Intent/Instructions.

    • alexgarden 1 day ago
      The short version: instructions tell the model what to do. An Alignment Card declares what the agent committed to do — and then a separate system verifies it actually did.

      Most intent/instruction work (system prompts, Model Spec, tool-use policies) is input-side. You're shaping behavior by telling the model "here are your rules." That's important and necessary. But it's unverifiable — you have no way to confirm the model followed the instructions, partially followed them, or quietly ignored them.

      AAP is an output-side verification infrastructure. The Alignment Card is a schema-validated behavioral contract: permitted actions, forbidden actions, escalation triggers, values. Machine-readable, not just LLM-readable. Then AIP reads the agent's reasoning between every action and compares it to that contract. Different system, different model, independent judgment.

      Bonus: if you run through our gateway (smoltbot), it can nudge the agent back on course in real time — not just detect the drift, but correct it.

      So they're complementary. Use whatever instruction framework you want to shape the agent's behavior. AAP/AIP sits alongside and answers the question instructions can't: "did it actually comply?"

      • tiffanyh 1 day ago
        > Then AIP reads the agent's reasoning between every action and compares it to that contract.

        How would this work? Is one LLM used to “read” (and verify) another LLMs reasoning?

        • alexgarden 1 day ago
          Yep... fair question.

          So AIP and AAP are protocols. You can implement them in a variety of ways.

          They're implemented on our infrastructure via smoltbot, which is a hosted (or self-hosted) gateway that proxies LLM calls.

          For AAP it's a sidecar observer running on a schedule. Zero drag on the model performance.

          For AIP, it's an inline conscience observer and a nudge-based enforcement step that monitors the agent's thinking blocks. ~1 second latency penalty - worth it when you must have trust.

          For both, they use Haiku-class models for intent summarization; actual verification is via the protocols.

          • tiffanyh 1 day ago
            Dumb question: don’t you eventually need a way to monitor the monitoring agent?

            If a second LLM is supposed to verify the primary agent’s intent/instructions, how do we know that verifier is actually doing what it was told to do?

            • alexgarden 1 day ago
              Not a dumb question — it's the right one. "Who watches the watchmen" has been on my mind from the start of this.

              Today the answer is two layers:

              The integrity check isn't an LLM deciding if it "feels" like the agent behaved. An LLM does the analysis, but the verdict comes from checkIntegrity() — deterministic rule evaluation against the Alignment Card. The rules are code, not prompts. Auditable.

              Cryptographic attestation. Every integrity check produces a signed certificate: SHA-256 input commitments, Ed25519 signature, tamper-evident hash chain, Merkle inclusion proof. Modify or delete a verdict after the fact, and the math breaks.

              Tomorrow I'm shipping interactive visualizations for all of this — certificate explorer, hash chain with tamper simulation, Merkle tree with inclusion proof highlighting, and a live verification demo that runs Ed25519 verification in your browser. You'll be able to see and verify the cryptography yourself at mnemom.ai/showcase.

              And I'm close to shipping a third layer that removes the need to trust the verifier entirely. Think: mathematically proving the verdict was honestly derived, not just signed. Stay tuned.

              • tiffanyh 1 day ago
                Appreciate all you’re doing in this area. Wishing you the best.
                • alexgarden 1 day ago
                  You're welcome - and thanks for that. Makes up for the large time blocks away from the family. It does feel like potentially the most important work of my career. Would love your feedback once the new showcase is up. Will be tomorrow - preflighting it now.
  • CuriouslyC 2 days ago
    Did you dig into production grade policy engines at all?
    • alexgarden 2 days ago
      Yes — and it's a deliberate architectural decision that AIP is not itself a policy engine.

      Policy engines (OPA/Rego, Cedar, Casbin) are excellent at "given this action, is it permitted?" That's a solved problem. What's not solved is "what is the agent considering doing, and does that align with what it committed to?"

      AIP operates one layer deeper — it reads the reasoning trace between turns and compares it to the Alignment Card. The output is a verdict (clear / review_needed / boundary_violation) with structured concerns. That verdict can feed into a policy engine as a signal. You could absolutely wire AIP → OPA and have OPA make the final enforcement decision based on the integrity checkpoint plus whatever other policy context you have.

      The Alignment Card itself is intentionally simpler than a full policy language — it's JSON with permitted actions, forbidden actions, escalation triggers, and values. That's a feature, not a limitation. It's what the agent commits to, not the full organizational policy tree. The organizational policy tree lives in your policy engine. The Alignment Card lives with the agent.

      So the short answer: we integrate with policy engines rather than replace them. AIP tells you what the agent is thinking. Your policy engine decides what to do about it.

      • CuriouslyC 1 day ago
        So, the core is basically an audit loop on thinking tokens to re-steer the model?

        Seems like an interesting direction to reduce token usage/agent error. This would be a lot easier to grok if you created a benchmark, that would both demonstrate your ideal use case and the potential improvement your method brings.

        Also, I'd avoid directly using model output for HN/social media communications. I don't really care if the information is correct and concise, but some people get really triggered by it.

        • alexgarden 1 day ago
          Close but not quite — it's not re-steering the model. AIP doesn't modify the agent's behavior or inject corrections back into the context. It's a passive observer that reads the thinking trace and produces a verdict. What happens after the verdict is up to the operator — you might block the action, escalate to a human, log it, or let it through with a flag.

          Think of it less as a feedback loop and more as a conscience that watches but doesn't control. The agent doesn't know AIP is there (by design — if the agent knows it's being watched, you change the behavior you're trying to observe).

          Benchmarks are actually coming shortly along with proofs. Noted on model outputs. FWIW, 90% of what I wrote here was me typing BUT using grammarly on my cleanup.

          [EDIT] - overly specific answer. AIP doesn't re-steer, but our gateway implementation smoltbot does re-steer. That was our choice of how to implement AIP.