I've worked in a number of regulated industries off & on for years, and recently hit this gap.
We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not.
The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging.
With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log.
It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits).
The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs.
Blog post with link to repo: https://systima.ai/blog/open-source-article-12-audit-logging
I'd value feedback, thoughts, and any critique.
One thing worth flagging from a compliance perspective: Art. 12 requires logs to be retained for the lifetime of the high-risk AI system or at minimum 10 years from the last use. The 180-day floor you mention is a starting point but auditors will typically ask for much longer retention windows, especially for systems used in employment, credit, or law enforcement contexts.
Also worth noting for teams building on this: the logs themselves become part of the "technical documentation" under Art. 11, which means they need to be accessible in a structured way to notified bodies during a conformity assessment — not just stored. The CLI reconstruction feature you describe is a good step toward that.
Building similar documentation tooling for EU AI Act compliance (the broader evidence vault problem, not just logging) and this kind of open infrastructure for Art. 12 specifically would integrate well with that approach.
Fair point on the reconstruction attack.
The library is deliberately scoped as tamper-evident, not tamper-proof; it detects modification but does not prevent wholesale chain reconstruction by someone with storage access. The design assumes defence-in-depth: S3 Object Lock (Compliance mode) at the infrastructure layer, hash chain verification at the application layer.
External timestamping (OpenTimestamps, RFC 3161) would definitely add independent temporal anchoring and is worth considering as an optional feature. From what I can see, Article 12 does not currently prescribe specific cryptographic mechanisms (but of course the assurance level would increase with it).
On the regulatory question: Article 12 requires "automatic recording" that enables monitoring and reconstruction and current regulatory guidance does not require tamper-proof storage (only trustworthy, auditable records). The hash chain plus immutable storage is designed to meet that bar, but what you raise here is good and thoughtful.
voxic11 is right that the AI Act creates a legal obligation that provides a lawful basis for processing under GDPR Article 6(1)(c).
To add to that, Article 17(3)(b) specifically carves out an exemption to the right to erasure where retention is necessary to comply with a legal obligation.
(So the defence works at both levels; you have a lawful basis to retain, and erasure requests don’t override it during the mandatory retention period).
That said, GDPR data minimisation (Article 5(1)(c)) still constrains what you log.
The library addresses this at write-time today, in that the pii config lets you SHA-256 hash inputs/outputs before they hit the log and apply regex redaction patterns, so personal data need never enter the chain in the first place.
This enables the pattern of “Hash by default, only log raw where necessary for Article 12”.
For cases where raw content must be logged (eg, full decision reconstruction for a regulator), we’re planning a dual-layer storage approach. The hash chain would cover a structural envelope (timestamps, decision ID, model ID, parameters, latency, hash pointers) while the actual PII-bearing content (input prompts, output text) would live in a separate referenced object.
Erasure would then mean deleting the content object, and the chain would stay intact because it never hashed the raw content directly.
The regulator would also therefore see a complete, tamper-evident chain of system activity.
It would definitely work (and when dealing with petabyte levels of data the simplicity of only having to delete the key is convenient).
We’re leaning toward the dual-layer separation I described though (metadata separate to content) mainly because crypto-shredding means every read (including regulatory reconstruction) depends on a key store.
In my view that’s a significant dependency for an audit log whose whole purpose is reliable reconstructability, whereas dual-layer lets the chain stand on its own.
Your point about developer mistakes is fair. It applies to dual layer as you say with your example, but I’d say crypto shredding isn’t immune to mistakes because (for example) deleting the key only works if the key and plaintext never leaked elsewhere accidentally in logs / backups etc.
The AI Act qualifies as such a legal obligation.