It would be good to see a real example. There’s a sketch of one in the README.md but I’d be interested to see how it works in real life with something complicated.
> Add users with authentication
> No, not like that
> Closer, but I don’t want avatars
> I need email validation too
> Use something off the shelf?
Someone in this place was saying this the other day: a lot of what might seem like public commits to main are really more like private commits to your feature branch. Once everything works you squash it all down to a final version ready for review and to commit to main.
It’s unclear what the “squash” process is for “make me a foo” + “no not like that”.
Yeah the squash question is the whole thing. If your commit history is "do X" -> "no, not like that" -> "closer" then your final commit message is just "do X" with no trace of why certain approaches were rejected. Which is arguably the most useful part of the conversation.
If you have a compiler the same source code and the same options, it should generate the same output everything provided you aren't using some compiler pragmas or something similar that embeds timestamps or random numbers or similar. If you give an LLM the same input, it can generate different outputs (controlled by the temperature setting).
I'll be charitable here but you need to go out of your way to introduce non-determinism. Bit reproducible builds and distros exist so it is possible to have an entire distro that can be reliably reproduced bit-by-bit on different systems and at different times.
Folks who bring up these "gotchas" should be forbidden from using or taking advantage of the things they are disingenuously whataboutism-ing. Reminds me of sovereign citizen behavior.
It is not clear to me that keeping prompts/conversations at something like this level of granularity is a _bad_ idea, nor that it's a good one. My initial response is that, while it seems cute, I can't really imagine myself reading it in most cases. Perhaps though you'd end up using it exactly when you're struggling to understand some code, the blame is unclear, the commit message is garbage, and no one remembers which ticket spawned it.
In my CLAUDE.md, I have Claude include all new prompts verbatim in the commit message body.
While I haven't used Claude long enough to need my prompts, I would appreciate seeing my coworkers' prompts when I review their LLM-generated code or proposals. Sometimes it's hard to tell if something was intentional that the author can stand behind, or fluff hallucinated by the LLM. It's a bit annoying to ask why something suspicious was written the way it is, and then they go ahead and wordlessly change it as if it's their first time seeing the code too.
I tried maintaining chat hostory and summary in a 'changes' dir in the repo. Claude creates a md file before commiting (timestamp.md, commit hash doesn't work as filename because rebase/squash).
I had to stop doing this because it greatly slowed down and confuse the model, when it did a repo search and found results in some old md files. Plus token usage went through the roof.
So keeping changes in the open like that in the repo doesn't work.
Not sure how tfa works, but hopefully the model doesn't see that data.
Huh? Either I don't get it, or they don't get it, or both. I'm so puzzled it's probably both.
> Every ghost commit answers: what did I want to happen here? Not what bytes changed.
Aren't they just describing what commit messages are supposed to be? Their first `git log --online` output looks normal to me. You don't put the bytes changed in the commit message; git can calculate that from any two states of the tree. You summarize what you're trying to do and why. If you run `git log -p` or `git show`, then yeah you see the bytes changed, in addition to the commit message. Why would you put the commit messages in some separate git repo or storage system?
> Ghost snapshots the working tree before and after Claude runs, diffs the two, and stages only what changed. Unrelated files are never touched.
That's...just what git does? It's not even possible to stage a file that hasn't changed.
> Every commit is reproducible. The prompt is preserved exactly. You can re-run any commit against a fresh checkout to see what Claude generates from the same instruction.
This is not what I mean by reproducible. I can re-run any commit against a fresh checkout but Claude will do something different than what it did when they ran it before.
You don't need to be committing the prompts you're using. There's a whole bunch of back and forth in the prompts as you refine. That's not useful information.
What you should do, is use the context window that you've got from writing the code and refine that into a commit message using a skill.
I like this concept because everyone's thought of "commit the agent prompts and reproduce everything from scratch every time" as a "dumb idea" I'm unsure if anyone has actually executed on it in a snappy git-like UI.
Now, because the author took the time to work on it, we can see if this is actually a better method of software development. If LLM development continues deflating the cost of quality software, maybe this will turn out to be the future.
I dont know. I get the idea that its like comitting c code that then gets compiled to machine code when someone needs the binary, but what if the prompt isnt complete?
For any formal language, there was a testing and iteration process that resulted in the programmer verifying that this code results in the correct functionality, and because a formal compiler is deterministic, they can know that the same code will have the same functionality when compiled and ran by someone else (edge cases concerning different platforms and compilers not withstanding)
But here, even if the prompt is iterated on and the prompter verifies dlfunctionality, its not guaranteed (or even highly likely) to create the same code with the same functionality when someone else runs the prompt. Even if the coding agent is the same. Even if its the same version. Simply due to the stochastic nature of these things.
This sounds like a bad idea. You gotta freeze your program in a reproducable form. Natural language prompts arent it, formal language instructions are
I love this idea although not sure I’d be comfortable with the level of steering control I would get without trying it for real! What would be even better would be to unshittify my poorly written commit message into a beautiful detailed commit message. We can still keep the original in the footnote if we have to.
I start with a conversation and then ask the coding agent to write a design doc. It might go through several revisions. The implementation might also be a bit different if something unexpected is found, so afterwards I ask the agent to update it to explain what was implemented.
This naturally happens over several commits. I suppose I could squash them, but I haven't bothered.
I noticed in the README that each commit message includes the agent and model, which is a nice start toward reproducibility.
I’m wondering how deep you plan to go on environment pinning beyond that. Is the system prompt / agent configuration versioned? Do you record tool versions or surrounding runtime context?
My mental model is that reproducible intent requires capturing the full "execution envelope", not just the human prompt + model & agent names. Otherwise it becomes more of an audit trail (which is also a good feature) than something you can deterministically re-run.
That’s fair - strict determinism isn’t possible in the traditional sense. I was thinking more along the lines of bounded reproducibility.
If the model, parameters, system prompt, and toolchain are pinned, you might not get identical output, but you can constrain the space of possible diffs.
It reminds me a bit of how StrongDM talks about reproducibility in their “Digital Twin” concept - not bit-for-bit replay, but reproducing the same observable behavior.
Poe’s Law applies here. “This is a pretty good parody of vibe coding” I thought, but then I didn’t see example commits like “update heart pacer firmware and push to users”.
I think I'd be more interested in a new worktree job type CLI. At the end of the day I don't want to be reverting commit who were a clear mistake, no matter how good AI will be in the near future.
Git was designed for humans.
Commits, branches, and the entire model works really well for human-to-human collaboration, but it starts to be too much for agent-to-human interactions.
A better alternative is to share the entire session, the entire flows of ideas, changes and decisions. For that, the export needs to be in a human, readble way, offering a rich experiences to other humans to understand, is way better then having git annotations.
I don't think people read prompt sessions, not even their own. Instinctively I think the sessions are valuable because they capture how we arrived at the thing. in practice it's just "slop" to everyone that wasn't there when it was happening. including future you.
i'm mostly just musing. I get the instinct, but i suspect all sessions have extremely rapid decay cycles.
It’s unclear what the “squash” process is for “make me a foo” + “no not like that”.
Commit your specs, not your prompts. When a change is correct, any information of value contained in the prompt should be reflected in the spec.
Write your code so that the intent is crystal clear. If it is not, you have failed.
The primary purpose of code is NOT for a computer to execute it, but for humans to communicate intent.
> some extra attributes about which model and agent was used.
> You can re-run any commit against a fresh checkout to see
> what Claude generates from the same instruction.
I don't see how this is true. LLMs can generate different outputs even with the same model and inputs.
While I haven't used Claude long enough to need my prompts, I would appreciate seeing my coworkers' prompts when I review their LLM-generated code or proposals. Sometimes it's hard to tell if something was intentional that the author can stand behind, or fluff hallucinated by the LLM. It's a bit annoying to ask why something suspicious was written the way it is, and then they go ahead and wordlessly change it as if it's their first time seeing the code too.
edit: and also, not bothering to represent manual edits to the code...
I had to stop doing this because it greatly slowed down and confuse the model, when it did a repo search and found results in some old md files. Plus token usage went through the roof. So keeping changes in the open like that in the repo doesn't work.
Not sure how tfa works, but hopefully the model doesn't see that data.
> Every ghost commit answers: what did I want to happen here? Not what bytes changed.
Aren't they just describing what commit messages are supposed to be? Their first `git log --online` output looks normal to me. You don't put the bytes changed in the commit message; git can calculate that from any two states of the tree. You summarize what you're trying to do and why. If you run `git log -p` or `git show`, then yeah you see the bytes changed, in addition to the commit message. Why would you put the commit messages in some separate git repo or storage system?
> Ghost snapshots the working tree before and after Claude runs, diffs the two, and stages only what changed. Unrelated files are never touched.
That's...just what git does? It's not even possible to stage a file that hasn't changed.
> Every commit is reproducible. The prompt is preserved exactly. You can re-run any commit against a fresh checkout to see what Claude generates from the same instruction.
This is not what I mean by reproducible. I can re-run any commit against a fresh checkout but Claude will do something different than what it did when they ran it before.
What you should do, is use the context window that you've got from writing the code and refine that into a commit message using a skill.
https://zknill.io/posts/commit-message-intent/
Now, because the author took the time to work on it, we can see if this is actually a better method of software development. If LLM development continues deflating the cost of quality software, maybe this will turn out to be the future.
For any formal language, there was a testing and iteration process that resulted in the programmer verifying that this code results in the correct functionality, and because a formal compiler is deterministic, they can know that the same code will have the same functionality when compiled and ran by someone else (edge cases concerning different platforms and compilers not withstanding)
But here, even if the prompt is iterated on and the prompter verifies dlfunctionality, its not guaranteed (or even highly likely) to create the same code with the same functionality when someone else runs the prompt. Even if the coding agent is the same. Even if its the same version. Simply due to the stochastic nature of these things.
This sounds like a bad idea. You gotta freeze your program in a reproducable form. Natural language prompts arent it, formal language instructions are
What are these people doing? There's a reason we don't attach random metadata to commits
This naturally happens over several commits. I suppose I could squash them, but I haven't bothered.
I’m wondering how deep you plan to go on environment pinning beyond that. Is the system prompt / agent configuration versioned? Do you record tool versions or surrounding runtime context?
My mental model is that reproducible intent requires capturing the full "execution envelope", not just the human prompt + model & agent names. Otherwise it becomes more of an audit trail (which is also a good feature) than something you can deterministically re-run.
Curious how you’re thinking about that.
If the model, parameters, system prompt, and toolchain are pinned, you might not get identical output, but you can constrain the space of possible diffs.
It reminds me a bit of how StrongDM talks about reproducibility in their “Digital Twin” concept - not bit-for-bit replay, but reproducing the same observable behavior.
I thought,that was what commit titles and descriptions are for?
What changed would be the diff.
Edit: perhaps the prompt describes the "how".
A better alternative is to share the entire session, the entire flows of ideas, changes and decisions. For that, the export needs to be in a human, readble way, offering a rich experiences to other humans to understand, is way better then having git annotations.
That's why we built https://github.com/wunderlabs-dev/claudebin.com. A free and open-source Claude Code session sharing tool, which allows other humans to better understand decisions.
Those sessions can be shared in PR https://github.com/vtemian/blog.vtemian.com/pull/21, embedded https://blog.vtemian.com/post/vibe-infer/ or just shared with other humans.
But every time, AI workflows are so unique... that’s why "we" built
i'm mostly just musing. I get the instinct, but i suspect all sessions have extremely rapid decay cycles.