Show HN: I built a tool to assist AI agents to know when a PR is good to go

(dsifry.github.io)

30 points | by dsifry 12 hours ago

5 comments

rootnod3 5 hours ago
Sorry, so the tool is now even circumventing human review? Is that the goal?
So the agent can now merge shit by itself?
Just the let damn thing push nto prod by itself at this point.
[-]
- blutoot 26 minutes ago
  At a scale, I don't see a net negative of AI merging "shit by itself" if the developer (or the agent) is ensuring sufficient e2e, integration and unit test coverage prior to every merge, if in return I get my team to crank out features at a 10x speed.
  The reality is that probably 99.9999% of code bases on this earth (but this might drop soon, who knows) pre-date LLMs and organizing them in a way that coding agents can produce consistent results from sprint to sprint, will need a big plumbing work from all dev teams. And that will include refactoring, documentation improvements, building consensus on architectures and of course reshaping the testing landscape. So SWE's will have a lot of dirty work to do before we reach the aforementioned "scale".
  However, a lot of platforms are being built from ground-up today in a post-CC (claude code) era . And they should be ready to hit that scale today.
  [-]
  - dsifry 14 minutes ago
    Yup! Software engineers aren't going to be out of work anytime soon, but I'm acting more like a CTO or VPE with a team of agents now, rather than just a single dev with a smart intern.
- dsifry 9 minutes ago
  No, it just prepares the PR - it doesn't automatically merge. That would be very dangerous, imho!
- ljm 4 hours ago
  Someone’s gonna think about wiring all this up to Linear or Jira, and there’ll be a whole new set of vulnerabilities created from malicious bug reports.
  [-]
  - dsifry 13 minutes ago
    That's why I intentionally don't have this hooked into an ingest flow - you still get control over what issues/stories you want the agent swarm to work on... Just now, I can know that the code that was written has been reviewed and all comments have been fully addressed!
- literalAardvark 4 hours ago
  In some workflows it's helpful for the full loop to be automated so that the agent can test if what's done works.
  And you can do a more exhaustive test later, after the agents are done running amok to merge various things.
  [-]
  - dsifry 8 minutes ago
    Exactly right!
- baxtr 4 hours ago
  I’m not saying this is, but if I were a malicious state actor, that’s exactly the kind of thing I’d like to see in widespread use.
- danenania 4 hours ago
  I don’t think “ready to merge” necessarily means the agent actually merges. Just that it’s gone as far as it can automatically. It’s up to you whether to review at that point or merge, depending on the project and the stakes.
  If there are CI failures or obvious issues that another AI can identify, why not have the agent keep going until those are resolved? This tool just makes that process more token efficient. Seems pretty useful to me.
  [-]
  - dsifry 10 minutes ago
    That's EXACTLY right. Ready to merge is an important gate, but it is very stupid to just merge everything without further checks/testing by a human!
- tayo42 1 hour ago
  No,
  The linked page explains how this fits into a development workflow
  eg.
  > A reviewer wrote “consider using X”… is that blocking or just a thought?
  > AMBIGUOUS - Needs human judgment (suggestions, questions)
  [-]
  - dsifry 11 minutes ago
    Right! It doesn't assume that all comments are actionable, or need to be worked on. However, if you allow anyone to comment on your PRs, it could be a malicious vector. So don't let anyone review PRs on projects that you care about!!!
philipp-gayret 1 hour ago
Very interesting! This has a gem in the documentation: Using the tool itself as a CI check. I hadn't considered unresolved comments by say a person, or CodeRabbit or similar tool being a CI status failure. That's an excellent idea for AI driven PR's.
On a personal note; I hate LLM output to advertise a project. If you have something to share have the decency to type it out yourself or at least redact the nonsense from it.
[-]
- dsifry 6 minutes ago
  Lol, I thought it did a reasonably good job, but to each their own - this was the difference between releasing the project so others could use it with decent documentation, or not releasing and just using it internally. :)
joshribakoff 2 hours ago
I dislike the idea of coupling my workflow to saas platforms like github or code rabbit. The fact that you still have to create local tools is a selling point for just doing it all “locally”.
joshuanapoli 4 hours ago
This looks nice! I like the idea of providing more deterministic feedback and more or less forcing the assistant to follow a particular development process. Do you have evidence that gtg improves the overall workflow? I think that there is a trade-off between risk of getting stuck (iteration without reaching gtg-green) versus reaching perfect 100% completion.
[-]
- dsifry 4 minutes ago
  I found that it has improved overall code quality significantly, at the cost of somewhat slower velocity. But it has meant fewer interruptions where the ai is just waiting for me, or saying "Everything is ready!" only to find that ci/cd failed or there were clearly existing comments/issues.
mcolley 11 hours ago
Super interesting, any particular reason you didn't try to solve these prior to pushing with hooks and subagents?
[-]
- dsifry 11 hours ago
  I did! The issue however, is having a clear, deterministic method of defining when the code review was 'done'. So the hooks can fire off subagents, but they are non-deterministic and often miss vital code review comments - especially ones that are marked in an inline comment, or are marked as 'Out of PR Scope' or 'Out of range of the file' - which are often the MOST important comments to address!
  So gtg builds all of that in and deterministically determines whether or not there are any actionable comments, and thus you can block the agent from moving forward until all actionable comments are thoroughly reviewed, acted upon or acknowledged, at which point it will change state and allow the PR to be merged.
  [-]
  - blutoot 5 hours ago
    I thought hooks are always fired if you use it as a PreToolUse event. Wouldn’t that work for the GitHub action tools from the GitHub mcp?
    [-]
    - dsifry 1 minute ago
      Just to be clear - the hook is deterministic, but the subagent running with an mcp server loaded is not - and for medium/large PRs, it can run out of context window or just forget what it is trying to do and get lazy and say 'Everything is good, ready to merge!' when in fact tests are failing or there are still unaddressed PR comments.
    - dsifry 18 minutes ago
      Sure, but that mcp still missed actionable comments that are marked as Out of Scope or Outside the PR - and this doesn't require having the context window loss of having another mcp instantiated, either. Anyway, give gtg a competitive look against the mcp - you should be able to see the difference