11 comments

  • tjungblut 1 day ago
    If you are curios, like me, how the actual reinforcement learning happens. It uses verl [1] underneath. The paper "HybridFlow: A Flexible and Efficient RLHF Framework" [2] explains it really well.

    [1] https://github.com/volcengine/verl

    [2] https://arxiv.org/abs/2409.19256v2

  • anorwell 1 day ago
    Some of the comments so far seem to be misunderstanding this submission. As I understand it:

    1. Custom scaffolding (system prompt and tools) using Qwen3-32B achieved 13.75% on Terminal-Bench. No training was involved.

    2. The author has built an RL system, but it has not been used for anything due to cost limitations.

    So there's actually no result related to training here. It well known that the scaffolding used can have a large impact on benchmark outcomes (the Terminal bench leaderboard also demonstrates this [1]).

    [1] https://www.tbench.ai/leaderboard

    • esafak 1 day ago
      It looks like the submission has two aspects that are being conflated.

      1. Tooling for training a terminal agent.

      2. An agent that was _not_ trained with this tooling but prompt engineered. I could not find the author's discussion on this point.

  • OtherShrezzing 1 day ago
    That you've spent in the low-thousands (by the looks of it), and managed to beat GPT4.1 is an amazing insight into the moat of the big AI labs.
  • rboyd 1 day ago
    Great work! There should be a way for entities to crowdfund model training. Can a model like this be partially evaluated during training time and save through early stopping?

    What are the best papers/resources on sota long-horizon RL?

    Thanks.

  • TarasBob 1 day ago
    I'm willing to help fund this if the creator is interested. I sent him an email.
  • enigma101 1 day ago
    Did you consider a kickstarter to overcome the gpu poorness??? 30 to 50 should be doable
  • bravesoul2 1 day ago
    Wow amazing! Amazing a "one person band" can do this much. It crosses many skillets.
  • thomasfromcdnjs 1 day ago
    How much did you spend?
  • lostmsu 11 hours ago
    Why do you need 50k? Can't you tune using LoRA?
    • Danau5tin 11 hours ago
      Exactly my first thought when I realised the cost! Currently LoRA is not supported by rLLM (The team told me they aim to support in next release), but it is certainly possible to port to verl directly or another RL framework for sure. I just did not have the time to port again (already done 2x as other RL frameworks had issues)
  • erdaltoprak 1 day ago
    This is incredible work