2 comments

  • sermakarevich 12 days ago
    I think they do scale:

    -- check this bug report from AMD where they say they run fleet of 50 agents 24x7 https://github.com/anthropics/claude-code/issues/42796

    -- here I am running 3 coding agents 24x5 (not yet 7 so far) https://news.ycombinator.com/item?id=48520757

    I was using multi-coding agents for poc projects first, now for production research and poc second and soon plan to get to production.

    • dovelome 9 days ago
      I like what you are doing! I suggest you create a proxy service that will route the traffic to the AI provider.

      You can cache questions and answers heavily and use a B25 search with a vector embedded to retrieve the best results for you.

         RAG pipeline
         ├── BM25 + TF-IDF + RRF retrieval
         ├── cross-encoder reranking
         ├── knowledge-graph entity linking
         └── multi-angle intent detection
              │
              ▼
         LLM synthesis  (Claude / local models)
  • dovelome 12 days ago
    [flagged]