The SWE-Bench Illusion

(microsoft.com)

4 points | by louiereederson 1 hour ago

2 comments

  • N_Lens 1 hour ago
    It seems to be inevitable that metrics become targets and cease to be valuable.
  • Ethan312 1 hour ago
    This is a good reminder that benchmark results don’t always translate to real engineering work. Solving a scoped task inside a controlled setup is very different from working in a live codebase with missing context and messy history. Benchmarks are still useful, but they should be treated as one signal, not the full picture.