Project Glasswing: An Initial Update

(anthropic.com)

50 points | by louiereederson 45 minutes ago

4 comments

0xAstro 0 minutes ago
I had a fun day today where I had deepseek-v4-flash subagents work out patch for dirty frag for systems with AF_ALG disabled and nscd turned on, to gain root access. The original published exploit wasn't working but the patched one worked like a charm.
I am still a believer that a 100 subagents with good-enough intelligence can get same results as mythos, I am ready for this opinion to be shattered when I eventually try mythos and I believe others here must have tried mythos out too.
InsideOutSanta 7 minutes ago
I wonder if it coincidentally becomes safe to release when compute capacity bought from SpaceX will provide enough headroom to let a lot more people run it.
OsrsNeedsf2P 38 minutes ago
The vulnerabilities found continues to impress, and make legacy media, Twitter and Youtube go nuts. But we still have no data to prove this wasn't doable with the same initiative backed by Opus 4.7, and there is no GA for Mythos access.
[-]
- krisbolton 10 minutes ago
  There is independent research out there on frontier model capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
- energy123 24 minutes ago
  . Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;
  [-]
  - dawnerd 3 minutes ago
    And of those 271, how many were very minor, really not an issue, kinda of fixes? And how many of the fixes will end up opening up being new vulnerabilities?
  - properbrew 3 minutes ago
    > over ten times more than they found in Firefox 148 with Claude Opus 4.6
    And how much with Opus 4.7? 5x?
  - applfanboysbgon 11 minutes ago
    Did they allocate the same number of tokens to looking with Claude 4.6? Or did they find more because they looked more, owing to a special initative by Anthropic?
- parker-3461 27 minutes ago
  Makes me wonder if Anthropic is really having issues with allocating compute (see recent deals with xAI and SpaceX). From available benchmarks, it seems like similar results should be possible with GPT 5.5 Pro or Opus 4.7 (with specific cybersecurity trained models).
  [-]
  - smoe 15 minutes ago
    At least according to this, GPT-5.5 Cyber is on par with Mythic, as the only two models that were able to finish their 32-step corporate network attack simulation.
    https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
  - wiwiwq 18 minutes ago
    Who knows but from a valuation stand point it’s better to signal that demand is higher than existing capacity..
- pertymcpert 28 minutes ago
  > Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6
  4.6 but close.
  [-]
  - OsrsNeedsf2P 22 minutes ago
    Right, but were they using the same methodology and harness? I'm skeptical that they're doing something with the harness - i.e. with Mythos, they pass each file in one at a time, whereas on 4.6 they let Claude Code run loose to find bugs. This would have a larger impact difference than the model itself.
- bobbycastorama 29 minutes ago
  I've seen a blog post by a security researcher saying that he was able to find the same vulnerabilities (for Firefox IIRC) with a ~30B params LLM...
  So yeah, huge marketing as always.
  [-]
  - krisbolton 4 minutes ago
    This is different though right? He found one (? we don't know who you're referring to - post sources for a higher quality discussion) vulnerability, he already knew it was there, etc. Anthropic didn't claim no other model can find vulnerabilities, nor that it's impossible with smaller models. They're claiming Mythos is a step-change in ability for end-to-end vulnerability discover and exploit creation. And that other frontier models are close behind.
  - Brystephor 15 minutes ago
    Did the security researcher point the LLM at the blob of information and say "Find vulnerabilities" or was the LLM told to "determine if vulnerability X is present in this blob"? Confirmation of suspected vulnerabilities is a different problem from finding vulnerabilities.
  - wiwiwq 21 minutes ago
    To me it’s clear what’s going on.
    The American firms are focused on marketing now to convince people to not even consider open sourced models / open weight models as they are inferior (that’s what they want you to believe).
    [-]
    - rhubarbtree 19 minutes ago
      IPO is coming is what is going on
      [-]
      - wiwiwq 17 minutes ago
        That’s implicit in my post.
        If people actually believe the narrative then the bankers will over price Anthropic and get away with it.
- boston_clone 28 minutes ago
  you would likely be quite interested in the more quantitative writeup from a real research team ! it’s linked about midway in to the article - similar functionally can be reached, yes, but not always and never with fewer tokens than what mythos requires.
  https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...
  [-]
  - OsrsNeedsf2P 20 minutes ago
    Ok this is actually a pretty good article and justifies the step function marketing in security they talked about
- enlightenedfool 14 minutes ago
  Is this the God model that no one else can build? Unbelievable.
amusingimpala75 28 minutes ago
Is this suspected vulns or actual vulns? If I recall correctly, it produced 5 for curl but only 1 was legit
[-]
- Smaug123 23 minutes ago
  > So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).
  > 1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. That means that even if Mythos Preview finds no further vulnerabilities, at our current post-triage true-positive rates, it’s on track to have surfaced nearly 3,900 high- or critical-severity vulnerabilities in open-source code
- extr 7 minutes ago
  Did you RTFA?
- rbranson 16 minutes ago
  I don't know why you're getting downvoted. This is exactly what was reported by curl's creator under the section "Five findings became one": https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v...
  [-]
  - Smaug123 11 minutes ago
    I think it's more that the requested information is prominently featured in the article, and indeed is the content of the only graphic in the article below the intro banner.
  - wiwiwq 15 minutes ago
    [flagged]
- RamRodification 25 minutes ago
  This is marketing. So probably suspected. Or somewhere in between.