Show HN: LLM plays Pokémon (open sourced)

(github.com)

186 points | by adenta 156 days ago

13 comments

dang 156 days ago
Related ongoing thread:
Claude Plays Pokémon - https://news.ycombinator.com/item?id=43173825
montebicyclelo 156 days ago
Super cool to see this idea working. I had a go at getting an LLM to play Pokémon in 2023, with openai vision. With only 100 expensive api calls a day, I shelved the project after putting together a quick POC and finding that the model struggled to see things or work out where the player was. I guess models are now better, but also looks like people are providing the model with information in addition to the game screen.
https://x.com/sidradcliffe/status/1722355983643525427?t=dYMk...
[-]
- adenta 156 days ago
  The vision models still struggle in my experience. I got around that by reading the RAM and describing all the objects positions on screen
- beoberha 156 days ago
  Same! Except I tried using LLaVa 1.5 locally and it didn’t work at all lol
kanzure 156 days ago
You can also directly pull in the emulation state and map back to game source code, and then make a script for tool use (not shown here): https://github.com/pret/pokemon-reverse-engineering-tools/bl... Well I see on your page that you already saw the pret advice about memory extraction, hopefully the link is useful anyway.
[-]
- adenta 156 days ago
  Yeah, took a similar approach at https://github.com/adenta/fire_red_agent/blob/main/app/servi...
minimaxir 156 days ago
See also: the AI Plays Pokemon project that went megaviral a year or so ago, using CNNs and RL instead of LLMs: https://github.com/PWhiddy/PokemonRedExperiments
> I believe that Claude Plays Pokemon isn't doing any of the memory parsing I spent a ton of time, they are just streaming the memory directly to Claude 3.7 and it is figuring it out
It is implied they are using structured Pokemon data from the LLM and saving it as a knowledge base. That is the only way they can get live Pokemon party data to display in the UI: https://www.twitch.tv/claudeplayspokemon
The AI Plays Pokemon project above does note some of the memory addresses where that data is contained, since it used that data to calculate the reward for the PPO.
[-]
- swyx 144 days ago
  https://www.latent.space/p/how-claude-plays-pokemon-was-made more info here too
- adenta 156 days ago
  On this page (https://excalidraw.com/#json=WrM9ViixPu2je5cVJZGCe,no_UoONhF...) linked from their twitch, it says: “ This info is all parsed directly from the RAM of the game, Claude Code is very good at this task”. I’m reading that as “we are pumping the RAM directly into the LLM”, but I could be mistaken.
  [-]
  - minimaxir 156 days ago
    I agree that's ambigiously worded. For example, I'm not sure if Claude could identify "MT MOON B1F" from the RAM data alone since internally world map areas are only known by IDs, while AI Plays Pokemon did annotate the corresponding area with a human-readable name. https://github.com/PWhiddy/PokemonRedExperiments/blob/master...
    Though this RAM data could be in Claude's training data.
  - None4U 156 days ago
    I suspect that means they wrote the memory parser using Claude (the Twitch description also mentions the LLM getting specific info)
LZ_Khan 156 days ago
I just find it insane that we're bootstrapping reinforcement learning and world planning on top of basic next token prediction.
I'm amazed that it works, but also amazed that this is the approach being prioritized.
[-]
- AustinDev 156 days ago
  Pure RL NN 'solved' simple games like Pokémon years ago. I think added challenge of seeing how well LLMs can generalize is a noble pursuit. I think games are a fun problem as well.
  Look how poorly Claude 3.7 is doing on Pokemon on Twitch right now.
  [-]
  - minimaxir 156 days ago
    > Pure RL NN 'solved' simple games like Pokémon years ago.
    Please link to said project. From my search of Google filtered to 2010-2020, it returns nothing outside of proofs-of-concept (e.g. https://github.com/poke-AI/poke.AI) that do not perform any better, or instead trying to solve Pokemon battles which are an order of magnitude easier.
    [-]
    - thrance 156 days ago
      There is this amazing video [1] of some guy training a pure RL neural network to play Pokémon Red. It's not that old and the problem was certainly never completely solved.
      [1] https://youtu.be/DcYLT37ImBY
    - rockwotj 156 days ago
      Maybe they are conflating the Starcraft success that Deepmind had with AlphaStar?
      [-]
      - lairv 156 days ago
        And AlphaStar or OpenAI Five were playing games knowing internal game states and variables
        Playing games from pixels only is still is a pretty hard problem
        [-]
        genewitch 155 days ago
        Codebullet on YouTube remakes games and then makes the computer beat the game.
        Because pixels are hard.
podoman 156 days ago
Have you considered calling this bot "intern bot"? - Jay
[-]
- adenta 156 days ago
  https://static.wikia.nocookie.net/loveinterest/images/5/5c/3...
deadbabe 156 days ago
I want to note that if you really wanted an AI to play Pokémon you can do it with a far simpler and cheaper AI than an LLM and it would play the game far better, making this mostly an exercise in overcomplicating something trivial. But sometimes when you have a hammer everything will look like a nail.
[-]
- futureshock 156 days ago
  I know what you are saying, but I very much disagree. There are also better chess engines. That’s not the point.
  It’s all about the “G” in AGI. This is a nice demonstration of how LLMs are a generalizable intelligence. It was not designed to play Pokémon, Pokémon was no special part of its training set, Pokémon was not part of its evaluation criteria. And yet, it plays Pokémon, and rather well!
  And to see each iteration of Claude be able to progress further and faster in Pokémon helps demonstrate that each generation of the LLM is getting smarter in general, not just better fitted to standard benchmarks.
  The point is to build the universal hammer that can hammer every nail, just as the human mind is the universal hammer.
  [-]
  - deadbabe 156 days ago
    It is not generalizable intelligence, its wisdom of the crowds. Claude does not form long term strategies or create predictions about future states. A simpler GOAP engine could create far more elaborate plans and still run entirely locally on your device (while adapting constantly to changing world states).
    And yea you could have Claude use a GOAP tool for planning, but all you’re really doing is layering an LLM on top of a conventional AI as a presentation layer to make the lower AI seem far more intelligent than it is. This is why trying to use LLMs for complex decision making about anything that isn’t text and words is a dead end.
    [-]
    - lyu07282 155 days ago
      > It is not generalizable intelligence, its wisdom of the crowds.
      Did you see twitch chat plays pokemon? There was not much wisdom in that crowd :P
      [-]
      - eru 151 days ago
        Well, we know that some ways to organise crowds work better than others.
  - wordpad25 156 days ago
    Pokémon guides were definitely part of every LLM training set. Game is so old, there are thousands of guides and videos on the topic.
    LLMs will readily offer high quality Pokémon gameplay advice without needing to searc online.
    [-]
    - hombre_fatal 155 days ago
      If you're implying that generalization isn't at play because game knowledge shows up in its training data, you can disabuse yourself of that by watching the stream and how it reasons itself out of situations. You can see its chain of thought.
      It spends most of its time stuck and reasoning about what it can do. It might throw back to knowledge like "I know Pokemon games can have a ledge system that you can walk off, so I will try to see if this is a ledge" (and it fails and has to think of something else), but it's not like it knows the moment to moment intricacies of the game. It's clearly generalized problem solving.
    - minimaxir 156 days ago
      The operative phrase of that comment being “no special part.”
      If you watch the Twitch stream it is obvious Claude has general knowledge of what to do to win in Pokémon but cannot recall specifics.
      [-]
      - northern-lights 156 days ago
        For eg., Bug type attack is super effective against Poison type in Gen 1 but not very effective in Gen 2 and onnwards. But Claude keeps bringing Nidoran into Weedle/Caterpie.
- minimaxir 156 days ago
  The AI Plays Pokemon project only made it to Mt. Moon (where coincidentially ClaudePlaysPokemon is stuck now) with many months of iteration and many many hours of compute.
  The reason Claude 3.7's performance is interesting is that the LLM approach defeated Lt. Surge, far past Mt. Moon. (I wonder how Claude solved the infamous puzzle in Surge's gym)
  https://www.anthropic.com/research/visible-extended-thinking
  [-]
  - gyomu 156 days ago
    The fact that these models can only play up to a certain point seems like an interesting indication as to the inherent limitation of their capabilities.
    After all, the game does not introduce any significant new mechanics beyond the first couple areas - any human player who has the reading/reasoning ability to make it to Mt Moon/Lt Surge would be able to complete the rest of the game.
    So why are these models getting stuck at arbitrary points in the game?
    [-]
    - minimaxir 156 days ago
      There's one major mechanic that opens up shortly after Lt. Surge: nonlinearity. Once you get to Lavender Town, there are several options to go to, and I suspect that will be difficult for an AI to handle over a limited context window.
      And if the AI decides to attempt Seafoam Islands, all bets are off.
  - deadbabe 156 days ago
    Not talking about Reinforcement learning type AI, I’m talking about classically programmed AI with standard pathfinders, GOAP, behavior trees, etc…
    [-]
    - Philpax 156 days ago
      But how much effort do you have to put in to build an agent that can play a specific game? Can you retarget that agent easily? How well will your agent deal with circumstances that it wasn't designed for?
      [-]
      - deadbabe 156 days ago
        A lot less effort than training a massive LLM.
        Also, there’s no point in designing for use cases it will never encounter. A Pokémon rpg AI is never going to have to go play GTA.
        [-]
        Philpax 156 days ago
        A LLM can be reused for other use cases. Your agent can't.
        [-]
        deadbabe 156 days ago
        The reusability is overrated.
        For every problem that isn’t natural language processing, there exists a far better solution that runs faster and more optimally than an LLM, at the expense of having to actually program the damn thing (for which you can use an LLM to help you anyway).
        Who can fight harder and better in a Pokémon battle, a programmed AI or an LLM? The programmed AI, because it has tactics and analysis built in. Even better, the AI’s difficulty can be scaled trivially where as an LLM you can tell it to “go easy” but it doesn’t actually know what that means? There’s no point in wasting time with an LLM for such an application.
    - adenta 156 days ago
      Got a link handy?
- drusepth 156 days ago
  I don't think this project is meant to "solve" a task (hammer, nail) insomuch as it's just an interesting "what if" experiment to observe and play around with new technology.
- adenta 156 days ago
  I disagree. Getting a computer to play a game like a human has an incredibly broad range of applications. Imagine a system like this that is on autopilot, but can get suggestions from a twitch chat, nudging its behavior in a specific direction. Two such systems could be run by two teams, and they could do a weekly battle.
  This isn’t an exercise in AI, it’s an exercise in TV production IMO.
- imtringued 156 days ago
  It's a publicity stunt by anthropic (Claude plays Pokémon).
  Obviously they are going to show off their LLM
rererereferred 155 days ago
What's the appeal of Pokemon for these kind of things? I never see AI or Twitch chat playing other turn based games like Final Fantasy or Fire Emblem.
[-]
- yifanl 155 days ago
  Pokemon is extremely hard to completely brick a run short of actively releasing your entire box which is very appealing for an MVP run, and is also literally the biggest media franchise in the world, which is a very appealing for people seeking hype.
- mminer237 155 days ago
  They're all inspired by TwitchPlaysPokemon, who chose Pokémon cuz he personally liked it and because "Even when played very poorly it is difficult not to make progress in Pokémon". It doesn't have game overs or permadeath. Even when you lose battles, you typically get stronger.
- hombre_fatal 155 days ago
  Cultural mass.
tgtweak 156 days ago
You can use claude computer functions to actually play it on an emulator with no programming at all - but that kind of feels like cheating :D
[-]
- adenta 156 days ago
  I tried! It didn’t work super well
evanextreme 156 days ago
Was working on a similar thing last year! Might as well open source at this point too.
[-]
- adenta 156 days ago
  Email me when it launches! (In profile)
mclau156 156 days ago
Honestly Claude 3.7 can make a pokemon game in pygame fairly easily, at that point it would have a lot more control over it
[-]
- genewitch 155 days ago
  This is what codebullet does, YouTube channel. Recreate games so that an agent can play them and try to win, high score, whatever.
linwangg 155 days ago
Really cool experiment! The idea of AI 'playing' games as a form of entertainment is fascinating—kind of like Twitch streams but fully autonomous. Curious, what were the biggest hurdles with input control? Lag, accuracy, or something else?
[-]
- adenta 155 days ago
  I couldn't get https://docs.libretro.com/library/remote_retropad/ to work! its designed for game pads to control the emulator. I was hoping I could repurpose it to just send arbitrary key commands over a network interface, but nothing I tried worked. When I asked in the retroarch discord, they couldn't figure it out either lol.
- phatskat 155 days ago
  I doubt this answers any of your questions but check out saltybet on twitch - people make fighters and the computer pits them against each other a la street fighter
ArlenBales 156 days ago
> To me, this is the future of TV.
The future of television is watching bots play video games? What a sad future.
[-]
- farts_mckensy 156 days ago
  Watchin AI robots fight each other gladiator style world be pretty cool.
  [-]
  - adenta 156 days ago
    Idk what the parent comment said but let’s make AI robots fight each other gladiator style.
    [-]
    - Izkata 156 days ago
      A step there, Orbitron v human operators: https://youtu.be/TiSpihZnq4E?t=681
- adenta 156 days ago
  Yeah I think _a_ future of television might've been more apt. They should've made a season 5 to Snowpiercer.