State of AI: An Empirical 100T Token Study with OpenRouter

(openrouter.ai)

56 points | by anjneymidha 1 hour ago

6 comments

lukev 26 minutes ago
Super interesting data.
I do question this finding:
> the small model category as a whole is seeing its share of usage decline.
It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.
It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.
[-]
- maikakz 9 minutes ago
  Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.
sosodev 2 minutes ago
The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?
syspec 30 minutes ago
According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.
I'm pretty surprised by that, but I guess that also selects for people who would use openrouter
[-]
- djfergus 4 minutes ago
  Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.
- raincole 6 minutes ago
  If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.
asadm 10 minutes ago
Who is using grok code and why?
[-]
- djfergus 0 minutes ago
  It's a 1.7 trillion token free model. Why wouldn't you try it?
  I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).
themanmaran 1 hour ago
> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.
I'd be interested in a clarification on the reasoning vs non-reasoning metric.
Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).
Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.
[-]
- ribosometronome 3 minutes ago
  As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.
- reeeli 1 hour ago
  I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?
  [-]
  - themanmaran 46 minutes ago
    "reasoning" models like GPT 5 et al do a pre-generation step where they:
    - Take in the user query (input tokens)
    - Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)
    - Answer (output tokens)
    Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.
  - typs 55 minutes ago
    I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)
    [-]
    - maikakz 45 minutes ago
      that's exactly right!
typs 1 hour ago
This is really amazing data. Super interesting read