Ask HN: Hardware for 1k RPS?

I ran an uncensored model on a CPU server. as expected its dead slow (min or two per query).

What kinda hardware (GPU) do i need to serve 1k RPS?

I could not find APIs for uncensored models that kinda forced me to run locally

5 points | by gsky 3 days ago

2 comments

  • eddythompson80 3 days ago
    Depends on your model size and how many of it can fit in memory. Multiply the size by 1k and divide by the memory capacity of the hardware for a rough ballpark.
  • barnabee 3 days ago
    https://venice.ai claim to offer uncensored models (I’ve not tested that claim)
    • gsky 3 days ago
      Thanks, I give it a try.