Ask HN: Do you use LLM for HTML translations?

I've came accross a good but quite hard problem to solve which is to translate HTML pages into whatever human langage.

I've started tinkering with DeepL which proposes this feature but results are quite off the rails because you cannot provide any context to it.

So well, it was a clear case for an LLM to me. After crafting a good context prompt, I've tried o4-mini from OpenAI and it's results are nearly perfect.

However came business around and I need to be able to this WAY faster, as a ~100ko page will take like a minute to generate, which is way too long for the end user.

I've optimised content a little (sending only html parts that are interesting) but it's still quite slow in the end (more than 10s). I'm thinking about translating html to json and do it back after translation, so it might be faster.

Does anyone do this as well ? Or has any insight of a better performing API/LLM for this problem ?

2 points | by Mooty 6 hours ago

1 comments

  • web5lab 2 hours ago
    Yes, I’ve tackled this too. LLMs like o4-mini give great quality, but latency is a real issue for larger pages. One trick is converting HTML to structured JSON (just the visible text), translating that, then rebuilding the HTML. You can also parallelize translation and use faster models like Claude Haiku or Gemini Flash. For bulk tasks, traditional models like MarianMT via Hugging Face are much faster. Caching also helps a lot. Curious how your JSON approach works out!
    • Mooty 45 minutes ago
      Did you use any library to go from html to json ? Did you have any inconsistensies with stuff like "<div>text<span> is <span> really</span><b> difficult to convert to json</b><span></div>" ?

      I think using 3rd party API might help with latency as they won't have the same latency problem.