Ask HN: Best on device LLM tooling for PDFs?

I've got very used to using the "big" LLMs for analysing PDFs

Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.

The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.

Given this is so new I'm struggling to find any tools which make this easier.

4 points | by martinald 13 days ago

2 comments

  • raymond_goo 13 days ago
    Try something like this

      !pip install pytesseract pdf2image pillow
      !apt install poppler-utils
      #!apt install tesseract-ocr
      from pdf2image import convert_from_path
      import pytesseract
    
      pages = convert_from_path('k.pdf', dpi=300)
    
      all_text = ""
      for page_num, img in enumerate(pages, start=1):
          text = pytesseract.image_to_string(img)
          all_text += f"\n--- Page {page_num} ---\n{text}"
    
      print(all_text)
  • constantinum 10 days ago