Ask HN: What LLM tools to locally search and analyse your documents?
I have yet to find any easy to use local models that I can feed my desktop archive of files and folders and then give it queries on that material. Are there any usable alternatives?
Many years ago I tried Devonthink, before LLMs were a thing, but it locked you in an app and was not so powerful.
I OCR’d all my PDFs separately, then used FAISS with all-MiniLM-L12-v2 to generate embeddings and vectors. After that, I wrote a separate Python script using Tkinter that acts as my search “GUI.” It takes natural language input, creates an embedding, and queries the FAISS output. It can send the results and an extract to a local LLM (i host it using llama.cpp), which re-ranks them and provides a justification for the ranking.
I wouldn't call it plug and play the actual scripting took me about a weekend, and creating the embeddings took several hours on my Mac but the results have been great.
Good question. Do you need OCR too, or just file recognition? Models like Granite are pretty good for simple OCR in a tiny footprint, but not punchy enough for serious work.
Only that the model can search existing pdf:s, docs, etc and I can then query it. Seems like the obvious local use case. I can’t upload gigabytes of data to ChatGPT, even if I wanted.
I wouldn't call it plug and play the actual scripting took me about a weekend, and creating the embeddings took several hours on my Mac but the results have been great.