Best architecture to use LLM on laptop app [closed]

I'm designing an App that takes a document and then gets an LLM to review it. Ideally it would: run on a low end laptop (ie no GPU), work without internet access, run with minimal cost and have a RAG system that informs the review. I realise I might have to compromise on some of these. I've thought of three architectures: A. Make remote calls to eg Chatgpt with RAG on the laptop. This presumably incurs API call costs. B. Run RAG+eg mistral on the local machine. This might need a more powerful machine which I don't want. C. Run the whole thing in the cloud and just make calls to it from the laptop. Maybe use Chatgpt, maybe use llama or similar. Obviously cloud costs here. Is there a better/standard way?

Feb 15, 2025 - 18:29
 0
Best architecture to use LLM on laptop app [closed]

I'm designing an App that takes a document and then gets an LLM to review it. Ideally it would:

  1. run on a low end laptop (ie no GPU),
  2. work without internet access,
  3. run with minimal cost and
  4. have a RAG system that informs the review.

I realise I might have to compromise on some of these.

I've thought of three architectures:

A. Make remote calls to eg Chatgpt with RAG on the laptop. This presumably incurs API call costs.

B. Run RAG+eg mistral on the local machine. This might need a more powerful machine which I don't want.

C. Run the whole thing in the cloud and just make calls to it from the laptop. Maybe use Chatgpt, maybe use llama or similar. Obviously cloud costs here.

Is there a better/standard way?