Best architecture to use LLM on laptop app [closed]
I'm designing an App that takes a document and then gets an LLM to review it. Ideally it would: run on a low end laptop (ie no GPU), work without internet access, run with minimal cost and have a RAG system that informs the review. I realise I might have to compromise on some of these. I've thought of three architectures: A. Make remote calls to eg Chatgpt with RAG on the laptop. This presumably incurs API call costs. B. Run RAG+eg mistral on the local machine. This might need a more powerful machine which I don't want. C. Run the whole thing in the cloud and just make calls to it from the laptop. Maybe use Chatgpt, maybe use llama or similar. Obviously cloud costs here. Is there a better/standard way?
![Best architecture to use LLM on laptop app [closed]](https://cdn.sstatic.net/Sites/softwareengineering/Img/apple-touch-icon@2.png?v=1ef7363febba)
I'm designing an App that takes a document and then gets an LLM to review it. Ideally it would:
- run on a low end laptop (ie no GPU),
- work without internet access,
- run with minimal cost and
- have a RAG system that informs the review.
I realise I might have to compromise on some of these.
I've thought of three architectures:
A. Make remote calls to eg Chatgpt with RAG on the laptop. This presumably incurs API call costs.
B. Run RAG+eg mistral on the local machine. This might need a more powerful machine which I don't want.
C. Run the whole thing in the cloud and just make calls to it from the laptop. Maybe use Chatgpt, maybe use llama or similar. Obviously cloud costs here.
Is there a better/standard way?