How to Run DeepSeek Locally Using Ollama
In a time when data privacy, performance, and cost control are critical, running large language models (LLMs) locally is becoming increasingly practical. Among the open-source offerings, DeepSeek-R1 models stand out due to their strong performance in coding, logical reasoning, and problem-solving tasks. This guide explains how to install and run DeepSeek-R1 models locally using Ollama, and optionally expose them securely online using Pinggy. It's aimed at developers and IT professionals who want a self-hosted, offline-capable, and customizable LLM stack. Why Consider Running DeepSeek-R1 Models Locally? Running models like DeepSeek-R1 on your local machine offers several practical advantages: Data stays local – no external server or API receives your prompts. Zero cloud usage limits – you control the compute resources. Offline-ready – ideal for air-gapped or restricted networks. Choose models by system specs – from lightweight to high-performance variants. Step 1: Install Ollama Ollama provides a simple command-line interface to run open-source LLMs locally. Installation: Head to ollama. Choose your operating system (Linux, macOS, or Windows). Follow the installation prompts. After setup, open your terminal and check the installation: ollama --version Step 2: Pull a DeepSeek-R1 Model DeepSeek models are available in various sizes to suit different hardware capacities. Choose Based on Your System: Basic system (≤ 8GB RAM): ollama pull deepseek-r1:1.5b Mid-tier system (≥ 16GB RAM): ollama pull deepseek-r1:7b High-performance systems (≥ 32GB RAM): ollama pull deepseek-r1:8b ollama pull deepseek-r1:14b Check which models are downloaded: ollama list Step 3: Run the Model Locally Once the model is pulled, running it is straightforward: ollama run deepseek-r1:1.5b This opens an interactive terminal session. You can begin asking coding questions, logical reasoning problems, or other NLP tasks. Example prompt: You: What’s the output of the following Python code? print([i**2 for i in range(5)]) Step 4 (Optional): Use DeepSeek via API Ollama exposes an API interface so you can integrate DeepSeek-R1 into apps or scripts. Start the API Server: ollama serve Send an API request: curl http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1:1.5b", "messages": [{"role":"user", "content":"Hello"}] }' You can build apps on top of this using JavaScript, Python, or other frameworks. Step 5 (Optional): Use a GUI via Open WebUI For those who prefer a ChatGPT-style web interface: Run Open WebUI via Docker: docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main Access the GUI: Open your browser at http://localhost:3000, set up an admin account, and select a DeepSeek model to start chatting. Step 6 (Optional): Share Your Ollama API Online with Pinggy If you want to test or access your local model remotely, you can forward Ollama's API port online using Pinggy. Start Ollama Server: ollama serve Create a Public Tunnel: ssh -p 443 -R0:localhost:11434 -t qr@a.pinggy.io "u:Host:localhost:11434" Explanation: -p 443: Uses HTTPS-compatible port to avoid firewall blocks. -R0:localhost:11434: Forwards Ollama's local API port. qr@a.pinggy.io: Pinggy SSH endpoint. "u:Host:localhost:11434": Header forwarding to allow remote access. Once executed, Pinggy will return a public HTTPS URL like https://yourid.pinggy.link. Verify the API Online: curl https://yourid.pinggy.link/api/tags You can now test your model remotely or share this URL with collaborators. Performance Optimization Tips Use quantized versions to reduce memory usage: ollama pull deepseek-r1:1.5b-q4_K_M Limit context size to reduce latency: ollama run deepseek-r1:1.5b --num_ctx 1024 Control randomness and creativity with temperature: ollama run deepseek-r1:1.5b --temperature 0.7 --top_p 0.9 Troubleshooting Model not loading? Try a smaller size or close background applications. Slow output? Use a quantized model or reduce num_ctx. API not reachable? Confirm ollama serve is running and Pinggy tunnel is active. About the DeepSeek-R1 Family Released under the MIT license Models available: Qwen-based: 1.5B, 7B, 14B, 32B LLaMA-based: 8B, 70B Suitable for reasoning, software development, and general-purpose NLP Conclusion Running DeepSeek locally using Ollama is a powerful option for developers looking for secure, cost-efficie

In a time when data privacy, performance, and cost control are critical, running large language models (LLMs) locally is becoming increasingly practical. Among the open-source offerings, DeepSeek-R1 models stand out due to their strong performance in coding, logical reasoning, and problem-solving tasks.
This guide explains how to install and run DeepSeek-R1 models locally using Ollama, and optionally expose them securely online using Pinggy. It's aimed at developers and IT professionals who want a self-hosted, offline-capable, and customizable LLM stack.
Why Consider Running DeepSeek-R1 Models Locally?
Running models like DeepSeek-R1 on your local machine offers several practical advantages:
- Data stays local – no external server or API receives your prompts.
- Zero cloud usage limits – you control the compute resources.
- Offline-ready – ideal for air-gapped or restricted networks.
- Choose models by system specs – from lightweight to high-performance variants.
Step 1: Install Ollama
Ollama provides a simple command-line interface to run open-source LLMs locally.
Installation:
- Head to ollama.
- Choose your operating system (Linux, macOS, or Windows).
- Follow the installation prompts.
- After setup, open your terminal and check the installation:
ollama --version
Step 2: Pull a DeepSeek-R1 Model
DeepSeek models are available in various sizes to suit different hardware capacities.
Choose Based on Your System:
- Basic system (≤ 8GB RAM):
ollama pull deepseek-r1:1.5b
- Mid-tier system (≥ 16GB RAM):
ollama pull deepseek-r1:7b
- High-performance systems (≥ 32GB RAM):
ollama pull deepseek-r1:8b
ollama pull deepseek-r1:14b
Check which models are downloaded:
ollama list
Step 3: Run the Model Locally
Once the model is pulled, running it is straightforward:
ollama run deepseek-r1:1.5b
This opens an interactive terminal session. You can begin asking coding questions, logical reasoning problems, or other NLP tasks.
Example prompt:
You: What’s the output of the following Python code?
print([i**2 for i in range(5)])
Step 4 (Optional): Use DeepSeek via API
Ollama exposes an API interface so you can integrate DeepSeek-R1 into apps or scripts.
Start the API Server:
ollama serve
Send an API request:
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:1.5b",
"messages": [{"role":"user", "content":"Hello"}]
}'
You can build apps on top of this using JavaScript, Python, or other frameworks.
Step 5 (Optional): Use a GUI via Open WebUI
For those who prefer a ChatGPT-style web interface:
Run Open WebUI via Docker:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Access the GUI:
Open your browser at http://localhost:3000, set up an admin account, and select a DeepSeek model to start chatting.
Step 6 (Optional): Share Your Ollama API Online with Pinggy
If you want to test or access your local model remotely, you can forward Ollama's API port online using Pinggy.
Start Ollama Server:
ollama serve
Create a Public Tunnel:
ssh -p 443 -R0:localhost:11434 -t qr@a.pinggy.io "u:Host:localhost:11434"
Explanation:
-
-p 443
: Uses HTTPS-compatible port to avoid firewall blocks. -
-R0:localhost:11434
: Forwards Ollama's local API port. -
qr@a.pinggy.io
: Pinggy SSH endpoint. -
"u:Host:localhost:11434"
: Header forwarding to allow remote access.
Once executed, Pinggy will return a public HTTPS URL like https://yourid.pinggy.link
.
Verify the API Online:
curl https://yourid.pinggy.link/api/tags
You can now test your model remotely or share this URL with collaborators.
Performance Optimization Tips
- Use quantized versions to reduce memory usage:
ollama pull deepseek-r1:1.5b-q4_K_M
- Limit context size to reduce latency:
ollama run deepseek-r1:1.5b --num_ctx 1024
- Control randomness and creativity with temperature:
ollama run deepseek-r1:1.5b --temperature 0.7 --top_p 0.9
Troubleshooting
- Model not loading? Try a smaller size or close background applications.
-
Slow output? Use a quantized model or reduce
num_ctx
. -
API not reachable? Confirm
ollama serve
is running and Pinggy tunnel is active.
About the DeepSeek-R1 Family
- Released under the MIT license
-
Models available:
- Qwen-based: 1.5B, 7B, 14B, 32B
- LLaMA-based: 8B, 70B
Suitable for reasoning, software development, and general-purpose NLP
Conclusion
Running DeepSeek locally using Ollama is a powerful option for developers looking for secure, cost-efficient AI solutions. Whether you’re prototyping applications, working in restricted environments, or simply want better control over AI workflows, this local deployment method gives you freedom without sacrificing performance.