How to Run DeepSeek Locally Using Ollama

In a time when data privacy, performance, and cost control are critical, running large language models (LLMs) locally is becoming increasingly practical. Among the open-source offerings, DeepSeek-R1 models stand out due to their strong performance in coding, logical reasoning, and problem-solving tasks. This guide explains how to install and run DeepSeek-R1 models locally using Ollama, and optionally expose them securely online using Pinggy. It's aimed at developers and IT professionals who want a self-hosted, offline-capable, and customizable LLM stack. Why Consider Running DeepSeek-R1 Models Locally? Running models like DeepSeek-R1 on your local machine offers several practical advantages: Data stays local – no external server or API receives your prompts. Zero cloud usage limits – you control the compute resources. Offline-ready – ideal for air-gapped or restricted networks. Choose models by system specs – from lightweight to high-performance variants. Step 1: Install Ollama Ollama provides a simple command-line interface to run open-source LLMs locally. Installation: Head to ollama. Choose your operating system (Linux, macOS, or Windows). Follow the installation prompts. After setup, open your terminal and check the installation: ollama --version Step 2: Pull a DeepSeek-R1 Model DeepSeek models are available in various sizes to suit different hardware capacities. Choose Based on Your System: Basic system (≤ 8GB RAM): ollama pull deepseek-r1:1.5b Mid-tier system (≥ 16GB RAM): ollama pull deepseek-r1:7b High-performance systems (≥ 32GB RAM): ollama pull deepseek-r1:8b ollama pull deepseek-r1:14b Check which models are downloaded: ollama list Step 3: Run the Model Locally Once the model is pulled, running it is straightforward: ollama run deepseek-r1:1.5b This opens an interactive terminal session. You can begin asking coding questions, logical reasoning problems, or other NLP tasks. Example prompt: You: What’s the output of the following Python code? print([i**2 for i in range(5)]) Step 4 (Optional): Use DeepSeek via API Ollama exposes an API interface so you can integrate DeepSeek-R1 into apps or scripts. Start the API Server: ollama serve Send an API request: curl http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1:1.5b", "messages": [{"role":"user", "content":"Hello"}] }' You can build apps on top of this using JavaScript, Python, or other frameworks. Step 5 (Optional): Use a GUI via Open WebUI For those who prefer a ChatGPT-style web interface: Run Open WebUI via Docker: docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main Access the GUI: Open your browser at http://localhost:3000, set up an admin account, and select a DeepSeek model to start chatting. Step 6 (Optional): Share Your Ollama API Online with Pinggy If you want to test or access your local model remotely, you can forward Ollama's API port online using Pinggy. Start Ollama Server: ollama serve Create a Public Tunnel: ssh -p 443 -R0:localhost:11434 -t qr@a.pinggy.io "u:Host:localhost:11434" Explanation: -p 443: Uses HTTPS-compatible port to avoid firewall blocks. -R0:localhost:11434: Forwards Ollama's local API port. qr@a.pinggy.io: Pinggy SSH endpoint. "u:Host:localhost:11434": Header forwarding to allow remote access. Once executed, Pinggy will return a public HTTPS URL like https://yourid.pinggy.link. Verify the API Online: curl https://yourid.pinggy.link/api/tags You can now test your model remotely or share this URL with collaborators. Performance Optimization Tips Use quantized versions to reduce memory usage: ollama pull deepseek-r1:1.5b-q4_K_M Limit context size to reduce latency: ollama run deepseek-r1:1.5b --num_ctx 1024 Control randomness and creativity with temperature: ollama run deepseek-r1:1.5b --temperature 0.7 --top_p 0.9 Troubleshooting Model not loading? Try a smaller size or close background applications. Slow output? Use a quantized model or reduce num_ctx. API not reachable? Confirm ollama serve is running and Pinggy tunnel is active. About the DeepSeek-R1 Family Released under the MIT license Models available: Qwen-based: 1.5B, 7B, 14B, 32B LLaMA-based: 8B, 70B Suitable for reasoning, software development, and general-purpose NLP Conclusion Running DeepSeek locally using Ollama is a powerful option for developers looking for secure, cost-efficie

May 2, 2025 - 11:40
 0
How to Run DeepSeek Locally Using Ollama

In a time when data privacy, performance, and cost control are critical, running large language models (LLMs) locally is becoming increasingly practical. Among the open-source offerings, DeepSeek-R1 models stand out due to their strong performance in coding, logical reasoning, and problem-solving tasks.

This guide explains how to install and run DeepSeek-R1 models locally using Ollama, and optionally expose them securely online using Pinggy. It's aimed at developers and IT professionals who want a self-hosted, offline-capable, and customizable LLM stack.

Why Consider Running DeepSeek-R1 Models Locally?

Running models like DeepSeek-R1 on your local machine offers several practical advantages:

  • Data stays local – no external server or API receives your prompts.
  • Zero cloud usage limits – you control the compute resources.
  • Offline-ready – ideal for air-gapped or restricted networks.
  • Choose models by system specs – from lightweight to high-performance variants.

Step 1: Install Ollama

Ollama provides a simple command-line interface to run open-source LLMs locally.

Installation:

  1. Head to ollama.
  2. Choose your operating system (Linux, macOS, or Windows).
  3. Follow the installation prompts.
  4. After setup, open your terminal and check the installation:
   ollama --version

Step 2: Pull a DeepSeek-R1 Model

DeepSeek models are available in various sizes to suit different hardware capacities.

Choose Based on Your System:

  • Basic system (≤ 8GB RAM):
  ollama pull deepseek-r1:1.5b

ollama pull deepseek-r1:1.5b

  • Mid-tier system (≥ 16GB RAM):
  ollama pull deepseek-r1:7b
  • High-performance systems (≥ 32GB RAM):
  ollama pull deepseek-r1:8b
  ollama pull deepseek-r1:14b

Check which models are downloaded:

ollama list

ollama list

Step 3: Run the Model Locally

Once the model is pulled, running it is straightforward:

ollama run deepseek-r1:1.5b

This opens an interactive terminal session. You can begin asking coding questions, logical reasoning problems, or other NLP tasks.

Example prompt:

You: What’s the output of the following Python code?
print([i**2 for i in range(5)])

Step 4 (Optional): Use DeepSeek via API

Ollama exposes an API interface so you can integrate DeepSeek-R1 into apps or scripts.

Start the API Server:

ollama serve

Send an API request:

curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:1.5b",
    "messages": [{"role":"user", "content":"Hello"}]
  }'

postman ssh

You can build apps on top of this using JavaScript, Python, or other frameworks.

Step 5 (Optional): Use a GUI via Open WebUI

For those who prefer a ChatGPT-style web interface:

Run Open WebUI via Docker:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

run1

Access the GUI:

Open your browser at http://localhost:3000, set up an admin account, and select a DeepSeek model to start chatting.
run2

Step 6 (Optional): Share Your Ollama API Online with Pinggy

If you want to test or access your local model remotely, you can forward Ollama's API port online using Pinggy.

Start Ollama Server:

ollama serve

ollama runing

Create a Public Tunnel:

ssh -p 443 -R0:localhost:11434 -t qr@a.pinggy.io "u:Host:localhost:11434"

pinggy's command

Explanation:

  • -p 443: Uses HTTPS-compatible port to avoid firewall blocks.
  • -R0:localhost:11434: Forwards Ollama's local API port.
  • qr@a.pinggy.io: Pinggy SSH endpoint.
  • "u:Host:localhost:11434": Header forwarding to allow remote access.

Once executed, Pinggy will return a public HTTPS URL like https://yourid.pinggy.link.

log

Verify the API Online:

curl https://yourid.pinggy.link/api/tags

You can now test your model remotely or share this URL with collaborators.

url

Performance Optimization Tips

  • Use quantized versions to reduce memory usage:
  ollama pull deepseek-r1:1.5b-q4_K_M
  • Limit context size to reduce latency:
  ollama run deepseek-r1:1.5b --num_ctx 1024
  • Control randomness and creativity with temperature:
  ollama run deepseek-r1:1.5b --temperature 0.7 --top_p 0.9

Troubleshooting

  • Model not loading? Try a smaller size or close background applications.
  • Slow output? Use a quantized model or reduce num_ctx.
  • API not reachable? Confirm ollama serve is running and Pinggy tunnel is active.

About the DeepSeek-R1 Family

  • Released under the MIT license
  • Models available:

    • Qwen-based: 1.5B, 7B, 14B, 32B
    • LLaMA-based: 8B, 70B
  • Suitable for reasoning, software development, and general-purpose NLP

Conclusion

Running DeepSeek locally using Ollama is a powerful option for developers looking for secure, cost-efficient AI solutions. Whether you’re prototyping applications, working in restricted environments, or simply want better control over AI workflows, this local deployment method gives you freedom without sacrificing performance.

References

  1. How to Run DeepSeek Locally
  2. Pinggy's Official Website
  3. How to Easily Share Ollama API and Open WebUI Online