How to Run AI Models Locally with Ollama: Deploy LLMs and Debug APIs in Minutes

This article introduces how to download Ollama and deploy AI large language models (such as DeepSeek-R1, Llama 3.2, etc.). Using Ollama - an open-source large language model service tool - you can run other open-source AI models locally on your computer. We'll provide step-by-step instructions for installation and setup to enable seamless interaction with AI models. Table of Contents Step 1: Download and Install Ollama Step 2: Install AI Models Step 3: Interact with AI Models Step 4: Optional - Simplify Workflows with GUI/Web Tools Step 5: Debug the Local AI API Step 1: Download and Install Ollama Visit Ollama's official GitHub repository: https://github.com/ollama/ollama Download the version corresponding to your operating system (this tutorial uses macOS as an example; Windows follows similar steps). 3.Complete the installation. After installation, open the Terminal (on macOS, press F4 and search for "Terminal"). Enter ollama - if the following prompt appears, installation was successful. Step 2: Install AI Models After installing Ollama, download the desired AI model using these commands: ollama run Llama3.2 Available models (replace Llama3.2 with your preferred model): Model Parameters Size Download DeepSeek-R1 7B 4.7GB ollama run deepseek-r1 DeepSeek-R1 671B 404GB ollama run deepseek-r1:671b Llama 3.3 70B 43GB ollama run llama3.3 Llama 3.2 3B 2.0GB ollama run llama3.2 Llama 3.2 1B 1.3GB ollama run llama3.2:1b Llama 3.2 Vision 11B 7.9GB ollama run llama3.2-vision Llama 3.2 Vision 90B 55GB ollama run llama3.2-vision:90b Llama 3.1 8B 4.7GB ollama run llama3.1 Llama 3.1 405B 231GB ollama run llama3.1:405b Phi 4 14B 9.1GB ollama run phi4 Phi 4 Mini 3.8B 2.5GB ollama run phi4-mini Gemma 2 2B 1.6GB ollama run gemma2:2b Gemma 2 9B 5.5GB ollama run gemma2 Gemma 2 27B 16GB ollama run gemma2:27b Mistral 7B 4.1GB ollama run mistral Moondream 2 1.4B 829MB ollama run moondream Neural Chat 7B 4.1GB ollama run neural-chat Starling 7B 4.1GB ollama run starling-lm Code Llama 7B 3.8GB ollama run codellama Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored LLaVA 7B 4.5GB ollama run llava Granite-3.2 8B 4.9GB ollama run granite3.2 A progress indicator will appear during download (duration depends on internet speed): When prompted with "Send a message", you're ready to interact with the model: Step 3: Interact with Llama3.2 Example interaction (asking "Who are you?"): Use Control + D to end the current session. To restart later, simply rerun ollama run Llama3.2. Step 4: Optional GUI/Web Interface Support Using a terminal for daily interactions can be inconvenient. For a more user-friendly experience, Ollama’s GitHub repository lists multiple community-driven GUI and web-based tools (e.g., Ollama WebUI, LM Studio). You can explore these options independently, as each project provides its own setup instructions. Here’s a brief overview: GUI Tools Ollama Desktop : Native app for macOS/Windows (supports model management and chat). LM Studio : Cross-platform interface with model library integration. Web Interfaces Ollama WebUI : Browser-based chat interface (run locally). OpenWebUI : Customizable web dashboard for model interaction. For details, visit the Ollama GitHub README . Step 5: Debug the AI API Ollama exposes a local API by default. Refer to the Ollama API Docs for details. Below, we will use Apidog to debug the local API generated by Ollama. If you haven't installed Apidog yet, you can download and install it—it's an excellent tool for API debugging, API documentation, API mocking, and automated API testing. Create a New Request Copy this cURL command: curl --location --request POST 'http://localhost:11434/api/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "llama3.2", "prompt": "Why is the sky blue?", "stream": false }' In Apidog: Create a new HTTP project. Paste the cURL into the request builder. Save the configuration. Send the Request Navigate to the "Run" tab and click "Send". The AI response will appear. For streaming output, set "stream": true. Conclusion This guide covered: Ollama installation Model deployment Command-line interaction API testing with Apidog You now have a complete workflow for local AI model experimentation and application development. References Ollama GitHub Repository Apidog Documentation

Mar 7, 2025 - 11:34
 0
How to Run AI Models Locally with Ollama: Deploy LLMs and Debug APIs in Minutes

This article introduces how to download Ollama and deploy AI large language models (such as DeepSeek-R1, Llama 3.2, etc.). Using Ollama - an open-source large language model service tool - you can run other open-source AI models locally on your computer. We'll provide step-by-step instructions for installation and setup to enable seamless interaction with AI models.

Table of Contents

  1. Step 1: Download and Install Ollama

  2. Step 2: Install AI Models

  3. Step 3: Interact with AI Models

  4. Step 4: Optional - Simplify Workflows with GUI/Web Tools

  5. Step 5: Debug the Local AI API

Step 1: Download and Install Ollama

  1. Visit Ollama's official GitHub repository: https://github.com/ollama/ollama

  2. Download the version corresponding to your operating system (this tutorial uses macOS as an example; Windows follows similar steps).

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

3.Complete the installation.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

After installation, open the Terminal (on macOS, press F4 and search for "Terminal"). Enter ollama - if the following prompt appears, installation was successful.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

Step 2: Install AI Models

After installing Ollama, download the desired AI model using these commands:

ollama run Llama3.2

Available models (replace Llama3.2 with your preferred model):

Model

Parameters

Size

Download

DeepSeek-R1

7B

4.7GB

ollama run deepseek-r1

DeepSeek-R1

671B

404GB

ollama run deepseek-r1:671b

Llama 3.3

70B

43GB

ollama run llama3.3

Llama 3.2

3B

2.0GB

ollama run llama3.2

Llama 3.2

1B

1.3GB

ollama run llama3.2:1b

Llama 3.2 Vision

11B

7.9GB

ollama run llama3.2-vision

Llama 3.2 Vision

90B

55GB

ollama run llama3.2-vision:90b

Llama 3.1

8B

4.7GB

ollama run llama3.1

Llama 3.1

405B

231GB

ollama run llama3.1:405b

Phi 4

14B

9.1GB

ollama run phi4

Phi 4 Mini

3.8B

2.5GB

ollama run phi4-mini

Gemma 2

2B

1.6GB

ollama run gemma2:2b

Gemma 2

9B

5.5GB

ollama run gemma2

Gemma 2

27B

16GB

ollama run gemma2:27b

Mistral

7B

4.1GB

ollama run mistral

Moondream 2

1.4B

829MB

ollama run moondream

Neural Chat

7B

4.1GB

ollama run neural-chat

Starling

7B

4.1GB

ollama run starling-lm

Code Llama

7B

3.8GB

ollama run codellama

Llama 2 Uncensored

7B

3.8GB

ollama run llama2-uncensored

LLaVA

7B

4.5GB

ollama run llava

Granite-3.2

8B

4.9GB

ollama run granite3.2

A progress indicator will appear during download (duration depends on internet speed):

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

When prompted with "Send a message", you're ready to interact with the model:

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

Step 3: Interact with Llama3.2

Example interaction (asking "Who are you?"):

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

  • Use Control + D to end the current session.

  • To restart later, simply rerun ollama run Llama3.2.

Step 4: Optional GUI/Web Interface Support

Using a terminal for daily interactions can be inconvenient. For a more user-friendly experience, Ollama’s GitHub repository lists multiple community-driven GUI and web-based tools (e.g., Ollama WebUI, LM Studio). You can explore these options independently, as each project provides its own setup instructions. Here’s a brief overview:

  • GUI Tools

    • Ollama Desktop : Native app for macOS/Windows (supports model management and chat).
    • LM Studio : Cross-platform interface with model library integration.
  • Web Interfaces

    • Ollama WebUI : Browser-based chat interface (run locally).
    • OpenWebUI : Customizable web dashboard for model interaction.

For details, visit the Ollama GitHub README .

Step 5: Debug the AI API

Ollama exposes a local API by default. Refer to the Ollama API Docs for details.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

Below, we will use Apidog to debug the local API generated by Ollama. If you haven't installed Apidog yet, you can download and install it—it's an excellent tool for API debugging, API documentation, API mocking, and automated API testing.

Create a New Request

Copy this cURL command:

curl --location --request POST 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "llama3.2",
    "prompt": "Why is the sky blue?",
    "stream": false
}'

In Apidog:

  • Create a new HTTP project.

  • Paste the cURL into the request builder.

  • Save the configuration.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

Send the Request

Navigate to the "Run" tab and click "Send". The AI response will appear.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

For streaming output, set "stream": true.

Run AI Models Locally with Ollama: A Step-by-Step Guide to Installation, Deployment & API Integration

Conclusion

This guide covered:

  1. Ollama installation

  2. Model deployment

  3. Command-line interaction

  4. API testing with Apidog

You now have a complete workflow for local AI model experimentation and application development.

References

  1. Ollama GitHub Repository

  2. Apidog Documentation