Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!

I've inspected the latest response from Mistral: Mistral-Small-24B-Instruct. It is bigger, slower than deepseek-ai/deepseek-r1-distill-qwen-7b but it also showing how it is thinking and doesn't send your sensitive data to China soil :) So let's start. This project provides an interactive chat interface for the mistralai/Mistral-Small-24B-Instruct-2501 model using PyTorch and the Hugging Face Transformers library. Requirements Python 3.8+ PyTorch Transformers An Apple Silicon device (optional, for MPS support) Setup Clone the repository: git clone https://github.com/alexander-uspenskiy/mistral.git cd mistral Create and activate a virtual environment: python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` Install the required packages: pip install torch transformers Set your Hugging Face Hub token: export HUGGINGFACE_HUB_TOKEN=your_token_here Usage Run the chat interface: python mistral.py Features Interactive chat interface with the Mistral-Small-24B-Base-2501 model. Progress indicator while generating responses. Supports Apple Silicon GPU (MPS) for faster inference. Code: import torch from transformers import AutoModelForCausalLM, AutoTokenizer import os import time import threading # Check if MPS (Apple Silicon GPU) is available device = torch.device("mps" if torch.backends.mps.is_available() else "cpu") # Load the Mistral-Small-24B-Base-2501 model model_name = "mistralai/Mistral-Small-24B-Instruct-2501" token = os.getenv("HUGGINGFACE_HUB_TOKEN") tokenizer = AutoTokenizer.from_pretrained(model_name, token=token) model = AutoModelForCausalLM.from_pretrained( model_name, device_map={"": device}, torch_dtype=torch.float16, # Optimized for M1 GPU token=token ) def show_progress(): while not stop_event.is_set(): for char in "|/-\\": print(f"\rGenerating response {char}", end="", flush=True) time.sleep(0.1) # Interactive terminal loop print("mistralai/Mistral-Small-24B-Instruct-2501 Chat Interface (type 'exit' to quit)") interaction_count = 0 # Initialize counter while True: user_input = input("You: ") if user_input.lower() in ["exit", "quit"]: break inputs = tokenizer(user_input, return_tensors="pt").to(device) stop_event = threading.Event() progress_thread = threading.Thread(target=show_progress) progress_thread.start() with torch.no_grad(): outputs = model.generate( **inputs, max_length=200, do_sample=True, temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id # Added to avoid warning ) stop_event.set() progress_thread.join() print("\r", end="") # Clear the progress line response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Mistral: {response}\n") Example: You: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs Mistral: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs? Let's denote the cost of the ball as $ B $ and the cost of the bat as $ B + 1.00 $. According to the problem, the total cost of the bat and the ball is $1.10. Therefore, we can write the equation: \[ B + (B + 1.00) = 1.10 \] Simplifying the equation: \[ 2B + 1.00 = 1.10 \] Subtract 1.00 from both sides: \[ 2B = 0.10 \] Divide both sides by 2: \[ B = 0.05 \] So, the ball costs $0.05. Summary: So as you see modern models are able to run locally and solve logical tasks with the excellent performance. Happy coding!

Feb 4, 2025 - 03:21

Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!

I've inspected the latest response from Mistral: Mistral-Small-24B-Instruct. It is bigger, slower than deepseek-ai/deepseek-r1-distill-qwen-7b but it also showing how it is thinking and doesn't send your sensitive data to China soil :)

So let's start.

This project provides an interactive chat interface for the mistralai/Mistral-Small-24B-Instruct-2501 model using PyTorch and the Hugging Face Transformers library.

Requirements

Python 3.8+
PyTorch
Transformers
An Apple Silicon device (optional, for MPS support)

Setup
Clone the repository:

git clone https://github.com/alexander-uspenskiy/mistral.git
cd mistral

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:

pip install torch transformers

Set your Hugging Face Hub token:

export HUGGINGFACE_HUB_TOKEN=your_token_here

Usage

Run the chat interface:

python mistral.py

Features

Interactive chat interface with the Mistral-Small-24B-Base-2501 model.
Progress indicator while generating responses.
Supports Apple Silicon GPU (MPS) for faster inference.

Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import time
import threading

# Check if MPS (Apple Silicon GPU) is available
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

# Load the Mistral-Small-24B-Base-2501 model
model_name = "mistralai/Mistral-Small-24B-Instruct-2501"
token = os.getenv("HUGGINGFACE_HUB_TOKEN")

tokenizer = AutoTokenizer.from_pretrained(model_name, token=token)


model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map={"": device},
    torch_dtype=torch.float16,  # Optimized for M1 GPU
    token=token
)

def show_progress():
    while not stop_event.is_set():
        for char in "|/-\\":
            print(f"\rGenerating response {char}", end="", flush=True)
            time.sleep(0.1)

# Interactive terminal loop
print("mistralai/Mistral-Small-24B-Instruct-2501 Chat Interface (type 'exit' to quit)")
interaction_count = 0  # Initialize counter
while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break

    inputs = tokenizer(user_input, return_tensors="pt").to(device)

    stop_event = threading.Event()
    progress_thread = threading.Thread(target=show_progress)
    progress_thread.start()

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=200,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id  # Added to avoid warning
        )

    stop_event.set()
    progress_thread.join()
    print("\r", end="")  # Clear the progress line

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Mistral: {response}\n")

Example:

You: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs                         
Mistral: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs?
Let's denote the cost of the ball as \( B \) and the cost of the bat as \( B + 1.00 \).

According to the problem, the total cost of the bat and the ball is $1.10. Therefore, we can write the equation:

\[ B + (B + 1.00) = 1.10 \]

Simplifying the equation:

\[ 2B + 1.00 = 1.10 \]

Subtract 1.00 from both sides:

\[ 2B = 0.10 \]

Divide both sides by 2:

\[ B = 0.05 \]

So, the ball costs $0.05.

Summary:

So as you see modern models are able to run locally and solve logical tasks with the excellent performance.

Happy coding!