Run a Local LLM: Ollama + Home Assistant

0h 16m video Published Jul 30, 2025 Transcribed Jul 28, 2026 StratoBuilds

StratoBuilds

Intermediate 8 min read For: Home automation enthusiasts and developers interested in running local LLMs for smart home integration.

AI Trust Score 95/100

✅ Highly Legit

"Title accurately reflects content: the video delivers a complete guide to running a local LLM with Ollama and Home Assistant."

AI Summary

The video demonstrates how to run a local LLM using Ollama on a Mac Mini M4 Pro, integrated with Home Assistant for voice control and a WLED animation to visualize LLM activity. The setup prioritizes speed and reliability through cached responses and multiple models.

Chapters

1 Introduction and Hardware Overview 0:00 2 Live Demo and Cached Responses Strategy 1:38 3 Step-by-Step Setup: Ollama on Mac 5:01 4 Model Testing and Performance 8:50 5 Home Assistant Integration 11:25 6 Open Web UI and Conclusion 13:47

[0:00]

Local LLM Setup Overview

The creator runs Ollama on a Mac Mini M4 Pro, integrated with Home Assistant and a WLED animation that visualizes LLM workload.

[0:55]

Why Apple Silicon for LLMs

Apple Silicon's unified memory architecture allows GPU and CPU to share memory, giving models more headroom compared to Windows machines with limited VRAM.

[2:30]

Cached Responses Strategy

To improve speed, the creator uses cached responses: a larger model generates summaries (e.g., weather) periodically, which are cached and served instantly by a lightweight model for real-time voice interactions.

[3:55]

WLED Animation for LLM Activity

A WLED animation (SOAP) speeds up or slows down based on the Mac Mini's power draw, visually indicating LLM activity. The Mac Mini idles at <5W and peaks at 65W.

[5:01]

Setting Up Ollama on Mac

Steps: set a static IP, download Ollama from ollama.com, drag to Applications, run it, and expose it to the network via settings.

[6:45]

Testing Models with Terminal

Use command 'ollama run <model> verbose' to download and test models. Example: 'ollama run qwen2.5:4b verbose'.

[8:50]

Performance Metrics

For Home Assistant voice, 50 tokens per second is the minimum; larger models like DeepSeek 32B run at ~10 tokens/s, usable for background tasks but not real-time.

[10:24]

Model Selection for Home Assistant

Models must support tool use. The creator recommends Qwen 3 (4B or 8B) for Home Assistant; Mistral 3 24B for heavier tasks. Avoid Llama models due to errors.

[11:25]

Integrating Ollama with Home Assistant

Add Ollama integration in Home Assistant, enter the Mac Mini's IP address and port (11434), then add conversation agents for different models.

[13:47]

Open Web UI for Local Chat

Open Web UI provides a ChatGPT-like interface to interact with Ollama models, allowing side-by-side comparison of different models.

Running a local LLM with Ollama on a Mac Mini is feasible and efficient, especially with Apple Silicon. Using cached responses and multiple models optimizes speed and accuracy for Home Assistant voice control.

Mentioned in this Video

Ollama

tool

Home Assistant

tool

Open Web UI

tool

WLED

tool

StrottleBuilds

person

Brilliant

link

Full write-up

link

Tutorial Checklist

1 5:07 Set a static IP address on your Mac.

2 5:24 Download Ollama from ollama.com and drag to Applications.

3 5:47 Run Ollama and in settings, expose it to the network.

4 6:01 Verify other devices can reach Ollama by navigating to http://<Mac-IP>:11434 in a browser.

5 6:39 Open Terminal and run 'ollama run <model> verbose' to download and test a model (e.g., 'ollama run qwen2.5:4b verbose').

6 11:25 In Home Assistant, add the Ollama integration by entering the Mac's IP address and port (11434).

7 11:51 Add conversation agents for different models in Home Assistant.

Study Flashcards (8)

What is the main advantage of Apple Silicon for running local LLMs?

easy Click to reveal answer

Unified memory architecture allows GPU and CPU to share a common memory pool, giving models more headroom compared to limited VRAM on Windows machines.