---
title: 'Home Assistance Voice & Ollama Setup Guide - The Ultimate Local LLM Solution!'
source: 'https://youtube.com/watch?v=6nsiQXCgnYA'
video_id: '6nsiQXCgnYA'
date: 2026-06-20
duration_sec: 0
---

# Home Assistance Voice & Ollama Setup Guide - The Ultimate Local LLM Solution!

> Source: [Home Assistance Voice & Ollama Setup Guide - The Ultimate Local LLM Solution!](https://youtube.com/watch?v=6nsiQXCgnYA)

## Summary

This video demonstrates how to connect a locally running Ollama large language model (LLM) to Home Assistant Voice, enabling fully local voice commands and question answering. The setup is straightforward and leverages Docker containers for easy installation.

### Key Points

- **Home Assistant Voice Preview Edition** [0:00] — An ESP32-based smart speaker that handles voice commands and home automation locally using Whisper and Piper.
- **Missing Conversation Agent** [0:24] — The device lacks a proper conversation agent for answering questions, which Ollama can provide.
- **Ollama Overview** [0:36] — Ollama is an open-source way to run large language models locally, and it's easy to set up with Home Assistant Voice.
- **Hardware Requirements** [0:53] — A reasonably powerful PC with a GPU (lots of VRAM) or at least enough system RAM is needed. The setup uses TrueNAS Scale with Docker containers.
- **Installing Ollama on TrueNAS** [1:39] — Installation is as simple as clicking install in the App Store, setting parameters like storage pools, CPU cores, RAM, and optionally passing a GPU.
- **Home Assistant Integration** [2:21] — Add the Ollama integration in Home Assistant by entering the server IP and port, then select a model. The command line allows choosing smaller parameter sizes.
- **Model Selection Example** [3:43] — Example: 'ollama run deepseek-r1:1.5b' or 'ollama pull deepseek-r1:1.5b' to download without running.
- **Voice Assistant Configuration** [4:00] — In Home Assistant, go to Voice Assistant, select local assistant, and change the conversation agent to the Ollama model. Adjust settings like instruction, context window, max message history, and keep alive time.
- **Keep Alive Setting** [4:31] — Recommend changing from -1 (permanent) to a few minutes to avoid permanent RAM usage.
- **LLM Control of Home Assistant** [4:46] — Leave LLM control set to 'No Control' to avoid compatibility issues; commands are handled locally by Home Assistant, while questions go to the LLM.
- **Demo: Question Answering** [5:40] — Asking 'How long is an inch in centimeters?' yields a response: '1 inch is approximately equal to 2.54 cm.' Responses are slower due to speech-to-text, LLM generation, and text-to-speech.
- **Model Recommendation** [6:19] — DeepSeek R1 produces verbose thinking text; the creator prefers Llama 3.2 for cleaner responses.
- **Future Addition: Web Search** [7:01] — Web search capability is desired but not yet implemented.

### Conclusion

Setting up Ollama with Home Assistant Voice is remarkably easy and provides a fully local, private LLM-powered voice assistant. While responses are slightly slower, the results are great for answering questions and controlling home automation.

## Transcript

a couple of weeks ago I showed off this
the home assistant voice preview Edition
an esp32 based smart speaker that if set
up to use whisper and Piper handles
voice commands and home automation fully
locally through home assistant it's
amazing but the one thing that it's
missing is a proper conversation agent
sure asking it to turn your lights on
and set timers is useful but being able
to ask questions and get answers would
be really nice that is where AMA comes
in ama is an open-source way to run
large language models locally and it
turns out it's ridiculously easy to set
up and connect to home assistant voice
so let me walk you through it and then
demo just how useful it is first things
first you'll need a Lama setup and
running and you need a reasonably
powerful PC for this ideally with a
graphics card with lots of vram although
it can run just on the CPU relatively
quickly so long as you have enough
system Ram too in my case I already have
two home servers running already one for
work and one for personal use so set up
on my personal Nas which is running the
latest stable build of true N scale and
that latest stable build part is
actually really important only in the
last few months has tras finally
migrated to using Docker containers as
their Apps Manager that means installing
a Lama is as easy as clicking install in
the App Store setting up any parameters
you might want like letting it use
existing storage pools if you'd prefer
and setting up how many CPU cores to
allow it to use as well as how much RAM
and if you can pass a GPU into it then
that too then they're just hitting save
that's pretty much it ready to go you
can open the container shell and try and
set up the various models and you know
try them out right in the command line
and this does actually give you a bit
better control too although we'll come
back to that in a second but you can
also just leave it alone and head to
home assistant you want to head to the
devices and Integrations page and add
the AMA integration put in the IP
address of the Alama server and the
correct port and then it will ask you
what model you want to use here you can
take your pick although this is where
the command line interface might
actually be beneficial if you don't have
much RAM like my server did when I first
set this up I've since doubled to 32 gig
which still isn't that much for a ZFS
server but either way you might find
that you need to run the smaller
versions of the models wherever possible
these models all generally come with
differing parameter counts the more
parameters generally the better the
responses but the harder they are to run
and certainly the more memory it takes
to run them too from home assistant
there doesn't seem to be a way to
install a particular parameter size
version of a listed model it will just
download the default which is often the
largest one if if you need to pick a
smaller one you might need to use the
command line interface anyway which you
can then run a Lama run model Name colon
parameter size so as an example AMA Run
Deep se- R1 colon
1.5b or if you just want to download the
model but not run it swap run for pull
once you've pulled the models you want
or just pick the default in home
assistant saving that will create an
entity for the model in home assistant
you can create as many of these entities
as you want and try and swap them out
see how the the responses work but once
you've got that to be able to use the
model with voice head to Voice Assistant
click on local assistant and then change
the conversation agent to the Alama
model you want to use in the settings
for that you can change the instruction
given to the model along with the
context window size Max message history
and the keep alive typ personally I
would recommend changing that from minus
one which means permanently keep the
model alive to something reasonable like
a few minutes so it doesn't end up
sprawling all over your systems Ram
permanently there is also a setting to
let the llm control home assistant
although you will find that it's best to
leave that set to No Control partially
because a bunch of models don't support
tools a required feature for the contr
control to work and partially because of
the setting in the main menu there
prefer handling commands locally this
means that commands like turn on lights
are handled by home assistant or the
home assistant agents while questions
that it doesn't know the answer to are
passed off to the llm that means those
commands are run faster and more
efficiently and it keeps the
compatibility problem at Bay 2 so now
with that set up we've successfully
connected home assistant voice to a
locally run large language model it's
remarkably simple and works pretty well
okay
Nabu how long is an inch in
cenm 1 in is approximately equal to 2.54
cm
there you go now responses do take a bit
longer than the built-in conversation
agent because it's now doing speech to
text generating response from the llm
and then text to speech to voice the
answer but considering this is all
running locally I'm pretty happy with
that the one thing that you'll want to
try out and consider is which model to
use well deep seek R1 is the new hotness
the r in the name may as well stand for
reasoning because the responses it gives
contain an awful lot of text for a
relatively simple question asking it
what you know an inch in centimeters is
spits out two large paragraphs of
thinking followed by a single sentence
answer if you could maybe filter out
everything in the think tags that might
work but for me I just opted to use Lama
3.2 instead that seems to work pretty
well the only other thing that I'd like
to add to this is the ability to search
the web although that doesn't seem like
the most simple of additions so I'll
have to put that one on hold for the
time being in short then setting up a
llama and home assistant voice is
remarkably easy and the results well a
little on the slow side are great let me
know in the comments if you set this up
and how you get on with it and which
model you end up choosing as well so
yeah that's how to set it up if you want
to see more videos like this one you can
hit the Subscribe button check out
plenty of other videos in the end cards
including the home assistant voice
preview Edition review that'll be in the
cards on the end cards and in the cards
above and otherwise that's pretty much
it hope you enjoyed the video thank you
for watching if you want to check out my
own Open Source Hardware the open source
response time tool and open source
latency testing tool those are available
at os.com Linked In the description
nowise yeah thanks for watching hope you
enjoyed it we'll see you on the next
video