[0:00] a couple of weeks ago I showed off this
[0:02] the home assistant voice preview Edition
[0:05] an esp32 based smart speaker that if set
[0:08] up to use whisper and Piper handles
[0:11] voice commands and home automation fully
[0:15] locally through home assistant it's
[0:17] amazing but the one thing that it's
[0:19] missing is a proper conversation agent
[0:24] sure asking it to turn your lights on
[0:26] and set timers is useful but being able
[0:29] to ask questions and get answers would
[0:32] be really nice that is where AMA comes
[0:36] in ama is an open-source way to run
[0:39] large language models locally and it
[0:42] turns out it's ridiculously easy to set
[0:45] up and connect to home assistant voice
[0:48] so let me walk you through it and then
[0:50] demo just how useful it is first things
[0:53] first you'll need a Lama setup and
[0:55] running and you need a reasonably
[0:58] powerful PC for this ideally with a
[1:00] graphics card with lots of vram although
[1:03] it can run just on the CPU relatively
[1:07] quickly so long as you have enough
[1:09] system Ram too in my case I already have
[1:13] two home servers running already one for
[1:16] work and one for personal use so set up
[1:19] on my personal Nas which is running the
[1:21] latest stable build of true N scale and
[1:25] that latest stable build part is
[1:27] actually really important only in the
[1:30] last few months has tras finally
[1:33] migrated to using Docker containers as
[1:36] their Apps Manager that means installing
[1:39] a Lama is as easy as clicking install in
[1:42] the App Store setting up any parameters
[1:45] you might want like letting it use
[1:47] existing storage pools if you'd prefer
[1:50] and setting up how many CPU cores to
[1:52] allow it to use as well as how much RAM
[1:55] and if you can pass a GPU into it then
[1:59] that too then they're just hitting save
[2:02] that's pretty much it ready to go you
[2:05] can open the container shell and try and
[2:08] set up the various models and you know
[2:11] try them out right in the command line
[2:13] and this does actually give you a bit
[2:15] better control too although we'll come
[2:17] back to that in a second but you can
[2:19] also just leave it alone and head to
[2:21] home assistant you want to head to the
[2:23] devices and Integrations page and add
[2:26] the AMA integration put in the IP
[2:29] address of the Alama server and the
[2:31] correct port and then it will ask you
[2:34] what model you want to use here you can
[2:37] take your pick although this is where
[2:39] the command line interface might
[2:41] actually be beneficial if you don't have
[2:45] much RAM like my server did when I first
[2:47] set this up I've since doubled to 32 gig
[2:50] which still isn't that much for a ZFS
[2:52] server but either way you might find
[2:55] that you need to run the smaller
[2:57] versions of the models wherever possible
[3:01] these models all generally come with
[3:03] differing parameter counts the more
[3:05] parameters generally the better the
[3:08] responses but the harder they are to run
[3:11] and certainly the more memory it takes
[3:13] to run them too from home assistant
[3:17] there doesn't seem to be a way to
[3:19] install a particular parameter size
[3:22] version of a listed model it will just
[3:25] download the default which is often the
[3:28] largest one if if you need to pick a
[3:30] smaller one you might need to use the
[3:33] command line interface anyway which you
[3:35] can then run a Lama run model Name colon
[3:39] parameter size so as an example AMA Run
[3:43] Deep se- R1 colon
[3:45] 1.5b or if you just want to download the
[3:48] model but not run it swap run for pull
[3:52] once you've pulled the models you want
[3:53] or just pick the default in home
[3:55] assistant saving that will create an
[3:57] entity for the model in home assistant
[4:00] you can create as many of these entities
[4:02] as you want and try and swap them out
[4:05] see how the the responses work but once
[4:08] you've got that to be able to use the
[4:10] model with voice head to Voice Assistant
[4:12] click on local assistant and then change
[4:15] the conversation agent to the Alama
[4:17] model you want to use in the settings
[4:21] for that you can change the instruction
[4:23] given to the model along with the
[4:25] context window size Max message history
[4:28] and the keep alive typ personally I
[4:31] would recommend changing that from minus
[4:33] one which means permanently keep the
[4:35] model alive to something reasonable like
[4:38] a few minutes so it doesn't end up
[4:40] sprawling all over your systems Ram
[4:43] permanently there is also a setting to
[4:46] let the llm control home assistant
[4:49] although you will find that it's best to
[4:51] leave that set to No Control partially
[4:54] because a bunch of models don't support
[4:56] tools a required feature for the contr
[4:59] control to work and partially because of
[5:02] the setting in the main menu there
[5:05] prefer handling commands locally this
[5:08] means that commands like turn on lights
[5:11] are handled by home assistant or the
[5:13] home assistant agents while questions
[5:16] that it doesn't know the answer to are
[5:18] passed off to the llm that means those
[5:21] commands are run faster and more
[5:23] efficiently and it keeps the
[5:25] compatibility problem at Bay 2 so now
[5:28] with that set up we've successfully
[5:30] connected home assistant voice to a
[5:32] locally run large language model it's
[5:35] remarkably simple and works pretty well
[5:39] okay
[5:40] Nabu how long is an inch in
[5:54] cenm 1 in is approximately equal to 2.54
[5:58] cm
[5:59] there you go now responses do take a bit
[6:02] longer than the built-in conversation
[6:04] agent because it's now doing speech to
[6:06] text generating response from the llm
[6:09] and then text to speech to voice the
[6:12] answer but considering this is all
[6:14] running locally I'm pretty happy with
[6:17] that the one thing that you'll want to
[6:19] try out and consider is which model to
[6:22] use well deep seek R1 is the new hotness
[6:26] the r in the name may as well stand for
[6:28] reasoning because the responses it gives
[6:31] contain an awful lot of text for a
[6:34] relatively simple question asking it
[6:37] what you know an inch in centimeters is
[6:40] spits out two large paragraphs of
[6:43] thinking followed by a single sentence
[6:46] answer if you could maybe filter out
[6:48] everything in the think tags that might
[6:51] work but for me I just opted to use Lama
[6:54] 3.2 instead that seems to work pretty
[6:57] well the only other thing that I'd like
[6:59] to add to this is the ability to search
[7:01] the web although that doesn't seem like
[7:04] the most simple of additions so I'll
[7:07] have to put that one on hold for the
[7:09] time being in short then setting up a
[7:12] llama and home assistant voice is
[7:14] remarkably easy and the results well a
[7:17] little on the slow side are great let me
[7:20] know in the comments if you set this up
[7:22] and how you get on with it and which
[7:25] model you end up choosing as well so
[7:27] yeah that's how to set it up if you want
[7:29] to see more videos like this one you can
[7:30] hit the Subscribe button check out
[7:31] plenty of other videos in the end cards
[7:33] including the home assistant voice
[7:35] preview Edition review that'll be in the
[7:37] cards on the end cards and in the cards
[7:39] above and otherwise that's pretty much
[7:41] it hope you enjoyed the video thank you
[7:43] for watching if you want to check out my
[7:44] own Open Source Hardware the open source
[7:46] response time tool and open source
[7:48] latency testing tool those are available
[7:50] at os.com Linked In the description
[7:53] nowise yeah thanks for watching hope you
[7:54] enjoyed it we'll see you on the next
[7:56] video