[0:00] a couple of weeks ago I showed off this [0:02] the home assistant voice preview Edition [0:05] an esp32 based smart speaker that if set [0:08] up to use whisper and Piper handles [0:11] voice commands and home automation fully [0:15] locally through home assistant it's [0:17] amazing but the one thing that it's [0:19] missing is a proper conversation agent [0:24] sure asking it to turn your lights on [0:26] and set timers is useful but being able [0:29] to ask questions and get answers would [0:32] be really nice that is where AMA comes [0:36] in ama is an open-source way to run [0:39] large language models locally and it [0:42] turns out it's ridiculously easy to set [0:45] up and connect to home assistant voice [0:48] so let me walk you through it and then [0:50] demo just how useful it is first things [0:53] first you'll need a Lama setup and [0:55] running and you need a reasonably [0:58] powerful PC for this ideally with a [1:00] graphics card with lots of vram although [1:03] it can run just on the CPU relatively [1:07] quickly so long as you have enough [1:09] system Ram too in my case I already have [1:13] two home servers running already one for [1:16] work and one for personal use so set up [1:19] on my personal Nas which is running the [1:21] latest stable build of true N scale and [1:25] that latest stable build part is [1:27] actually really important only in the [1:30] last few months has tras finally [1:33] migrated to using Docker containers as [1:36] their Apps Manager that means installing [1:39] a Lama is as easy as clicking install in [1:42] the App Store setting up any parameters [1:45] you might want like letting it use [1:47] existing storage pools if you'd prefer [1:50] and setting up how many CPU cores to [1:52] allow it to use as well as how much RAM [1:55] and if you can pass a GPU into it then [1:59] that too then they're just hitting save [2:02] that's pretty much it ready to go you [2:05] can open the container shell and try and [2:08] set up the various models and you know [2:11] try them out right in the command line [2:13] and this does actually give you a bit [2:15] better control too although we'll come [2:17] back to that in a second but you can [2:19] also just leave it alone and head to [2:21] home assistant you want to head to the [2:23] devices and Integrations page and add [2:26] the AMA integration put in the IP [2:29] address of the Alama server and the [2:31] correct port and then it will ask you [2:34] what model you want to use here you can [2:37] take your pick although this is where [2:39] the command line interface might [2:41] actually be beneficial if you don't have [2:45] much RAM like my server did when I first [2:47] set this up I've since doubled to 32 gig [2:50] which still isn't that much for a ZFS [2:52] server but either way you might find [2:55] that you need to run the smaller [2:57] versions of the models wherever possible [3:01] these models all generally come with [3:03] differing parameter counts the more [3:05] parameters generally the better the [3:08] responses but the harder they are to run [3:11] and certainly the more memory it takes [3:13] to run them too from home assistant [3:17] there doesn't seem to be a way to [3:19] install a particular parameter size [3:22] version of a listed model it will just [3:25] download the default which is often the [3:28] largest one if if you need to pick a [3:30] smaller one you might need to use the [3:33] command line interface anyway which you [3:35] can then run a Lama run model Name colon [3:39] parameter size so as an example AMA Run [3:43] Deep se- R1 colon [3:45] 1.5b or if you just want to download the [3:48] model but not run it swap run for pull [3:52] once you've pulled the models you want [3:53] or just pick the default in home [3:55] assistant saving that will create an [3:57] entity for the model in home assistant [4:00] you can create as many of these entities [4:02] as you want and try and swap them out [4:05] see how the the responses work but once [4:08] you've got that to be able to use the [4:10] model with voice head to Voice Assistant [4:12] click on local assistant and then change [4:15] the conversation agent to the Alama [4:17] model you want to use in the settings [4:21] for that you can change the instruction [4:23] given to the model along with the [4:25] context window size Max message history [4:28] and the keep alive typ personally I [4:31] would recommend changing that from minus [4:33] one which means permanently keep the [4:35] model alive to something reasonable like [4:38] a few minutes so it doesn't end up [4:40] sprawling all over your systems Ram [4:43] permanently there is also a setting to [4:46] let the llm control home assistant [4:49] although you will find that it's best to [4:51] leave that set to No Control partially [4:54] because a bunch of models don't support [4:56] tools a required feature for the contr [4:59] control to work and partially because of [5:02] the setting in the main menu there [5:05] prefer handling commands locally this [5:08] means that commands like turn on lights [5:11] are handled by home assistant or the [5:13] home assistant agents while questions [5:16] that it doesn't know the answer to are [5:18] passed off to the llm that means those [5:21] commands are run faster and more [5:23] efficiently and it keeps the [5:25] compatibility problem at Bay 2 so now [5:28] with that set up we've successfully [5:30] connected home assistant voice to a [5:32] locally run large language model it's [5:35] remarkably simple and works pretty well [5:39] okay [5:40] Nabu how long is an inch in [5:54] cenm 1 in is approximately equal to 2.54 [5:58] cm [5:59] there you go now responses do take a bit [6:02] longer than the built-in conversation [6:04] agent because it's now doing speech to [6:06] text generating response from the llm [6:09] and then text to speech to voice the [6:12] answer but considering this is all [6:14] running locally I'm pretty happy with [6:17] that the one thing that you'll want to [6:19] try out and consider is which model to [6:22] use well deep seek R1 is the new hotness [6:26] the r in the name may as well stand for [6:28] reasoning because the responses it gives [6:31] contain an awful lot of text for a [6:34] relatively simple question asking it [6:37] what you know an inch in centimeters is [6:40] spits out two large paragraphs of [6:43] thinking followed by a single sentence [6:46] answer if you could maybe filter out [6:48] everything in the think tags that might [6:51] work but for me I just opted to use Lama [6:54] 3.2 instead that seems to work pretty [6:57] well the only other thing that I'd like [6:59] to add to this is the ability to search [7:01] the web although that doesn't seem like [7:04] the most simple of additions so I'll [7:07] have to put that one on hold for the [7:09] time being in short then setting up a [7:12] llama and home assistant voice is [7:14] remarkably easy and the results well a [7:17] little on the slow side are great let me [7:20] know in the comments if you set this up [7:22] and how you get on with it and which [7:25] model you end up choosing as well so [7:27] yeah that's how to set it up if you want [7:29] to see more videos like this one you can [7:30] hit the Subscribe button check out [7:31] plenty of other videos in the end cards [7:33] including the home assistant voice [7:35] preview Edition review that'll be in the [7:37] cards on the end cards and in the cards [7:39] above and otherwise that's pretty much [7:41] it hope you enjoyed the video thank you [7:43] for watching if you want to check out my [7:44] own Open Source Hardware the open source [7:46] response time tool and open source [7:48] latency testing tool those are available [7:50] at os.com Linked In the description [7:53] nowise yeah thanks for watching hope you [7:54] enjoyed it we'll see you on the next [7:56] video