Local Models Just Got a Massive Upgrade
45sOpens with a bold claim about saving thousands by ditching cloud providers, hooking viewers interested in cost-cutting.
▶ Play ClipLocal AI models have recently seen significant improvements, making them viable for many tasks even on mid-tier hardware. This guide explains how to run local models on your own computer using Ollama and connect them to OpenClaw, potentially saving thousands of dollars per month by eliminating cloud provider costs. The key is selecting the right model based on your hardware and use case.
Over the past two months, very capable local models have been released that are relatively easy to run on mid-tier hardware.
The experience running local models depends on the chosen model and hardware. Making the wrong choice can force you back to expensive cloud models.
Local models are typically open source, free to use, and can be downloaded, modified, and viewed. You cannot run proprietary models like Opus or GPT-5 locally.
Models are demanding on RAM and GPU. On Mac, check unified memory (e.g., 32GB). On Windows/Linux, check GPU VRAM (e.g., 24GB on a 4090).
Install Ollama from ollama.com using the terminal command. It allows you to pull and run models locally for free.
Recommended model: Gemma 4. Choose a size that fits in your RAM/VRAM. Larger models have better performance but require more memory.
Use 'ollama pull <model-name>' to download the model. For example, 'ollama pull gemma4:2b' for a smaller model.
Run 'ollama run <model-name>' to test the model in the terminal. Type '/exit' to quit.
Run 'openclaw configure', select Ollama as the provider, choose the local model, and restart the gateway with 'openclaw gateway restart'.
Use a local model as default and switch to a cloud model for more challenging tasks. Use '/model' in OpenClaw to switch.
Running local models with Ollama and OpenClaw is straightforward once you understand your hardware constraints. Choose the largest model that fits your RAM/VRAM for best performance, and consider combining local and cloud models for optimal results.
"Title accurately promises a guide on upgraded local models with Ollama and OpenClaw; video delivers detailed setup instructions."
What tool is used to run local models on your computer?
Ollama
04:13
What hardware specification is most important for running local models on a Mac?
Unified memory (RAM)
02:12
What hardware specification is most important for running local models on Windows?
GPU VRAM
03:04
What command is used to download a model in Ollama?
ollama pull <model-name>
10:18
What command is used to list all downloaded models in Ollama?
ollama list
10:57
What command starts the Ollama service in the background?
ollama serve
13:44
What is the recommended local model in the video?
Gemma 4
07:49
How do you configure OpenClaw to use an Ollama model?
Run 'openclaw configure', select Ollama as provider, choose the model, and restart the gateway.
12:08
What command restarts the OpenClaw gateway after adding a model?
openclaw gateway restart
14:39
Why might you want to combine local and cloud models?
Local models are cheaper and private but less capable; cloud models handle more complex tasks.
16:04
Local models upgrade
Highlights the recent improvement in local models, making them a viable alternative to cloud services.
Hardware requirements
Explains the critical hardware constraints that determine which models you can run.
01:43Install Ollama
Introduces the key tool for running local models, which is free and easy to install.
04:09Select a model
Provides guidance on choosing the right model size based on available memory.
07:49Combine local and cloud models
Offers a practical strategy for balancing cost, privacy, and performance.
15:51[00:00] Local models just got a massive upgrade.
[00:03] Just over the past two months,
[00:06] drop that are very capable
[00:09] Even on just mid-tier hardware.
[00:11] So in this video, I'm going to show you
[00:15] and then connect them to tools like OpenClaw
[00:19] of dollars per month and no longer need to rely on
[00:23] Now, I am going to explain this extremely in-depth
[00:26] You need to pick the correct local model,
[00:30] It really depends on your use case,
[00:32] what you're looking for, security,
[00:35] Anyways, let's dive in.
[00:37] So the experience that you're going to have running
[00:40] on the model that you choose
[00:44] Now I want to go over both.
[00:45] And while it's going to seem a little bit
[00:49] because making the wrong decision here
[00:53] to go back to using these cloud based models,
[00:57] Now, the first thing to understand
[01:00] are open source,
[01:03] What that means is that these are models
[01:06] that are free to use, that
[01:10] Okay.
[01:11] So you're not going to be able to run,
[01:12] you know, opus 4.7 locally on your own computer
[01:16] And they want you to pay them a massive amount
[01:21] Same with GPT 5.4 or whatever the newest model is.
[01:24] You can't run that locally.
[01:25] You have kind of a limited selection
[01:29] And when you are picking those models,
[01:32] that are going to be compatible with tool
[01:37] like Open Claw,
[01:40] now in order to determine
[01:43] you do need to understand
[01:46] These models are very demanding on the hardware
[01:49] and specifically on the Ram and the graphics
[01:55] So before we go much further, I'm
[01:58] the specs of the computer
[02:02] If you're working with a higher end
[02:03] MacBook, what you're going to want to do
[02:07] go to about this Mac, and you're going to be looking
[02:12] Now assuming that you have a mac,
[02:15] This is the unified memory that you have available
[02:21] your GPU, and effectively
[02:25] It's not going to be able
[02:27] In this case, I have 32GB, and probably the highest
[02:33] Again, we'll look at that in a second.
[02:34] But what you want to find is, okay if I'm on Mac,
[02:38] Again,
[02:41] if you have a machine that's six, seven,
[02:45] to run local models, and you're
[02:49] just because they're going to be extremely slow
[02:53] If you're on windows,
[02:56] If you're on windows or even a Linux device,
[03:00] or your graphics processing unit,
[03:04] So the Ram in your computer doesn't really matter.
[03:06] It's more about just the Ram. Again,
[03:09] So if you're running a 4090, for example,
[03:13] If you're running maybe an older GPU,
[03:17] Now, typically speaking, these local models
[03:23] So whatever that number is,
[03:27] or performance of local model
[03:30] Now we're going to get into all of the details here,
[03:34] on my on for 99% of you it's going to be Mac
[03:38] If you're on Mac,
[03:41] If you're on a newer Macs on M-series Mac,
[03:44] then you're going to be looking at the total amount
[03:48] processing unit, assuming that that's an Nvidia GPU.
[03:51] And again, you're ideally going to want
[03:55] But even if you're running on a
[03:59] available in the GPU or again on a newer series
[04:04] Okay, so now that we have that out of the way,
[04:08] setting this up on our machine.
[04:09] So what we're going to do in order to run
[04:13] called Ollama.
[04:14] Ollama allows you to pull down different models
[04:19] The only limitation for running these models
[04:23] it's completely free.
[04:23] You don't need to pay for anything
[04:26] like Open Claw,
[04:29] However, before we go any further, I want to make you
[04:31] aware of a really cool opportunity,
[04:35] It's called ClawComp.
[04:36] Now Link Ventures is running it,
[04:39] a free Mac mini to build with Open Claw,
[04:43] You can keep it no matter where you finish.
[04:46] No matter what happens,
[04:49] Now here's how it works.
[04:50] It's a multi-month build program, so it's designed
[04:54] You apply with a team of 1 to 3 people.
[04:57] You hang out in their discord, you talk in their
[05:01] And then you get three months to actually build
[05:05] So not a wrapper, not a demo that falls apart
[05:08] in an actual automation with measurable outcomes.
[05:11] Now the whole thing ends
[05:13] This is from June 15th to June 18th in Cambridge,
[05:17] At Link Studios.
[05:18] They fly you out,
[05:19] they cover the housing, and you spend four days
[05:23] Now, founders and investors from the link ecosystem
[05:26] Researchers from Harvard
[05:29] companies are walking around meeting teams
[05:32] Lock now the prize for $17,500
[05:38] So if you're a student building something
[05:42] that you've been meaning to take seriously,
[05:45] for three months with a real deadline
[05:49] It's an awesome opportunity. Again,
[05:52] Link is in the description.
[05:53] Go apply, get in the discord stand out
[05:58] Okay, so that said, let's get back to the video here.
[05:59] I want to go through the setup process.
[06:01] So the first thing we're going to have to do here,
[06:05] and if you're running on a virtual private server,
[06:08] You're going to go over to your terminal
[06:10] and you're just going to run this command
[06:13] I'm going to leave a link to in the description,
[06:16] You're going to copy the install command.
[06:17] And even if you already have Ollama installed,
[06:22] because you will need the newest version in order
[06:26] Okay, so you open a terminal or command prompt.
[06:27] You paste this command, you hit enter
[06:31] Now like I said, Ollama is just a tool runs on
[06:36] So we're going to wait for this installation
[06:38] Once it's done, I'll be right back and then we're
[06:41] And then once we have the local model running, we're
[06:46] And good timing.
[06:47] It looks like it's already installed.
[06:48] While I was just doing that speech.
[06:50] Okay, so we have Ollam installed now
[06:54] We're just going to type the Ollama command.
[06:55] If for some reason this Ollama command
[06:59] and reopen it or close your command prompt
[07:03] and you should just see something popping up
[07:08] And then what you can do
[07:10] Okay, get out of this interactive window.
[07:13] Now if you're on a virtual private server,
[07:18] you're going to want to make sure
[07:22] that I talked about before, right?
[07:24] So in the case of a Linux machine, you're going
[07:29] If you have that, then you're going to be able
[07:33] So the steps that I'm showing you here
[07:35] work on any operating system,
[07:38] whether it's on a VPNs,
[07:41] hardware in order to have a decent experience
[07:44] Okay, so now that we have Ollama installed,
[07:46] what we need to do is select
[07:49] So what I'm going to recommend is that we use
[07:54] For now. There's a bunch of other models
[07:57] At least when I'm filming this video.
[07:58] This is the current, smallest and best model
[08:03] But what we're looking for
[08:05] is that they have the ability to cull tools
[08:09] Specifically, you need this.
[08:10] You don't just want a chat based model,
[08:14] All of these different modes like Gemma for has.
[08:17] And if you want to see all the different models
[08:22] or just go to the models tab and you can look
[08:26] For example Gwen 3.6.
[08:28] Also another great model,
[08:31] So I'm going to recommend
[08:34] So once we've selected that Gemma four
[08:38] Is this Ollama run or Ollama pull?
[08:41] I'm going to show you the command and a second
[08:42] Gemma for this is a command
[08:46] But before you do that, you need to select
[08:51] Now you'll notice that if we go to the models down
[08:55] We have Gemma for latest, Gemma for 2 billion, Gemma
[08:58] for 4 billion, Gemma
[09:02] And then the cloud one
[09:05] Now the billion value here
[09:09] The larger the amount of parameters,
[09:10] the better performance of the model,
[09:14] So really the thing that you want to look at here
[09:18] So we see nine gigabytes, seven gigabytes,
[09:22] Now you want to make sure that this size is smaller
[09:26] that you have in your computer
[09:29] Again, based on what I talked about before,
[09:33] I have 32GB of Ram, so I'm fine to go all the way
[09:39] want to go much higher than that because the bigger
[09:43] And again,
[09:47] Now theoretically, you can run any model you want,
[09:51] as long as you had enough hardware space.
[09:52] But it's going to be so incredibly slow
[09:54] that you're not going to be able
[09:57] So that's why I'm emphasizing
[10:00] that it fits into the Ram
[10:04] So that's what we're looking for.
[10:05] Specifically, you want to pick the biggest model
[10:09] Okay.
[10:10] So I'm going to go with let's just go with the 31 B.
[10:13] Right. Because that's going to work for my machine.
[10:14] Even though it might be a little bit slow.
[10:16] And what I'm going to do
[10:18] and I'm going to type the command Alama pull.
[10:21] And then I'm going to paste this Gemma for 31 B.
[10:25] Now, for most of you, you're
[10:27] You might go with 4 billion right.
[10:29] Or E 4 billion E 2 billion,
[10:32] So you might put E to be right.
[10:34] Whatever the name is that you see here,
[10:37] what this is going to do
[10:40] It's going to pull all nine, 15, 20GB, whatever.
[10:43] And once that's done we're good to actually right now
[10:48] So I'm not going to run this command.
[10:49] But if you don't again you need to download it first.
[10:51] It might take a few minutes
[10:53] Now once it's downloaded,
[10:57] that you have available is to type
[11:00] When you do that, it's going to give you a list
[11:03] You can see that
[11:06] and this is the one that I'll end up
[11:09] But you can see all of these.
[11:10] You can also just go Ollama help.
[11:12] And if you do that, it's going to give you a list
[11:15] where you can create a new model, you can pull
[11:19] There's all kinds of advanced stuff.
[11:22] The point is, awesome is super
[11:25] Okay, so now that the model is installed, we can just
[11:29] And then we're going to go with whatever the name is.
[11:31] So my case I'm just going to go Gemma for
[11:35] installed and it should take a second here
[11:39] So I'm just going to go hello world.
[11:40] And then you can see immediately
[11:45] You know I'm good.
[11:45] How are you? Right. Whatever I'm doing. Well.
[11:47] And then it gives me the response and you can see
[11:49] this is actually quite fast
[11:52] just the nine gigabyte model on my machine,
[11:56] If you want to get out of this window,
[11:59] This is just kind of a terminal based view
[12:02] if you want to do that.
[12:03] Okay, so now that we have the model installed
[12:08] So first we need to make sure we have OpenClaw
[12:10] And again
[12:13] that virtual private server
[12:17] And you would follow the same steps I just did
[12:22] So what I'm about to show you,
[12:26] Okay, I'm going to assume that you have it installed
[12:30] Now what you can do right is we go to open
[12:34] So I have OpenCL installed on my machine.
[12:36] And literally all we need to do to get Ollama
[12:42] configure.
[12:43] Okay.
[12:44] So assuming this is installed on our machine right
[12:48] But for this tutorial I will show you doing it right
[12:50] We're going to type OpenCL configure.
[12:52] Maybe you guys have a mac mini or something.
[12:54] So you're going to do that directly on there.
[12:56] And what we're going to do is go through
[12:59] So where it says model we're going to use our arrow
[13:01] We're going to press enter.
[13:02] And we're going to select down here.
[13:04] Mine's a little bit laggy when it first pops up.
[13:06] But we're going to go down
[13:10] You should see it popping up as an option.
[13:12] Now once it pops up
[13:15] We don't want to use the cloud one.
[13:17] I'm not going to get into that in this video.
[13:18] Ollama has a cloud offering,
[13:21] We're going to go local only.
[13:22] We're just going to leave the base URL as it is.
[13:24] We don't need to change this and continue.
[13:27] And we're going to select the model
[13:30] Now I apologize for the cut here,
[13:33] for some reason Ollama is not appearing or showing,
[13:36] and there's some issue, then what
[13:40] You can hit Ctrl C
[13:44] Now when you run Ollamaserve, this is just going
[13:48] It's going to run directly in this terminal window.
[13:50] So just don't close it.
[13:52] Now you probably don't need to do that.
[13:54] You also can just go to your spotlight search.
[13:56] If you're on something like Mac, and you can just run
[14:01] and you should see you get like a little Ollama
[14:04] meaning that it's running in the background.
[14:06] Okay, so you can see we have a bunch of options here.
[14:08] I'm just going to check this “ollama/gemma4:latest”
[14:12] You can select all these different models.
[14:15] Because these are all the ones
[14:18] And I'm just going to go ahead and press on confirm
[14:21] So you just select the model that we just installed
[14:24] For most of you it's going to be that gem of four.
[14:26] Okay. So we're going to go ahead and press enter.
[14:28] And now both those models or whatever ones
[14:32] So I'm going to press on continue.
[14:34] And then what we're going to do
[14:35] is just restart the gateways
[14:39] We're gonna go open core
[14:44] Now when we run that command it's
[14:47] Both these models should then be available.
[14:49] And then what we can do is just start using them
[14:52] Now, the same thing goes for windows.
[14:54] You can run the Ollama serve command,
[14:57] like application search bar or whatever
[14:59] And then you can just run Ollama and it should run
[15:03] Okay. So you want to make sure that's running
[15:06] Now once we do that
[15:08] Again I'm going to assume you know how to get here
[15:12] cloud before.
[15:13] And what we can do is just asking the question so
[15:17] Can you tell me the meaning of life?
[15:18] Right.
[15:19] And you'll notice that by default,
[15:22] using GEMA four.
[15:23] And if we go enter here
[15:27] And you can see we get the response very quickly.
[15:29] Now if for some reason you have different models
[15:31] probably want to set the default model
[15:35] In order to set the default model,
[15:39] I don't know exactly where the configuration
[15:43] cloud dot json file, there is a way to set
[15:49] And then you're kind of good to go.
[15:51] Now, what I typically will do
[15:53] when I'm using this is I might have a local model
[15:57] And then I also configure a cloud model
[16:01] more challenging,
[16:04] because while these models are good,
[16:08] or opus 4.7,
[16:12] unless you purely just care about privacy
[16:18] So my suggestion is you configure like an open
[16:22] and then the local model,
[16:25] and then you switch to the other model
[16:28] Now what you can do is you can actually type
[16:31] Right. And then you can change the model
[16:33] So I just do slash models.
[16:34] It will give me a list.
[16:37] And then I can say slash model.
[16:38] And then you know Gemma for right.
[16:41] And then switch over to actually use that model.
[16:43] In my case
[16:45] But you also can just directly tell it, hey,
[16:50] Can you switch the model and then open?
[16:52] Close should be capable of just running the command
[16:55] You can see it called the tool
[16:58] Now in this case it's saying you can't do it again
[17:01] By the way,
[17:01] if you are wondering what I'm using to dictate here,
[17:06] I can quickly click into it
[17:09] if we go to the settings, but effectively
[17:14] It uses AI in the background extremely fast,
[17:18] They're free to try out and I just use them anyways,
[17:21] You could see my words per minute here,
[17:23] and I'll leave a link to in the description
[17:25] but especially if you're working with a lot of these
[17:30] rather than to type, right?
[17:31] So if I want to say something, hey,
[17:33] Here's a bullet pointed list of Abcde.
[17:36] Right?
[17:36] And then, you know, we go and it just gives me
[17:39] it, fix the spelling, punctuation,
[17:43] So anyways, that's pretty much all that I have for
[17:46] Setting it up with OpenClaw is very easy.
[17:48] It's a matter of having Ollama installed
[17:51] If you want to go further,
[17:55] as well as your agent start MD file
[17:59] and you can tell it when to use the local model
[18:03] You can install multiple local models. Right.
[18:06] So maybe you want Gemma for for something.
[18:08] Maybe you want Gwen for something else.
[18:09] Maybe there's a specific image or video
[18:13] You can go crazy with the configuration.
[18:15] The important thing is really understanding
[18:18] And again, the performance
[18:21] is going to be dictated by that hardware.
[18:23] In combination with that model selection.
[18:25] Probably you just want Gemma,
[18:28] and see what experience you get.
[18:30] In my case, Gemma is much, much, much faster
[18:35] that are currently out there.
[18:36] So that's it guys. I'm gonna wrap up the video here.
[18:38] If you enjoyed, make sure they like subscribe
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.