TubeSum ← Transcribe a video

Running LLMs Locally Just Got Way Better - Ollama + MCP

Transcribed Jun 14, 2026 Watch on YouTube ↗
Intermediate 12 min read For: Developers and tech enthusiasts interested in running local LLMs with tool integration.
125.8K
Views
3.0K
Likes
96
Comments
175
Dislikes
2.5%
📈 Moderate

AI Summary

This video demonstrates how to run a local LLM on your own machine and connect it to external tools like Google, Notion, and Facebook Ads using Ollama and the Zapier MCP server. The goal is to achieve tool-use capabilities similar to Claude or OpenAI, but completely free, private, and secure.

[00:00]
Introduction to local LLM with tool use

The video promises to show how to run a capable local model and connect it to external services for free, private, and secure tool use.

[00:35]
Using Ollama and Zapier MCP

Ollama is used to run local models, and the Zapier MCP server connects to over 8,000 integrations, allowing the LLM to access external tools.

[01:13]
LLM vs AI agent explained

An LLM is a chatbot that predicts text; an AI agent can take actions by calling tools. Connecting an LLM to tools turns it into an agent.

[02:26]
Installing Ollama

Download Ollama from ollama.com and install it. Update with 'ollama update' command if already installed.

[03:45]
Choosing a model based on hardware

Model selection depends on GPU and RAM. Macs with unified memory can use RAM for models; Windows relies on VRAM. Newer devices perform better.

[06:00]
Selecting a tool-calling model

Models must support tool calling. Example: Qwen 3.5. Parameter count affects performance and RAM usage; choose based on available RAM.

[08:21]
Pulling and testing the model

Use 'ollama pull <model>' to download. Test with 'ollama run <model>'. Larger models may be slow; adjust parameter size as needed.

[10:58]
Setting up MCP client for Ollama

Ollama doesn't natively support MCP; use 'ollama-mcp' bridge. Install via pip: 'pip install ollama-mcp'.

[11:37]
Configuring Zapier MCP server

Create a Zapier MCP server, connect tools (e.g., Notion, Google Calendar), generate a token, and copy the URL with token.

[15:55]
Running the model with MCP integration

Run 'ollama-mcp --mcp-server <URL> --model <model>' to connect. Test tool calls like reading Notion or creating calendar events.

[19:42]
Using local model in code

Ollama exposes a REST API. Use LangChain with MCP adapters to build agents in code. Example: query calendar events.

Running a local LLM with tool integration is powerful and relatively easy to set up. The trade-off is speed vs. accuracy, but it's a huge unlock for privacy and control.

Clickbait Check

85% Legit

"Title accurately reflects content: local LLM with tool integration is demonstrated, though performance is slower than cloud models."

Mentioned in this Video

Tutorial Checklist

1 02:26 Download and install Ollama from ollama.com.
2 03:45 Open terminal and run 'ollama' to verify installation.
3 06:00 Choose a model that supports tool calling (e.g., Qwen 3.5) based on your hardware (RAM/VRAM).
4 08:21 Pull the model: 'ollama pull qwen3.5:27b'.
5 10:58 Install ollama-mcp: 'pip install ollama-mcp'.
6 11:37 Go to Zapier MCP server, create a new server, connect tools (e.g., Notion, Google Calendar), and generate a token.
7 15:55 Run the model with MCP: 'ollama-mcp --mcp-server "<URL with token>" --model qwen3.5:27b'.
8 19:42 Optionally, use Ollama's REST API with LangChain to integrate in code.

Study Flashcards (12)

What is the difference between an LLM and an AI agent?

easy Click to reveal answer

An LLM is a chatbot that predicts text; an AI agent can take actions by calling tools.

01:13

What does MCP stand for?

easy Click to reveal answer

Model Context Protocol.

00:45

How many integrations does the Zapier MCP server support?

easy Click to reveal answer

Over 8,000.

00:47

What command is used to pull a model in Ollama?

easy Click to reveal answer

ollama pull <model>

08:30

What hardware specification is most important for running local models on a Mac?

medium Click to reveal answer

Unified memory (RAM) because it is shared with the GPU.

04:22

What hardware specification is most important for running local models on Windows?

medium Click to reveal answer

VRAM (video RAM) of the graphics card.

04:52

Why must a model support 'tool calling' for this tutorial?

medium Click to reveal answer

Because the model needs to call external tools via MCP to perform actions.

06:00

What is the trade-off when using a model with more parameters?

medium Click to reveal answer

Better performance but requires more RAM and is slower on less powerful hardware.

07:00

How do you install the ollama-mcp bridge?

easy Click to reveal answer

pip install ollama-mcp

10:58

What command runs the Ollama model with a specific MCP server?

hard Click to reveal answer

ollama-mcp --mcp-server "<URL>" --model <model>

15:55

What is the purpose of the token generated in Zapier MCP?

medium Click to reveal answer

It authenticates the connection between Ollama and the Zapier MCP server.

14:10

Can Ollama natively support MCP?

medium Click to reveal answer

No, it requires a bridge like ollama-mcp.

11:12

💡 Key Takeaways

💡

LLM vs AI agent distinction

Clarifies the key concept that an LLM becomes an agent only when connected to tools.

01:13
📊

Hardware requirements for local models

Explains the critical role of RAM/VRAM in model selection and performance.

03:45
🔧

Tool calling capability requirement

Highlights that not all models support tool calling, which is essential for this setup.

06:00
🔧

Zapier MCP server setup

Demonstrates how to connect over 8,000 integrations via a single MCP server.

11:37
🔧

Code integration with LangChain

Shows how to programmatically use local models with MCP for building products.

19:42

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Run Local AI with Tool Access for Free

32s

Promises free, private AI with tool access like Claude or OpenAI, appealing to privacy-conscious users.

▶ Play Clip

LLM vs AI Agent: Key Difference Explained

44s

Clearly explains a confusing concept (LLM vs agent) in under a minute, highly educational for beginners.

▶ Play Clip

Pick the Right Model: RAM & VRAM Guide

55s

Practical hardware advice for running local models, directly useful for viewers wanting to try it themselves.

▶ Play Clip

Connect Local LLM to 8000+ Tools via MCP

48s

Shows how to integrate local LLM with Zapier's 8000+ apps, a powerful and surprising capability.

▶ Play Clip

Local LLM Creates Calendar Events Live

60s

Demonstrates a real-world action (creating calendar events) using a local model, proving its practical utility.

▶ Play Clip

[00:00] If you want to run a local model that is

[00:02] free, private, and capable of connecting

[00:04] to all of your external tools like

[00:06] Google, notion, Facebook ads, you name

[00:09] it, anything that you want, then keep

[00:11] watching this video. What I'm going to

[00:13] show you how to do here is how to run a

[00:14] capable local model on your own machine,

[00:17] but more importantly, how to connect it

[00:18] to external services so that you can get

[00:20] the same tool use and advantages that

[00:22] you would by using something like Claude

[00:24] or OpenAI, but completely for free,

[00:27] completely private and secure on your

[00:29] own machine so you control everything.

[00:32] With that said, let's get into the

[00:33] video. Okay, so the way that we're going

[00:35] to do this here is we're going to use

[00:36] Olama. I'll talk about how to install it

[00:38] and set it up. We're going to use a

[00:40] capable model. We'll discuss the

[00:42] criteria for using that in a second. And

[00:44] we're going to use something called the

[00:45] Zapier MCP server. This allows you to

[00:47] connect to over 8,000 different

[00:49] integrations. It is free to use. You can

[00:52] just create an account and you can set

[00:53] it up. And this means that all of your

[00:55] integrations can be handled in one place

[00:56] and you just connect a single MCP server

[00:59] to O Lama, which again I'll explain to

[01:01] you how to do and then you're good to go

[01:03] and you can run a local model and

[01:04] effectively do anything you would with

[01:06] the better ones. Now, there are a few

[01:08] things you should understand before

[01:09] doing this. So, let me quickly go

[01:11] through them. Now, first I'd like you to

[01:13] understand the difference between an

[01:14] LLM, a large language model, which is

[01:16] kind of what we've been talking about,

[01:18] and an AI agent. Now, an AI agent is

[01:21] something that's capable of actually

[01:22] taking action. And the way that it does

[01:24] that is by calling various different

[01:26] tools. An LLM or a large language model

[01:29] is just kind of a standard chatbot. What

[01:32] it's capable of doing is predicting

[01:33] text, in some cases, calling tools or

[01:36] generating videos, images, etc. But what

[01:38] allows it to do that is you connecting

[01:40] it to something external. So if you just

[01:42] have you know chatgbt or you have I

[01:45] don't know claude or enthropic or some

[01:47] base model whatever if you don't connect

[01:49] it to anything it can't really do

[01:50] anything. It can just give you some text

[01:52] back. Now the exact same thing applies

[01:54] here when we're running these models on

[01:56] our own machine. It's very easy to run a

[01:58] local model on your own computer. You

[02:00] can type with it. It can give you a

[02:01] response back. That's cool. But we want

[02:03] to add that tool layer so it can

[02:05] actually take some actions and be useful

[02:07] to us. So that's the key difference. The

[02:09] LLM is kind of like the brain. It can

[02:11] chat with you. It can do things. But

[02:12] what allows it to do something is

[02:14] connecting it to these tools. So when we

[02:16] make that connection now, the LLM goes

[02:18] from a chatbot to something that can

[02:20] actually go out in the real world and

[02:22] take actions on your data, which makes

[02:24] it extremely valuable. Okay, so in terms

[02:26] of setting this up, first thing we need

[02:28] to do is download and install OAL. I'll

[02:30] leave a link to in the description, but

[02:31] you can probably find it faster. It's

[02:33] olama.com. What you're going to do here

[02:35] is just press the download button.

[02:37] Download it for your operating system.

[02:39] And if you already have it, I'd

[02:40] recommend just updating it. You can do

[02:41] that by running this command right here

[02:43] in your terminal. Okay. Now, once you've

[02:45] got Olama installed, go ahead and open

[02:47] up a terminal or a command prompt. If

[02:50] you're on Mac like me, you can just

[02:51] search for terminal in the spotlight

[02:52] search. If you're on Windows, go to the

[02:54] Windows search and search for PowerShell

[02:57] or for CMD or even now I think they've

[02:59] called it terminal as well. Doesn't

[03:01] really matter. Just open it up. From

[03:03] here, you're going to simply type Lama

[03:05] like so. Let's zoom in and just make

[03:08] sure that this is working. Okay. Now, if

[03:10] it's your first time running this, you

[03:12] may need to just make sure that Lama is

[03:14] started on your machine. To do that, you

[03:16] can also just search for the Olama

[03:17] application and just double click to run

[03:19] it. And then it should run in the

[03:20] background. Okay. Okay, so once you've

[03:21] run O Lama here, you might see a screen

[03:23] that looks something like this where it

[03:25] says like, "Oh, launch these different

[03:26] things." This is like a brand new update

[03:28] they just released. For now, to get out

[03:29] of it, you can just hit escape because

[03:31] we're going to kind of pause on this for

[03:32] a second cuz what we need to do is find

[03:34] the model that we want to run. Okay, so

[03:36] from here, if you want to see a list of

[03:37] models that you can possibly run, just

[03:39] go to this models tab and you can see

[03:42] there are a ton of different ones and

[03:43] you can start searching through them.

[03:45] This is where we need to talk about kind

[03:46] of realistic expectations. The type of

[03:49] model that you can run on your computer

[03:51] really depends on the performance of

[03:53] your machine. And the main thing that

[03:55] we're focused on is the graphics

[03:57] processing unit or CPU that you have

[03:59] depending on your operating system, Mac

[04:01] versus Windows, and the amount of RAM

[04:03] that you have. Now, what I'm about to

[04:05] tell you is pretty much exclusively for

[04:07] newer devices. So, I'm talking about in

[04:08] the last like four or five years. If

[04:10] you're running a much older machine, you

[04:12] still can run these local models, but

[04:14] you're going to be really limited in

[04:16] what's possible just based on the

[04:18] current architecture of like the old

[04:19] devices versus the new devices. So, if

[04:22] you're running a newer Mac on any M

[04:24] series chip, then you should have

[04:26] unified memory. That means the RAM

[04:28] available to your computer is also

[04:30] available to your graphics card uh or

[04:32] graphics processing unit, whatever you

[04:34] want to call it or whatever you know

[04:35] Apple is calling it now. So, if you have

[04:37] 32 gigs of RAM on your machine, then you

[04:39] can run models that utilize maybe like

[04:41] 70 or 80% of that RAM. Now, if you're on

[04:43] a Windows machine or a Linux machine,

[04:46] it's unlikely that you have unified

[04:48] memory. Actually, I don't even know if

[04:49] that's possible on Windows. I don't know

[04:51] the current architecture, but what I do

[04:52] know is that usually you're going to

[04:54] have these models running on your

[04:55] graphics card. So, if you have a 4090,

[04:58] for example, then you should have 24 GB

[05:00] of VRAM. Now, this is the memory that's

[05:02] exclusively available to your graphics

[05:04] card and can be used for running these

[05:06] models. And the models can use pretty

[05:08] much all of that. So, if you're on Mac,

[05:09] what you want to look at because you

[05:10] need to know your amount of RAM is,

[05:12] okay, how much RAM or unified memory do

[05:14] I have? If you don't have unified

[05:16] memory, you're looking at the memory for

[05:17] your graphics card. And if you're on

[05:19] Windows, you're almost exclusively

[05:20] looking at the memory for your graphics

[05:22] card. If you don't have a graphics card,

[05:23] you can still run these models, but

[05:24] they're going to be extremely slow. So,

[05:26] you could always run these models,

[05:28] assuming you have enough hard drive

[05:29] space. It's just a matter of like are

[05:31] they actually usable and do they give

[05:33] you any kind of output that doesn't take

[05:35] like five hours to run because you can

[05:37] run them but they're just going to be

[05:38] really really slow if you don't have

[05:39] enough RAM. And the reason for this is

[05:41] that typically you load the entire model

[05:43] and all of its weights into the

[05:45] computer's memory. So this is what we're

[05:47] focused on. Again, if you're on Mac,

[05:48] it's a little bit easier for you because

[05:49] you just look at the RAM amount. If

[05:51] you're on Windows, you need to know your

[05:52] graphics card, the amount of VRAM, and

[05:54] again, newer devices are just going to

[05:56] perform better. So, with all of that

[05:58] said, when we choose a model here, what

[06:00] we need to be looking for is one that's

[06:02] capable of tool calling. If I just

[06:04] search for like the Llama model here,

[06:06] and I'm using like an outdated model,

[06:08] something like Llama 3, you'll see that

[06:10] Llama 3 doesn't indicate that it has the

[06:12] ability to call tools. So, if I download

[06:15] this and use it, okay, cool. But I can't

[06:18] actually call any tools with it. So,

[06:20] it's just not really usable, right, for

[06:21] doing the integrations. Now, you can use

[06:23] as a chatbot that, but that's about it.

[06:25] So, when you're going to pick your

[06:26] model, you want to pick a model that has

[06:28] this tools ability. So, the one that I'm

[06:31] going to use in this video is Gwen 3.5.

[06:33] Very good, new model updated 2 weeks

[06:35] ago. And you'll notice that when you

[06:37] start looking at these models, they have

[06:38] a ton of different options. So, if we

[06:41] scroll down, you'll see we have one that

[06:43] is 8 billion parameters, 2 billion

[06:45] parameters, 4 billion, uh 9 billion, 27

[06:48] billion. Sorry, this is 0.8 billion. And

[06:50] the number of parameters directly ties

[06:53] to the space that the model is going to

[06:55] take up on your computer in terms of

[06:56] storage space, but also in your RAM. So

[07:00] the more parameters, the better

[07:02] performing the model is going to be.

[07:04] However, the more difficult it's going

[07:05] to be to run. So when you're selecting

[07:07] the version of the model, and let's say

[07:09] you're using Gwen here, you need to be

[07:11] conscious of how much RAM you have

[07:13] available for the model to use. Now, in

[07:15] my case, because I have 32 GB of RAM on

[07:18] my machine. Let's just quickly show you

[07:19] about this Mac. And this is unified

[07:21] memory on an M2 Max uh like MacBook

[07:24] here, I can use most likely the 35

[07:26] billion or the 27 billion parameter

[07:28] model. The 122 billion one, I can still

[07:31] use it, but it's going to be incredibly

[07:33] slow and just not practical to run

[07:35] because it uses 81 GB, which means in

[07:38] order for it to be efficient, I would

[07:39] have to load all 81 GB into RAM. and my

[07:42] RAM is also still being used by some

[07:44] other processes on my machine. Okay, so

[07:46] that's what you're paying attention to.

[07:48] I'm just explaining that because the

[07:49] model selection is the most important

[07:51] part. For most of you, running like a 9

[07:54] billion parameter model will work.

[07:55] Again, it's not going to perform as good

[07:57] as Opus 4.6, but it will still give you

[07:59] decent performance and do some basic

[08:01] things that you need. So, the more

[08:03] hardware you have, the richer you are,

[08:05] right? The better models you can run.

[08:07] But that's kind of how it works. And

[08:08] again, if you wanted to connect to

[08:09] something else like a tool, you need

[08:11] this tool calling ability. If you go

[08:13] look at the models list, a lot of them

[08:15] now do have the ability to call tools or

[08:17] to do thinking or all of these different

[08:19] things. So check those out. Okay. And

[08:21] Gwen 3 coder next is also a good model

[08:23] that's a little bit lighter that you can

[08:25] run as well uh on your own computer.

[08:27] Okay. So now that we know the model that

[08:29] we want to use, what we can do is we can

[08:30] type pull and we can just go find the

[08:33] name of the model. So mine I'm just

[08:35] going to go with maybe 35 billion and

[08:37] I'm going to paste this here. So Gwen

[08:39] 3.5 col 35B pull is now going to pull

[08:42] the manifest down and download the

[08:44] entire model and all of its weights.

[08:46] This is going to take a long time. Uh

[08:48] well my case may not actually incredibly

[08:50] fast here which I'm surprised by. You

[08:52] can see it's downloading 23 GB. So I'm

[08:55] going to wait for this to finish. Once

[08:56] it's done I'll be right back and then

[08:58] we're going to start running the model

[08:59] and then I'll show you how to connect

[09:00] the tools. All right. All right. So,

[09:02] this just finished and now the first

[09:03] thing I want to do is just quickly test

[09:04] the model. So, to do that, I'm going to

[09:06] type run and then Gwen and this is going

[09:10] to be 3.5 colon 35 billion. And then

[09:14] this is just going to run it directly in

[09:15] a lama. I'm going to show you a better

[09:16] way to run it in 1 second, but at least

[09:18] this will allow us to test it. Should

[09:20] take a second to load. Again, it does

[09:21] need to load up RAM here with all of the

[09:24] weights. And then as soon as it does

[09:25] that, we can start chatting with it and

[09:27] get like textbased responses. Okay, so I

[09:29] just gave it a random prompt like, "Hey,

[09:30] tell me what you're good at doing." And

[09:31] let's see kind of the response time that

[09:33] we're getting here from the model.

[09:35] Again, the larger the model, typically

[09:37] the slower it's going to be on your

[09:38] machine, unless you have like a really

[09:39] fast good machine. And in my case, you

[09:41] can see this is taking a while. So, I

[09:43] probably g downloaded a model that was a

[09:46] little bit too large for the specs that

[09:48] I had. Again, this one, I believe, is

[09:50] using 24 GB of RAM. If I go down to the

[09:52] 27 billion parameter one, I'm more at

[09:54] like 17, uh, which is a little bit more

[09:57] manageable. So, if this takes too long,

[09:58] I'll switch over to the other one.

[10:00] Anyways, just want to show you the

[10:02] reality here. I don't really know

[10:03] exactly what's going to happen. You have

[10:04] to kind of test it out and see which

[10:06] models actually work on your machine.

[10:08] Okay, so trying to run the other one

[10:09] wasn't really working. I wasn't getting

[10:11] any response after like a minute. So,

[10:13] I've just switched to use the 27 billion

[10:15] parameter model, which now is fine. It

[10:16] also doesn't help that I'm recording a

[10:18] video right now. And you can see that

[10:19] it's giving me its live thinking process

[10:22] where we have quite a few tokens per

[10:23] second as the output. Now, again, this

[10:25] is going to take a second to run, right?

[10:26] it's going to be slower than you running

[10:28] it in Claude, but the point is you're

[10:29] running it on your own machine and, you

[10:31] know, that's a huge advantage. So, you

[10:33] play around with the different models to

[10:35] determine what it is that you actually

[10:36] need. So, it's doing this whole thinking

[10:38] process. Hopefully, it's going to give

[10:39] me, you know, like a valid answer here

[10:41] in a second. Uh, but generally speaking,

[10:43] it's working. And now we can move on to

[10:45] the next step where we start doing the

[10:46] integrations. Oh, and there you go.

[10:48] Okay, it's giving me the answer as we

[10:49] speak. And I mean, that didn't take too

[10:51] long. That's pretty decent. And it looks

[10:53] like it's giving me a pretty

[10:54] comprehensive answer. So it's not just

[10:56] like a super basic model. All right. So

[10:58] continuing here, let's now go to the

[10:59] tool integration component. Now, in

[11:01] order for us to actually run MCP servers

[11:04] inside of Olama, we need to use

[11:06] something called the MCP client for

[11:08] Olama. Now, there's a few other tools

[11:10] and abilities to do this. But Olama does

[11:12] not natively support MCP. So effectively

[11:15] what we need to do is use something

[11:16] called a bridge which is going to

[11:17] connect to the MCP server, discover the

[11:19] tools and then share those with the

[11:21] llama in real time and just act as a

[11:23] proxy for doing the tool calling and

[11:25] getting the results. Now the best one

[11:27] that I found is mcb client for a llama.

[11:29] I'll leave a link to in the description.

[11:30] Of course it's open source and available

[11:32] and I'll just show you the commands you

[11:34] need. You don't need to read this whole

[11:35] thing. Now in order for us to use that,

[11:37] like I mentioned, we do need to use the

[11:38] Zapier MCP server. Again, massive shout

[11:40] out to them for sponsoring this video.

[11:42] This is free to use. The way that it

[11:44] works in terms of pricing is that yes,

[11:45] if you do use it a massive amount, you

[11:47] will need to pay for it, but it lends

[11:49] the same actions from your Zapier plan.

[11:51] So, if you're familiar with Zapier, it's

[11:53] very popular for automations. I use it

[11:54] in all of my businesses. And it does

[11:56] something called zaps. So, it like zaps

[11:58] an automation or zaps something to a

[12:00] platform. And each one of those zaps is

[12:02] like one credit, right? Or yeah, one

[12:04] zap. So, when you use the MCP server,

[12:06] every time you use it, it just uh acts

[12:08] as like one zap towards your plan, which

[12:11] is totally fine. and you can get, you

[12:12] know, like thousands of them for free

[12:14] effectively without you having to pay

[12:16] anything. So, the way that this will

[12:18] work is just go to this link. It'll

[12:19] leave it in the description. And what

[12:20] you can do is just press get started.

[12:22] When you do that, it should bring you to

[12:24] a page that looks something like this

[12:25] where it shows Zapier MCP. And you can

[12:27] see like the number of tasks, right,

[12:29] that are being used and that I've

[12:30] currently used so far. Now, from here,

[12:32] what we're going to do is go to new MCP

[12:34] server. from the MCP server. We're just

[12:36] going to select other, but you also

[12:38] could connect this to any of the other

[12:40] uh AI agents that you're using. So, it

[12:42] doesn't just need to be what do you call

[12:43] it? Um, what am I using here? A llama.

[12:46] Okay. So, I'm going to go other because

[12:47] this is just show that on here. From

[12:50] here, I'm now going to connect some

[12:51] different tools that I have. So, for

[12:53] tools, let's go ahead and connect

[12:54] something like notion cuz that's like

[12:56] pretty visual and easy to see. And I'm

[12:58] just going to select all of the tools

[12:59] and just make them all available. But

[13:01] you can obviously safeguard which ones

[13:03] you want. So, I'm going to go ahead and

[13:04] now connect my notion account. It's

[13:06] literally as easy as just using the

[13:08] OOTH. And then all of the security is

[13:09] handled for you. So, what I'm going to

[13:10] do is just connect it to my personal

[13:12] account so that I don't really care if

[13:14] it does something wrong. So, let's just

[13:16] connect it to new page and this second

[13:18] channel page and maybe this December

[13:20] 2025 travel. Uh, you can see I have just

[13:22] all my travel stuff in here. So, let's

[13:24] just select all of it and at minimum it

[13:26] can maybe summarize some of the stuff

[13:27] that we have inside of here. Okay, so

[13:29] notion is now connected. Editors, please

[13:31] just make sure you're blurring my email

[13:32] there. Let's add all of the tools here

[13:34] now to this MCP configuration. And just

[13:37] because we're at it, we can just add

[13:38] another one. So, let's add something

[13:40] like I don't know, there's literally

[13:42] 8,000 of them. So, you kind of just have

[13:44] to figure out what do you want and then

[13:45] you can start searching them uh to find

[13:47] anything like I don't know. Do they have

[13:49] meta? Uh let's see here. Yeah, they have

[13:51] like 100 meta things. Okay, let's try

[13:54] maybe Google Calendar. Um and let's just

[13:57] same thing. Select all. Connect. And let

[13:59] me just connect it to one of my

[14:00] accounts. All right, so I've got that

[14:02] connected. Obviously, I can go crazy and

[14:03] add as many of these as I want. But for

[14:05] now, we'll just stick with these two.

[14:07] And now, if we want to actually connect

[14:08] this to Olama, we're going to go to this

[14:10] connect tab here, and we're going to

[14:11] generate a new token. Now, when we

[14:13] generate the token, we've got to make

[14:14] sure we save this. And obviously, you

[14:16] don't want to share it with anyone else.

[14:17] So, you can see there's a full URL with

[14:19] the token. This is the one that I want,

[14:20] so I'm going to copy that. But you can

[14:22] also just connect um what is it like

[14:24] through a standard MCP configuration

[14:26] using the authorization header. A few

[14:28] different ways. Depending on the tool

[14:29] you're using, you'll use a different

[14:30] one, but we want the URL with the token.

[14:33] And we're just going to save that. So,

[14:34] put that somewhere safe uh cuz we're

[14:35] going to use that in 1 second. All

[14:37] right. So, as I mentioned, the next step

[14:38] here is we're just going to install this

[14:40] Olama MCP client connector thing so that

[14:43] we can actually connect to the MCP

[14:44] server. So, in order to do that, you can

[14:46] do this directly with pip. So, pip

[14:48] install upgrade OLCP.

[14:51] Then you can run oolcp. You can also do

[14:54] uvxol mcp or you can run this

[14:56] installation step right here. So, for

[14:58] most of you, running it through pip is

[15:00] probably going to be the easiest. And

[15:01] the way you do that is just make sure

[15:03] you have Python installed on your

[15:04] machine first. Okay. Now, from here,

[15:06] there's a ton of configuration options

[15:07] once you've got this installed. I

[15:09] already installed it, but effectively

[15:10] all you would do, right, is just run

[15:11] this pip command or run this uvx

[15:13] command. I'll leave the link to this in

[15:14] the description in your terminal. So,

[15:16] you just go here, right, if you have uv

[15:18] installed, and you go uvx and then ol

[15:21] mcp, and it should just run, right? And

[15:24] install all the dependencies, and you

[15:25] should be good to go. Now, once that is

[15:28] installed, okay, so it's spun up and you

[15:29] can see by default it's using Gwen 2.5.

[15:32] I'll show you how we change the model in

[15:33] a second. Let's just quit so we can get

[15:35] out of that in the meantime. Uh there's

[15:37] a bunch of options you can use to run

[15:39] this. So you can do uh what is it? - MCP

[15:42] server where you're specifically

[15:43] specifying an MCP server, which is what

[15:45] we're going to do. You can do

[15:46] autodiscocovery where it's going to look

[15:48] in your clouds configuration actually.

[15:50] And then you can specify the model, the

[15:51] host, the version, all of that fun

[15:53] stuff. So what we want to do is we just

[15:55] want to specify the model and the MCP

[15:57] server. So I'm just going to show you

[15:58] the command to do this and then we can

[16:00] run it. Okay. So effectively we're going

[16:02] to run olcp-mcp

[16:05] server URL. We're going to take our

[16:07] server URL. I just stored it in my

[16:09] browser which I know is horrible but for

[16:10] the video that's fine. And I'm going to

[16:13] paste this in. So put the URL including

[16:15] the token. So we've just done that. And

[16:17] actually, I believe we need to put this

[16:19] inside of quotation marks, a set of

[16:21] double quotes, just so it identifies

[16:23] this as one argument. So, let's do that

[16:26] and go back here. And then we're just

[16:28] going to specify the model that we want

[16:29] to run. And it's going to run an olike

[16:31] environment for us. So, we're going to

[16:32] go d-model and then Gwen 3.5 colon and

[16:36] then 27B or whatever parameter version

[16:38] you were using. And go ahead and hit

[16:40] enter. So, again, MCP--

[16:43] MCP server URL. put the server URL you

[16:46] want to connect to d-model and we're

[16:48] good to go. You can connect to multiple

[16:49] servers if you want for MCP. However, if

[16:52] you have Zapier, you just need one

[16:53] because it connects to all of them by

[16:54] default. Okay, so we're now going to do

[16:56] that. It's going to try to connect to

[16:58] this URL. You can see it's connected. It

[17:00] now has access to these various tools.

[17:02] It's also in thinking mode and you can

[17:04] see that you can change a bunch of stuff

[17:05] here. So, if you want to change the

[17:07] thinking mode configuration, you can

[17:08] type a TM. If you want to show the

[17:10] thinking, ST. If you want to show the

[17:11] metrics, SM. If you want to view the

[17:14] various tools, type tools, right? So,

[17:16] let's type T and we should be able to

[17:18] see all of the different tools here. And

[17:19] you can see it's exposing all of the

[17:21] tools that we connected with the Zapier

[17:23] MCP server. So, I'm just going to press

[17:25] Q to get out of that. Okay. And it's

[17:28] going to bring us back here. Show us

[17:29] that we're in thinking mode. And let's

[17:31] try something basic. Can you tell me

[17:33] where I was traveling in the last year

[17:35] based on my notion documents? Okay. And

[17:38] then we're going to go ahead and press

[17:39] enter. And you can see that it begins to

[17:41] work. And now it should hopefully

[17:42] connect to our MCP integrations, start

[17:45] calling and using those tools and giving

[17:46] us the result. Okay, so you can see it's

[17:48] actually prompting me if I want to

[17:49] enable the tool call. You can see that

[17:52] it's going to go travel and start

[17:53] searching in notion to find this page by

[17:55] the title. So let's go ahead and type Y.

[17:58] Now again, I'm not going to lie to you.

[18:00] This is slow. It's not speedy execution.

[18:02] You're going to sit here for a little

[18:04] bit, but this is the kind of tradeoff

[18:06] when you're running on your own machine.

[18:08] If I had a really highend machine, which

[18:09] by the way I'm considering buying soon

[18:11] because I can just run a ton of local

[18:12] models. This would be a lot faster. I

[18:14] can use a better parameter model. But

[18:15] also I could just go down to like the

[18:17] nine billion parameter model and then

[18:18] this is going to be lightning fast. I

[18:20] might get a little bit less accurate

[18:21] replies, but that's the trade-off. So

[18:23] play with them. Let me know in the

[18:25] comments down below, but let's wait for

[18:26] this to finish and make sure it can

[18:28] actually kind of do what I'm saying.

[18:29] Okay, so it did take a second here.

[18:30] There was a lot of tool calls that it

[18:32] ran through, but it was able to get all

[18:33] the content. And we can see now that it

[18:35] has all of the details here and it's

[18:37] starting to show where I was traveling,

[18:39] what flights I was taking, all of this

[18:40] kind of stuff that I was doing back in

[18:42] December 2025. So was able to go through

[18:45] notion. It was able to do that. Let's

[18:46] quickly test one with the calendar. And

[18:48] then I want to show you how we can do

[18:49] this from code. Can you create a new

[18:51] booking today at 4 to 5:00 p.m. That is

[18:55] just a time block saying eat lunch in my

[18:57] calendar. Let's just test something

[18:59] simple and just see if it can like take

[19:01] the action rather than just reading some

[19:03] content. Let's test it. Okay, so it's

[19:05] telling me here that it did create the

[19:07] event. Uh, let me go check my calendar

[19:09] and see if that's true. Okay, so I did

[19:11] find it. It is at 4 to 5:00 a.m., not 4

[19:13] to 5 p.m. So, let's just tell it to fix

[19:15] that. Hey, the events at 4 to 5:00 a.m.

[19:17] Move it to 4 to 5:00 p.m. Okay, probably

[19:20] just based on how it called the tool.

[19:22] Again, we're not using the best model

[19:24] possible, but if we scroll through here,

[19:26] we can see all of the stuff. And I'm

[19:28] also guessing that maybe this is a time

[19:29] zone issue because it's probably doing

[19:31] this in Eastern time, which my calendar

[19:33] is set to and I'm currently 12 hours

[19:35] ahead of that. Okay, so that's it

[19:36] probably actually did do it at the

[19:37] correct time. It's just uh in the wrong

[19:39] time zone awareness area. So anyways,

[19:41] we'll skip this for now. I want to go

[19:42] over the code. I want to show you how we

[19:43] can do this inside of code as well, not

[19:46] just from this kind of terminal

[19:47] environment in case you want to run a

[19:49] local model through something like lang

[19:50] chain. All right, so I've just opened up

[19:52] cursor. I've got some code here. I just

[19:53] had claude code generate this. So it's

[19:55] probably, you know, a more optimal way

[19:56] to run it. But effectively, I just want

[19:58] to show you that from code, you can just

[20:00] directly connect to Lama because Lama

[20:02] exposes a REST API server. And then from

[20:05] there, if you want to use an

[20:06] orchestration framework like Langchain,

[20:08] you can really easily connect to an MCP

[20:10] server. So you can see like I've defined

[20:11] my Zapier MCP URL. I define it as a

[20:14] client. I then just create a React agent

[20:16] using Langchain here. I think there's a

[20:18] newer version, but this one works just

[20:20] fine. I get all the tools, pass it

[20:21] there, and then I can just start

[20:22] chatting with my agent directly inside

[20:24] of code. So if you're building a

[20:26] product, you want to run a local model,

[20:27] you have high enough hardware, you can

[20:29] do it really easily. I just have a few

[20:31] packages, right? Langchain MCP adapters,

[20:33] lang chain, a llama, langraph in Python.

[20:35] You can do it in any language you want.

[20:37] And you don't even have to use these

[20:38] packages. They're just an abstraction on

[20:40] top of the HTTP server or REST API that

[20:42] is exposed. So you can just directly

[20:44] call a llama, call your model, and then

[20:46] connect it up to the MCP server like I'm

[20:48] doing right here. So, I'm just running

[20:50] it and I'm just going to go like, you

[20:52] know, actually what calendar events do I

[20:55] have in the next, I don't know, 5 days.

[20:59] Okay. And let's see what it gives us.

[21:01] Okay. So, same thing. It's a little bit

[21:02] slow, but I got the response back here.

[21:05] You can see there's a bunch of data. I'm

[21:06] going to have to blur some of it because

[21:07] there's some custom stuff. So, um,

[21:09] editors, please just like blur any of

[21:11] the important information, but

[21:12] effectively it summarized all of the

[21:13] meetings that I have coming up here, and

[21:15] these are accurate based on what I know

[21:16] about the calendar. There we go, guys.

[21:18] With that said, that's going to wrap

[21:20] this up. This is super powerful,

[21:22] relatively easy to do. Yes, there's a

[21:24] few commands and setup, and once you get

[21:25] it, it's pretty much just plugandplay.

[21:28] And I mean, this is a huge unlock if you

[21:30] want to run something locally or you

[21:31] want to build anything that relies on

[21:33] local models. Again, play with them, try

[21:35] out different models, see what kind of

[21:37] speed, accuracy combo you can find. Let

[21:40] me know what you think of the video, and

[21:41] I look forward to seeing you in the next

[21:42] one.

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.