TubeSum ← Transcribe a video

How To Use Claude Code With Ollama (Free Local AI Setup)

0h 08m video Transcribed Jun 30, 2026
1.9K
Views
57
Likes
5
Comments
0
Dislikes
3.3%
📈 Moderate

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Run Claude Code for FREE Locally!

45s

Shows a free alternative to paid Claude Code, saving money and appealing to developers.

▶ Play Clip

Best Free Coding Model for Your Computer

47s

Educational guide on choosing the right model based on RAM, helping users avoid mistakes.

▶ Play Clip

The One Setting You MUST Change for Claude Code

56s

Critical fix for a common problem (context window) that ruins user experience if not changed.

▶ Play Clip

AI Built a Full App in 3 Minutes

46s

Impressive demo of AI building a functional app quickly, sparking interest and showcasing capability.

▶ Play Clip

Why Small AI Models Fail at Complex Tasks

60s

Reveals the limitations of small models, creating controversy and discussion about model size vs. performance.

▶ Play Clip

[00:00] In this video, I'm going to show you how to run Quad Code using Olima and local AI models. I'm doing this on M4 base MacBook Air, the most basic Apple Silicon Mac.

[00:13] If you have any modern Mac, Windows or Linux machine, you can follow the same guide. The recently Olima added the native support for the Anthropic compatible APR endpoint.

[00:32] That means Quad Code can now talk directly to the model running on your own machine. The first, however, to Olima.com and download the installer for your operating system,

[00:46] I simply copy the installation command, paste it into the terminal, and that's it. Olima is installed. Once it's running, you will see a small Olima icon

[01:03] in the menu bar. That means Olima is running in the background and is ready to serve local models. I start to install a model that works with Quad Code. I go to Olima's model section and enable

[01:18] the tools and thinking filters. This will show you the most popular compatible models. If your computer has 24GB of RAM or more, I recommend Qn3.627b. It's currently one of the best coding

[01:34] models you can run on Kinsium or Hardware. For this video, we're going to use two models from Google's Jamfer family. We will start with Jamfer E2B, a small and fast-edge model that runs on just a GB of

[01:51] RAM. Then later, we will upgrade to Jamfer 12B when we need more reasoning power, which requires around 13GB of RAM. Since I'm using Apple Silicon, I will be downloading the MLX versions because they're

[02:05] optimized for this hardware. I go ahead and copy the model tag you want to use. Then run this command and replace the example tag with your chosen model. We are downloading Jamfer E2B through

[02:20] Olima. It's around 7GB, so it should take just a few minutes. Once it's downloaded, let's launch it in Olima's interactive mode and ask a simple question.

[02:41] Look at the bottom of the terminal. Olima shows your generation speed. On this M4 MacBook Air, we are getting around 30 tokens per second, which feels extremely responsive.

[02:58] If I continue the conversation a little longer, you can see it's almost giving me 40 tokens per second. And when you are done, press Ctrl plus D or simply type by to exit. Before using Cloudcode,

[03:16] there's one setting you must change. If we need to increase the model's context window to 64K, Cloudcode needs more context window to hold your fast conversation history and task instructions

[03:28] at the same time. By default, Olima uses a much smaller context window of around 8000 tokens, with large coding tasks the model quickly loses track of what it's doing. To fix this,

[03:41] open the Olima settings and change the context length to 64K. And don't worry, it won't always use the full 64K. We're simply increasing the maximum available context.

[03:59] Next, head over to this page to learn how to connect Cloudcode with Olima. I simply copy the launch command and press Enter.

[04:12] If Cloudcode CLI isn't already installed, Olima will detect and install it automatically. Once it launches, choose the model you want to use.

[04:29] And that's it. We are now running Cloudcode with Jamal4. Now, let's ask it a few questions.

[04:43] And as you can see, everything works perfectly. Now, let's build something real. I'm going to ask to create a simple to do list application using HTML, CSS and JavaScript. I will also tell it to save the source files

[05:00] explicitly. Otherwise, it may only generate the code instead of writing the files. And as you can see, it's making two calls reading the project directory, planning the structure and creating the files. In just about three minutes, it finishes building

[05:17] the entire application, which is pretty impressive. Now, let's open it in the browser. And there it is, a fully working to do list application. You can add task, delete them and

[05:29] everything works. And just remember, this was built by a 2 billion parameter model running locally and completely free. Now, let's push it a bit further. I'm going to ask it to add a dark mode toggle.

[05:47] This time, it takes around four minutes. And let's see the result. The dark mode technically

[06:05] works, but the UI looks very bad. This is where the 2B model starts reaching its limits, updating the HTML, CSS and JavaScript together requires the model to keep the entire project in memory

[06:19] and make consistent changes across multiple files. A 2 billion parameter model simply doesn't have enough reasoning capacity for more complex tasks. It's great for small jobs, but once the project grows,

[06:32] it starts making mistakes. To fix this, I downloaded Jummo412b, then I launched Gladcode again using the larger model. This time, I asked it to first analyze the existing project, identify

[06:49] what went wrong and then fix it. Now, watch this. It correctly identifies the CSS rules that weren't updating and also checks how the JavaScript class toggles dark mode and applies targeted fixes.

[07:06] It took quite a bit longer this time, but this is the result. The UI now looks much better. However, it accidentally broke the dark mode functionality.

[07:20] So I asked it to fix one last time, and finally, everything works correctly. And that's what a 12B model gives you, not just better output, but significantly stronger reasoning.

[07:35] If you have a faster computer with more RAM, you can even run larger models that perform even better. And that's how you can run Gladcode on your own hardware using Olama and local AI models.

[07:49] Let me know what do you think about this in the comments section down below. If you liked this video, hit that like button and subscribe to see more content. Thank you so much for watching. This is being KSKRIO. I will see you in the next one.

⚡ Saved you 0h 08m reading this? Transcribe any YouTube video for free — no signup needed.