How to Run LLMs Locally - Full Guide

0h 16m video Published Dec 19, 2025 Transcribed Jun 16, 2026 Tech With Tim

Tech With Tim

Intermediate 8 min read For: Developers with basic programming experience who want to run LLMs locally for development or personal use.

AI Summary

This video demonstrates two methods for running large language models (LLMs) locally: Ollama and Docker Model Runner. Both are free, high-performance alternatives to hosted solutions, offering benefits in speed, privacy, and cost. The tutorial covers downloading models, running them interactively, and integrating them into code via REST APIs or Python modules.

Chapters

1 Introduction and Why Local LLMs 0:00 2 Method 1: Ollama Setup and Usage 0:39 3 Sponsor Break: boot.dev 4:28 4 Integrating Ollama with Code 5:26 5 Method 2: Docker Model Runner Setup 8:21 6 Using Docker Model Runner (UI, CLI, Code) 10:21 7 Conclusion and Next Steps 14:48

[0:00]

Why Run LLMs Locally

Running LLMs locally provides speed, privacy, and cost benefits over hosted solutions like ChatGPT.

[0:39]

Ollama Introduction

Ollama is a popular open-source tool for downloading and managing local LLMs, accessible via command line and code.

[1:04]

Installing Ollama

Download Ollama from ollama.com for Mac, Windows, or Linux, then ensure it's running (check system tray or taskbar).

[1:42]

Pulling a Model with Ollama

Use `ollama pull <model-name>` to download a model. For testing, start with a small model like 'small-m2' (271 MB).

[3:10]

Running a Model Interactively

Use `ollama run <model-name>` to enter an interactive chat. Type `/bye` to exit.

[5:49]

Using Ollama from Code via REST API

Ollama exposes an HTTP server on port 11434. Send POST requests to `/api/chat` with model and messages to get responses.

[7:09]

Using Ollama from Code via Python Module

Install the `ollama` Python module (`pip install ollama`) and use `ollama.chat()` for a simpler interface.

[8:27]

Docker Model Runner Introduction

Docker Model Runner is a newer, more efficient method with better GPU acceleration and container support.

[9:15]

Setting Up Docker Model Runner

Install Docker Desktop (latest version), enable the AI model runner in settings, and enable host-side TCP support.

[10:21]

Pulling and Running Models in Docker UI

In Docker Desktop, go to Models, pull models from Docker Hub, and run them interactively from the UI.

[11:22]

Using Docker Model Runner from CLI

Use `docker model pull`, `docker model list`, and `docker model run` commands similar to Ollama.

[12:46]

Using Docker Model Runner from Code

Docker Model Runner runs on port 12434. Change the URL from 11434 to 12434 and adjust the endpoint path.

[13:41]

Using OpenAI Module with Docker Model Runner

Override the base URL of the OpenAI module to point to Docker Model Runner (localhost:12434) to use local models with familiar API.

Both Ollama and Docker Model Runner are effective for running LLMs locally, with Docker Model Runner being more optimized for containerized deployments. The video provides practical steps to get started with either method, from installation to code integration.

Mentioned in this Video

Ollama

tool

Docker Desktop

tool

Ollama Model Library

link

Docker Hub Models

link

boot.dev

service

Tutorial Checklist

1 1:04 Download and install Ollama from ollama.com for your OS.

2 1:12 Ensure Ollama is running (check system tray or taskbar).

3 1:42 Open terminal and verify Ollama works with `ollama` command.

4 1:57 Pull a model: `ollama pull small-m2:col135m` (or another small model).

5 3:10 List models: `ollama list`.

6 3:24 Run model interactively: `ollama run small-m2:col135m`.

7 4:13 Exit interactive mode: type `/bye`.

8 5:49 Use REST API: send POST to `http://localhost:11434/api/chat` with JSON body containing model and messages.

9 7:09 Install Python module: `pip install ollama`.

10 7:23 Use Python module: `ollama.chat(model='small-m2:col135m', messages=[...])`.

11 9:15 Install Docker Desktop (latest version) and open it.

12 9:50 Enable Docker Model Runner: Settings > AI > Enable Docker Model Runner, enable host-side TCP support, set cores to all.

13 10:21 Pull model via Docker UI: go to Models > Docker Hub, search for model, click Download.

14 11:22 Use CLI: `docker model pull small-m2`, `docker model list`, `docker model run small-m2`.

15 12:46 Use Docker Model Runner REST API: send POST to `http://localhost:12434/api/chat`.

16 13:41 Use OpenAI module with local model: set `base_url='http://localhost:12434/v1'` and use `openai.ChatCompletion.create()`.

Study Flashcards (10)

What are the two methods shown for running LLMs locally?

easy Click to reveal answer

Ollama and Docker Model Runner.