EASIEST Way to Fine-Tune a LLM and Use It With Ollama

0h 22m video Published Jun 27, 2025 Transcribed Jul 28, 2026 Tech With Tim

Tech With Tim

Intermediate 12 min read For: Developers and AI enthusiasts with basic Python knowledge who want to fine-tune LLMs and run them locally.

AI Trust Score 85/100

✅ Highly Legit

"The title accurately promises a straightforward fine-tuning tutorial with Ollama integration, and the video delivers exactly that."

AI Summary

This video provides a step-by-step guide on fine-tuning a large language model (LLM) in Python and deploying it locally with Ollama. It covers the concept of fine-tuning, when to use it, and walks through the entire process using Google Colab and the Unsloth library. The tutorial includes data preparation, model training, and integration with Ollama for local inference.

Chapters

1 Introduction to Fine-Tuning 00:00 2 Data Preparation 02:16 3 Setting Up the Environment 03:47 4 Training the Model 06:30 5 Testing and Exporting 14:59 6 Integrating with Ollama 17:26

[00:11]

What is Fine-Tuning?

Fine-tuning takes a pre-trained language model and teaches it to be better at a specific task, like training an experienced chef on your restaurant's recipes rather than teaching someone to cook from scratch.

[00:52]

Fine-Tuning vs Parameter Tuning

Fine-tuning is different from parameter tuning (adjusting settings like temperature). Parameter tuning is like adjusting your car's radio, while fine-tuning is like teaching your car to drive in a completely different neighborhood.

[01:08]

When to Fine-Tune

Three main scenarios: 1) Consistent formatting or style that prompting alone can't achieve, 2) Domain-specific data the model hasn't seen, 3) Reducing costs by using a smaller, specialized model.

[01:42]

Key Advantage of Fine-Tuning

You need way less data and compute power compared to training from scratch. Instead of millions of examples and months of training, you might need thousands or hundreds of examples and minutes to hours of training.

[02:26]

Step 1: Gather Data

The most important step. If you have bad data, you'll have a poorly fine-tuned model. The video uses a dataset of 500 examples for HTML extraction, where input is HTML and output is a formatted JSON.

[03:47]

Using Unsloth for Fine-Tuning

Unsloth is an open-source library that is extremely good and fast for fine-tuning models. The tutorial uses a Google Colab notebook with Unsloth.

[09:20]

Choosing a Base Model

The video uses a small model (53 mini) for speed. You can fine-tune any open-source model like Llama 3.1, Mistral, etc. The model is loaded in 4-bit to save memory.

[10:48]

Preprocessing Data

Data is formatted into a single string with input, output, and an end-of-text token. The format prompt function needs to be adapted to your specific data.

[12:14]

Applying LoRA Adapters

LoRA (Low-Rank Adaptation) adds trainable layers to the model, enabling efficient fine-tuning without modifying all parameters.

[13:21]

Setting Up the Trainer

The SFT trainer from Unsloth handles the fine-tuning process. Key parameters include model, tokenizer, dataset, and training arguments.

[14:59]

Testing the Model

After training, the model is tested in Google Colab by running inference on a sample prompt to verify it works correctly.

[16:15]

Downloading the Model for Ollama

The model is saved in GGUF format (compatible with Ollama) and downloaded to the local machine. This step can take 10-25 minutes.

[17:26]

Creating a Model File for Ollama

A Modelfile is created to define the custom configuration, specifying the GGUF file, parameters (temperature, stop tokens), and prompt template.

[20:07]

Adding the Model to Ollama

Use 'ollama create' with the Modelfile to add the model to Ollama, then run it with 'ollama run'.

Fine-tuning an LLM for local use with Ollama is achievable with the right tools and data. By following the steps in this tutorial, you can create a specialized model that runs on your own machine, though experimentation with parameters and data is key to good performance.

Mentioned in this Video

Unsloth

tool

Google Colab

tool

Ollama

tool

Tenweb AI Website Builder API

tool

Dataset download link

link

Fine-tuning notebook link

link

Tutorial Checklist

1 02:26 Gather and prepare your dataset in JSON format with input and output fields.

2 06:30 Open the provided Google Colab notebook and connect to a T4 GPU runtime.

3 07:12 Upload your dataset file to the Colab environment.

4 08:04 Install dependencies by running the pip install cell.

5 08:34 Restart the runtime after installation.

6 08:57 Run the GPU check cell to verify CUDA and GPU availability.

7 09:20 Set the model name (e.g., 'unsloth/phi-3-mini') and load the model and tokenizer.

8 10:48 Preprocess your data: create a format prompt function that combines input and output into a single string with an end-of-text token.

9 12:14 Apply LoRA adapters to the model using the provided configuration.

10 13:21 Set up the SFT trainer with your model, tokenizer, dataset, and training arguments.

11 14:22 Run the training cell and wait for training to complete (approx. 10 minutes for small model).

12 15:04 Test the model by modifying the messages in the inference cell and running it.

13 16:15 Save the model in GGUF format and download it to your local machine.

14 17:26 Create a Modelfile: specify the GGUF file path, parameters (temperature, stop tokens), and prompt template.

15 20:07 Run 'ollama create <model-name> -f Modelfile' to add the model to Ollama, then run it with 'ollama run <model-name>'.

Study Flashcards (10)

What is fine-tuning in the context of LLMs?

easy Click to reveal answer

Fine-tuning is taking a pre-trained language model and teaching it to be better at a specific task by feeding it examples of that task.