---
title: 'How to Finetune Llama3-8b in Google Colab for Free'
source: 'https://youtube.com/watch?v=0Td_zS2KKJs'
video_id: '0Td_zS2KKJs'
date: 2026-06-15
duration_sec: 0
---

# How to Finetune Llama3-8b in Google Colab for Free

> Source: [How to Finetune Llama3-8b in Google Colab for Free](https://youtube.com/watch?v=0Td_zS2KKJs)

## Summary

This video demystifies fine-tuning large language models (LLMs) for AI agent tasks, using Meta's Llama 3 8B model on free Google Colab GPUs. The creator argues that raw chat-trained LLMs underperform as action-oriented agents and demonstrates how to fine-tune a model to use first-principles reasoning for generating structured outputs.

### Key Points

- **Problem with raw LLMs for AI agents** [00:00] — Raw LLMs trained as chatbots fail as action models; they need fine-tuning to output structured, reliable responses for AI agents.
- **First-principles reasoning approach** [01:24] — Training a decision-making model based on first-principles reasoning (boiling down to fundamental truths) rather than reasoning by analogy improves agent performance.
- **Demo of raw Llama 3 8B limitations** [02:31] — Raw Llama 3 8B cannot reliably output a Python list without extra notes; even Llama 3 70B fails to maintain correct format.
- **Fine-tuned model results** [03:14] — Fine-tuning on just 40 high-quality examples enables Llama 3 8B to break out of chatbot behavior and generate proper task lists for AI agents.
- **Dataset creation: quality over quantity** [04:00] — The dataset should be small but high-quality; each example shows the model the perfect response. The creator used Mixtral 8x22B for drafts, then manually edited for quality.
- **Fine-tuning steps in Google Colab** [05:45] — Steps include: upload dataset JSON, select T4 GPU runtime, install libraries, log into Hugging Face, load model, configure LoRA, run trainer, quantize model, and test locally with LM Studio.
- **Training epochs and memory considerations** [08:07] — One epoch is default; increasing epochs improves training loss but uses more memory. Ideal for this dataset was 15-20 epochs before diminishing returns.

### Conclusion

Fine-tuning a small, high-quality dataset on a free T4 GPU in Google Colab can transform a general-purpose LLM into a specialized AI agent that reliably outputs structured, first-principles reasoning. The key is quality over quantity in dataset creation.

## Transcript

everybody training new large language
models are training them out the box for
chat a chat trained llm is like an
intelligent student who finished general
education in 50 languages now think of
fine-tuning these llms like kicking the
broke General ed student out of the
house and choosing exactly what
specialized degree they will get in this
video I'm going to demystify the concept
of fine-tuning a language model no
programming experience will be needed to
follow along this tutorial I'll be
showing how to fine tune meta's latest
llama 38 billion parameter model on free
gpus in Google collab let's discuss a
huge problem with implementation of
every team of AI agent project we have
seen to date it started with auto GPT
and now the current leader in hype crew
AI the reason most teams of AI agents
that people are creating to attempt to
accomplish complex tasks sucks is
because they are using raw large
language models trained to respond as
intelligent chat Bots and not as action
models out of the box these models are
trained to have decent general
intelligence and serve a general
audience like chat gbt as an AI software
engineer the first step I would take to
develop a system to outperform any of
these current existing AI agent swarms
is to train a decision-making model
based on first principal reasoning I
think it's also important to reason from
first principles rather than by analogy
the normal way that we conduct Our Lives
is we Reason by analogy we're doing this
because it's like something else that
was done hold up wait a minute and what
that really means is you kind of boil
things down to the most fundamental
truths say okay what are we sure as
possible is true and then reason up from
there that takes a lot more mental
energy think of each response from an
llm as a thought we want this
intelligent AI to generate in 10 seconds
or less instead of expecting our AI to
use first principle reasoning break the
prompt into multiple needles in a
haystack of facts the model needs to
understand or tasks the AI needs to
complete all in a single response in
this video I will fine-tune a llama
model to power the first AI agent of
contact in a team of AI agents I want
the AI agent to be able to use first
principal reasoning to Output highly
logical order of steps that need to be
completed to actually provide a factual
response and better yet automate a
complex task before we get into
fine-tuning a model let's first demo raw
llama 3 8bs ability to accomplish just
the task of generating first principal
reasoning outputs if I go try to get
llama 38b to act as a decision model the
reasoning is as though it came from a
mind who wasn't trained to make actions
in the real world tell llama to respond
with just a python list and it
constantly adds notes before or after
the list making the responses unreliable
as commands for a Python program or even
other AI agents to process we can even
try these same tasks on llama 370b the
chat tune model still cannot manage to
Output the correct response format
reliably now let's look at the responses
from llama 3 8B fine-tuned on a tiny but
highquality data set that I created to
show the model exactly how I want it to
respond to agent prompts fine tuning on
just 40 parameters in this case allowed
the model to break break out of thinking
it is just a chatbot limited to
generating text now llama 3 thinks
freely about what tasks would need to be
accomplished by an AI to actually
accomplish my instructions despite half
of them claiming to in the title all of
the video fine-tuning tutorials I have
found do not show how to fine-tune on
your own data set in this video I will
show you how to create your own data set
to fine tune on instead of using some
pre-existing data set the data set you
use for fine-tuning is about quality and
not quantity since we are training a
specialized model we want to take full
control of maximizing our data set's
response examples quality on exactly the
task we need it to work at as my data
set I have a Json file called Data set.
Json inside this file I have one long
list of dictionaries each dictionary
consist of the same system prompt I'm
trying to get the model to properly
respond to as well as the prompt for the
input value and the response as the
output value to create my data set for
each response I used mixl 8X 22b to
generate rough draft responses before
adding any of these responses to my data
set I'll go through and manually edit
each to improve upon the quality and
ensure perfect formatting as a python
list your data set for fine tuning could
be 20 examples or thousands of examples
while larger fine tuning data sets can
improve upon your model's performance I
can't stress it enough the importance of
adding only highquality examples to your
data set each example in your data set
is an example showing llama 3 what you
expect a perfect response from that
input should be so if you are
fine-tuning on Mid data expect the
quality from your fine-tune model to be
mid if you want a copy of my data set to
skip making your own for this tutorial
or just have a copy of my data to ask on
to for your own fine tuning it's
available in the Pro learning docs
channel of my Discord for anyone with an
AI Austin Pro membership with my data
set of 40 examples complete I now am
ready to start loading them into my
collab notebook and start fine-tuning
llama 3 check the comment section for my
pinned comment with the link to the
Google collab notebook that I will be
going through in this video Once the
notebook loads I can drag my data set.
Json file into the main content folder
then I will select my runtime type to
use a free T4 GPU and save it to start
the runtime once it is up and running
click the play button on the first code
Block in step one to install the needed
python libraries for a T4 GPU I'll run
step two to import the libraries into my
runtime once the installations and
imports complete we'll run this next
oneline block to log into our hugging
face account with a right access token
if you don't have a hugging face access
token yet you can get one for free by
logging into your account going to
settings clicking access tokens and
create a token with right access granted
copy that and paste that into the field
to log into your hugging face in the
next block we have some python code that
loads our data set. Json file and
converts our examples into llama 3's
correct template format you'll see the
hugging phore userv value is set as my
username make sure you change this to
your actual hugging face username our
next code block will set up our
configuration settings for the
fine-tuning the fine-tuned model
variable sets the name you want to save
the model as in your hugging face
repository so feel free to change this
too we can run the configuration
settings block now and the next block to
load the Llama 38b Cur and trainer model
now we'll run the trainer to start the
fine-tuning process on our data set
you'll see the trainer going through
multiple training steps before
completing each training step is a batch
of our training data being ran each
batch in the training steps you will see
this training loss number start to drop
when our model is training it is going
through each of our prompts from the
data set and blind generating what it
expects our example response in the data
set is training loss is a value to
represent the difference between the
model in training's predicted response
to the example response in our data set
a lower training loss value means that
the predicted outputs during fine-tuning
are getting closer to the responses in
our data set this code with its current
configuration settings runs one Epoch
one Epoch equals one pass through of our
entire training data set running more
Epoch up to a certain threshold will
absolutely allow your model to achieve a
lower training loss during fine-tuning
going back up to your Laura
configurations you can change the num
train epox variable to the number of
passes through your data you want it to
run now there is a few things to note
before changing this increasing this
number will increase memory usage
meaning you can only raise it so much on
the free collab gpus before the runtime
will fail from exceeding memory another
consideration is that the benefits of
raising the epoch is diminishing meaning
at some point running more Epoch will
not decrease the training loss value the
ideal number for my training data set
was about 15 to 20 EP talks before the
training loss was practically staying
the same step eight will save the
trainer stats. Json file to your collabs
content folder step nine will quantize
your fine-tune model and save it to your
hugging face quantizing the model will
allow it to perform much faster on your
local machine this code block will take
about 20 minutes to complete in the last
step of the notebook you can test some
of your prompts to your custom model a
better option I can recommend for anyone
with a computer with at least 8 GB of
RAM and ideally 16 or more you can test
the model locally with LM Studio LM
studio is completely free to use and
easy to install inside LM Studio I can
go to the search Tab and type my hugging
face username inside there I can click
my Project's repository locate the file
with Q4 korm at the end of the file and
download that model file once downloaded
I can go to the chat tab click new chat
and load my custom fine-tuned model in
the system prompt tab I will paste in
the same exact system prompt that I used
to fine-tune my model on while this is
not going to be a tutorial on how to use
LM studio just note that there's also a
lot of settings for optimizing the speed
of your model on your machine don't
forget to hit the like button on this
video If you learned anything new about
fine-tuning this has been AI Austin I
will see you in the next one