What is Fine-Tuning? Chef Analogy
45sUses a relatable chef analogy to explain a complex AI concept, making it accessible and shareable.
▶ Play ClipThis video provides a step-by-step guide on fine-tuning a large language model (LLM) in Python and deploying it locally with Ollama. It covers the concept of fine-tuning, when to use it, and walks through the entire process using Google Colab and the Unsloth library. The tutorial includes data preparation, model training, and integration with Ollama for local inference.
Fine-tuning takes a pre-trained language model and teaches it to be better at a specific task, like training an experienced chef on your restaurant's recipes rather than teaching someone to cook from scratch.
Fine-tuning is different from parameter tuning (adjusting settings like temperature). Parameter tuning is like adjusting your car's radio, while fine-tuning is like teaching your car to drive in a completely different neighborhood.
Three main scenarios: 1) Consistent formatting or style that prompting alone can't achieve, 2) Domain-specific data the model hasn't seen, 3) Reducing costs by using a smaller, specialized model.
You need way less data and compute power compared to training from scratch. Instead of millions of examples and months of training, you might need thousands or hundreds of examples and minutes to hours of training.
The most important step. If you have bad data, you'll have a poorly fine-tuned model. The video uses a dataset of 500 examples for HTML extraction, where input is HTML and output is a formatted JSON.
Unsloth is an open-source library that is extremely good and fast for fine-tuning models. The tutorial uses a Google Colab notebook with Unsloth.
The video uses a small model (53 mini) for speed. You can fine-tune any open-source model like Llama 3.1, Mistral, etc. The model is loaded in 4-bit to save memory.
Data is formatted into a single string with input, output, and an end-of-text token. The format prompt function needs to be adapted to your specific data.
LoRA (Low-Rank Adaptation) adds trainable layers to the model, enabling efficient fine-tuning without modifying all parameters.
The SFT trainer from Unsloth handles the fine-tuning process. Key parameters include model, tokenizer, dataset, and training arguments.
After training, the model is tested in Google Colab by running inference on a sample prompt to verify it works correctly.
The model is saved in GGUF format (compatible with Ollama) and downloaded to the local machine. This step can take 10-25 minutes.
A Modelfile is created to define the custom configuration, specifying the GGUF file, parameters (temperature, stop tokens), and prompt template.
Use 'ollama create' with the Modelfile to add the model to Ollama, then run it with 'ollama run'.
Fine-tuning an LLM for local use with Ollama is achievable with the right tools and data. By following the steps in this tutorial, you can create a specialized model that runs on your own machine, though experimentation with parameters and data is key to good performance.
"The title accurately promises a straightforward fine-tuning tutorial with Ollama integration, and the video delivers exactly that."
What is fine-tuning in the context of LLMs?
Fine-tuning is taking a pre-trained language model and teaching it to be better at a specific task by feeding it examples of that task.
00:11
What is the difference between fine-tuning and parameter tuning?
Parameter tuning adjusts settings like temperature, while fine-tuning changes the model's weights to specialize it for a specific task.
00:52
Name three scenarios where fine-tuning is recommended.
1) Consistent formatting or style that prompting alone can't achieve, 2) Domain-specific data the model hasn't seen, 3) Reducing costs by using a smaller specialized model.
01:08
What is the key advantage of fine-tuning over training from scratch?
You need way less data and compute power; thousands or hundreds of examples and minutes to hours of training instead of millions and weeks.
01:42
What is the most important step in fine-tuning?
Gathering good data; if you have bad data, you'll have a poorly fine-tuned model.
02:33
What library does the video use for fine-tuning?
Unsloth, an open-source library that is fast and efficient for fine-tuning.
03:47
What is LoRA and why is it used?
LoRA (Low-Rank Adaptation) adds trainable layers to the model, enabling efficient fine-tuning without modifying all parameters.
12:14
What file format does Ollama require for models?
GGUF format.
16:18
What command is used to add a custom model to Ollama?
ollama create <model-name> -f Modelfile
20:07
What is the purpose of a Modelfile in Ollama?
It defines a custom configuration for a model, including the GGUF file path, parameters, and prompt template.
17:26
Fine-Tuning Analogy
Provides an intuitive analogy (hiring an experienced chef) to explain fine-tuning.
00:11Fine-Tuning vs Parameter Tuning
Clearly distinguishes two commonly confused concepts with a memorable analogy.
00:52Data Efficiency of Fine-Tuning
Highlights the practical advantage of fine-tuning: requires far less data and compute than training from scratch.
01:42Data Quality is Critical
Emphasizes that bad data leads to a poorly fine-tuned model, a key principle for success.
02:33LoRA Adapters Explanation
Introduces LoRA as an efficient fine-tuning technique, though the video suggests using AI to explain details.
12:14[00:00] Today, you'll learn how to fine tune
[00:04] I'll walk you through it
[00:05] step by step, give you all the code
[00:09] So let's go ahead and get started.
[00:11] Now first
[00:14] and when you should actually do it.
[00:15] So fine tuning is taking a pre-trained
[00:19] to be better at your specific task.
[00:21] So think of this like hiring
[00:22] an experienced chef and training them
[00:26] Rather than teaching someone
[00:30] So here's how it works.
[00:31] Instead of training a model from zero,
[00:35] or Claude that already understands
[00:38] Then you feed these examples
[00:41] So maybe customer service conversations,
[00:46] The model then adjust its existing
[00:48] knowledge to excel at your specific domain
[00:52] Now this is completely different from
[00:55] settings like temperature
[00:59] Parameter tuning
[01:02] Whereas fine tuning
[01:03] is like teaching your car to drive
[01:06] Now, when should you fine tune?
[01:08] Well, there's three main scenarios.
[01:10] First is when you need consistent
[01:13] The prompting alone can't achieve.
[01:15] So something like you want to output
[01:18] or you want the model to write in
[01:21] Second,
[01:24] that the model just hasn't seen before.
[01:26] So things like advanced medical records
[01:30] or information that I just wouldn't
[01:34] to what it is that you're doing.
[01:35] And then third,
[01:37] by using a smaller, specialized model
[01:42] Now, the key advantage to fine tuning,
[01:46] is that you need way
[01:49] Instead of millions of examples
[01:52] you might need thousands
[01:55] and maybe minutes to train or hours
[01:59] But here's the catch when you do fine
[02:03] At general tasks,
[02:06] So if you do fine tune
[02:07] one of these models, keep in mind
[02:11] at least typically, but be much better
[02:15] Anyways, with that information
[02:16] I want to show you how to do this
[02:19] So let's go over to the computer
[02:21] up in Python with some example data
[02:26] And the first step
[02:28] if we want to fine tune a model, is
[02:31] that we're going to fine tune
[02:33] If this is the most important step,
[02:36] you're going to have a poorly
[02:38] So make sure that you take your time here
[02:42] Now for this video I've just used
[02:45] which demonstrates HTML extraction.
[02:49] So you can see that
[02:52] a price tag, whatever, etc..
[02:54] Okay, there's a bunch of information
[02:56] and what I'm expecting the model
[03:01] that tells me the name, price, category
[03:05] Now, this is very specific
[03:08] from these types of tags.
[03:10] However, you could do anything as advanced
[03:13] And you can see that
[03:17] where I have some sample input
[03:20] Now this can be literally anything
[03:22] that you want, because
[03:25] you're giving it example prompts
[03:28] the result or the answer from
[03:32] So you can use customer data again
[03:36] In this case just simple HTML extraction.
[03:38] So you can see kind of how it works.
[03:40] And I can compare it easily in this video.
[03:42] If you want this specific data set
[03:44] I'll leave a link
[03:46] Okay.
[03:47] Now for this video
[03:48] we're gonna use something called onslaught
[03:51] This is open source.
[03:52] It's free to use,
[03:55] and very fast at fine
[03:58] All right.
[03:59] is I'm going to go over to this fine
[04:03] for the purpose of this video.
[04:04] I'm not going to write all of the code
[04:06] I'm just going to leave a link
[04:08] in the description that you can download,
[04:10] or that you can connect to
[04:12] and then you can use your own data
[04:16] But before we get into that, here's
[04:20] If you're already fine tuning LMS
[04:23] Chances are you're solving real problems
[04:27] Maybe it's a niche SAS,
[04:28] an AI agent, or a quick side
[04:32] Now imagine giving those users
[04:36] booking system, portfolio, whatever
[04:40] No templates, no drag and drop
[04:45] Well, that's what ten web's AI Website
[04:49] It's not a template engine
[04:52] It's a real API that spins up fully
[04:56] structure content design,
[05:00] all in under 60s.
[05:02] With one API call,
[05:06] Your users never leave your dashboard.
[05:08] You get full white label control,
[05:12] your customer relationship,
[05:16] management, and technical infrastructure
[05:20] This is how ten web powers websites
[05:23] And now you can do the same thing
[05:27] Whether your user is launching
[05:30] or online store, the site comes complete
[05:34] mobile optimization, and full e-commerce
[05:39] The platform that runs 62% of the web.
[05:42] If you're ready to turn your next big idea
[05:44] then visit my link in the description
[05:47] Call with the ten web team to see
[05:51] Okay, so how does the fine tuning work?
[05:54] Well, you can fine
[05:57] by running all of the code
[05:59] that you can download
[06:02] But unless you have a really powerful GPU,
[06:06] even more powerful than that, then
[06:10] So for most of you, I suggest that you use
[06:13] This is a free online
[06:15] code editor environment provided by Google
[06:18] high end GPUs so you can do all of your
[06:22] We can
[06:25] and we can download it to our own computer
[06:29] what I'm going to show you how to do.
[06:30] So first step here is open up
[06:32] I'll leave a link in the description.
[06:33] If you're doing this on Google Colab
[06:37] you should see a button here
[06:39] We're going to press that
[06:40] and we should connect to a
[06:45] And if you're not sure,
[06:47] and it's going to show you
[06:49] system Ram, GPU, Ram, and disk,
[06:51] and it will show you
[06:53] If that didn't work, you can go here
[06:57] And then you can make sure
[07:00] and that you're running in Python three.
[07:03] So now that we're connected
[07:05] we can start running the various cells
[07:08] Now, because we're going to train this
[07:10] on our own custom data,
[07:12] So what I've done is I have this Json file
[07:16] But in order to open it in
[07:17] Google Collaboratory
[07:20] So I'm going to press on this file
[07:23] And I'm going to go to this
[07:26] And I'm going to select this file again.
[07:28] You can download this
[07:31] So once this file is uploaded
[07:35] And what I'm doing is I'm
[07:38] as a Json file by providing the name Json
[07:42] Now you can use any type of data
[07:44] Again,
[07:47] where you have some input
[07:49] Okay.
[07:49] And the output
[07:51] I'm going to convert this
[07:52] to a string in one second,
[07:56] All right.
[07:56] So now that we have
[07:58] And you can see that
[08:00] And I'm successfully loading in this file
[08:03] So I'll go ahead and close that window.
[08:04] Next we need to install the various
[08:06] In my case I'm in Google Colab.
[08:08] So I'm going to run exclamation
[08:11] I'll remove the uninstall command.
[08:12] I just had that in case
[08:15] Okay, this is going to take probably
[08:19] Once it's done, I'll be right back.
[08:21] So that install has finished here.
[08:23] And now we're going to
[08:23] move on to the next step here
[08:27] However, as I says here, the following
[08:30] In this runtime,
[08:30] you must restart the runtime in order
[08:34] So I'm just going to press
[08:37] just to get this reloaded.
[08:38] And once this restarting thing is done,
[08:42] We can move on to the next cell.
[08:44] And if you are running this locally,
[08:45] the install might
[08:47] and you are going to need to have Cuda
[08:50] and be using an invariant GPU
[08:51] in order for this to work,
[08:54] That's why I like using Google Colab,
[08:56] because everything is already
[08:57] So I'm going to go ahead and press run
[08:59] I'm just going to run this cell and check
[09:01] if I have a GPU available
[09:04] I should get two trues here
[09:08] So Cuda is available
[09:12] in this Google Colab instance or runtime,
[09:16] Okay, so now we can move on
[09:17] and we can start actually
[09:20] Now here we need to pick the model
[09:24] Now I'm going to pick
[09:25] because I want to do this
[09:27] and I want this to take days or weeks
[09:30] And the model that I'm using here
[09:33] Okay.
[09:33] You can look up this model
[09:34] if you want to see
[09:36] but you can fine tune
[09:38] That's open source.
[09:39] So for the model name here,
[09:42] You can go to the onslaught documentation
[09:44] to find all the models that are available.
[09:46] There are pretty much
[09:47] you could do here like Lama 3.1
[09:52] you can put them all here.
[09:53] Then we're going to set the sequence
[09:54] Now in this case
[09:57] This is the maximum number of tokens
[09:59] that the model can handle
[10:02] Don't worry about that.
[10:04] And this just means that we're going
[10:07] Now what we need to do here is just load
[10:10] So that's why I put the model name here.
[10:12] And I'm getting the model
[10:13] and the tokenizer from the fast
[10:16] And then I'm
[10:16] loading this pre-trained model
[10:20] length load in four bit is equal to true
[10:25] So I'm going to go ahead and press on run.
[10:27] That's going to load the model for us.
[10:28] It will need to download it
[10:31] So that can take a second.
[10:32] And you might see some stuff like this
[10:36] Don't worry, that is totally normal.
[10:38] I'll be right back
[10:41] All right,
[10:43] Again, I picked a pretty small one
[10:45] If you pick a larger one, this can take
[10:48] And next what we're going to do is we're
[10:51] So the day that I have remember
[10:53] where we have some input
[10:56] Now what we need to do is
[10:59] string that we can send to the model
[11:03] So what we're going to do is we're just
[11:06] that I wrote here,
[11:08] That input is going to be equal to
[11:12] So you'll likely
[11:14] whatever type of data it is that you have.
[11:16] Then I have a new line character,
[11:18] And what I've done here for the output
[11:22] Json object, I've converted it into text.
[11:25] So that's what Json dump is doing.
[11:27] I'm taking a Json object
[11:30] So I'm doing that for my output okay.
[11:32] And then I put this end of text tag here
[11:34] so that the model knows
[11:37] That's it.
[11:38] There's many different ways
[11:40] but this is what I want the prompt
[11:43] And then this is the expected output okay.
[11:46] I then have my format of data
[11:48] through all of the items
[11:49] that I have in my file and call this
[11:53] And then I convert this into a data set.
[11:55] Now if you try to run this,
[11:57] That's
[12:00] we may need to rerun where
[12:02] because we reset the runtime.
[12:04] So if we scroll down here now
[12:06] that we can re-execute this cell.
[12:08] And we should be good to go.
[12:09] And we now generate this data set that
[12:12] when we're running this trainer okay.
[12:14] So now what we're going to do
[12:16] something called the Lora adapters.
[12:18] Now I'm not going to go into too much
[12:21] But this line right here is essentially
[12:25] that we need to our lemma in order
[12:29] So again this can get very complicated.
[12:30] You don't need to understand
[12:33] And if you do want to mess with them
[12:35] Or maybe use an L1 to explain them to you,
[12:37] because there's a lot going on
[12:39] Explain this in this video
[12:41] So in fact,
[12:43] or in Colab, I'll just highlight this.
[12:45] Right click on it, press explain code.
[12:48] This is going to use Gemini
[12:50] And then we can actually just see
[12:52] And that's going to be more accurate
[12:55] So says the selected code applies
[12:58] method to the language model.
[12:59] Using this, here's
[13:02] and then it tells
[13:04] Okay, so use Gemini within here
[13:05] if you want to know
[13:07] Either way, we're going to go ahead and
[13:10] Now by kind of adding
[13:12] which then will actually allow us
[13:15] which we'll do now.
[13:15] So you can see 1.32 pounds layers
[13:19] Don't worry too much about that
[13:21] Okay, so now that we've learned in the
[13:24] Now the trainer is actually the thing
[13:27] tuning for us people much smarter than
[13:31] So all we have to do is simply use it.
[13:33] So again,
[13:34] if you want, but what I'm using here
[13:38] Importantly, I pass my model
[13:41] This is actually going
[13:42] to convert our string into the token
[13:46] I pass my data set
[13:48] The field for my data set is text.
[13:51] That's important because if we go look
[13:55] Here we created the data
[13:59] And then it has all of the values okay.
[14:01] And then we have a few other things
[14:03] For example the maximum sequence length
[14:05] that needs to match
[14:07] And then all of these training arguments.
[14:10] Again, I'm just going to leave
[14:11] these kind of all default,
[14:14] All right.
[14:15] So now we're going to run this.
[14:16] We're going to initialize the trainer.
[14:18] And now that that is created
[14:20] is actually train the model.
[14:22] Now this step is going
[14:23] to take a different amount of time
[14:26] and the different settings
[14:28] The more examples you have, the longer
[14:32] you will get.
[14:33] The larger
[14:35] Then again, the longer this will take.
[14:37] In my case, I'm using a very small model
[14:39] with a very small amount of examples,
[14:43] but this should train in
[14:47] All right, so we are training now
[14:49] and you can watch this window
[14:51] So once this is finished
[14:53] And then I'll show you how
[14:54] we can actually test this model,
[14:57] and then download it
[14:59] All right.
[15:01] Just as an FYI
[15:04] So the next step is we're going to set up
[15:07] just so we can test it and
[15:10] So in order to do that, pretty much
[15:12] here is modify these messages
[15:15] So you can see I have some raw
[15:18] And then I have content
[15:20] You can place multiple messages here.
[15:22] And you can see that this message is
[15:25] I'm doing
[15:28] You don't need to worry
[15:31] We're just kind of putting the input
[15:33] where we're able
[15:36] at least in the inference mode,
[15:40] That will just take a second.
[15:42] And then you can see that
[15:44] So we have user right
[15:47] And then if we go over here
[15:50] And then we get the output in the format
[15:53] Okay.
[15:55] And we can test this
[15:57] to make sure it is working to our liking
[16:01] Okay.
[16:01] So now that we've assumed
[16:04] It's working.
[16:04] We've tested a few times in Google Colab.
[16:07] What we want to do is download this model
[16:11] and then start using it more permanently
[16:15] And this step does take a while.
[16:17] So bear with me here.
[16:18] What we're going to do is run this where
[16:22] Gov is the model format
[16:26] So that's
[16:28] So I'm going to go ahead and press run.
[16:29] It is now going to essentially
[16:33] Download it to Google Colab.
[16:35] And then after that
[16:38] so we can save it to our computer.
[16:40] Now this can take a long time.
[16:42] It can take ten minutes
[16:44] So just bear with it.
[16:45] Be patient.
[16:46] And once it's done you can run this cell.
[16:49] This cell will then download it
[16:52] Then read and move on to the next step
[16:54] where I show you how to actually load this
[16:58] So once this is done executing
[17:01] you should have a file
[17:04] folder that's named something like this
[17:10] If you're just looking for some file
[17:13] this is going to be a pretty large file
[17:17] In my case, both
[17:20] so just keep that in mind.
[17:22] This will take a while,
[17:24] internet connection
[17:26] Okay, now that we have the file,
[17:31] So we need to create
[17:34] So what I'm going to do
[17:36] I'm just going to do this
[17:37] to make our life a little bit easier.
[17:39] And I'm going to open this up okay.
[17:41] I'm going to zoom in a little bit
[17:43] and make sure that a llama
[17:46] So run the llama command,
[17:48] make sure you've got that downloaded
[17:52] Okay, from here
[17:55] So CD into downloads
[17:58] I have this Unsworth kind of
[18:04] So what I'm going to do
[18:06] So I'm going to say mkdir
[18:09] Okay.
[18:10] If we go to downloads now we can see this
[18:13] I'm just going to drag this file
[18:15] So now it's inside of Olamide test okay
[18:18] I'm now
[18:20] And I'm just doing this
[18:22] But you don't need to follow
[18:24] And what we're going to do now
[18:25] is we're going to make something
[18:27] Now a model file
[18:31] that you want to run in a llama.
[18:33] So we need to make one of those.
[18:34] Now to do that we can make a new file
[18:37] And then we're going to call this model
[18:39] Just exactly like this with the capital
[18:44] If we ls we can see it's here.
[18:46] Now to edit this
[18:50] And then I'm going to paste in
[18:54] So it's going to say from dot slash.
[18:56] And then this needs to be the name
[19:00] So the local file
[19:02] So I'm just going to copy the name
[19:03] by going to rename copy
[19:06] So we're going to say from dot slash
[19:10] underscore mgg us again
[19:13] So we're saying from this file right here
[19:17] where this model file exists, then
[19:21] So we can do the top
[19:24] So user end of text.
[19:26] Because that's the way
[19:28] And then we have a template.
[19:29] So the template is going to be user prompt
[19:33] And then we just have a system message.
[19:35] So if we want to tell the model
[19:37] we can in this case
[19:41] So this is a very simple model file.
[19:43] This is how we're going to be able
[19:44] to load in this particular model
[19:47] So now we are going to save.
[19:49] So I'm going to hit Ctrl x
[19:53] Go ahead and press enter.
[19:54] And now you can see
[19:56] So nano model file.
[19:59] It looks like this. Again
[20:01] So now we have a model file
[20:05] What we need to do is add this to a llama.
[20:07] So in order to add this to a llama we're
[20:11] And then we're going
[20:12] So I'm going to say
[20:17] And we're going to do dash f
[20:20] What this is going to do
[20:21] is add a new model configuration
[20:24] So let's go ahead and press enter.
[20:27] And now if we want to see if this is here
[20:30] And we should see that we get the HTML
[20:34] So now if we want to run the model
[20:38] And then the name of this
[20:41] It's going to take a second to load up.
[20:42] And then we can paste
[20:44] And we should see it working.
[20:46] So let's go into our data set.
[20:47] Let's just copy one of the ones
[20:50] So let's copy this right here.
[20:53] Go back and paste it okay
[20:55] I just cleared the screen here
[20:58] Then you can see after
[21:00] This is the result that we got.
[21:02] Now this isn't always going to work.
[21:04] And you'll see now
[21:07] That's because obviously
[21:10] you know a pretty poor model.
[21:11] Not one of the massive models,
[21:13] but generally it is working
[21:16] And more examples pass in.
[21:17] This will give us a better output
[21:20] This is a now custom fine tuned
[21:24] locally on our own computer.
[21:25] We could connect to it from Python or
[21:29] And that is going to wrap up this video.
[21:31] Now I know
[21:33] You do need to experiment
[21:36] But I wanted to show you how to do fine
[21:40] and give you a step by step guide.
[21:42] So hopefully this can get you up
[21:45] All of the assets for this
[21:46] video will be linked in the description
[21:49] And with that said, I look forward
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.