TubeSum ← Transcribe a video

"okay, but I want Llama 3 for my specific use case" - Here's how

Transcribed Jun 14, 2026 Watch on YouTube ↗
Intermediate 10 min read For: Developers and AI enthusiasts with basic understanding of LLMs who want to fine-tune models for specific tasks.
345.6K
Views
8.5K
Likes
197
Comments
242
Dislikes
2.5%
📈 Moderate

AI Summary

David Andre explains how to fine-tune Llama 3 for free using Google Colab and the Unsloth framework. He covers the basics of fine-tuning, data preparation, and step-by-step implementation to adapt the model for specific tasks.

[00:15]
What is Fine-Tuning?

Fine-tuning adapts a pre-trained LLM like Llama 3 to a specific task by adjusting a small portion of its parameters on a focused dataset.

[00:52]
Benefits of Fine-Tuning

Cost-effective (uses a GPU for hours instead of millions), improved performance on specific tasks, and data-efficient (works with as few as 300-500 entries).

[01:49]
How Fine-Tuning Works

Steps: prepare a tailored dataset, update pre-trained weights using optimization algorithms (only possible with open-weight models), then monitor and refine to prevent overfitting.

[02:48]
Real-World Use Cases

Customer service chatbots using proprietary transcripts, content generation in a specific writing style, and domain-specific analysis (e.g., legal or medical texts).

[04:11]
Implementation with Llama 3

Uses a Google Colab notebook (created with Unsloth) to fine-tune Llama 3 8B for free on a T4 GPU. Steps include checking GPU, installing dependencies, loading the model, and configuring LoRA.

[08:43]
Data Preparation

Uses the Alpaca dataset (50,000 rows) with instruction-input-output format. Custom datasets must follow the same structure. Suggests using LLMs to generate larger datasets from a few hand-crafted examples.

[11:39]
Training Configuration

Trains for 60 steps (not a full epoch) for demonstration. For production, use multiple epochs and set max_steps to None. Training loss dropped from ~1.9 to ~0.8 in 8 minutes.

[16:22]
Testing the Fine-Tuned Model

The model correctly answered prompts like listing prime numbers (1-50) and converting binary to decimal. Uses text streamer for token-by-token generation.

[19:28]
Saving the Model

Save LoRA adapters locally or push to Hugging Face Hub. For inference, load adapters by setting a flag to true. Recommends using Unsloth for faster inference.

[23:03]
Quantization and Deployment

Quantize the model (e.g., Q4) for easier deployment on weaker hardware. Can be used with UIs like GPT4All or Oobabooga for easy chatting.

Fine-tuning Llama 3 is accessible and cost-effective, enabling anyone to adapt a powerful LLM to their specific needs using free tools like Google Colab and Unsloth.

Clickbait Check

90% Legit

"Title accurately reflects content: a practical guide to fine-tuning Llama 3 for custom use cases."

Mentioned in this Video

Tutorial Checklist

1 05:01 Check GPU version and install compatible dependencies.
2 05:51 Load the quantized Llama 3 model (e.g., 8B) with 4-bit quantization.
3 08:00 Integrate LoRA to update a fraction of parameters efficiently.
4 08:43 Prepare dataset in instruction-input-output format (e.g., Alpaca).
5 10:44 Define a system prompt and apply it to the dataset with EOS token.
6 11:39 Configure training: set max steps (e.g., 60) or epochs, batch size, learning rate.
7 13:48 Run training with trainer.train() and monitor loss.
8 16:22 Test the fine-tuned model with prompts (leave output blank).
9 19:28 Save LoRA adapters locally or push to Hugging Face Hub.
10 22:40 Quantize the model (e.g., Q4) for deployment on weaker hardware.

Study Flashcards (11)

What is fine-tuning in the context of LLMs?

easy Click to reveal answer

Adapting a pre-trained LLM to a specific task by adjusting a small portion of its parameters on a focused dataset.

00:15

What are three benefits of fine-tuning mentioned?

easy Click to reveal answer

Cost-effectiveness, improved performance on specific tasks, and data efficiency.

00:52

What is the minimum dataset size suggested for fine-tuning?

medium Click to reveal answer

300-500 entries.

01:29

What format does the Alpaca dataset use?

medium Click to reveal answer

Instruction, input (optional), and output.

08:43

What is LoRA and why is it used?

hard Click to reveal answer

LoRA (Low-Rank Adaptation) is a technique to efficiently update a fraction of parameters, enhancing training speed and reducing computation.

08:00

What does the EOS token do in fine-tuning?

medium Click to reveal answer

It signals completion of a response, preventing the model from generating text indefinitely.

11:00

How many steps were used for training in the demonstration?

easy Click to reveal answer

60 steps.

11:39

What was the training loss at the start and end of the 60-step run?

medium Click to reveal answer

Started around 1.9, ended around 0.8.

14:09

How long did the 60-step training take on a T4 GPU?

medium Click to reveal answer

Approximately 8 minutes (476 seconds).

14:44

What is the recommended way to save a fine-tuned model for sharing?

medium Click to reveal answer

Push LoRA adapters to Hugging Face Hub using push_to_hub.

19:28

What is quantization and why is it used?

medium Click to reveal answer

Quantization compresses the model to reduce memory usage, making it easier to run on weaker hardware.

22:40

💡 Key Takeaways

💡

Definition of Fine-Tuning

Provides a clear, plain-English explanation of a core concept.

00:15
📊

Cost-Effectiveness of Fine-Tuning

Highlights the dramatic cost savings compared to training from scratch.

00:52
💡

Customer Service Chatbot Use Case

Illustrates a practical application using proprietary data.

02:48
🔧

Free Fine-Tuning with Google Colab

Demonstrates that fine-tuning is accessible without expensive hardware.

04:11
🔧

Dataset Format Requirement

Specifies the exact JSON structure needed for custom datasets.

08:43
📊

Training Time and Memory Usage

Provides concrete metrics for a real fine-tuning run.

14:44
🔧

Saving LoRA Adapters

Explains how to persist fine-tuning results efficiently.

19:28

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

What is fine-tuning? Plain English

45s

Explains a complex AI concept in simple terms, making it accessible to a broad audience.

▶ Play Clip

Fine-tuning costs cents, not millions

60s

Highlights the cost-effectiveness of fine-tuning vs. training from scratch, appealing to budget-conscious creators.

▶ Play Clip

Real-world use cases for fine-tuning

55s

Shows practical applications like customer service chatbots and writing style mimicry, sparking ideas for viewers.

▶ Play Clip

Fine-tune Llama 3 for free on Colab

60s

Demonstrates a free, accessible method to fine-tune a powerful model, lowering the barrier to entry.

▶ Play Clip

Model generates prime numbers correctly

60s

Visually shows the fine-tuned model performing a specific task correctly, proving the technique works.

▶ Play Clip

[00:00] my name is David Andre and in this video

[00:02] I'll teach you how to fine tune llama

[00:03] free so that it performs 10 times better

[00:06] for your specific use case let's start

[00:08] with what even is fine tuning and I made

[00:11] this explanation in plain English so

[00:13] that anybody can understand fine-tuning

[00:15] is adapting a pre-trained llm like gbd4

[00:18] or in this case Lama 3 to a specific

[00:21] task or domain it involves adjusting a

[00:24] small portion of the parameters on a

[00:27] more focused data set so you know when a

[00:29] new model releases what everybody needs

[00:31] to know is how many parameters it has we

[00:33] have llama 3 8B and always that number

[00:37] like 8B or 70b that's the number of

[00:40] parameters so we're adjusting just a

[00:41] small number of them to make it more

[00:43] focused on a specific thing fine tuning

[00:47] customizes the outputs to be more

[00:49] relevant and accurate for your use case

[00:52] here's the power of fine tuning cost

[00:54] Effectiveness it leverages the power of

[00:57] pre-trained llms which cost tens of

[00:59] millions of dollar if not hundreds of

[01:01] millions to train and we can just you

[01:04] know run a GPU for a few hours and fine

[01:07] tune something for I don't

[01:09] know like cents a few cents or few

[01:12] dollars at most which is just amazing it

[01:15] gives you improved performance because

[01:16] you can enhance the llm on your data set

[01:20] and improve accuracy for specific tasks

[01:23] and it it also is more data efficient

[01:25] you can achieve excellent results even

[01:27] with smaller data sets so you know maybe

[01:29] maybe even like 300 500 entries while

[01:34] you know llama 3 was trained on 15

[01:36] trillion tokens I don't know about you

[01:38] but I'm not have I don't have nearly as

[01:40] much data as Zak so that's why fine

[01:43] tuning is great for people like you and

[01:45] me so how does llm fine tuning actually

[01:49] work first you need to prepare your data

[01:51] set and this you know depending on how

[01:53] hardcore you want to go this can take

[01:55] anywhere from 20 minutes to a few hours

[01:58] to week

[02:00] potentially depends how far you want to

[02:02] take it so you create a smaller high

[02:04] quality data set tailored to your

[02:05] specific use case and label it

[02:08] appropriately which I'll teach you in a

[02:10] bit the pre-rain llms weights are

[02:12] updated incrementally using the

[02:14] optimization algorithms like grade in

[02:16] descent based on the new dat set so we

[02:19] can only fine-tune uh llms that we have

[02:22] access to the weights meaning open

[02:24] source open weights llms you cannot find

[02:26] you gbt 4 if you are not open AI open

[02:29] can do it obviously but me and you we

[02:32] probably don't have gp4 just laying on

[02:34] our

[02:35] computer then you Monitor and refine you

[02:38] evaluate the model's performance on a

[02:40] validation set preventing overfitting

[02:42] and guide adjustments now here are some

[02:45] real world use cases for fine tuning

[02:48] fine tuning and llm or customer service

[02:50] transcripts can create a chat bot like

[02:53] this one that can address issue in a way

[02:55] specific to the company so let's say you

[02:58] know you have a specific product very

[03:00] Niche that is not there is not much data

[03:03] about it on the internet and if somebody

[03:06] messages your customer support email you

[03:08] want your you know chatbot to respond in

[03:12] a specific way based on the information

[03:14] of your product and that data is

[03:15] proprietary it's private only you have

[03:18] it and you can find you an llm to

[03:21] respond based on that data so like

[03:24] technically if you have enough script

[03:26] you can find you an llm to respond like

[03:28] you and you know if you try sh GPT if

[03:32] you even give like sh GPT some writing

[03:35] and tell it continue in this writing

[03:36] style it's terrible so this is where

[03:38] fine tuning could be better tailored

[03:41] content generation so you can fine tune

[03:43] in llm on your posts and descriptions to

[03:45] create engaging summar or marketing copy

[03:47] again in your writing style tailored to

[03:50] your

[03:51] audience domain specific analysis so

[03:54] fine tuning llm on legal or medical text

[03:57] can make it much better for those

[03:59] specific Benchmark so and you might have

[04:01] a model that let's say it reaches 50 on

[04:04] some arbitrary Benchmark with fine

[04:06] tuning it can reach 70 or 80 now let's

[04:09] dive into how to actually implement this

[04:11] on Lama free so I created this Google

[04:14] collab well actually most of it was

[04:16] created by ansoff team A huge shout out

[04:18] to ansoff because they did all the heavy

[04:20] lifting so I'm going to also link their

[04:22] GitHub below now first off I added a

[04:25] component that's only available in April

[04:27] to the community so if you join during

[04:30] April you will get a personalized AI

[04:32] strategy to Future proof yourself and

[04:34] your business so if you want to be among

[04:36] people who are building the future if

[04:38] you want access to all the different

[04:40] courses modules and everything else in

[04:42] the community and to two we Rec calls

[04:45] then consider joining and especially if

[04:47] you want me to give you a personalized

[04:50] AI strategy to Future PR yourself so if

[04:53] that's interesting to you make sure to

[04:54] join the community it's the first link

[04:56] in the description now let's find youe

[04:58] Lama free shall we so first thing we

[05:01] check the GPU version available in the

[05:03] environment and install specific

[05:05] dependencies that are compatible with

[05:07] the detected GPU to prevent conflict so

[05:09] this is uh this cell by the way if you

[05:11] don't know how uh Google collab Works

[05:14] which is you know the software I'm using

[05:16] right now it's super simple it's

[05:18] basically um splitting the code into

[05:21] cells it's called The jupyter Notebook

[05:23] but it's like much more easier to see

[05:25] you can add text you can add graphics

[05:27] and it's great for like tutorials and

[05:29] explaining right so if you never use

[05:31] this it's great because it's free and

[05:32] Google actually gives you a GPU so you

[05:34] can use this T4 GPU to train this model

[05:38] for free and if you want faster you can

[05:41] obviously upgrade it right so I'm going

[05:43] to link this collab below the video as

[05:45] well so we run this cell which does what

[05:48] I just explained the next cell we need

[05:51] to prepare to load a range of quantied

[05:54] language models including the new 15

[05:56] trillion lvfree model so trained on 15

[05:59] trillion tokens and it's optimized for

[06:02] efficiency with forbit quantization I

[06:03] mean I'm not going to even pretend I

[06:06] know everything about fire tuning

[06:07] because I don't so if you know if um it

[06:11] seems like I have gaps in my knowledge

[06:13] it because it is I do have those gaps in

[06:15] my knowledge so I try to make it as

[06:17] simple as possible but if this proves

[06:19] something it proves that you don't have

[06:20] to be a machine learning expert to find

[06:22] your models so you know just follow

[06:25] along so here this is the max sequence

[06:27] length uh obviously 3 is up to 8,000 so

[06:32] I mean 2,000 is plenty for this

[06:33] demonstration but you can do anything

[06:35] you can do 4,000 or

[06:38] 8,000 here use 4bit quantization to

[06:41] reduce memory usage but it can be false

[06:44] as well so here are the models we can

[06:45] see like we have mro 7B llama 2 which is

[06:49] the old one Gemma from Google but

[06:50] obviously we're interested in llama 3 8B

[06:54] and by the way we can also use llama

[06:55] 370p if you want which obviously will

[06:58] take longer because uh it's a much

[07:00] bigger model so in that case you might

[07:02] uh want to buy the premium version of of

[07:04] collab or just wait for a while but yeah

[07:07] I mean uh everything is the same just

[07:09] here you would change the model to Lama

[07:11] fre 70p and if you want to use like a

[07:15] gated models from hugging face which

[07:18] gated means that you have to usually

[07:20] agree to some you know license or

[07:21] whatever then here just remove the

[07:25] comand and then put your hugging face

[07:27] token here super simple now by the way

[07:31] you always have to run this so what you

[07:32] do when you go to Google collab you

[07:34] click on run time and click run all that

[07:36] way all of the cells run but you can

[07:38] also do it one by one by clicking this

[07:40] button right here next to each cell and

[07:43] it needs to have this little tick green

[07:44] tick that way it was uh executed here

[07:47] it's not because I you know removed the

[07:51] I changed this so anytime you make any

[07:53] change it disappears but that doesn't

[07:54] matter it was still executed so it's

[07:56] stored in the run time next next up we

[08:00] integrate Laura again you don't have to

[08:02] understand what this is but it's

[08:03] basically um way of fine-tuning into our

[08:06] model which allows us to efficiently

[08:08] update just a fraction of the parameters

[08:10] enhancing training speed and reducing

[08:12] computation load so again we are not

[08:15] training the model from scratch we're

[08:16] just fine-tuning a few parameters for

[08:19] our specific use case and here you can

[08:21] change the r to Any number greater than

[08:23] zero 8 16 32 64 up to

[08:27] you and your goals would want to do with

[08:29] it by the way on SLO the reason I'm

[08:31] using it is because it's uh makes fine

[08:34] tuning much faster and consuming less

[08:36] memory so it's actually a great uh great

[08:40] framework for this data prep we now use

[08:43] the alpaka data set from yma which is

[08:46] this one which has 50,000 rows and I

[08:49] have it loaded in vs code here just that

[08:51] way you see how it looks like in Json

[08:53] formatting so you know it's a lot of

[08:56] lines because for everyone it's

[08:57] basically times five yeah so like 200

[08:59] 50,000 uh lines and it's like every one

[09:03] every one of them has an

[09:05] instruction should probably Zoom it up

[09:07] Zoom it

[09:09] in so yeah every one entry has a

[09:13] instruction give fre tips for staying

[09:15] healie input this is not mandatory

[09:18] because instruction is already enough

[09:19] context and then output this is what the

[09:22] llm should say and you do this enough

[09:25] times and the llm you know learns it

[09:27] basically learns right so we you can see

[09:29] it probably better here uh and if you

[09:31] want to use your own data set you have

[09:33] to format it the same way so you know

[09:35] just having output input and

[09:37] instructions these three um parameters

[09:41] but yeah just look at this not all of

[09:43] them have the input which is fine I mean

[09:45] probably like 20% or 15% have the input

[09:49] and that's just extra context so yeah uh

[09:51] I'm also going to link this data set

[09:52] below but if you want your own data set

[09:55] which you know if you want your own use

[09:56] case just make sure to format it the

[09:58] same way so you know instruction some

[10:01] text input some extra context or empty

[10:04] and output how the model should respond

[10:07] and you know if you if you're getting

[10:09] creative you can definitely use llms to

[10:12] generate these large data sets much

[10:14] faster I mean maybe you create really

[10:17] like 20 high quality examples by hand

[10:20] and then you run a team of Agents um for

[10:23] creating that data set that can just you

[10:24] know use those 20 examples to create

[10:27] 50,000 like in this data set but yeah

[10:30] that's a topic for a whole another video

[10:32] so if you want me to make a video on how

[10:33] to make data sets for fine tuning then

[10:36] let me know but let's go back to our

[10:41] collab so then we Define a system prompt

[10:44] which is you know custom instruction

[10:45] system prompt which you already know

[10:47] hopefully that formats tasks into

[10:49] instruction inputs and responses so this

[10:51] has to fit with our data set and we

[10:54] apply it to our data set for the model

[10:57] and we add the EOS token to Signal

[11:00] completion so this token right here here

[11:02] we Define it and here here we add it

[11:04] because without this the token

[11:05] generation continues forever so we don't

[11:08] want that obviously so let's look at the

[11:11] system prompt it's very simple it says

[11:13] below is a instruction that describes a

[11:16] task paired with an input that provides

[11:18] further context WR the response

[11:22] that appropriately completes the request

[11:25] and that's our system prompt and then we

[11:27] feed it the instruction the input and

[11:30] response and obviously you can change

[11:32] the system prompt if you

[11:35] want now train the model we do a 60 step

[11:40] uh we do only 60 steps here to speed

[11:42] things up um you can like this is

[11:45] obviously very small because it's not

[11:47] even one Epoch training Epoch so uh if

[11:51] you want to like actually use something

[11:53] for production or your business you

[11:56] probably want to train it for longer

[11:57] than 60 steps and I'm going to show you

[11:59] how how in this

[12:01] bit so if you if you do multiple EO you

[12:04] have to turn Max steps none so here okay

[12:08] number number of trained eox is not

[12:11] included in here so what you would do is

[12:13] you would copy this and you would go in

[12:16] here and look at the steps right so we

[12:18] have the steps here you would add this

[12:21] maybe you would do four or

[12:24] whatever however many you want the more

[12:26] the better but at a certain point it

[12:28] starts to not yield better result so max

[12:33] steps you have to change it to none

[12:34] right so this is 60 60 right now so you

[12:37] do none and this is where you would do

[12:39] like proper fine tuning but um you know

[12:42] I just add it that 604 demonstration

[12:45] that way it's faster and it still took

[12:47] like 8 minutes so I'm not going to

[12:48] replicate it I'm just going to show it

[12:50] everything but yeah basically um you

[12:53] know this is what you do you decide how

[12:55] many EO you want and then at this stage

[12:58] we confir configuring our models

[13:00] training setup where we Define things

[13:02] like badge

[13:03] size and learning rate to teach our

[13:06] model effectively with the data we've

[13:07] prep prepared so obviously you can like

[13:10] mess with stuff here um again I'm not

[13:13] going to PR pretend I understand

[13:14] everything but the main things are you

[13:17] know backing like this can make it five

[13:19] times faster for short sequences

[13:21] obviously the steps and the epox but um

[13:26] yeah I mean if you're confused something

[13:28] just take a screenshot boom like this

[13:31] and ask sh

[13:33] GPD now this is the current memory stats

[13:36] right so we're using the Tesla T4 GPU

[13:39] provided from Google for free and the

[13:42] max memory is 14

[13:45] GB and this is where the training begins

[13:48] this is the magical part right so here

[13:51] we do this line of code trainer stats uh

[13:55] trainer. train and this will give us the

[13:57] statistics as the model trains so again

[14:00] this is only 60

[14:01] steps which is um like zero EPO but yeah

[14:07] um you can see the training loss going

[14:09] down so like basically smaller number is

[14:11] better here so you can see like at the

[14:13] start we have 1.8 2 like 1.9 and then it

[14:17] quickly starts dropping to like 0.9 you

[14:19] know around 1 0.8 so it fluctuates a bit

[14:23] but it consistently go down 0.7 but you

[14:26] can see it's reaching like a as symt

[14:28] right obviously it's only 60 steps so

[14:31] really doesn't mean anything um but yeah

[14:34] like we ended up like 0.8 from like two

[14:38] so it shows you like if the model is

[14:40] actually

[14:42] improving and this took like 8 minutes

[14:44] you can see the stats here right so 476

[14:47] seconds almost exactly 8 minutes Peak

[14:50] Reserve memory was 8.9 GB and for

[14:53] training was 3.3 GB so not like this is

[14:56] the power of unso it's like really

[14:58] optimized for for this to use uh to run

[15:01] faster and to use less memory so that

[15:04] way we can find tune gpus for cheaper I

[15:06] mean you know I'm using a free T4 GPU

[15:09] from Google so it's free but it's faster

[15:12] like if you didn't use unso it would be

[15:13] a lot

[15:15] slower so okay so 60% of we used 60% of

[15:19] max memory so that's good because we

[15:21] didn't like hit the limit so we still

[15:23] have like 40% reserved and for training

[15:27] uh it was only 22% which is even better

[15:30] inference which is which means here we

[15:33] actually run our new model that we

[15:34] fine-tuned and okay so this data set is

[15:38] for like instructions and this is

[15:39] basically when you see a model that is

[15:42] like instruct at the end of it this is

[15:44] what they mean it's just trained on a

[15:46] large data set of instructions because

[15:48] usually the models are more for like

[15:50] chatting for text generation you know

[15:52] you give it some input and it's like

[15:54] gives you some output it's you know for

[15:56] more conversational here for

[15:58] instructions for instruct models is to

[16:00] follow instructions you give it a task

[16:02] and it completes it so like we can see

[16:04] it probably here in vs code like rewrite

[16:07] the sentence to change its meaning and

[16:08] then output the Fe

[16:10] escaped compar to dat sub so this is

[16:13] like all tasks it's all in instructions

[16:15] and then it shows how the model should

[16:17] do it

[16:19] so let's look at it right so now we've

[16:22] trained the model this took like 8

[16:24] minutes to do so all of you can do this

[16:26] the beauty of using a Google cloud is

[16:28] that obviously ly it doesn't matter what

[16:30] machine you have even if you have a

[16:31] terrible computer this will take the

[16:33] exact same time because you're using the

[16:35] GPU and

[16:37] Cloud so obviously here you can change

[16:40] your prompt I mean this is you know I

[16:42] changed the prompts here so this is my

[16:44] prompt uh but always make sure to leave

[16:46] the output blank so here the first one

[16:48] is the instruction then this is the

[16:50] input like the extra added context and

[16:52] the output leave it blank because the

[16:54] model will generate it right so list the

[16:57] prime numbers contained within this

[16:58] range and then the range is here in the

[17:01] input 1 to 50 and then the model our new

[17:05] findun the Lama 3 generates the output

[17:07] so let's look at this 2 3 5 7 11 13 17

[17:12] 19 23 29 and just by looking at it uh

[17:15] you can see it's correct I mean none of

[17:17] these numbers are divisible so yeah this

[17:19] is correct all of them are prime

[17:21] numbers and also this is this is even

[17:24] better like I think this is much more

[17:27] visible using text streamer for

[17:29] continuous inference and I'm I'm just

[17:31] going to show it again by the way this

[17:33] is how it looks right so you have the

[17:35] instructions it's separated but that's

[17:37] not the main thing not only is it

[17:38] formatted better it's uh continuous

[17:40] inference so you can see the token

[17:41] generation token by token instead of

[17:43] waiting for the whole time so if I run

[17:46] this as you can see it waits and it

[17:48] generates it all at once right so boom

[17:51] it like appeared all at once so if you

[17:53] want to see it token by token this is

[17:55] much better right look at how fast it is

[17:58] this is the power of llama 3 8 billion a

[18:00] small model but a very capable model so

[18:04] um yeah Tech streamer is great for this

[18:06] and you can see it how it's generating

[18:08] the

[18:09] answer so yeah this is um the next

[18:12] prompt I Ed myself convert these binary

[18:15] numbers to decimal and then here and by

[18:17] the way again you can use these proms

[18:19] like example create like 20 30 by hand

[18:21] maybe and then you know feed this into

[18:24] CH GPD or your team of Agents something

[18:27] automated ideally or there is I think

[18:29] there's a service for like like

[18:31] reflection AI or something like that but

[18:33] yeah either way you can creating large

[18:35] data sets need to needs to be automated

[18:37] right so you cannot do that by hand but

[18:39] either way like you create something

[18:41] like this so uh these examples and then

[18:44] you would feed that into your own data

[18:45] set obviously relevant to your use case

[18:48] to your business and you just go crazy

[18:52] and create as many as as you possibly

[18:53] can

[18:55] so like really at least 1,000 like this

[18:58] is 50,000 and it's still probably could

[19:01] be larger so yeah I mean you have to

[19:04] again that's probably another video to

[19:06] build a team of agents to um generate

[19:09] data sets but yeah okay so here we give

[19:12] it um three different binary numbers and

[19:15] it tasks its tasks is to convert to

[19:18] decimal and as you can see it does it

[19:20] flawlessly I mean this is 10 13 15 that

[19:23] is correct so we have the model we

[19:25] tested it a bit with two prompts now

[19:28] it's time to to save it and because you

[19:30] know we spend all this time all this GPU

[19:32] power training it we don't want to go

[19:34] with to waste because if you restart the

[19:36] run time in Google collab um your model

[19:39] will disappear obviously you can run it

[19:41] again but then you have to wait again

[19:43] and you know maybe run out of the three

[19:45] GPU hours so to save the final model as

[19:48] Lura adapters we can either use hugging

[19:51] face push to hub for an online save if

[19:54] you wanted your model listed on hugging

[19:56] phase so hugging phase lists data assets

[19:59] and models it's probably the main two

[20:00] things it's used for so if you want your

[20:03] model shared then you would do that but

[20:05] if you want it just Sav on your computer

[20:07] do safe pre-train for a local

[20:10] save by the way this only saves the

[20:13] loraa adapters meaning um the like

[20:16] basically the things that were changed

[20:18] it doesn't save the entire model with

[20:19] the change parameters just the changes

[20:21] right so uh it's less memory and yeah

[20:25] just faster to save so but if you want

[20:28] to save the L adapters with the save

[20:30] model uh you can change this if you want

[20:33] to load the L adapters we saved for

[20:36] inference you would change this false to

[20:38] true so simply changing

[20:42] this and yeah this is the model name so

[20:45] obviously you can change this this is

[20:47] your model used for training uh here is

[20:49] just laa model but you can name it lava

[20:51] free I don't know copyrighting or Lama

[20:53] free uh medical diagnosis whatever your

[20:57] um use case is obviously and then

[21:01] um here the alpaka prompt so yeah this

[21:04] is the variable we declared earlier so

[21:07] this is the importance you can just go

[21:08] into the collab and try to running this

[21:10] cell you have to run the cells from

[21:12] above otherwise this will not work so

[21:14] whenever you're using a jupyter notebook

[21:16] such as Google collab always run all

[21:19] cells in order otherwise it will not

[21:21] work so

[21:23] yeah so this is the same uh format right

[21:27] from earlier inst ction input output at

[21:30] this point you should be familiar with

[21:31] this and that's just for this particular

[21:33] data set and for this style of prompting

[21:36] so if you have a different one then

[21:38] follow the different one so obviously

[21:40] here U what is the famous St Tower in

[21:43] Paris obviously it's Eiffel Tower blah

[21:45] blah blah it gives some extra info about

[21:48] it so you can also use hugging face Auto

[21:51] model for perf casual LM but ANS slof

[21:54] does not recommend this because it's a

[21:56] lot slower than ANS slof

[21:59] so yeah uh if possible use unslow for

[22:02] Speed and as the name suggests of you

[22:04] know unlove it's UNS slowing everything

[22:07] it's making everything two to five times

[22:09] faster so why not do that with 80% less

[22:12] memory so

[22:13] yeah okay and then we're preparing to

[22:16] save our trained model in a more compact

[22:18] format and then upload it into a cloud

[22:20] platform which allows for Less storage

[22:23] and comparation power so again like I'm

[22:25] not going to even pretend I understand

[22:26] everything because this is honestly

[22:28] stepping outside of my comfort zone but

[22:31] like just building this and doing this

[22:33] fine tuning taught me a lot so if you

[22:35] want more technical videos like this let

[22:37] me know next we're ready to compress our

[22:40] model using various quantization methods

[22:42] which means just you know making it

[22:44] easier to run or a machine so maybe you

[22:47] cannot like if you have a bad computer

[22:49] maybe you cannot run the full model but

[22:51] you definitely can run a quantized

[22:52] version of it it makes it leaner and

[22:55] then uh we upload it to the cloud for

[22:58] easy sh in this is what this piece of

[23:00] code does

[23:03] and so we use the model un. GF file or

[23:08] the quantise version so the Q4 means

[23:10] quanti in Lama CCP or if you want a UI

[23:14] based system which probably you do which

[23:16] is easier to use you can use like GPT

[23:18] for or or um is the other one is

[23:21] escaping me but yeah these are basically

[23:24] these USB system that you can use

[23:26] to or llm anything yeah I don't know if

[23:29] this supports it but yeah basically

[23:31] these uh these U Frameworks have a UI

[23:34] that's easy to chat with and you can use

[23:37] open source model there so if you do if

[23:40] you find this you can upload this to GPD

[23:42] for all and chat with your own model

[23:44] very easily and yeah that's it you know

[23:47] how to fine tune Lama 3 for your your

[23:49] own specific use case again I'm going to

[23:52] leave these resources below the video

[23:55] and if you have any questions regarding

[23:57] to anso join the their Discord so yeah

[24:00] that's it if you find this useful then

[24:03] please subscribe and again if you want

[24:05] during April which is what like eight

[24:07] days nine days left if you join the

[24:09] community you will get a personalized AI

[24:11] strategy to Future prooof yourself and

[24:13] your business so if that sounds valuable

[24:15] to you then make sure to join it's the

[24:17] first link in the description thank you

[24:19] for watching

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.