TubeSum ← Transcribe a video

Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Transcribed Jun 15, 2026 Watch on YouTube ↗
Intermediate 13 min read For: Machine learning practitioners and developers interested in fine-tuning LLMs with limited resources.
229.8K
Views
4.5K
Likes
177
Comments
60
Dislikes
2.0%
📊 Average

AI Summary

Krish Naik introduces a series on fine-tuning LLMs, starting with a practical demonstration of fine-tuning Llama 2 using LoRA and QLoRA techniques. The video covers parameter-efficient transfer learning, quantization, and step-by-step code implementation in Google Colab.

[00:00]
Introduction to Fine-Tuning Series

Krish Naik announces a series on fine-tuning various LLMs, starting with Llama 2 using custom datasets and techniques like PEFT and LoRA.

[00:42]
Plan for This Video

This video focuses on practical implementation with a code template, dataset preprocessing, and quantization. Theoretical intuition will be covered in a follow-up video.

[01:22]
Importance of Fine-Tuning Open-Source Models

With many open-source models like Llama 2, Mistral, and Falcon, knowing how to fine-tune them with custom data is valuable for companies.

[02:30]
Techniques Covered: PEFT and LoRA

Parameter Efficient Transfer Learning (PEFT) and Low-Rank Adaptation (LoRA) are used to fine-tune large models efficiently.

[03:33]
Installing Required Libraries

Libraries include accelerate, peft, bitsandbytes for quantization, transformers, and trl.

[04:42]
Understanding PEFT

PEFT freezes most weights of the LLM and retrains only a subset, enabling fine-tuning with limited resources.

[06:08]
Llama 2 Prompt Template

Llama 2 uses a specific prompt template with system, user, and assistant sections. Datasets must be reformatted accordingly.

[07:00]
Dataset: Open Assistant Guanaco

The dataset used is Open Assistant Guanaco, containing human-assistant conversations. 1,000 samples are used for fine-tuning.

[09:10]
Resource Constraints and Quantization

Google Colab's free GPU (15GB) is insufficient for full fine-tuning of 7B model. Quantization (4-bit) reduces memory usage.

[10:53]
LoRA and QLoRA Configuration

LoRA rank is set to 64, scaling parameter (alpha) to 16. Model is loaded in 4-bit precision using bitsandbytes.

[12:02]
Training Arguments

Training arguments include output directory, 1 epoch, fp16/bf16, batch size, learning rate, and cosine scheduler.

[15:00]
Loading Model and Tokenizer

AutoModelForCausalLM loads Llama 2 in 4-bit with quantization config. Tokenizer is loaded with padding and EOS token.

[18:46]
Supervised Fine-Tuning with SFTTrainer

SFTTrainer from trl is used with model, dataset, LoRA config, tokenizer, and training arguments to perform fine-tuning.

[21:59]
Training Completion and Results

Training completed 250 steps in 25 minutes on Colab. Training loss reached 1.36. Model saved as adapter.

[23:20]
Inference with Fine-Tuned Model

Using pipeline for text generation, the model answers prompts like 'What is large language model?' and 'How to own a plane in United States?'

This practical tutorial demonstrates fine-tuning Llama 2 with LoRA/QLoRA on a custom dataset. The next video will explain the theoretical intuition behind these techniques.

Clickbait Check

90% Legit

"Title accurately describes the tutorial; video delivers step-by-step fine-tuning of Llama 2 with LoRA and QLoRA."

Mentioned in this Video

Tutorial Checklist

1 03:33 Install required libraries: accelerate, peft, bitsandbytes, transformers, trl.
2 04:42 Import libraries: os, torch, datasets, transformers (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig), peft, trl.
3 06:08 Define Llama 2 prompt template with system, user, and assistant tokens.
4 07:00 Load and preprocess dataset (Open Assistant Guanaco) into Llama 2 format. Use 1,000 samples.
5 10:53 Configure LoRA: set rank=64, lora_alpha=16, target_modules, lora_dropout=0.1.
6 12:02 Configure bitsandbytes for 4-bit quantization: bnb_4bit_compute_dtype=float16, bnb_4bit_quant_type='nf4'.
7 15:00 Load Llama 2 model in 4-bit with quantization config and tokenizer with padding token.
8 18:46 Set training arguments: output_dir, num_train_epochs=1, per_device_train_batch_size=4, learning_rate=2e-4, fp16=True.
9 18:46 Initialize SFTTrainer with model, dataset, LoRA config, tokenizer, and training arguments.
10 21:59 Train the model. After training, save the adapter model.
11 23:20 Use pipeline for inference: load fine-tuned model and tokenizer, generate responses to prompts.

Study Flashcards (12)

What does PEFT stand for?

easy Click to reveal answer

Parameter Efficient Transfer Learning.

00:20

What is LoRA?

easy Click to reveal answer

Low-Rank Adaptation of large language models.

00:23

Why is quantization used in fine-tuning LLMs?

medium Click to reveal answer

To reduce memory usage by converting weights from 32-bit to lower precision (e.g., 4-bit).

03:50

What is the purpose of the bitsandbytes library?

medium Click to reveal answer

It is used for quantization of model weights.

03:49

What is the Llama 2 prompt template structure?

hard Click to reveal answer

It uses [INST] for system prompt, <<SYS>> and <</SYS>> for system message, and [/INST] for user prompt, with model answer after.

06:08

What dataset is used in this tutorial?

easy Click to reveal answer

Open Assistant Guanaco dataset.

07:00

How many samples are used for fine-tuning?

easy Click to reveal answer

1,000 samples.

08:10

What is the rank parameter in LoRA?

medium Click to reveal answer

It is a hyperparameter that determines the low-rank dimension; set to 64 in this tutorial.

11:21

What is the scaling parameter (alpha) in LoRA?

medium Click to reveal answer

It controls the scaling of the low-rank adaptation; set to 16.

11:23

What is the role of SFTTrainer?

medium Click to reveal answer

It performs supervised fine-tuning of the model with the given dataset and configuration.

18:46

How long did the training take on Google Colab free GPU?

easy Click to reveal answer

25 minutes for 250 steps.

22:02

What was the final training loss?

easy Click to reveal answer

1.36.

22:19

💡 Key Takeaways

🔧

Introduction to PEFT and LoRA

Defines the core techniques used for efficient fine-tuning.

00:20
💡

Quantization Explanation

Explains why quantization is necessary for limited GPU memory.

03:50
🔧

Llama 2 Prompt Template

Critical for correctly formatting data for Llama 2.

06:08
📊

Resource Constraints

Highlights the practical challenge of fine-tuning large models on free hardware.

09:10
🔧

SFTTrainer Usage

Shows the key step of using SFTTrainer for supervised fine-tuning.

18:46

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Fine-tune LLAMA 2 with LoRA in 25 min

45s

Shows a complete fine-tuning pipeline in a short time, appealing to ML practitioners wanting quick results.

▶ Play Clip

Why full fine-tuning is impossible on free Colab

50s

Explains the memory bottleneck with 7B models, making the case for PEFT techniques relatable to many developers.

▶ Play Clip

LoRA config: rank 64, alpha 16 explained

50s

Demystifies key hyperparameters in LoRA, a common pain point for beginners.

▶ Play Clip

Testing fine-tuned model: 'What is LLM?'

50s

Shows real inference output, proving the fine-tuning worked, which is highly satisfying and educational.

▶ Play Clip

[00:00] hello all my name is krishak and welcome

[00:02] to my YouTube channel so guys I'm happy

[00:04] to announce that I will be soon creating

[00:06] a series of videos of showing you that

[00:09] how you can fine-tune various llm models

[00:12] using custom data set in this video we

[00:14] are going to see how we can fine-tune

[00:16] Lama 2 model uh with the custom data set

[00:18] by using techniques like parameter

[00:20] efficient transfer learning and low rank

[00:23] adaptation of large language models

[00:25] which is also called as Laura so all

[00:27] these techniques we'll specifically use

[00:29] in this particular video I will show you

[00:30] the Practical

[00:32] implementation uh and in the upcoming

[00:34] video because I was planning that how I

[00:36] can efficiently teach you this entire

[00:38] fine-tuning techniques because it is a

[00:39] complex topic altoe so first of all in

[00:42] this video we'll see the entire

[00:43] implementation quickly there will be a

[00:45] template of code which will try to learn

[00:47] we'll take a data set if there is data

[00:49] pre-processing that is required we will

[00:51] do it if there is quantisation that is

[00:52] required we will specifically do it okay

[00:55] and then in the upcoming video I will

[00:56] try to demonstrate the entire

[00:59] theoretical intuition

[01:00] about this parameter efficient transfer

[01:02] learning and low rank adaptation what

[01:04] exactly it is and there is also another

[01:06] variant which is called as chora okay

[01:08] and then we will try to relate this

[01:10] entire theoretical intuition with the

[01:12] Practical implementation it will be

[01:14] amazing to understand because that is

[01:16] how I have also learned and it was very

[01:18] much helpful for me in order to

[01:20] understand each and everything as you

[01:22] all know guys there there are lot of

[01:24] Open Source models that are going to

[01:25] come up in the future also and good good

[01:27] models like Lama 2 Mistral falcon there

[01:30] are so many models as such it is better

[01:32] that we should know how to fine-tune all

[01:33] these models with our own custom data

[01:35] set and that is what companies will be

[01:37] requiring so let's go ahead and let's

[01:39] see that how you can uh fine-tune your

[01:41] llama 2 model uh with this techniques

[01:44] again here we'll be using Transformers

[01:46] uh from hugging face and there will be a

[01:48] lot many different libraries that we'll

[01:49] be using with respect to this at least

[01:52] get the, ft overview about these topics

[01:55] and in the next topic when I discussed

[01:57] about the theoretical intuition your

[01:59] knowledge will get more intact and

[02:01] you'll be able to understand it so let's

[02:03] go ahead and let's proceed towards the

[02:04] Practical implementation hello all my

[02:07] name is krishn and welcome to my YouTube

[02:09] channel so guys in this particular video

[02:11] we are going to see the stepbystep way

[02:14] of probably fine tuning your llm models

[02:19] in this case I'm going to specifically

[02:20] take open- Source Lama 2 model and with

[02:23] the help of a custom data set we are

[02:26] going to fine-tune this specific model

[02:28] right over here we are going to learn

[02:30] about various techniques practically not

[02:33] theoretically because if you really want

[02:35] theoretically you can let me know in the

[02:37] comment section so we will be discussing

[02:39] about something called as parameter

[02:42] efficient transfer learning for NLP

[02:44] which is an amazing technique to

[02:47] basically fine-tune all these llm models

[02:50] which will definitely be of use size

[02:52] like 70 billion parameters and all so

[02:55] how this parameter efficient transfer

[02:57] learning actually happens we'll try to

[02:58] see in the code and and we are also

[03:00] going to see a technique which is called

[03:02] as Laura right so Laura paper if I go

[03:05] ahead and search right it is basically

[03:07] called as low rank adapt adaptation of

[03:11] large language models right so these are

[03:13] some of the mathematical concept don't

[03:15] worry in the upcoming videos I will talk

[03:18] about all every theoretical intuition

[03:20] about PFT about Laura right now a simple

[03:25] way of fine-tuning I'm just going to

[03:26] show you because many people were

[03:28] requesting for this right so initially

[03:31] what we will do is that we will go ahead

[03:33] and install some of the important

[03:34] libraries like accelerate PFT as I said

[03:38] PFT is nothing but parameter efficient

[03:40] transfer learning inside this only

[03:42] you'll find this Laura technique which

[03:44] is called as low rank adaptation of

[03:46] large language models uh then we have

[03:49] bits and bytes bits and bytes are

[03:50] specifically used for doing quantization

[03:53] now what does quantisation basically

[03:54] mean all these llm models you know when

[03:57] they are trained with 70 billion

[03:58] parameters or 13 billion parameters by

[04:01] default the weights data types are in

[04:03] the form of floating values right when

[04:05] we say floating values that they are

[04:07] basically 32 bid values what we can

[04:09] actually do and obviously since I'm

[04:11] actually going to do this in Google

[04:12] collab we get a very less Ram so it is a

[04:16] better way that you quantize those

[04:18] weights you know from float 32 probably

[04:20] convert that into int 8 and then

[04:23] probably based on the Ram size you'll be

[04:26] able to quickly fine tune it along with

[04:28] that I will be also so we'll also be

[04:30] using Transformers and then you have TRL

[04:33] so all this libraries will go ahead and

[04:35] execute it and once we specifically

[04:38] execute it you'll be able to see that

[04:39] all these libraries will get installed

[04:42] now in the Second Step the major thing

[04:46] is that we will specifically be using

[04:48] the library called as Transformers which

[04:50] is specifically used for this particular

[04:52] purpose and internally we'll also be

[04:54] using PFT which is having some Laura

[04:57] configuration and we'll use this PF

[04:59] model I know you'll not be able to

[05:02] understand what exactly PFT is but I'll

[05:04] just tell you in some time just let me

[05:06] go ahead with but at the end of the day

[05:08] PFT actually uh you know uses techniques

[05:12] which will try to freeze you know when

[05:14] it applies transfer learning on these

[05:15] llm models it is freezing most of the

[05:18] weights of that llm model and only some

[05:21] of the weights will be retrained and

[05:23] based on that they will be able to

[05:25] provide you accurate results based on

[05:27] your custom data set okay uh how it is

[05:30] done don't worry I'll create a amazing

[05:32] dedicated video to make you understand

[05:35] this mathematical intuitions okay now

[05:37] over here you'll be able to see that I'm

[05:39] going to import OS import torch I'm

[05:41] going to use a data set I will talk

[05:43] about what data set we are going to

[05:45] specifically do the fine tuning but here

[05:47] we are specifically using open source

[05:49] llm models and then from Transformer I'm

[05:51] going to use Auto model for casual LM

[05:53] Auto tokenizer bits and bytes I will

[05:56] talk about all these libraries as we go

[05:57] ahead so let me quickly go ahead and

[06:00] execute it okay now till this is getting

[06:03] executed this import statement is

[06:04] getting executed let's talk about some

[06:06] of the important properties over here

[06:08] with respect to llama 2 in the case of

[06:10] llama 2 the following prompt template is

[06:12] used for chat model so this is the

[06:14] specific prompt template uh here we be

[06:19] give an instruction in this s symbol and

[06:22] then we have our system prompt which

[06:23] will be closed with the CIS brackets and

[06:26] then you will also be able to give your

[06:29] user prompt over here and the model

[06:30] answer will be coming after this after

[06:32] this entire instruction okay so this is

[06:35] how the entire Lama 2 models llm models

[06:39] specifically require the system prompt

[06:42] and the user prompt and the model answer

[06:43] format right now any data set that you

[06:47] specifically get right we really need to

[06:49] convert that data set into this format

[06:52] okay and that is how I will show you how

[06:54] to probably do this there's a technique

[06:57] uh you can also write your own custom

[06:58] code and all there are many ways okay

[07:00] now what we'll do we will reformat our

[07:03] instruction data set to follow Lama 2

[07:05] template so right now we are going to

[07:06] use this data set which is basically

[07:09] called as open open

[07:12] Assistant Guan guanako I hope I'm

[07:15] pronouncing it right now here you will

[07:17] be able to see this is my data set right

[07:19] human can you write a short introduction

[07:21] about the relevance of term uh monopsony

[07:24] in economics please use example related

[07:26] to this and then Mon Mon monopsony ref

[07:29] first to the market so here you can see

[07:31] assistant answer so here the data set is

[07:34] basically in the form of human and

[07:35] assistant like human has a question over

[07:38] there and assistant is probably

[07:39] providing uh you a specific answer so in

[07:42] this format you'll be able to find out

[07:44] each and every rows each and every rows

[07:47] in different different languages so we

[07:49] are going to take this entire data set

[07:52] and then considering this entire data

[07:55] set what we are going to do we are going

[07:56] to reform the data set following the

[07:58] Lama 2 template and out of all these

[08:01] samples all this data set there are

[08:02] around how many data sets are there I

[08:05] guess there are around 10 10K records we

[08:08] just going to take thousand uh th000

[08:10] Records or 1K records the reason is that

[08:12] I really need to show you how the

[08:13] fineing is basically done so if I go

[08:16] ahead and click on this and if you see

[08:18] this format right this format you'll be

[08:22] able to see that this entire data set is

[08:24] converted in this format only right

[08:26] instruction is basically there the

[08:28] answer is over here and this entire s is

[08:30] getting closed right so all the data set

[08:32] is basically converted into that

[08:34] specific format now how do you convert

[08:37] it right so for that already what we

[08:40] have basically done is that over here to

[08:42] know how this data set was created you

[08:43] can check this notebook so this notebook

[08:45] is there already you can see that we are

[08:47] loading the data set we are applying

[08:49] this we are taking the Thousand records

[08:51] and then we are transforming right so in

[08:53] transforming basically a simple python

[08:55] code like I have to probably keep in

[08:57] that specific format right so that is

[08:59] the reason I'm showing you this specific

[09:00] code over here just by one click you

[09:02] will be able to do that okay so all the

[09:04] links are actually given now you need to

[09:07] follow Now understand guys see

[09:10] understanding how the specific

[09:12] techniques are definitely I'll create a

[09:14] dedicated theoretical video

[09:16] understanding all the maths equations

[09:17] that is required right over here we are

[09:19] trying to see that how you can also run

[09:21] your own fine tun model right so note

[09:24] you don't need to follow a specific prom

[09:25] template if you're using the base Lama 2

[09:27] model but right now we'll not use we'll

[09:29] use will not use this base Lama 2 model

[09:31] okay how to F tune Lama 2 so these are

[09:33] some of the steps not only with Lama 2

[09:35] with other models also this will work

[09:37] but again there the format may change

[09:40] you know the the format of the

[09:41] instruction the format of your prompts

[09:43] may change so free Google collabs offers

[09:46] a 15gb graphic card right so limited

[09:49] resources barely enough to store Lama to

[09:51] 7 billion weights now here we are going

[09:52] to use 7 billion weights but it is also

[09:54] very difficult to store 15 GB right

[09:56] whatever free model that we specifically

[09:58] have we also need to consider the

[10:00] overhead due to Optimizer State gradient

[10:03] and forward activation okay so usually

[10:05] in in any llm models you'll be having

[10:08] gradients you'll be having forward

[10:10] activations you'll be having optimizers

[10:12] so there also you require some amount of

[10:13] memory fine tuning is not possible here

[10:16] right obviously this will not be

[10:18] possible because 7 billion weights you

[10:20] cannot store it in 15 GB that is the

[10:23] reason we require this parameter

[10:25] efficient fine-tuning technique now what

[10:28] does PFT basically do it is going to

[10:31] freeze most of the weights that is

[10:33] present in that llm model like Lama 2

[10:35] and only with some of the weights after

[10:38] applying quantization it is going to

[10:40] probably perform the fight fine tuning

[10:42] now parameter efficient fine tuning I

[10:44] will in the my next video I will talk

[10:46] about this research paper if you quickly

[10:47] want this video please make sure that

[10:49] you make the video likes 2,000 okay now

[10:51] what we are going to do over here we are

[10:53] going to use techniques like Laura and

[10:54] clora as I said Laura or clora Laura is

[10:57] nothing but low rank adaptation of large

[11:00] language model again I'm apologist guys

[11:01] if you don't know the mathematical

[11:02] Concepts I will explain in the upcoming

[11:04] video okay so first of all we will load

[11:07] a Lama 27b chart GPT model this chart HF

[11:11] model then train it on this 1K sample

[11:14] which will produce a fine tune model

[11:16] with which in the name of chat fine tune

[11:18] we'll try to create in this clora will

[11:21] use a rank of 64 with a scaling

[11:23] parameter of 16 we will load the Lama 2

[11:25] model directly in 4bit Precision we are

[11:27] trying to convert that 32 bit into 4 bit

[11:30] so that is how we are going to do the

[11:32] training and with respect to chora in

[11:34] order to find the low rank index we are

[11:37] going to use the rank of 64 right this

[11:40] is an hyper tuning parameter you can

[11:42] just consider right now this is a kind

[11:44] of hyper tuning parameter with a scaling

[11:47] parameter Alpha this is also called as

[11:49] Alpha it will be having a scaling

[11:50] parameter of 16 as I said everything

[11:53] will be explained detailly when I

[11:55] probably go with the mathematical

[11:57] equation but right now our main name is

[11:59] is to probably learn how to find T it

[12:02] now what model we are going to use we

[12:04] are going to use Lama 2 7bh uh 7B chat

[12:07] HF then the instruction data set to use

[12:10] is this particular data set we will be

[12:12] downloading it from the hugging face the

[12:13] model name also will be downloading it

[12:15] and after finetuning it this will be my

[12:17] new model name okay now these are some

[12:21] of the clor parameters that is required

[12:23] okay so one is laurore R 64 what is this

[12:26] R this R is a rank of 64 kind of

[12:30] hyperparameter Laura Alpha as I said

[12:32] Alpha right I told you Alpha why because

[12:35] I know the entire mathematics stuffs in

[12:37] this okay just to increase the Curiosity

[12:40] I'm coming up with this first video and

[12:42] later on I will come up with that then

[12:44] here also Dropout is basically required

[12:47] now in order to do the quantization we

[12:49] will be using bits and bytes parameter

[12:51] so here you can see activate 4bit

[12:53] precision based model so there is a

[12:55] parameter which is called as _ 4bit

[12:58] which is equal to true

[12:59] then compute data type for 4bit base

[13:01] model so here it is basically float 16

[13:04] then quantization we using fp4 on np4 so

[13:08] BNB 4bit Quant type you have to keep

[13:10] this particular value to np4 since it is

[13:12] 4bit activate Ned quation for 4bit based

[13:15] model so here we are keeping it as false

[13:17] Now understand Guys these are some of

[13:19] the basic parameters that we

[13:21] specifically use in Lura technique

[13:23] specifically in PFT then training

[13:25] argument parameters our output directory

[13:27] will be present in this results I'm

[13:29] going to run one Epoch then we are going

[13:31] to enable this fp6 and B bf16 training

[13:35] okay uh it is set to True with an a100

[13:40] right so a100 uh you can set it if

[13:42] you're using a100 you can set it to True

[13:44] right now I'm using T4 if you have the

[13:46] paid version of Google collab then you

[13:48] can set it to

[13:49] True bass size for uh Pur GPU for

[13:52] training I hope you know what is bass

[13:54] size then you have GPU for evaluation

[13:56] bass size then gradient accumulation

[13:58] step check points Max gr uh Max grad nor

[14:02] learning rate weight DK right Optimizer

[14:05] page adamw we will be using which is of

[14:07] a variety of Adam itself then learning

[14:09] sh learn uh LR sched type cosine because

[14:12] it works on similarity right whatever

[14:14] question and answers we specifically

[14:16] write then maximum steps is minus one

[14:18] number of training steps override number

[14:20] of training epochs and after this you

[14:23] are also putting logging steps is equal

[14:25] to 25 now with respect to any fine

[14:27] tuning technique you use something

[14:29] called as supervised tuning right in

[14:31] supervised tuning that is you require

[14:33] some parameters right max sequent length

[14:35] then packing then device map so this is

[14:37] load the entire model on the GPU zero

[14:39] right so this is what are the some of

[14:41] the parameters don't worry uh these are

[14:44] some of the parameters that you don't

[14:45] need to learn each and every parameter

[14:47] because already all these things are

[14:49] provided by the official page itself

[14:51] I've just copied and pasted it over here

[14:53] right so we will go ahead and execute it

[14:56] so let's go ahead and execute it so all

[14:57] these parameters are set now the step

[15:00] four right there are multiple four steps

[15:02] right uh one more step is there later on

[15:05] load everything and start the F tuning

[15:07] process right first of all we want to

[15:09] load the data set we defined here our

[15:11] data set is already pre-processed but

[15:13] usually this is where you should

[15:14] reformat The Prompt right filter out bad

[15:17] text combine multiple data some amount

[15:18] of pre-processing is required but

[15:20] already we have done that so we are not

[15:21] going to do it then we are Recon we are

[15:24] configuring bits and byes for four bit

[15:26] quantization as I said right from 16

[15:28] from 32 or 16 bit we are converting that

[15:30] into 4 bit so that it required less

[15:32] space with respect to GPU for the fine

[15:34] tuning purpose next we are loading the

[15:36] Llama 2 model in 4bit Precision GPU with

[15:39] the current corresponding tokenizer

[15:41] right with that tokenizer we'll try to

[15:43] load that and obviously we'll also be

[15:45] loading it with the 4bit Precision

[15:47] finally we are loading the configuration

[15:49] of clor so uh and passing everything to

[15:51] the sft trainer so here is what self

[15:54] fine tuning uh s uh this sft will

[15:57] basically happen right now let's go

[15:59] ahead and let's do this so first of all

[16:01] we are loading the data set we are

[16:02] loading the tokenizer model with clora

[16:05] configuration so here I have return this

[16:07] B&B compute D type and we are using

[16:10] torch so along with that you also

[16:12] require bits and bytes config again load

[16:14] we are enabling this 4 bit then all the

[16:16] necessary parameters like compute D type

[16:19] you'll be using H net nested Quant okay

[16:22] again I'm telling you guys there is

[16:24] nothing new to learn in this because all

[16:25] these formats will be available in the

[16:27] official documentation then we are going

[16:29] to check the GPU compatibility with

[16:31] float 16 if compute dipe is equal to

[16:33] torch. float 16 use 4bit otherwise this

[16:36] all things are there right then we are

[16:39] going to load the base model see

[16:41] whenever we want to load the base model

[16:43] from hugging face we can use this Auto

[16:44] model for casual LM right that is the

[16:46] reason we have imported on top Dot from

[16:49] pre-trained model name what is my model

[16:51] name I've given that quation config so

[16:54] here you'll be able to see in conation

[16:55] config we are also given something

[16:57] called as uh BNB config right so here

[17:01] you'll be able to see this is the

[17:02] compute

[17:03] type let me just search for it somewhere

[17:06] here only it will be available

[17:12] so so BNB

[17:14] config so here you can see this entire

[17:17] bytes config is basically there so uh

[17:20] based on that you'll be okay yeah

[17:22] computer app okay yeah perfect so B&B

[17:24] config is basically given over here then

[17:26] device map is nothing but with respect

[17:28] to the GPU mapping then model. config do

[17:31] use cache false you can also make it

[17:32] true if you want model. config

[17:35] pre-training _ TP is equal to one then

[17:37] we are loading the Lama tokenizer see

[17:39] for any LM model we also need a

[17:42] tokenizer so that it will be able to

[17:43] convert any llm model the input data

[17:46] that we are specifically using into word

[17:48] embeddings and all so that is the reason

[17:50] order tokenizer from pre-trained again

[17:52] model name we are going to use this

[17:53] trust remote code is one additional

[17:55] parameter that is used then we going to

[17:58] put a pad token with respect to the end

[18:00] of statement token right so do this eore

[18:03] token specifically applies the token for

[18:06] the Lama itself right and here we are

[18:08] giving the padding side as right fixed

[18:10] weird overflow issue with fp16 training

[18:13] all these parameters will be almost

[18:15] fixed guys only thing that you will

[18:16] probably be changing is with respect to

[18:18] the configuration then load Laura

[18:21] configuration here you'll be able to see

[18:22] PFT config Laura config all the values

[18:25] that you're putting with respect to this

[18:27] Lowa configs and here here you have your

[18:29] PFT

[18:31] configuration now this is the most

[18:33] important thing because in this training

[18:34] arguments we set all the parameters

[18:37] output directory number of epo this this

[18:39] this learning rate PP p uh FP 16 bs6 you

[18:44] can probably see over here and then

[18:46] finally we are reporting it to the

[18:47] tensal flow right tensor board then you

[18:50] can also see that supervised fine-tuning

[18:52] parameters right I'm giving my model

[18:54] name I'm giving my data set my PFT

[18:56] config my data set text field this PF

[18:59] config has a Lowa config right then you

[19:02] have a tokenizer you have the arguments

[19:04] you you have packing then you have

[19:06] finally trainer1 okay now this is what

[19:09] is the main thing and that is where your

[19:11] supervised fine tuning will happen step

[19:13] by step you have done it okay let me

[19:15] repeat it quickly we have loaded the

[19:16] data set we have set our D type right we

[19:20] are setting up all our contag process

[19:23] over here here we are checking whether

[19:25] GPU is compatible or not here we are

[19:27] loading our llm model that is Lama 2

[19:30] here we are specifically loading our

[19:32] tokenizer which is be used in Lama 2

[19:35] along with this we are putting padding

[19:36] techniques then my Laura configuration

[19:39] which will specifically be in terms of

[19:40] PETA PFT config and then all my training

[19:43] arguments will go inside this right um

[19:47] the this training arguments is with

[19:49] respect to where my output directory is

[19:51] and all learning rate and all okay

[19:54] finally set supervised tuning parameters

[19:57] here we have seted model data set PFT

[19:59] config text Max equal length tokenizer

[20:02] everything is put up over here and

[20:04] finally we go ahead and train this now

[20:06] once we train it it is going to run for

[20:09] 250 aox uh I think 250 step size I have

[20:12] actually given over here sorry 25 steps

[20:15] uh logging steps let's see what is the

[20:18] bass size bass size is

[20:20] four um yeah till that much it will

[20:23] probably go so let this start so it has

[20:26] already started I guess so here you can

[20:28] see it is downloading here you'll also

[20:31] be able to see the data set

[20:34] okay sample data right now you cannot

[20:36] see it because the data set will get

[20:38] loaded okay so table of contents

[20:41] installed all the required packages

[20:43] we'll reformat all the steps are given

[20:45] side by side you can also read it out I

[20:47] know this looks like a little bit tough

[20:49] guys but at the end of the day uh I'll

[20:51] not say that it is easy and just the

[20:54] reason why I'm sharing you this

[20:55] finetuning technique because you should

[20:57] just get in your mind

[20:59] later on you know this is the pattern

[21:01] that I'm following first execute this

[21:04] don't worry about anything as such just

[21:06] try to get an high level overview how

[21:08] things work later on I will try to break

[21:12] down each and everything in my next

[21:14] video by breaking this entire code why

[21:16] this specific parameters used because

[21:19] the main thing is to understand what is

[21:20] PFT what is quantisation what is

[21:23] precision and uh how how do you

[21:26] specifically use this PFT technique what

[21:28] is qora everything what is low order

[21:31] rank index uh how to basically calculate

[21:34] that everything I will talk about it

[21:36] okay so we'll wait for some time till

[21:39] then uh just let let us wait and uh we

[21:43] will I'll just uh come again I I think

[21:46] it'll take 15 to 20 minutes to complete

[21:48] this entire fine tuning with thousand

[21:49] records and then again I'll come back

[21:51] and we'll start doing and seeing whether

[21:53] we are able to get the good results or

[21:55] not so yes uh let's wait for some time

[21:57] thank you

[21:59] so guys uh finally you can see the 250

[22:02] EPO or 250 steps have completed it took

[22:04] 25 minutes and again this is in Google

[22:06] collab if you have paid version of

[22:09] Google collab it will probably take

[22:11] hardly 5 to 10 minutes to complete okay

[22:13] so over here you can see the global step

[22:16] was 250 training loss it went went till

[22:19] 1.36 metrics runtime everything met

[22:22] training samples per second all this

[22:24] information is basically done okay and

[22:26] please remember this particular word

[22:28] which which is called as floss okay

[22:29] total floss because I'm going to discuss

[22:32] about this in my next video also now

[22:35] once we do this we are going to save

[22:36] this trained model right and understand

[22:39] the new model name what it will be right

[22:41] so here you can probably see Lama 27b

[22:44] chat fine tune so this is my results

[22:47] with respect to run all the results

[22:49] you'll be able to see over here also

[22:51] okay so here uh in this fine tuning

[22:53] technique it is also creating some

[22:55] something called as adapter adapter

[22:57] model okay please remember these words

[22:59] because in the next theoretical

[23:00] intuition we are going to discuss each

[23:02] and everything as we go ahead okay so

[23:04] please make sure that you remember it so

[23:06] we are going to save this model so we

[23:08] have written trainer. model. save.

[23:10] pre-rain model right now you can also

[23:12] check out in the tensor board but I will

[23:14] just go ahead and show you quickly that

[23:16] how it is probably going to generate it

[23:18] right so here we have created a prompt

[23:20] which is called as what is large

[23:21] language model I've used pipeline right

[23:24] so this pipeline we have already

[23:25] imported it the task will be task

[23:27] generation whatever model we have

[23:29] actually created that model will be

[23:31] there tokenizer will be used over here

[23:33] and max length we can keep to 2 200 to

[23:35] 250 the result uh and always understand

[23:39] as I always suggested with respect to

[23:41] Lama 2 this will be my format there will

[23:43] be an S then there will be an

[23:45] instruction and here I will be having my

[23:47] prompt and with respect to this

[23:50] particular prompt we are going to get

[23:51] some kind of response so whatever

[23:53] response we are going to get inside this

[23:54] result variable it will be in the form

[23:56] of list and inside that there will will

[23:58] be one field which is called as

[23:59] generated text so if I go ahead and

[24:01] search what is large language model

[24:04] you'll be able to see that how we going

[24:06] to get the result okay because we are

[24:08] running the same model over here so here

[24:10] is my prompt here we are using pipeline

[24:12] pipeline basically helps you to combine

[24:14] multiple things like task model

[24:16] tokenizers you know multiple things it

[24:18] will be able to give you right now since

[24:20] this is already running in this

[24:22] particular collab uh and obviously

[24:25] you'll be able to see RAM and all are

[24:27] almost it is used the dis space of

[24:30] somewhere around 39 GB right so just

[24:32] wait for some time and here you will be

[24:34] able to get the response if you quickly

[24:36] want to get the response obviously you

[24:38] need to have a good GPU right based on

[24:41] that it'll be able to give you a quick

[24:42] result right so after that you'll be

[24:44] also able to see that we'll be able to

[24:46] delete all these vams and all okay so

[24:49] let's see and let's see whether we'll be

[24:51] able to get our result in the next step

[24:53] we can also push our model to the

[24:55] hugging face which I will keep it right

[24:57] now I will not explain it because this I

[25:00] will show you as an complete project as

[25:03] we go ahead so here you can see what is

[25:04] large language Model A large language

[25:06] model is a type of artificial

[25:07] intelligence large language model often

[25:09] seen then here you can also see all the

[25:11] information are there some example of

[25:13] large language models are uh include

[25:16] this okay now what we are going to do

[25:18] let's go ahead and take any one example

[25:20] over here from this particular data set

[25:22] okay so I will just write how to own a

[25:26] plane in United States okay okay so this

[25:29] will be

[25:31] my over here and I'll paste it over here

[25:35] let's see so this will also run and I

[25:38] will finally get my result also so same

[25:40] same question I've have taken right so

[25:42] from this 1K result so to a plane this

[25:44] is the answer that we will probably be

[25:47] getting let's see how much time it'll

[25:49] take to probably showcase but always

[25:52] remember please keep on looking at this

[25:54] particular Ram like how much uh time it

[25:56] is probably taking and how much space it

[25:59] is taking okay so so guys here you can

[26:02] probably see the response how to own a

[26:04] plane in united state in United State

[26:06] and owning a plane is this determine

[26:07] your budget so this is completely based

[26:09] on this information that is present over

[26:11] here but here I've written only 200 max

[26:14] length so I can only see 200 characters

[26:16] that is given right so you can probably

[26:18] try with each and everything as you go

[26:20] ahead now guys uh here also you'll be

[26:22] able to see the detailed explanation of

[26:24] each and every step but the most

[26:26] interesting video after seeing this will

[26:28] obviously be able to understand like

[26:30] what all each and everything does over

[26:32] here what this PFT does what is this

[26:34] bits and bites what is this Laura

[26:36] everything we will discuss in our next

[26:38] video so I hope you like this particular

[26:39] video this was it from my side I'll see

[26:41] you in the next video have a great day

[26:42] thank you and all take care bye-bye

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.