[0:00] hello all my name is krishak and welcome [0:02] to my YouTube channel so guys I'm happy [0:04] to announce that I will be soon creating [0:06] a series of videos of showing you that [0:09] how you can fine-tune various llm models [0:12] using custom data set in this video we [0:14] are going to see how we can fine-tune [0:16] Lama 2 model uh with the custom data set [0:18] by using techniques like parameter [0:20] efficient transfer learning and low rank [0:23] adaptation of large language models [0:25] which is also called as Laura so all [0:27] these techniques we'll specifically use [0:29] in this particular video I will show you [0:30] the Practical [0:32] implementation uh and in the upcoming [0:34] video because I was planning that how I [0:36] can efficiently teach you this entire [0:38] fine-tuning techniques because it is a [0:39] complex topic altoe so first of all in [0:42] this video we'll see the entire [0:43] implementation quickly there will be a [0:45] template of code which will try to learn [0:47] we'll take a data set if there is data [0:49] pre-processing that is required we will [0:51] do it if there is quantisation that is [0:52] required we will specifically do it okay [0:55] and then in the upcoming video I will [0:56] try to demonstrate the entire [0:59] theoretical intuition [1:00] about this parameter efficient transfer [1:02] learning and low rank adaptation what [1:04] exactly it is and there is also another [1:06] variant which is called as chora okay [1:08] and then we will try to relate this [1:10] entire theoretical intuition with the [1:12] Practical implementation it will be [1:14] amazing to understand because that is [1:16] how I have also learned and it was very [1:18] much helpful for me in order to [1:20] understand each and everything as you [1:22] all know guys there there are lot of [1:24] Open Source models that are going to [1:25] come up in the future also and good good [1:27] models like Lama 2 Mistral falcon there [1:30] are so many models as such it is better [1:32] that we should know how to fine-tune all [1:33] these models with our own custom data [1:35] set and that is what companies will be [1:37] requiring so let's go ahead and let's [1:39] see that how you can uh fine-tune your [1:41] llama 2 model uh with this techniques [1:44] again here we'll be using Transformers [1:46] uh from hugging face and there will be a [1:48] lot many different libraries that we'll [1:49] be using with respect to this at least [1:52] get the, ft overview about these topics [1:55] and in the next topic when I discussed [1:57] about the theoretical intuition your [1:59] knowledge will get more intact and [2:01] you'll be able to understand it so let's [2:03] go ahead and let's proceed towards the [2:04] Practical implementation hello all my [2:07] name is krishn and welcome to my YouTube [2:09] channel so guys in this particular video [2:11] we are going to see the stepbystep way [2:14] of probably fine tuning your llm models [2:19] in this case I'm going to specifically [2:20] take open- Source Lama 2 model and with [2:23] the help of a custom data set we are [2:26] going to fine-tune this specific model [2:28] right over here we are going to learn [2:30] about various techniques practically not [2:33] theoretically because if you really want [2:35] theoretically you can let me know in the [2:37] comment section so we will be discussing [2:39] about something called as parameter [2:42] efficient transfer learning for NLP [2:44] which is an amazing technique to [2:47] basically fine-tune all these llm models [2:50] which will definitely be of use size [2:52] like 70 billion parameters and all so [2:55] how this parameter efficient transfer [2:57] learning actually happens we'll try to [2:58] see in the code and and we are also [3:00] going to see a technique which is called [3:02] as Laura right so Laura paper if I go [3:05] ahead and search right it is basically [3:07] called as low rank adapt adaptation of [3:11] large language models right so these are [3:13] some of the mathematical concept don't [3:15] worry in the upcoming videos I will talk [3:18] about all every theoretical intuition [3:20] about PFT about Laura right now a simple [3:25] way of fine-tuning I'm just going to [3:26] show you because many people were [3:28] requesting for this right so initially [3:31] what we will do is that we will go ahead [3:33] and install some of the important [3:34] libraries like accelerate PFT as I said [3:38] PFT is nothing but parameter efficient [3:40] transfer learning inside this only [3:42] you'll find this Laura technique which [3:44] is called as low rank adaptation of [3:46] large language models uh then we have [3:49] bits and bytes bits and bytes are [3:50] specifically used for doing quantization [3:53] now what does quantisation basically [3:54] mean all these llm models you know when [3:57] they are trained with 70 billion [3:58] parameters or 13 billion parameters by [4:01] default the weights data types are in [4:03] the form of floating values right when [4:05] we say floating values that they are [4:07] basically 32 bid values what we can [4:09] actually do and obviously since I'm [4:11] actually going to do this in Google [4:12] collab we get a very less Ram so it is a [4:16] better way that you quantize those [4:18] weights you know from float 32 probably [4:20] convert that into int 8 and then [4:23] probably based on the Ram size you'll be [4:26] able to quickly fine tune it along with [4:28] that I will be also so we'll also be [4:30] using Transformers and then you have TRL [4:33] so all this libraries will go ahead and [4:35] execute it and once we specifically [4:38] execute it you'll be able to see that [4:39] all these libraries will get installed [4:42] now in the Second Step the major thing [4:46] is that we will specifically be using [4:48] the library called as Transformers which [4:50] is specifically used for this particular [4:52] purpose and internally we'll also be [4:54] using PFT which is having some Laura [4:57] configuration and we'll use this PF [4:59] model I know you'll not be able to [5:02] understand what exactly PFT is but I'll [5:04] just tell you in some time just let me [5:06] go ahead with but at the end of the day [5:08] PFT actually uh you know uses techniques [5:12] which will try to freeze you know when [5:14] it applies transfer learning on these [5:15] llm models it is freezing most of the [5:18] weights of that llm model and only some [5:21] of the weights will be retrained and [5:23] based on that they will be able to [5:25] provide you accurate results based on [5:27] your custom data set okay uh how it is [5:30] done don't worry I'll create a amazing [5:32] dedicated video to make you understand [5:35] this mathematical intuitions okay now [5:37] over here you'll be able to see that I'm [5:39] going to import OS import torch I'm [5:41] going to use a data set I will talk [5:43] about what data set we are going to [5:45] specifically do the fine tuning but here [5:47] we are specifically using open source [5:49] llm models and then from Transformer I'm [5:51] going to use Auto model for casual LM [5:53] Auto tokenizer bits and bytes I will [5:56] talk about all these libraries as we go [5:57] ahead so let me quickly go ahead and [6:00] execute it okay now till this is getting [6:03] executed this import statement is [6:04] getting executed let's talk about some [6:06] of the important properties over here [6:08] with respect to llama 2 in the case of [6:10] llama 2 the following prompt template is [6:12] used for chat model so this is the [6:14] specific prompt template uh here we be [6:19] give an instruction in this s symbol and [6:22] then we have our system prompt which [6:23] will be closed with the CIS brackets and [6:26] then you will also be able to give your [6:29] user prompt over here and the model [6:30] answer will be coming after this after [6:32] this entire instruction okay so this is [6:35] how the entire Lama 2 models llm models [6:39] specifically require the system prompt [6:42] and the user prompt and the model answer [6:43] format right now any data set that you [6:47] specifically get right we really need to [6:49] convert that data set into this format [6:52] okay and that is how I will show you how [6:54] to probably do this there's a technique [6:57] uh you can also write your own custom [6:58] code and all there are many ways okay [7:00] now what we'll do we will reformat our [7:03] instruction data set to follow Lama 2 [7:05] template so right now we are going to [7:06] use this data set which is basically [7:09] called as open open [7:12] Assistant Guan guanako I hope I'm [7:15] pronouncing it right now here you will [7:17] be able to see this is my data set right [7:19] human can you write a short introduction [7:21] about the relevance of term uh monopsony [7:24] in economics please use example related [7:26] to this and then Mon Mon monopsony ref [7:29] first to the market so here you can see [7:31] assistant answer so here the data set is [7:34] basically in the form of human and [7:35] assistant like human has a question over [7:38] there and assistant is probably [7:39] providing uh you a specific answer so in [7:42] this format you'll be able to find out [7:44] each and every rows each and every rows [7:47] in different different languages so we [7:49] are going to take this entire data set [7:52] and then considering this entire data [7:55] set what we are going to do we are going [7:56] to reform the data set following the [7:58] Lama 2 template and out of all these [8:01] samples all this data set there are [8:02] around how many data sets are there I [8:05] guess there are around 10 10K records we [8:08] just going to take thousand uh th000 [8:10] Records or 1K records the reason is that [8:12] I really need to show you how the [8:13] fineing is basically done so if I go [8:16] ahead and click on this and if you see [8:18] this format right this format you'll be [8:22] able to see that this entire data set is [8:24] converted in this format only right [8:26] instruction is basically there the [8:28] answer is over here and this entire s is [8:30] getting closed right so all the data set [8:32] is basically converted into that [8:34] specific format now how do you convert [8:37] it right so for that already what we [8:40] have basically done is that over here to [8:42] know how this data set was created you [8:43] can check this notebook so this notebook [8:45] is there already you can see that we are [8:47] loading the data set we are applying [8:49] this we are taking the Thousand records [8:51] and then we are transforming right so in [8:53] transforming basically a simple python [8:55] code like I have to probably keep in [8:57] that specific format right so that is [8:59] the reason I'm showing you this specific [9:00] code over here just by one click you [9:02] will be able to do that okay so all the [9:04] links are actually given now you need to [9:07] follow Now understand guys see [9:10] understanding how the specific [9:12] techniques are definitely I'll create a [9:14] dedicated theoretical video [9:16] understanding all the maths equations [9:17] that is required right over here we are [9:19] trying to see that how you can also run [9:21] your own fine tun model right so note [9:24] you don't need to follow a specific prom [9:25] template if you're using the base Lama 2 [9:27] model but right now we'll not use we'll [9:29] use will not use this base Lama 2 model [9:31] okay how to F tune Lama 2 so these are [9:33] some of the steps not only with Lama 2 [9:35] with other models also this will work [9:37] but again there the format may change [9:40] you know the the format of the [9:41] instruction the format of your prompts [9:43] may change so free Google collabs offers [9:46] a 15gb graphic card right so limited [9:49] resources barely enough to store Lama to [9:51] 7 billion weights now here we are going [9:52] to use 7 billion weights but it is also [9:54] very difficult to store 15 GB right [9:56] whatever free model that we specifically [9:58] have we also need to consider the [10:00] overhead due to Optimizer State gradient [10:03] and forward activation okay so usually [10:05] in in any llm models you'll be having [10:08] gradients you'll be having forward [10:10] activations you'll be having optimizers [10:12] so there also you require some amount of [10:13] memory fine tuning is not possible here [10:16] right obviously this will not be [10:18] possible because 7 billion weights you [10:20] cannot store it in 15 GB that is the [10:23] reason we require this parameter [10:25] efficient fine-tuning technique now what [10:28] does PFT basically do it is going to [10:31] freeze most of the weights that is [10:33] present in that llm model like Lama 2 [10:35] and only with some of the weights after [10:38] applying quantization it is going to [10:40] probably perform the fight fine tuning [10:42] now parameter efficient fine tuning I [10:44] will in the my next video I will talk [10:46] about this research paper if you quickly [10:47] want this video please make sure that [10:49] you make the video likes 2,000 okay now [10:51] what we are going to do over here we are [10:53] going to use techniques like Laura and [10:54] clora as I said Laura or clora Laura is [10:57] nothing but low rank adaptation of large [11:00] language model again I'm apologist guys [11:01] if you don't know the mathematical [11:02] Concepts I will explain in the upcoming [11:04] video okay so first of all we will load [11:07] a Lama 27b chart GPT model this chart HF [11:11] model then train it on this 1K sample [11:14] which will produce a fine tune model [11:16] with which in the name of chat fine tune [11:18] we'll try to create in this clora will [11:21] use a rank of 64 with a scaling [11:23] parameter of 16 we will load the Lama 2 [11:25] model directly in 4bit Precision we are [11:27] trying to convert that 32 bit into 4 bit [11:30] so that is how we are going to do the [11:32] training and with respect to chora in [11:34] order to find the low rank index we are [11:37] going to use the rank of 64 right this [11:40] is an hyper tuning parameter you can [11:42] just consider right now this is a kind [11:44] of hyper tuning parameter with a scaling [11:47] parameter Alpha this is also called as [11:49] Alpha it will be having a scaling [11:50] parameter of 16 as I said everything [11:53] will be explained detailly when I [11:55] probably go with the mathematical [11:57] equation but right now our main name is [11:59] is to probably learn how to find T it [12:02] now what model we are going to use we [12:04] are going to use Lama 2 7bh uh 7B chat [12:07] HF then the instruction data set to use [12:10] is this particular data set we will be [12:12] downloading it from the hugging face the [12:13] model name also will be downloading it [12:15] and after finetuning it this will be my [12:17] new model name okay now these are some [12:21] of the clor parameters that is required [12:23] okay so one is laurore R 64 what is this [12:26] R this R is a rank of 64 kind of [12:30] hyperparameter Laura Alpha as I said [12:32] Alpha right I told you Alpha why because [12:35] I know the entire mathematics stuffs in [12:37] this okay just to increase the Curiosity [12:40] I'm coming up with this first video and [12:42] later on I will come up with that then [12:44] here also Dropout is basically required [12:47] now in order to do the quantization we [12:49] will be using bits and bytes parameter [12:51] so here you can see activate 4bit [12:53] precision based model so there is a [12:55] parameter which is called as _ 4bit [12:58] which is equal to true [12:59] then compute data type for 4bit base [13:01] model so here it is basically float 16 [13:04] then quantization we using fp4 on np4 so [13:08] BNB 4bit Quant type you have to keep [13:10] this particular value to np4 since it is [13:12] 4bit activate Ned quation for 4bit based [13:15] model so here we are keeping it as false [13:17] Now understand Guys these are some of [13:19] the basic parameters that we [13:21] specifically use in Lura technique [13:23] specifically in PFT then training [13:25] argument parameters our output directory [13:27] will be present in this results I'm [13:29] going to run one Epoch then we are going [13:31] to enable this fp6 and B bf16 training [13:35] okay uh it is set to True with an a100 [13:40] right so a100 uh you can set it if [13:42] you're using a100 you can set it to True [13:44] right now I'm using T4 if you have the [13:46] paid version of Google collab then you [13:48] can set it to [13:49] True bass size for uh Pur GPU for [13:52] training I hope you know what is bass [13:54] size then you have GPU for evaluation [13:56] bass size then gradient accumulation [13:58] step check points Max gr uh Max grad nor [14:02] learning rate weight DK right Optimizer [14:05] page adamw we will be using which is of [14:07] a variety of Adam itself then learning [14:09] sh learn uh LR sched type cosine because [14:12] it works on similarity right whatever [14:14] question and answers we specifically [14:16] write then maximum steps is minus one [14:18] number of training steps override number [14:20] of training epochs and after this you [14:23] are also putting logging steps is equal [14:25] to 25 now with respect to any fine [14:27] tuning technique you use something [14:29] called as supervised tuning right in [14:31] supervised tuning that is you require [14:33] some parameters right max sequent length [14:35] then packing then device map so this is [14:37] load the entire model on the GPU zero [14:39] right so this is what are the some of [14:41] the parameters don't worry uh these are [14:44] some of the parameters that you don't [14:45] need to learn each and every parameter [14:47] because already all these things are [14:49] provided by the official page itself [14:51] I've just copied and pasted it over here [14:53] right so we will go ahead and execute it [14:56] so let's go ahead and execute it so all [14:57] these parameters are set now the step [15:00] four right there are multiple four steps [15:02] right uh one more step is there later on [15:05] load everything and start the F tuning [15:07] process right first of all we want to [15:09] load the data set we defined here our [15:11] data set is already pre-processed but [15:13] usually this is where you should [15:14] reformat The Prompt right filter out bad [15:17] text combine multiple data some amount [15:18] of pre-processing is required but [15:20] already we have done that so we are not [15:21] going to do it then we are Recon we are [15:24] configuring bits and byes for four bit [15:26] quantization as I said right from 16 [15:28] from 32 or 16 bit we are converting that [15:30] into 4 bit so that it required less [15:32] space with respect to GPU for the fine [15:34] tuning purpose next we are loading the [15:36] Llama 2 model in 4bit Precision GPU with [15:39] the current corresponding tokenizer [15:41] right with that tokenizer we'll try to [15:43] load that and obviously we'll also be [15:45] loading it with the 4bit Precision [15:47] finally we are loading the configuration [15:49] of clor so uh and passing everything to [15:51] the sft trainer so here is what self [15:54] fine tuning uh s uh this sft will [15:57] basically happen right now let's go [15:59] ahead and let's do this so first of all [16:01] we are loading the data set we are [16:02] loading the tokenizer model with clora [16:05] configuration so here I have return this [16:07] B&B compute D type and we are using [16:10] torch so along with that you also [16:12] require bits and bytes config again load [16:14] we are enabling this 4 bit then all the [16:16] necessary parameters like compute D type [16:19] you'll be using H net nested Quant okay [16:22] again I'm telling you guys there is [16:24] nothing new to learn in this because all [16:25] these formats will be available in the [16:27] official documentation then we are going [16:29] to check the GPU compatibility with [16:31] float 16 if compute dipe is equal to [16:33] torch. float 16 use 4bit otherwise this [16:36] all things are there right then we are [16:39] going to load the base model see [16:41] whenever we want to load the base model [16:43] from hugging face we can use this Auto [16:44] model for casual LM right that is the [16:46] reason we have imported on top Dot from [16:49] pre-trained model name what is my model [16:51] name I've given that quation config so [16:54] here you'll be able to see in conation [16:55] config we are also given something [16:57] called as uh BNB config right so here [17:01] you'll be able to see this is the [17:02] compute [17:03] type let me just search for it somewhere [17:06] here only it will be available [17:12] so so BNB [17:14] config so here you can see this entire [17:17] bytes config is basically there so uh [17:20] based on that you'll be okay yeah [17:22] computer app okay yeah perfect so B&B [17:24] config is basically given over here then [17:26] device map is nothing but with respect [17:28] to the GPU mapping then model. config do [17:31] use cache false you can also make it [17:32] true if you want model. config [17:35] pre-training _ TP is equal to one then [17:37] we are loading the Lama tokenizer see [17:39] for any LM model we also need a [17:42] tokenizer so that it will be able to [17:43] convert any llm model the input data [17:46] that we are specifically using into word [17:48] embeddings and all so that is the reason [17:50] order tokenizer from pre-trained again [17:52] model name we are going to use this [17:53] trust remote code is one additional [17:55] parameter that is used then we going to [17:58] put a pad token with respect to the end [18:00] of statement token right so do this eore [18:03] token specifically applies the token for [18:06] the Lama itself right and here we are [18:08] giving the padding side as right fixed [18:10] weird overflow issue with fp16 training [18:13] all these parameters will be almost [18:15] fixed guys only thing that you will [18:16] probably be changing is with respect to [18:18] the configuration then load Laura [18:21] configuration here you'll be able to see [18:22] PFT config Laura config all the values [18:25] that you're putting with respect to this [18:27] Lowa configs and here here you have your [18:29] PFT [18:31] configuration now this is the most [18:33] important thing because in this training [18:34] arguments we set all the parameters [18:37] output directory number of epo this this [18:39] this learning rate PP p uh FP 16 bs6 you [18:44] can probably see over here and then [18:46] finally we are reporting it to the [18:47] tensal flow right tensor board then you [18:50] can also see that supervised fine-tuning [18:52] parameters right I'm giving my model [18:54] name I'm giving my data set my PFT [18:56] config my data set text field this PF [18:59] config has a Lowa config right then you [19:02] have a tokenizer you have the arguments [19:04] you you have packing then you have [19:06] finally trainer1 okay now this is what [19:09] is the main thing and that is where your [19:11] supervised fine tuning will happen step [19:13] by step you have done it okay let me [19:15] repeat it quickly we have loaded the [19:16] data set we have set our D type right we [19:20] are setting up all our contag process [19:23] over here here we are checking whether [19:25] GPU is compatible or not here we are [19:27] loading our llm model that is Lama 2 [19:30] here we are specifically loading our [19:32] tokenizer which is be used in Lama 2 [19:35] along with this we are putting padding [19:36] techniques then my Laura configuration [19:39] which will specifically be in terms of [19:40] PETA PFT config and then all my training [19:43] arguments will go inside this right um [19:47] the this training arguments is with [19:49] respect to where my output directory is [19:51] and all learning rate and all okay [19:54] finally set supervised tuning parameters [19:57] here we have seted model data set PFT [19:59] config text Max equal length tokenizer [20:02] everything is put up over here and [20:04] finally we go ahead and train this now [20:06] once we train it it is going to run for [20:09] 250 aox uh I think 250 step size I have [20:12] actually given over here sorry 25 steps [20:15] uh logging steps let's see what is the [20:18] bass size bass size is [20:20] four um yeah till that much it will [20:23] probably go so let this start so it has [20:26] already started I guess so here you can [20:28] see it is downloading here you'll also [20:31] be able to see the data set [20:34] okay sample data right now you cannot [20:36] see it because the data set will get [20:38] loaded okay so table of contents [20:41] installed all the required packages [20:43] we'll reformat all the steps are given [20:45] side by side you can also read it out I [20:47] know this looks like a little bit tough [20:49] guys but at the end of the day uh I'll [20:51] not say that it is easy and just the [20:54] reason why I'm sharing you this [20:55] finetuning technique because you should [20:57] just get in your mind [20:59] later on you know this is the pattern [21:01] that I'm following first execute this [21:04] don't worry about anything as such just [21:06] try to get an high level overview how [21:08] things work later on I will try to break [21:12] down each and everything in my next [21:14] video by breaking this entire code why [21:16] this specific parameters used because [21:19] the main thing is to understand what is [21:20] PFT what is quantisation what is [21:23] precision and uh how how do you [21:26] specifically use this PFT technique what [21:28] is qora everything what is low order [21:31] rank index uh how to basically calculate [21:34] that everything I will talk about it [21:36] okay so we'll wait for some time till [21:39] then uh just let let us wait and uh we [21:43] will I'll just uh come again I I think [21:46] it'll take 15 to 20 minutes to complete [21:48] this entire fine tuning with thousand [21:49] records and then again I'll come back [21:51] and we'll start doing and seeing whether [21:53] we are able to get the good results or [21:55] not so yes uh let's wait for some time [21:57] thank you [21:59] so guys uh finally you can see the 250 [22:02] EPO or 250 steps have completed it took [22:04] 25 minutes and again this is in Google [22:06] collab if you have paid version of [22:09] Google collab it will probably take [22:11] hardly 5 to 10 minutes to complete okay [22:13] so over here you can see the global step [22:16] was 250 training loss it went went till [22:19] 1.36 metrics runtime everything met [22:22] training samples per second all this [22:24] information is basically done okay and [22:26] please remember this particular word [22:28] which which is called as floss okay [22:29] total floss because I'm going to discuss [22:32] about this in my next video also now [22:35] once we do this we are going to save [22:36] this trained model right and understand [22:39] the new model name what it will be right [22:41] so here you can probably see Lama 27b [22:44] chat fine tune so this is my results [22:47] with respect to run all the results [22:49] you'll be able to see over here also [22:51] okay so here uh in this fine tuning [22:53] technique it is also creating some [22:55] something called as adapter adapter [22:57] model okay please remember these words [22:59] because in the next theoretical [23:00] intuition we are going to discuss each [23:02] and everything as we go ahead okay so [23:04] please make sure that you remember it so [23:06] we are going to save this model so we [23:08] have written trainer. model. save. [23:10] pre-rain model right now you can also [23:12] check out in the tensor board but I will [23:14] just go ahead and show you quickly that [23:16] how it is probably going to generate it [23:18] right so here we have created a prompt [23:20] which is called as what is large [23:21] language model I've used pipeline right [23:24] so this pipeline we have already [23:25] imported it the task will be task [23:27] generation whatever model we have [23:29] actually created that model will be [23:31] there tokenizer will be used over here [23:33] and max length we can keep to 2 200 to [23:35] 250 the result uh and always understand [23:39] as I always suggested with respect to [23:41] Lama 2 this will be my format there will [23:43] be an S then there will be an [23:45] instruction and here I will be having my [23:47] prompt and with respect to this [23:50] particular prompt we are going to get [23:51] some kind of response so whatever [23:53] response we are going to get inside this [23:54] result variable it will be in the form [23:56] of list and inside that there will will [23:58] be one field which is called as [23:59] generated text so if I go ahead and [24:01] search what is large language model [24:04] you'll be able to see that how we going [24:06] to get the result okay because we are [24:08] running the same model over here so here [24:10] is my prompt here we are using pipeline [24:12] pipeline basically helps you to combine [24:14] multiple things like task model [24:16] tokenizers you know multiple things it [24:18] will be able to give you right now since [24:20] this is already running in this [24:22] particular collab uh and obviously [24:25] you'll be able to see RAM and all are [24:27] almost it is used the dis space of [24:30] somewhere around 39 GB right so just [24:32] wait for some time and here you will be [24:34] able to get the response if you quickly [24:36] want to get the response obviously you [24:38] need to have a good GPU right based on [24:41] that it'll be able to give you a quick [24:42] result right so after that you'll be [24:44] also able to see that we'll be able to [24:46] delete all these vams and all okay so [24:49] let's see and let's see whether we'll be [24:51] able to get our result in the next step [24:53] we can also push our model to the [24:55] hugging face which I will keep it right [24:57] now I will not explain it because this I [25:00] will show you as an complete project as [25:03] we go ahead so here you can see what is [25:04] large language Model A large language [25:06] model is a type of artificial [25:07] intelligence large language model often [25:09] seen then here you can also see all the [25:11] information are there some example of [25:13] large language models are uh include [25:16] this okay now what we are going to do [25:18] let's go ahead and take any one example [25:20] over here from this particular data set [25:22] okay so I will just write how to own a [25:26] plane in United States okay okay so this [25:29] will be [25:31] my over here and I'll paste it over here [25:35] let's see so this will also run and I [25:38] will finally get my result also so same [25:40] same question I've have taken right so [25:42] from this 1K result so to a plane this [25:44] is the answer that we will probably be [25:47] getting let's see how much time it'll [25:49] take to probably showcase but always [25:52] remember please keep on looking at this [25:54] particular Ram like how much uh time it [25:56] is probably taking and how much space it [25:59] is taking okay so so guys here you can [26:02] probably see the response how to own a [26:04] plane in united state in United State [26:06] and owning a plane is this determine [26:07] your budget so this is completely based [26:09] on this information that is present over [26:11] here but here I've written only 200 max [26:14] length so I can only see 200 characters [26:16] that is given right so you can probably [26:18] try with each and everything as you go [26:20] ahead now guys uh here also you'll be [26:22] able to see the detailed explanation of [26:24] each and every step but the most [26:26] interesting video after seeing this will [26:28] obviously be able to understand like [26:30] what all each and everything does over [26:32] here what this PFT does what is this [26:34] bits and bites what is this Laura [26:36] everything we will discuss in our next [26:38] video so I hope you like this particular [26:39] video this was it from my side I'll see [26:41] you in the next video have a great day [26:42] thank you and all take care bye-bye