[0:00] hello all my name is krishak and welcome
[0:02] to my YouTube channel so guys I'm happy
[0:04] to announce that I will be soon creating
[0:06] a series of videos of showing you that
[0:09] how you can fine-tune various llm models
[0:12] using custom data set in this video we
[0:14] are going to see how we can fine-tune
[0:16] Lama 2 model uh with the custom data set
[0:18] by using techniques like parameter
[0:20] efficient transfer learning and low rank
[0:23] adaptation of large language models
[0:25] which is also called as Laura so all
[0:27] these techniques we'll specifically use
[0:29] in this particular video I will show you
[0:30] the Practical
[0:32] implementation uh and in the upcoming
[0:34] video because I was planning that how I
[0:36] can efficiently teach you this entire
[0:38] fine-tuning techniques because it is a
[0:39] complex topic altoe so first of all in
[0:42] this video we'll see the entire
[0:43] implementation quickly there will be a
[0:45] template of code which will try to learn
[0:47] we'll take a data set if there is data
[0:49] pre-processing that is required we will
[0:51] do it if there is quantisation that is
[0:52] required we will specifically do it okay
[0:55] and then in the upcoming video I will
[0:56] try to demonstrate the entire
[0:59] theoretical intuition
[1:00] about this parameter efficient transfer
[1:02] learning and low rank adaptation what
[1:04] exactly it is and there is also another
[1:06] variant which is called as chora okay
[1:08] and then we will try to relate this
[1:10] entire theoretical intuition with the
[1:12] Practical implementation it will be
[1:14] amazing to understand because that is
[1:16] how I have also learned and it was very
[1:18] much helpful for me in order to
[1:20] understand each and everything as you
[1:22] all know guys there there are lot of
[1:24] Open Source models that are going to
[1:25] come up in the future also and good good
[1:27] models like Lama 2 Mistral falcon there
[1:30] are so many models as such it is better
[1:32] that we should know how to fine-tune all
[1:33] these models with our own custom data
[1:35] set and that is what companies will be
[1:37] requiring so let's go ahead and let's
[1:39] see that how you can uh fine-tune your
[1:41] llama 2 model uh with this techniques
[1:44] again here we'll be using Transformers
[1:46] uh from hugging face and there will be a
[1:48] lot many different libraries that we'll
[1:49] be using with respect to this at least
[1:52] get the, ft overview about these topics
[1:55] and in the next topic when I discussed
[1:57] about the theoretical intuition your
[1:59] knowledge will get more intact and
[2:01] you'll be able to understand it so let's
[2:03] go ahead and let's proceed towards the
[2:04] Practical implementation hello all my
[2:07] name is krishn and welcome to my YouTube
[2:09] channel so guys in this particular video
[2:11] we are going to see the stepbystep way
[2:14] of probably fine tuning your llm models
[2:19] in this case I'm going to specifically
[2:20] take open- Source Lama 2 model and with
[2:23] the help of a custom data set we are
[2:26] going to fine-tune this specific model
[2:28] right over here we are going to learn
[2:30] about various techniques practically not
[2:33] theoretically because if you really want
[2:35] theoretically you can let me know in the
[2:37] comment section so we will be discussing
[2:39] about something called as parameter
[2:42] efficient transfer learning for NLP
[2:44] which is an amazing technique to
[2:47] basically fine-tune all these llm models
[2:50] which will definitely be of use size
[2:52] like 70 billion parameters and all so
[2:55] how this parameter efficient transfer
[2:57] learning actually happens we'll try to
[2:58] see in the code and and we are also
[3:00] going to see a technique which is called
[3:02] as Laura right so Laura paper if I go
[3:05] ahead and search right it is basically
[3:07] called as low rank adapt adaptation of
[3:11] large language models right so these are
[3:13] some of the mathematical concept don't
[3:15] worry in the upcoming videos I will talk
[3:18] about all every theoretical intuition
[3:20] about PFT about Laura right now a simple
[3:25] way of fine-tuning I'm just going to
[3:26] show you because many people were
[3:28] requesting for this right so initially
[3:31] what we will do is that we will go ahead
[3:33] and install some of the important
[3:34] libraries like accelerate PFT as I said
[3:38] PFT is nothing but parameter efficient
[3:40] transfer learning inside this only
[3:42] you'll find this Laura technique which
[3:44] is called as low rank adaptation of
[3:46] large language models uh then we have
[3:49] bits and bytes bits and bytes are
[3:50] specifically used for doing quantization
[3:53] now what does quantisation basically
[3:54] mean all these llm models you know when
[3:57] they are trained with 70 billion
[3:58] parameters or 13 billion parameters by
[4:01] default the weights data types are in
[4:03] the form of floating values right when
[4:05] we say floating values that they are
[4:07] basically 32 bid values what we can
[4:09] actually do and obviously since I'm
[4:11] actually going to do this in Google
[4:12] collab we get a very less Ram so it is a
[4:16] better way that you quantize those
[4:18] weights you know from float 32 probably
[4:20] convert that into int 8 and then
[4:23] probably based on the Ram size you'll be
[4:26] able to quickly fine tune it along with
[4:28] that I will be also so we'll also be
[4:30] using Transformers and then you have TRL
[4:33] so all this libraries will go ahead and
[4:35] execute it and once we specifically
[4:38] execute it you'll be able to see that
[4:39] all these libraries will get installed
[4:42] now in the Second Step the major thing
[4:46] is that we will specifically be using
[4:48] the library called as Transformers which
[4:50] is specifically used for this particular
[4:52] purpose and internally we'll also be
[4:54] using PFT which is having some Laura
[4:57] configuration and we'll use this PF
[4:59] model I know you'll not be able to
[5:02] understand what exactly PFT is but I'll
[5:04] just tell you in some time just let me
[5:06] go ahead with but at the end of the day
[5:08] PFT actually uh you know uses techniques
[5:12] which will try to freeze you know when
[5:14] it applies transfer learning on these
[5:15] llm models it is freezing most of the
[5:18] weights of that llm model and only some
[5:21] of the weights will be retrained and
[5:23] based on that they will be able to
[5:25] provide you accurate results based on
[5:27] your custom data set okay uh how it is
[5:30] done don't worry I'll create a amazing
[5:32] dedicated video to make you understand
[5:35] this mathematical intuitions okay now
[5:37] over here you'll be able to see that I'm
[5:39] going to import OS import torch I'm
[5:41] going to use a data set I will talk
[5:43] about what data set we are going to
[5:45] specifically do the fine tuning but here
[5:47] we are specifically using open source
[5:49] llm models and then from Transformer I'm
[5:51] going to use Auto model for casual LM
[5:53] Auto tokenizer bits and bytes I will
[5:56] talk about all these libraries as we go
[5:57] ahead so let me quickly go ahead and
[6:00] execute it okay now till this is getting
[6:03] executed this import statement is
[6:04] getting executed let's talk about some
[6:06] of the important properties over here
[6:08] with respect to llama 2 in the case of
[6:10] llama 2 the following prompt template is
[6:12] used for chat model so this is the
[6:14] specific prompt template uh here we be
[6:19] give an instruction in this s symbol and
[6:22] then we have our system prompt which
[6:23] will be closed with the CIS brackets and
[6:26] then you will also be able to give your
[6:29] user prompt over here and the model
[6:30] answer will be coming after this after
[6:32] this entire instruction okay so this is
[6:35] how the entire Lama 2 models llm models
[6:39] specifically require the system prompt
[6:42] and the user prompt and the model answer
[6:43] format right now any data set that you
[6:47] specifically get right we really need to
[6:49] convert that data set into this format
[6:52] okay and that is how I will show you how
[6:54] to probably do this there's a technique
[6:57] uh you can also write your own custom
[6:58] code and all there are many ways okay
[7:00] now what we'll do we will reformat our
[7:03] instruction data set to follow Lama 2
[7:05] template so right now we are going to
[7:06] use this data set which is basically
[7:09] called as open open
[7:12] Assistant Guan guanako I hope I'm
[7:15] pronouncing it right now here you will
[7:17] be able to see this is my data set right
[7:19] human can you write a short introduction
[7:21] about the relevance of term uh monopsony
[7:24] in economics please use example related
[7:26] to this and then Mon Mon monopsony ref
[7:29] first to the market so here you can see
[7:31] assistant answer so here the data set is
[7:34] basically in the form of human and
[7:35] assistant like human has a question over
[7:38] there and assistant is probably
[7:39] providing uh you a specific answer so in
[7:42] this format you'll be able to find out
[7:44] each and every rows each and every rows
[7:47] in different different languages so we
[7:49] are going to take this entire data set
[7:52] and then considering this entire data
[7:55] set what we are going to do we are going
[7:56] to reform the data set following the
[7:58] Lama 2 template and out of all these
[8:01] samples all this data set there are
[8:02] around how many data sets are there I
[8:05] guess there are around 10 10K records we
[8:08] just going to take thousand uh th000
[8:10] Records or 1K records the reason is that
[8:12] I really need to show you how the
[8:13] fineing is basically done so if I go
[8:16] ahead and click on this and if you see
[8:18] this format right this format you'll be
[8:22] able to see that this entire data set is
[8:24] converted in this format only right
[8:26] instruction is basically there the
[8:28] answer is over here and this entire s is
[8:30] getting closed right so all the data set
[8:32] is basically converted into that
[8:34] specific format now how do you convert
[8:37] it right so for that already what we
[8:40] have basically done is that over here to
[8:42] know how this data set was created you
[8:43] can check this notebook so this notebook
[8:45] is there already you can see that we are
[8:47] loading the data set we are applying
[8:49] this we are taking the Thousand records
[8:51] and then we are transforming right so in
[8:53] transforming basically a simple python
[8:55] code like I have to probably keep in
[8:57] that specific format right so that is
[8:59] the reason I'm showing you this specific
[9:00] code over here just by one click you
[9:02] will be able to do that okay so all the
[9:04] links are actually given now you need to
[9:07] follow Now understand guys see
[9:10] understanding how the specific
[9:12] techniques are definitely I'll create a
[9:14] dedicated theoretical video
[9:16] understanding all the maths equations
[9:17] that is required right over here we are
[9:19] trying to see that how you can also run
[9:21] your own fine tun model right so note
[9:24] you don't need to follow a specific prom
[9:25] template if you're using the base Lama 2
[9:27] model but right now we'll not use we'll
[9:29] use will not use this base Lama 2 model
[9:31] okay how to F tune Lama 2 so these are
[9:33] some of the steps not only with Lama 2
[9:35] with other models also this will work
[9:37] but again there the format may change
[9:40] you know the the format of the
[9:41] instruction the format of your prompts
[9:43] may change so free Google collabs offers
[9:46] a 15gb graphic card right so limited
[9:49] resources barely enough to store Lama to
[9:51] 7 billion weights now here we are going
[9:52] to use 7 billion weights but it is also
[9:54] very difficult to store 15 GB right
[9:56] whatever free model that we specifically
[9:58] have we also need to consider the
[10:00] overhead due to Optimizer State gradient
[10:03] and forward activation okay so usually
[10:05] in in any llm models you'll be having
[10:08] gradients you'll be having forward
[10:10] activations you'll be having optimizers
[10:12] so there also you require some amount of
[10:13] memory fine tuning is not possible here
[10:16] right obviously this will not be
[10:18] possible because 7 billion weights you
[10:20] cannot store it in 15 GB that is the
[10:23] reason we require this parameter
[10:25] efficient fine-tuning technique now what
[10:28] does PFT basically do it is going to
[10:31] freeze most of the weights that is
[10:33] present in that llm model like Lama 2
[10:35] and only with some of the weights after
[10:38] applying quantization it is going to
[10:40] probably perform the fight fine tuning
[10:42] now parameter efficient fine tuning I
[10:44] will in the my next video I will talk
[10:46] about this research paper if you quickly
[10:47] want this video please make sure that
[10:49] you make the video likes 2,000 okay now
[10:51] what we are going to do over here we are
[10:53] going to use techniques like Laura and
[10:54] clora as I said Laura or clora Laura is
[10:57] nothing but low rank adaptation of large
[11:00] language model again I'm apologist guys
[11:01] if you don't know the mathematical
[11:02] Concepts I will explain in the upcoming
[11:04] video okay so first of all we will load
[11:07] a Lama 27b chart GPT model this chart HF
[11:11] model then train it on this 1K sample
[11:14] which will produce a fine tune model
[11:16] with which in the name of chat fine tune
[11:18] we'll try to create in this clora will
[11:21] use a rank of 64 with a scaling
[11:23] parameter of 16 we will load the Lama 2
[11:25] model directly in 4bit Precision we are
[11:27] trying to convert that 32 bit into 4 bit
[11:30] so that is how we are going to do the
[11:32] training and with respect to chora in
[11:34] order to find the low rank index we are
[11:37] going to use the rank of 64 right this
[11:40] is an hyper tuning parameter you can
[11:42] just consider right now this is a kind
[11:44] of hyper tuning parameter with a scaling
[11:47] parameter Alpha this is also called as
[11:49] Alpha it will be having a scaling
[11:50] parameter of 16 as I said everything
[11:53] will be explained detailly when I
[11:55] probably go with the mathematical
[11:57] equation but right now our main name is
[11:59] is to probably learn how to find T it
[12:02] now what model we are going to use we
[12:04] are going to use Lama 2 7bh uh 7B chat
[12:07] HF then the instruction data set to use
[12:10] is this particular data set we will be
[12:12] downloading it from the hugging face the
[12:13] model name also will be downloading it
[12:15] and after finetuning it this will be my
[12:17] new model name okay now these are some
[12:21] of the clor parameters that is required
[12:23] okay so one is laurore R 64 what is this
[12:26] R this R is a rank of 64 kind of
[12:30] hyperparameter Laura Alpha as I said
[12:32] Alpha right I told you Alpha why because
[12:35] I know the entire mathematics stuffs in
[12:37] this okay just to increase the Curiosity
[12:40] I'm coming up with this first video and
[12:42] later on I will come up with that then
[12:44] here also Dropout is basically required
[12:47] now in order to do the quantization we
[12:49] will be using bits and bytes parameter
[12:51] so here you can see activate 4bit
[12:53] precision based model so there is a
[12:55] parameter which is called as _ 4bit
[12:58] which is equal to true
[12:59] then compute data type for 4bit base
[13:01] model so here it is basically float 16
[13:04] then quantization we using fp4 on np4 so
[13:08] BNB 4bit Quant type you have to keep
[13:10] this particular value to np4 since it is
[13:12] 4bit activate Ned quation for 4bit based
[13:15] model so here we are keeping it as false
[13:17] Now understand Guys these are some of
[13:19] the basic parameters that we
[13:21] specifically use in Lura technique
[13:23] specifically in PFT then training
[13:25] argument parameters our output directory
[13:27] will be present in this results I'm
[13:29] going to run one Epoch then we are going
[13:31] to enable this fp6 and B bf16 training
[13:35] okay uh it is set to True with an a100
[13:40] right so a100 uh you can set it if
[13:42] you're using a100 you can set it to True
[13:44] right now I'm using T4 if you have the
[13:46] paid version of Google collab then you
[13:48] can set it to
[13:49] True bass size for uh Pur GPU for
[13:52] training I hope you know what is bass
[13:54] size then you have GPU for evaluation
[13:56] bass size then gradient accumulation
[13:58] step check points Max gr uh Max grad nor
[14:02] learning rate weight DK right Optimizer
[14:05] page adamw we will be using which is of
[14:07] a variety of Adam itself then learning
[14:09] sh learn uh LR sched type cosine because
[14:12] it works on similarity right whatever
[14:14] question and answers we specifically
[14:16] write then maximum steps is minus one
[14:18] number of training steps override number
[14:20] of training epochs and after this you
[14:23] are also putting logging steps is equal
[14:25] to 25 now with respect to any fine
[14:27] tuning technique you use something
[14:29] called as supervised tuning right in
[14:31] supervised tuning that is you require
[14:33] some parameters right max sequent length
[14:35] then packing then device map so this is
[14:37] load the entire model on the GPU zero
[14:39] right so this is what are the some of
[14:41] the parameters don't worry uh these are
[14:44] some of the parameters that you don't
[14:45] need to learn each and every parameter
[14:47] because already all these things are
[14:49] provided by the official page itself
[14:51] I've just copied and pasted it over here
[14:53] right so we will go ahead and execute it
[14:56] so let's go ahead and execute it so all
[14:57] these parameters are set now the step
[15:00] four right there are multiple four steps
[15:02] right uh one more step is there later on
[15:05] load everything and start the F tuning
[15:07] process right first of all we want to
[15:09] load the data set we defined here our
[15:11] data set is already pre-processed but
[15:13] usually this is where you should
[15:14] reformat The Prompt right filter out bad
[15:17] text combine multiple data some amount
[15:18] of pre-processing is required but
[15:20] already we have done that so we are not
[15:21] going to do it then we are Recon we are
[15:24] configuring bits and byes for four bit
[15:26] quantization as I said right from 16
[15:28] from 32 or 16 bit we are converting that
[15:30] into 4 bit so that it required less
[15:32] space with respect to GPU for the fine
[15:34] tuning purpose next we are loading the
[15:36] Llama 2 model in 4bit Precision GPU with
[15:39] the current corresponding tokenizer
[15:41] right with that tokenizer we'll try to
[15:43] load that and obviously we'll also be
[15:45] loading it with the 4bit Precision
[15:47] finally we are loading the configuration
[15:49] of clor so uh and passing everything to
[15:51] the sft trainer so here is what self
[15:54] fine tuning uh s uh this sft will
[15:57] basically happen right now let's go
[15:59] ahead and let's do this so first of all
[16:01] we are loading the data set we are
[16:02] loading the tokenizer model with clora
[16:05] configuration so here I have return this
[16:07] B&B compute D type and we are using
[16:10] torch so along with that you also
[16:12] require bits and bytes config again load
[16:14] we are enabling this 4 bit then all the
[16:16] necessary parameters like compute D type
[16:19] you'll be using H net nested Quant okay
[16:22] again I'm telling you guys there is
[16:24] nothing new to learn in this because all
[16:25] these formats will be available in the
[16:27] official documentation then we are going
[16:29] to check the GPU compatibility with
[16:31] float 16 if compute dipe is equal to
[16:33] torch. float 16 use 4bit otherwise this
[16:36] all things are there right then we are
[16:39] going to load the base model see
[16:41] whenever we want to load the base model
[16:43] from hugging face we can use this Auto
[16:44] model for casual LM right that is the
[16:46] reason we have imported on top Dot from
[16:49] pre-trained model name what is my model
[16:51] name I've given that quation config so
[16:54] here you'll be able to see in conation
[16:55] config we are also given something
[16:57] called as uh BNB config right so here
[17:01] you'll be able to see this is the
[17:02] compute
[17:03] type let me just search for it somewhere
[17:06] here only it will be available
[17:12] so so BNB
[17:14] config so here you can see this entire
[17:17] bytes config is basically there so uh
[17:20] based on that you'll be okay yeah
[17:22] computer app okay yeah perfect so B&B
[17:24] config is basically given over here then
[17:26] device map is nothing but with respect
[17:28] to the GPU mapping then model. config do
[17:31] use cache false you can also make it
[17:32] true if you want model. config
[17:35] pre-training _ TP is equal to one then
[17:37] we are loading the Lama tokenizer see
[17:39] for any LM model we also need a
[17:42] tokenizer so that it will be able to
[17:43] convert any llm model the input data
[17:46] that we are specifically using into word
[17:48] embeddings and all so that is the reason
[17:50] order tokenizer from pre-trained again
[17:52] model name we are going to use this
[17:53] trust remote code is one additional
[17:55] parameter that is used then we going to
[17:58] put a pad token with respect to the end
[18:00] of statement token right so do this eore
[18:03] token specifically applies the token for
[18:06] the Lama itself right and here we are
[18:08] giving the padding side as right fixed
[18:10] weird overflow issue with fp16 training
[18:13] all these parameters will be almost
[18:15] fixed guys only thing that you will
[18:16] probably be changing is with respect to
[18:18] the configuration then load Laura
[18:21] configuration here you'll be able to see
[18:22] PFT config Laura config all the values
[18:25] that you're putting with respect to this
[18:27] Lowa configs and here here you have your
[18:29] PFT
[18:31] configuration now this is the most
[18:33] important thing because in this training
[18:34] arguments we set all the parameters
[18:37] output directory number of epo this this
[18:39] this learning rate PP p uh FP 16 bs6 you
[18:44] can probably see over here and then
[18:46] finally we are reporting it to the
[18:47] tensal flow right tensor board then you
[18:50] can also see that supervised fine-tuning
[18:52] parameters right I'm giving my model
[18:54] name I'm giving my data set my PFT
[18:56] config my data set text field this PF
[18:59] config has a Lowa config right then you
[19:02] have a tokenizer you have the arguments
[19:04] you you have packing then you have
[19:06] finally trainer1 okay now this is what
[19:09] is the main thing and that is where your
[19:11] supervised fine tuning will happen step
[19:13] by step you have done it okay let me
[19:15] repeat it quickly we have loaded the
[19:16] data set we have set our D type right we
[19:20] are setting up all our contag process
[19:23] over here here we are checking whether
[19:25] GPU is compatible or not here we are
[19:27] loading our llm model that is Lama 2
[19:30] here we are specifically loading our
[19:32] tokenizer which is be used in Lama 2
[19:35] along with this we are putting padding
[19:36] techniques then my Laura configuration
[19:39] which will specifically be in terms of
[19:40] PETA PFT config and then all my training
[19:43] arguments will go inside this right um
[19:47] the this training arguments is with
[19:49] respect to where my output directory is
[19:51] and all learning rate and all okay
[19:54] finally set supervised tuning parameters
[19:57] here we have seted model data set PFT
[19:59] config text Max equal length tokenizer
[20:02] everything is put up over here and
[20:04] finally we go ahead and train this now
[20:06] once we train it it is going to run for
[20:09] 250 aox uh I think 250 step size I have
[20:12] actually given over here sorry 25 steps
[20:15] uh logging steps let's see what is the
[20:18] bass size bass size is
[20:20] four um yeah till that much it will
[20:23] probably go so let this start so it has
[20:26] already started I guess so here you can
[20:28] see it is downloading here you'll also
[20:31] be able to see the data set
[20:34] okay sample data right now you cannot
[20:36] see it because the data set will get
[20:38] loaded okay so table of contents
[20:41] installed all the required packages
[20:43] we'll reformat all the steps are given
[20:45] side by side you can also read it out I
[20:47] know this looks like a little bit tough
[20:49] guys but at the end of the day uh I'll
[20:51] not say that it is easy and just the
[20:54] reason why I'm sharing you this
[20:55] finetuning technique because you should
[20:57] just get in your mind
[20:59] later on you know this is the pattern
[21:01] that I'm following first execute this
[21:04] don't worry about anything as such just
[21:06] try to get an high level overview how
[21:08] things work later on I will try to break
[21:12] down each and everything in my next
[21:14] video by breaking this entire code why
[21:16] this specific parameters used because
[21:19] the main thing is to understand what is
[21:20] PFT what is quantisation what is
[21:23] precision and uh how how do you
[21:26] specifically use this PFT technique what
[21:28] is qora everything what is low order
[21:31] rank index uh how to basically calculate
[21:34] that everything I will talk about it
[21:36] okay so we'll wait for some time till
[21:39] then uh just let let us wait and uh we
[21:43] will I'll just uh come again I I think
[21:46] it'll take 15 to 20 minutes to complete
[21:48] this entire fine tuning with thousand
[21:49] records and then again I'll come back
[21:51] and we'll start doing and seeing whether
[21:53] we are able to get the good results or
[21:55] not so yes uh let's wait for some time
[21:57] thank you
[21:59] so guys uh finally you can see the 250
[22:02] EPO or 250 steps have completed it took
[22:04] 25 minutes and again this is in Google
[22:06] collab if you have paid version of
[22:09] Google collab it will probably take
[22:11] hardly 5 to 10 minutes to complete okay
[22:13] so over here you can see the global step
[22:16] was 250 training loss it went went till
[22:19] 1.36 metrics runtime everything met
[22:22] training samples per second all this
[22:24] information is basically done okay and
[22:26] please remember this particular word
[22:28] which which is called as floss okay
[22:29] total floss because I'm going to discuss
[22:32] about this in my next video also now
[22:35] once we do this we are going to save
[22:36] this trained model right and understand
[22:39] the new model name what it will be right
[22:41] so here you can probably see Lama 27b
[22:44] chat fine tune so this is my results
[22:47] with respect to run all the results
[22:49] you'll be able to see over here also
[22:51] okay so here uh in this fine tuning
[22:53] technique it is also creating some
[22:55] something called as adapter adapter
[22:57] model okay please remember these words
[22:59] because in the next theoretical
[23:00] intuition we are going to discuss each
[23:02] and everything as we go ahead okay so
[23:04] please make sure that you remember it so
[23:06] we are going to save this model so we
[23:08] have written trainer. model. save.
[23:10] pre-rain model right now you can also
[23:12] check out in the tensor board but I will
[23:14] just go ahead and show you quickly that
[23:16] how it is probably going to generate it
[23:18] right so here we have created a prompt
[23:20] which is called as what is large
[23:21] language model I've used pipeline right
[23:24] so this pipeline we have already
[23:25] imported it the task will be task
[23:27] generation whatever model we have
[23:29] actually created that model will be
[23:31] there tokenizer will be used over here
[23:33] and max length we can keep to 2 200 to
[23:35] 250 the result uh and always understand
[23:39] as I always suggested with respect to
[23:41] Lama 2 this will be my format there will
[23:43] be an S then there will be an
[23:45] instruction and here I will be having my
[23:47] prompt and with respect to this
[23:50] particular prompt we are going to get
[23:51] some kind of response so whatever
[23:53] response we are going to get inside this
[23:54] result variable it will be in the form
[23:56] of list and inside that there will will
[23:58] be one field which is called as
[23:59] generated text so if I go ahead and
[24:01] search what is large language model
[24:04] you'll be able to see that how we going
[24:06] to get the result okay because we are
[24:08] running the same model over here so here
[24:10] is my prompt here we are using pipeline
[24:12] pipeline basically helps you to combine
[24:14] multiple things like task model
[24:16] tokenizers you know multiple things it
[24:18] will be able to give you right now since
[24:20] this is already running in this
[24:22] particular collab uh and obviously
[24:25] you'll be able to see RAM and all are
[24:27] almost it is used the dis space of
[24:30] somewhere around 39 GB right so just
[24:32] wait for some time and here you will be
[24:34] able to get the response if you quickly
[24:36] want to get the response obviously you
[24:38] need to have a good GPU right based on
[24:41] that it'll be able to give you a quick
[24:42] result right so after that you'll be
[24:44] also able to see that we'll be able to
[24:46] delete all these vams and all okay so
[24:49] let's see and let's see whether we'll be
[24:51] able to get our result in the next step
[24:53] we can also push our model to the
[24:55] hugging face which I will keep it right
[24:57] now I will not explain it because this I
[25:00] will show you as an complete project as
[25:03] we go ahead so here you can see what is
[25:04] large language Model A large language
[25:06] model is a type of artificial
[25:07] intelligence large language model often
[25:09] seen then here you can also see all the
[25:11] information are there some example of
[25:13] large language models are uh include
[25:16] this okay now what we are going to do
[25:18] let's go ahead and take any one example
[25:20] over here from this particular data set
[25:22] okay so I will just write how to own a
[25:26] plane in United States okay okay so this
[25:29] will be
[25:31] my over here and I'll paste it over here
[25:35] let's see so this will also run and I
[25:38] will finally get my result also so same
[25:40] same question I've have taken right so
[25:42] from this 1K result so to a plane this
[25:44] is the answer that we will probably be
[25:47] getting let's see how much time it'll
[25:49] take to probably showcase but always
[25:52] remember please keep on looking at this
[25:54] particular Ram like how much uh time it
[25:56] is probably taking and how much space it
[25:59] is taking okay so so guys here you can
[26:02] probably see the response how to own a
[26:04] plane in united state in United State
[26:06] and owning a plane is this determine
[26:07] your budget so this is completely based
[26:09] on this information that is present over
[26:11] here but here I've written only 200 max
[26:14] length so I can only see 200 characters
[26:16] that is given right so you can probably
[26:18] try with each and everything as you go
[26:20] ahead now guys uh here also you'll be
[26:22] able to see the detailed explanation of
[26:24] each and every step but the most
[26:26] interesting video after seeing this will
[26:28] obviously be able to understand like
[26:30] what all each and everything does over
[26:32] here what this PFT does what is this
[26:34] bits and bites what is this Laura
[26:36] everything we will discuss in our next
[26:38] video so I hope you like this particular
[26:39] video this was it from my side I'll see
[26:41] you in the next video have a great day
[26:42] thank you and all take care bye-bye