[0:00] what can you do to improve the
[0:02] performance of your watch language model
[0:05] for your specific use case hey everyone
[0:08] my name is Vin and in this video we're
[0:10] going to see how you can find you a
[0:12] watch language model on a custom data
[0:14] set here we're going to be using W 38b
[0:18] instruct model and we are going to be
[0:20] fine-tuning it for a rock application
[0:22] for financial data let's get started if
[0:25] you want to follow along there is a
[0:27] complete text tutorial that is available
[0:29] for m expert Pro subscribers and it is
[0:32] right under the boot camp and then fine
[0:34] tuning W3 L for R here you can find the
[0:38] complete text tutorial along with the
[0:40] source code and explanations on each of
[0:43] the steps that we're going to do along
[0:45] with a link to a Google clap notebook so
[0:48] if you want to support my work please
[0:50] consider subscribing for M expert pro
[0:52] thank you here is the process that we're
[0:55] going to go through in order to find you
[0:57] our W 3 model for our specific task
[1:01] first we're going to be building a data
[1:04] set that is based on custom prompts
[1:07] provided from a Json file that I'm going
[1:09] to show you how you can transform into
[1:12] hugging phase data set then we're going
[1:14] to be choosing and evaluating the
[1:16] initial performance of the base model in
[1:18] our case this is going to be the W 38b
[1:21] instruct model then we're going to be
[1:23] setting up an adapter and in our case
[1:26] this is going to be a war adapter that
[1:29] we're going to be using using in order
[1:30] to tune on top of the original W 3 Model
[1:35] since the W 3 Model is quite large and
[1:38] probably you're not going to be able to
[1:40] do a fine-tuning of the complete model
[1:42] on a single GPU then we are going to be
[1:46] continuing with training and monitoring
[1:48] the training process I'm going to show
[1:50] you the results that I got and this
[1:52] model was trained in roughly 2 hours for
[1:55] a single ook then we're going to be
[1:58] creating an evaluation on a previously
[2:01] created test set and based on this
[2:03] evaluation we're going to be merging the
[2:06] based model that we have and we're going
[2:08] to be pushing the model to H face Hub
[2:11] and I'm going to show you uh some
[2:14] examples on how the trained model is
[2:16] comparing the predictions to the
[2:18] untrained model the data set that we're
[2:21] going to be using is available on the
[2:23] huging face data sets it is called
[2:25] Financial Q&A 10K and here you can find
[2:29] roughly 7 ,000 examples that are
[2:33] essentially paired with a question
[2:35] context and an answer these are the
[2:38] columns that we're going to be using of
[2:39] course uh you can infer from the name
[2:42] that this is actually a financial data
[2:44] set and uh you can see that uh the two
[2:47] additional coms are filing and then
[2:49] ticker we are not going to be using
[2:51] those but we are going to be uh
[2:53] deploying the question answer and the
[2:55] context the base model that we're going
[2:58] to be using is the original
[3:00] wama 38b instruct model by meta AI which
[3:03] is also available on the H face models
[3:06] repository and this model is going to be
[3:09] a we're going to be able to put this
[3:11] model on a single GPU with a
[3:13] quantization to four bit parameters and
[3:16] I'm going to show you how to do that
[3:17] into the co notebook other than that a
[3:21] thing that you should know about this
[3:22] model is that it has a Contex length of
[3:25] 8K tokens which will be quite more than
[3:28] we need in order to find tun for our
[3:30] specific data set and this model has to
[3:33] be one of the better open models that
[3:36] you can use uh at least today so we're
[3:39] going to be fine-tuning this another
[3:41] bonus of this model is that it has a
[3:43] chat template which uh is provided by
[3:46] the tokenizer as you can see here and we
[3:49] are going to be using this chat template
[3:51] in order to further fine-tune this model
[3:54] I have the Google clap notebook now
[3:56] opened and as you can see first I'm
[3:58] starting with showing you that the
[4:00] actual GPU that I've used during this
[4:03] fine tuning was a T4 I'm going to show
[4:06] you how we can fit the model on the T4
[4:08] GPU in a bit and here I'm installing
[4:10] pretty much the latest versions of the
[4:12] torch Library the Transformers Library
[4:15] data set since we're going to be
[4:16] downloading the data set from the
[4:18] hanging face repository then the
[4:20] accelerate library and bits and bites
[4:22] which we're going to be using for the
[4:23] quantization of the model then uh for
[4:26] the war setup we're going to be using
[4:28] the P Library then we're going to be
[4:31] using the TRL sft trainer or supervised
[4:34] fine tuning uh trainer that is provided
[4:37] by this labrary and then the covered
[4:39] Library which I'm going to show you why
[4:40] we're going to be using in a bit uh we
[4:43] have a lot of imports and most of those
[4:45] are based on the fact that I'm going to
[4:48] show you a couple of uh plots uh but the
[4:51] more important thing here is that I'm
[4:54] seeding the uh torch and the numai and
[4:58] the random uh from the python with a
[5:01] seat and then I'm specifying a p token
[5:04] I'm going to show you how you can apply
[5:06] the P token to the tokenizer since the
[5:09] tokenizer at least for w 38b instruct
[5:12] model doesn't come with a PO token
[5:14] included so we're going to be doing just
[5:17] that in a bit and then uh I'm going to
[5:20] be having a a constant for the original
[5:24] model and then the new model that I'm
[5:25] going to show you how you can push to
[5:27] the Hang face Repository
[5:30] so first I'm going to start with uh
[5:32] creating the configuration for the model
[5:34] itself and here you can see that I have
[5:37] something very basic I'm loading the
[5:39] model into 4 bit and I'm using the new
[5:43] word nf4 uh format for the Quant type of
[5:47] the 4bit model and uh here I'm saying
[5:50] that the compute type which are we're
[5:52] going to be using for the computational
[5:54] part of this model is going to be a
[5:56] binary for 16 uh other than that this is
[5:59] pretty much a very standard
[6:02] configuration for wading the model into
[6:04] 4bit format uh next I'm going to show
[6:07] you that uh we are actually downloading
[6:10] the original tokenizer from The Meta
[6:12] repository and I'm adding a p token
[6:16] which is going to be this P token
[6:18] constant right here and I'm setting the
[6:20] Ping side to the right this is just in
[6:22] case uh if this is not set already and
[6:26] then I'm loading the model from the
[6:28] quantization and then after I've
[6:31] downloaded or loaded the model you'll
[6:34] see that I'm actually expanding or
[6:36] resizing the token in Bings for this
[6:38] model based on the length of the
[6:39] tokenizer since we've added a new token
[6:42] right here now why I do that uh from
[6:45] what I found if you're training with
[6:48] more than one training example per batch
[6:52] I've seen that usually the embeddings or
[6:56] the tokenizer is getting scrumbled and
[6:59] it appears that the at least the was and
[7:02] the responses don't get very good and
[7:05] what I found is that the models continue
[7:07] to Jumble or try to speak a lot and
[7:10] repeat some of the sentences if I set
[7:13] this padding token it appears that uh
[7:16] the model is actually stopping to
[7:17] generate itself as it should and uh this
[7:21] actually helped me to consider that also
[7:24] I would like to know that I've tried to
[7:26] actually fine tune the base model uh
[7:29] that that is the model that didn't uh
[7:31] include any instruct fine tuning and on
[7:34] that model also without the P token uh
[7:37] it appears that uh this model continues
[7:39] to uh repeat the text uh forever and
[7:42] ever so if you have another solution to
[7:44] this problem please let me know down
[7:46] into the to the comments of this video
[7:49] uh and you'll see that we're downloading
[7:51] the model uh you can see that the model
[7:53] was able to be wed successfully and this
[7:56] is the config you can see that we are
[7:59] actually only adding the quantization
[8:01] config right
[8:03] here uh other than that uh I'm showing
[8:06] you the beginning of SE of sequence
[8:09] token the end of sequence token and the
[8:12] new P token that we've added those are
[8:15] already into our tokenizer okay so I'm
[8:19] going to continue with the original data
[8:21] set and here I'm going to show you how
[8:24] you can essentially create your own
[8:26] custom data set so you don't have to
[8:28] rely on on some pre-processed data set
[8:31] and for example you can have a data
[8:33] frame or Json and from that uh you can
[8:36] actually create your own custom hugging
[8:39] phase data set so I'm going to start by
[8:42] downloading the original hugging face
[8:45] data set and I'm going to convert it
[8:46] into a data frame I'm going to see a
[8:49] couple of examples right here these are
[8:51] the columns that we have originally and
[8:54] the first thing is as I've already told
[8:56] you I'm going to convert this data set
[8:58] into a data frame so this is something
[9:00] that you might have in the real world uh
[9:03] for example a data frame or a CSV file
[9:05] or uh you can have some SQL or uh SQL
[9:10] database that you can convert into a CSV
[9:12] file or a data frame and from here we're
[9:15] going to be building our custom data set
[9:17] and this is how uh I'm going to do this
[9:21] so first uh something that I really like
[9:24] to do is to check whether or not this
[9:27] data set contains any new values since
[9:29] this will probably W up our gradients
[9:33] during training and our was is not going
[9:35] to be very happy with that so I see that
[9:38] pretty much uh everything is here we
[9:41] have 7,000 examples and then after this
[9:44] is complete I'm going to be building
[9:46] this function called format example in
[9:49] which I'm going to be using the question
[9:52] the answer and the context for a
[9:54] specific question along with this very
[9:57] simple system prompt on top of that I'm
[10:00] going to be calling apply chat template
[10:03] and I don't want this to get tokenized
[10:05] so in order to get these messages and
[10:09] run through this I'm going to show you
[10:12] that this is going to be running through
[10:15] every example and I'm going to be adding
[10:18] a new com com text to our data
[10:21] frame and then I'm going to continue
[10:23] with counting the actual tokens that our
[10:27] tokenizer is going to be doing in order
[10:30] to have their count into our final data
[10:33] frame and this is something that you
[10:35] might get uh for example here is a data
[10:39] frame or a sample of the first couple of
[10:42] examples five to be exact and you can
[10:44] see the question the context the answer
[10:46] and now we have the text along with a
[10:49] token count for each text I'm going to
[10:52] show you why we're going to be using
[10:54] this but let's see a simple example or
[10:57] the first example that we get
[10:59] from the text here you can see that the
[11:02] tokenizer has added all of the specific
[11:05] tokens that are actually included within
[11:08] the template you can see the system
[11:10] prompt then you can see that uh the
[11:14] question is actually
[11:17] here sorry this is uh the system prompt
[11:20] then this is the question from our
[11:22] specific case and then this is the
[11:24] context provided here between these
[11:27] triple digs uh this is ending right here
[11:31] and then we have a answer from the
[11:34] assistant so this is going to be the
[11:37] answer from our data set and then we
[11:39] have end of sequence ID token at the end
[11:42] so this is pretty much the format that
[11:44] the model is going to be receiving our
[11:46] texting and then I'm showing you a
[11:49] histogram or let's say a plot that tells
[11:53] how often tokens be between for example
[11:57] 100 uh Zer and 200 100 Etc tokens are
[12:02] relevant here and you can see our data
[12:04] set is heavily skewed towards uh 300 or
[12:08] less tokens right here which is a good
[12:11] thing since we want to reduce the number
[12:14] of tokens that we're going to be using
[12:16] in order to have a a faster training so
[12:20] this is a good for us and uh I'm going
[12:23] to be actually reducing the number of
[12:25] tokens under 512 and in our case we
[12:30] seeing that only three of the examples
[12:32] right here have more than 5 12 tokens so
[12:35] what I'm going to do is to actually
[12:37] remove those
[12:39] examples uh and then I'm going to sample
[12:42] uh 6,000 examples and based on that I'm
[12:46] going to be splitting those into a train
[12:48] validation and test sets so to continue
[12:51] with that I'm going to be using the
[12:53] train test split from the sk1 library
[12:56] I'm going to be first creating a train
[12:58] set and then the rest of the data set
[13:01] I'm going to be splitting that into a
[13:02] validation and test sets so these are
[13:05] the results that I have and from that
[13:07] I'm going to be saving roughly 4,000
[13:10] examples for training 500 for validation
[13:14] and4 testing and this essentially is
[13:18] going to be our data set that we're
[13:21] going to be building and I'm going to be
[13:23] using two Json on the data frame that we
[13:27] have uh I'm going to orient towards the
[13:29] records and I want this to be stored as
[13:32] Json wines or Json l so essentially what
[13:36] I'm going to do next is to get or W our
[13:40] custom data set that we've just created
[13:42] and this is essentially how you are
[13:45] going to be wading a Json file and this
[13:47] is the mapping between the Json files so
[13:51] what we have here is our own custom data
[13:53] set that we pre-processed enabled and
[13:55] created finally based on the Json and
[13:58] then uh at the was step we're actually
[14:00] loading our own custom data set so this
[14:03] is essentially the process that you need
[14:06] to follow in order to build a data set
[14:09] for fine-tuning your
[14:12] L next I'm going to show you that uh
[14:15] actually our data set is correctly split
[14:17] you can see the number of rows right
[14:19] here and I'm going to just be looking at
[14:22] another example of the text which is
[14:24] again a text with all of the tokens that
[14:27] are needed to be applied based on the
[14:29] chat template okay so next we're going
[14:33] to continue with testing the original
[14:36] model this is be before fine-tuning the
[14:40] base model that is I'm going to be
[14:42] creating this pipeline I'm going to be
[14:44] pipelining the model in the tokenizer
[14:47] this is for the text generation task and
[14:49] I want this to produce as much as uh
[14:52] 128 tokens at
[14:55] most so I'm going to be creating this
[14:58] helper function
[14:59] which essentially goes through the
[15:02] example right here and does the exact
[15:05] same thing that we've did before but it
[15:07] is actually removing the original or the
[15:10] uh final answer or the correct answer
[15:12] from The Prompt and this is actually the
[15:15] test prom that we're going to be
[15:16] building here is an example of that uh
[15:19] one important thing here to note is that
[15:21] I'm adding add generation prompt equal
[15:23] to true so this will actually add this
[15:28] part to the prompt
[15:30] uh which you don't have to do on your
[15:32] own and again the model is going to be
[15:35] promptly uh formatted
[15:38] promptly all right so this is the
[15:40] example right now and if I run the
[15:44] prompt through the pipeline you'll see
[15:47] that this is the original answer and
[15:50] this is the prediction for our model you
[15:53] can also see that this took us roughly
[15:55] 10 seconds uh in order to produce the
[16:00] uh prediction which is quite slow at
[16:02] least on this GPU but yeah the GPU is
[16:05] quite slow as well
[16:08] so oh this is the first example let's
[16:10] see another
[16:12] one uh how did the company Net earnings
[16:16] amount to in fisal 2022 net earnings
[16:19] were 17.1 billion in fisal 2022 so
[16:23] relatively straightforward question in a
[16:26] context let's see uh you you can see
[16:29] that the answer was pretty simple uh but
[16:32] H 3 was quite verbos at least with the
[16:36] prompt of course uh if you play around
[16:38] with the prompt you might get better
[16:40] results uh but yeah probably uh with
[16:44] some fine tuning you get still better
[16:46] results another example let's see at the
[16:50] answer and very very both answer right
[16:54] here compared to the original very
[16:56] simple answer so uh I'm going to
[17:00] essentially get the 100 example in the
[17:03] test date sets and I'm going to be
[17:04] running the predictions throughout the
[17:08] uh pipeline that we have so we can
[17:10] compare the results at the end to the
[17:12] train model and of course this model is
[17:15] quite verbos I'm not sure if it is
[17:18] correct uh at all of the prompts but at
[17:21] least in my experience I'm not very
[17:24] happy with that and probably I would go
[17:27] with further tuning the model changing
[17:29] it all together uh tuning the prompts or
[17:32] completely fine-tuning it based on the
[17:34] performance that you
[17:36] require another thing that I'm going to
[17:39] show you is uh I've seen a lot of
[17:41] examples of fine-tuning those watch
[17:44] language models but most of the times
[17:47] the wor function was calculated on the
[17:50] complete generation of the text which is
[17:53] something that we don't really want
[17:55] since we want to only judge how well the
[18:00] performance of the generation is doing
[18:03] but not the performance of the already
[18:06] inputed text so what I'm going to do is
[18:09] to get the final token of the head and
[18:12] header ID let me show you this so this
[18:16] is this token right
[18:18] here and after that I'm going to be only
[18:22] uh looking at the was after this token
[18:26] so you can see that this data cator for
[18:29] completion only uh language modeling
[18:32] task is going to be essentially masking
[18:35] the tokens with minus 100 so this will
[18:39] not be calculated during the was so this
[18:41] will also speed up the calculation or
[18:44] the training process that you have and
[18:47] all of the rest tokens are going to be
[18:49] used for calculating the loss
[18:51] essentially so pretty neat trick uh if
[18:54] you want to essentially speed up or get
[18:57] even better results with this type of
[19:00] collator which is available from the
[19:02] Transformers library of course okay so
[19:05] we have the collator we have the DAT set
[19:07] let's see what we have for the model so
[19:11] what I do in order to choose which
[19:13] layers to Target with the War uh fine
[19:17] tuning is uh pretty much I'm going to be
[19:19] choosing each linear layer right here
[19:22] and I would say that the wama
[19:24] architecture is pretty straightforward
[19:26] with the wama decoder layer so I'm going
[19:29] to be using the query key value and then
[19:33] pretty much every linear layer that we
[19:36] have right here and for the MLP part
[19:39] this was the attention part of the
[19:41] architecture if you will and for the uh
[19:44] multilayer perceptron layer whatever uh
[19:48] I'm going to be essentially targeting
[19:50] again all of the layers that are of
[19:54] course linear as well so this is
[19:56] something that is coming from the origin
[19:59] War paper I believe and if I recall
[20:01] correctly they were specifying that you
[20:03] need to Target all the linear layers
[20:05] this is how they get the best results
[20:08] possible and in our case I'm going to
[20:12] specify this linear layers right here
[20:14] within the target modules and I'm going
[20:17] to be specifying the coal language
[20:19] modeling task along with a rank of the
[20:22] war config of 32 and War Alpha of 16 and
[20:27] if you're not familiar with the War
[20:29] fine-tuning uh there is a video on my
[20:31] channel that uh pretty much describes in
[20:33] a bit more detail how war is performing
[20:37] but essentially this is uh you can think
[20:39] of of creating a smaller model on top of
[20:42] the original model and this smaller
[20:44] model you're going to be essentially
[20:46] fine-tuning only the weights of this
[20:47] small model while freezing the lch model
[20:51] on the bottom of it and when a
[20:53] prediction comes uh the prediction is
[20:56] going to go through the original model
[20:58] and then it is going to go through your
[21:00] own fine tuned adapter on top of that so
[21:03] this is the way that I pretty much think
[21:06] of when thinking of War models and then
[21:09] I'm going to be preparing this model for
[21:12] kbit training since we are using
[21:13] quantization right here and then I'm
[21:16] going to be applying the war config on
[21:19] top of the model that we have which is
[21:22] again the original W 3 Model so how many
[21:25] parameters we actually going to train
[21:26] with uh you can see here that of course
[21:29] the model offers roughly uh all the
[21:33] parameters uh are roughly 8 billion
[21:35] parameters while we're going to be
[21:37] training only about
[21:42] 1.34% or roughly 84 million parameters
[21:47] on top of that and this is uh actually a
[21:51] very good Ru of temp if the model is
[21:54] watch enough think of like five six or
[21:57] more billion parameter models then
[21:59] probably 1% or even half% of the
[22:02] parameters uh depending on some
[22:05] experiments that you might do are going
[22:07] to be enough in order to train the model
[22:09] on your specific tasks of course this
[22:11] will depend on the DAT set and the
[22:13] complexity of the task that you're going
[22:15] to be doing but roughly 1 half% 1 and a
[22:20] half% is a good R of temp for larger
[22:24] LS and next I'm going to be wading the
[22:27] tensor board with this model I'm going
[22:29] to go through the training itself in a
[22:31] bit so I want to give a big shout out to
[22:35] Philip Schmidt and I'm going to link
[22:36] down his blog into the description of
[22:38] this video but more importantly he
[22:41] specified this part right here uh which
[22:43] is very important we don't want the
[22:46] tokenizer to add any special tokens and
[22:49] we don't want any additional separator
[22:51] tokens this is provided via the DAT set
[22:53] keyword arguments of the sft trainer uh
[22:57] and again this book post is very nice
[22:59] how to findun L in 2024 with hugging
[23:02] face so go and have a read on top of
[23:05] that so back to our config as you can
[23:08] see we have a lot of configuration here
[23:11] uh I'm specifying the maximum number of
[23:13] tokens uh
[23:15] 512 uh this is based of course on the uh
[23:20] experience that we got with the token
[23:22] counts the text field that we're going
[23:24] to be using is just going to be the text
[23:27] uh we're going to be training for a
[23:28] single Epoch probably it would be great
[23:31] to train for more uh and probably you'll
[23:34] get even better results for example two
[23:36] eox might be great so uh let me know if
[23:39] you train the model for two eox and let
[23:42] me know of the results so I'm going to
[23:44] be training on the T4 so this pretty
[23:47] much allows me to have uh two examples
[23:50] per batch uh I'm going to do the same
[23:53] thing for the evaluation and I'm
[23:55] accumulating for four this is actually
[23:58] for 4 * 2 so the gradient accumulation
[24:00] is going to be doing eight samples for
[24:03] the gradient update which is uh quite
[24:06] good at least on a single GPU uh I'm
[24:09] going to be using the special item with
[24:11] wayk fix page Optimizer that is uh I
[24:15] believe coming from the bits and B
[24:17] Library as well and this is for the 8bit
[24:20] optimization so this Optimizer is quite
[24:22] good it appears to be working quite well
[24:24] and quite fast on top of that uh next
[24:27] I'm going to be ass
[24:29] evaluating every uh 20% of the training
[24:32] process and uh running through the Valu
[24:35] U sorry the validation set I have a very
[24:38] small warning rate which appears to be
[24:40] working quite all right uh also I have a
[24:44] very small warm up ratio about 10% so
[24:46] during this time uh yeah actually this
[24:49] is quite redundant since I'm using a
[24:52] constant uh warning rate schedule but
[24:55] I've tried with linear it appears to be
[24:58] doing something but not that impressed
[25:01] with it and I want the responses or the
[25:03] results to be in a safe tensor format
[25:06] and these are the arguments that I'm
[25:08] going to be essentially getting from the
[25:09] Philip Schmid blog post that I've shown
[25:11] you and I'm seeding the training process
[25:15] itself not really sure if this is going
[25:17] to be completely reproducible for you
[25:20] but it appears to be doing something for
[25:22] the seating of the values at least uh
[25:24] when you have the correctly seated data
[25:27] set and then the training itself is
[25:30] quite straightforward I'm going to be
[25:31] passing the configuration the model the
[25:33] DAT set for training for the validation
[25:35] the tokenizer and the cleor uh which is
[25:38] again going to be calculating the was
[25:41] only on the parts that are going to get
[25:43] completed by the model and then uh you
[25:47] can see that I'm essentially calling the
[25:49] dot train method and this is the result
[25:53] from this you can see that the training
[25:56] is uh some somewhat junky if you will uh
[26:00] but it goes quite well the validation
[26:04] was on the other hand is also uh
[26:07] decreasing somewhat but it is quite
[26:11] slower in the decrease rate uh I recall
[26:14] that we have only 500 examples for the
[26:16] validation probably if you increase that
[26:18] to let's say 1,000 or 2,000 you will
[26:21] probably get a much smoother validation
[26:23] most and again if you train the model
[26:26] for a bit longer you probably get some
[26:29] more of better results as well okay so
[26:32] after this is complete I'm going to be
[26:34] saving the model into our uh loal
[26:39] repository or file system and after that
[26:43] I'm going to be essentially Waring the
[26:46] model uh again this is done on the
[26:50] another actually I did this on a p100
[26:53] since the GPU memory for the T4 wasn't
[26:56] enough to Lo the model without the
[26:59] tokenization or the quantization sorry
[27:02] and I essentially wed the model with the
[27:04] p 100 uh GPU applied the P adapter on
[27:09] top of that and then merged the model
[27:11] into a single model and what I did after
[27:14] that is to essentially upload the model
[27:17] and the tokenizer to the hugging face
[27:20] Hub and I wanted this to be split into
[27:24] maximum shite of 5 GB so this is the
[27:28] public model that is available on the H
[27:30] face models it is called W 38b instruct
[27:33] Finance R and here you can find the
[27:35] complete text tutorial or sample
[27:37] examples along with some of the
[27:39] predictions that I got from this model
[27:41] uh more importantly you can find the
[27:43] files you can see these are essentially
[27:46] the tensors with the sharts of 5 GB at
[27:49] most which is quite good along with the
[27:53] generation config and the tokenizer
[27:55] itself along with a sample of the
[27:57] predictions
[27:58] and then we also have the training
[28:00] metrics that are available for the
[28:03] tensor board and uh let's see what we
[28:06] got here I'm going to show you
[28:11] something so here you can look through
[28:14] the
[28:15] complete training process you can see
[28:18] that it took at least for the validation
[28:20] was uh hour and a half and it appears to
[28:24] be performing quite well again probably
[28:27] you're going to be uh quite happy with
[28:29] deploying
[28:30] this or earning this for a bit longer
[28:34] and this is uh the training course uh
[28:36] roughly hour and 40 minutes but again
[28:39] the complete training walk is available
[28:42] within the H face repository so we have
[28:45] the trend model and now I'm going to
[28:48] essentially W our data set once more and
[28:52] just for producibility of course and I'm
[28:55] going to be downloading the model from
[28:57] the huging face up I'm going to be
[28:59] applying the quantization that I did
[29:01] with the original model so we are going
[29:03] to be doing a completely Fair uh
[29:07] comparison between the base model and
[29:08] the finetune version of the model also
[29:11] I'm going to be uh getting the tokenizer
[29:14] from our own repository since it
[29:16] contains all the padding config Etc and
[29:20] this is going to be go aheading and
[29:22] getting all of the data for our model
[29:25] again I'm going to be creating a
[29:27] pipeline and in this pipeline I'm going
[29:29] to be seing or expecting at most 128
[29:33] tokens so this is again the first uh the
[29:37] first response that I got and this is
[29:40] now the prediction of the model uh I'm
[29:43] going to show you a couple of
[29:44] comparisons in a bit but this is now
[29:47] much more aligned with what we have in
[29:49] the original data set not these bullet
[29:52] list points that we got in the original
[29:55] uh next the answer from the prediction
[29:59] here again quite uh Compact and very
[30:02] like what we get in the data set here
[30:06] next I'm going to show you another
[30:08] example uh here you can see that our
[30:10] even our fun model is quite
[30:14] verbos yeah it did uh provide a lot of
[30:17] text but again uh the response is
[30:20] correct let's see how many examples we
[30:23] are going to be getting here and how
[30:26] we're going to compare those to the
[30:28] train prediction so this is the
[30:30] predictions data frame and I'm going to
[30:32] be essentially creating or adding those
[30:35] predictions of with the train
[30:38] model uh so I'm going to be taking a
[30:41] sample of 20 examples and we're going to
[30:44] go through some examples together uh the
[30:47] first example this is the train model
[30:49] and this is the untrained one uh you can
[30:51] see that we got a much better response
[30:53] from the train model at least based on
[30:56] our qualitive uh analysis again here the
[31:00] formatting and the words appear to be
[31:03] quite
[31:04] well matched to the ones that we have
[31:08] from the uh train model compared to the
[31:11] untrained
[31:13] model okay
[31:16] next uh you can see that the train model
[31:19] is actually providing a very short
[31:23] response compared to the answer in the
[31:25] data set uh on that case I'm not really
[31:29] sure if this is completely answering the
[31:31] question but at least it appears to be
[31:34] that our model is uh very biased towards
[31:37] shorter answers on some occasions of
[31:39] course uh okay so uh here another
[31:45] example mechanical engineering from
[31:47] University of California and from
[31:50] Stanford School Etc again this appears
[31:53] to be quite well
[31:55] written and this is uh let's say an
[31:59] additional word that I would not like to
[32:02] see into my rock system uh and this is
[32:06] the case when you don't fine tune at
[32:08] least you're prompt enough with those
[32:11] types of models something that we are
[32:12] not seeing into the fine tune model
[32:15] again a very good example based on our
[32:17] fine
[32:18] tuning uh another
[32:21] example where the unra model is adding a
[32:24] bit more verbosity and uh some
[32:26] formatting that is actually not
[32:30] needed concrete number here well the
[32:34] untrained one has a lot of verbosity
[32:37] yeah you can you can go through those
[32:38] examples and you'll probably be quite
[32:42] happy with the results that you get from
[32:44] the fine tuning and probably if you do
[32:46] some more fine tuning you'll be even
[32:48] happier with the results so this is it
[32:50] for this video we've seen how you can f
[32:52] tune a w 38b instruct model on a custom
[32:56] data set and we've seen how much better
[32:59] this model is performing based on our
[33:01] fine tuning compared to the base model
[33:04] so what do you think is this model
[33:06] performing much better or is it
[33:09] exceeding your expectations let me know
[33:11] down into the comments below thanks for
[33:13] watching guys please like share and
[33:15] subscribe also join the Discord channel
[33:18] that I'm going to link down into the
[33:19] description and I'm going to see you in
[33:21] the next one bye