[0:00] what can you do to improve the [0:02] performance of your watch language model [0:05] for your specific use case hey everyone [0:08] my name is Vin and in this video we're [0:10] going to see how you can find you a [0:12] watch language model on a custom data [0:14] set here we're going to be using W 38b [0:18] instruct model and we are going to be [0:20] fine-tuning it for a rock application [0:22] for financial data let's get started if [0:25] you want to follow along there is a [0:27] complete text tutorial that is available [0:29] for m expert Pro subscribers and it is [0:32] right under the boot camp and then fine [0:34] tuning W3 L for R here you can find the [0:38] complete text tutorial along with the [0:40] source code and explanations on each of [0:43] the steps that we're going to do along [0:45] with a link to a Google clap notebook so [0:48] if you want to support my work please [0:50] consider subscribing for M expert pro [0:52] thank you here is the process that we're [0:55] going to go through in order to find you [0:57] our W 3 model for our specific task [1:01] first we're going to be building a data [1:04] set that is based on custom prompts [1:07] provided from a Json file that I'm going [1:09] to show you how you can transform into [1:12] hugging phase data set then we're going [1:14] to be choosing and evaluating the [1:16] initial performance of the base model in [1:18] our case this is going to be the W 38b [1:21] instruct model then we're going to be [1:23] setting up an adapter and in our case [1:26] this is going to be a war adapter that [1:29] we're going to be using using in order [1:30] to tune on top of the original W 3 Model [1:35] since the W 3 Model is quite large and [1:38] probably you're not going to be able to [1:40] do a fine-tuning of the complete model [1:42] on a single GPU then we are going to be [1:46] continuing with training and monitoring [1:48] the training process I'm going to show [1:50] you the results that I got and this [1:52] model was trained in roughly 2 hours for [1:55] a single ook then we're going to be [1:58] creating an evaluation on a previously [2:01] created test set and based on this [2:03] evaluation we're going to be merging the [2:06] based model that we have and we're going [2:08] to be pushing the model to H face Hub [2:11] and I'm going to show you uh some [2:14] examples on how the trained model is [2:16] comparing the predictions to the [2:18] untrained model the data set that we're [2:21] going to be using is available on the [2:23] huging face data sets it is called [2:25] Financial Q&A 10K and here you can find [2:29] roughly 7 ,000 examples that are [2:33] essentially paired with a question [2:35] context and an answer these are the [2:38] columns that we're going to be using of [2:39] course uh you can infer from the name [2:42] that this is actually a financial data [2:44] set and uh you can see that uh the two [2:47] additional coms are filing and then [2:49] ticker we are not going to be using [2:51] those but we are going to be uh [2:53] deploying the question answer and the [2:55] context the base model that we're going [2:58] to be using is the original [3:00] wama 38b instruct model by meta AI which [3:03] is also available on the H face models [3:06] repository and this model is going to be [3:09] a we're going to be able to put this [3:11] model on a single GPU with a [3:13] quantization to four bit parameters and [3:16] I'm going to show you how to do that [3:17] into the co notebook other than that a [3:21] thing that you should know about this [3:22] model is that it has a Contex length of [3:25] 8K tokens which will be quite more than [3:28] we need in order to find tun for our [3:30] specific data set and this model has to [3:33] be one of the better open models that [3:36] you can use uh at least today so we're [3:39] going to be fine-tuning this another [3:41] bonus of this model is that it has a [3:43] chat template which uh is provided by [3:46] the tokenizer as you can see here and we [3:49] are going to be using this chat template [3:51] in order to further fine-tune this model [3:54] I have the Google clap notebook now [3:56] opened and as you can see first I'm [3:58] starting with showing you that the [4:00] actual GPU that I've used during this [4:03] fine tuning was a T4 I'm going to show [4:06] you how we can fit the model on the T4 [4:08] GPU in a bit and here I'm installing [4:10] pretty much the latest versions of the [4:12] torch Library the Transformers Library [4:15] data set since we're going to be [4:16] downloading the data set from the [4:18] hanging face repository then the [4:20] accelerate library and bits and bites [4:22] which we're going to be using for the [4:23] quantization of the model then uh for [4:26] the war setup we're going to be using [4:28] the P Library then we're going to be [4:31] using the TRL sft trainer or supervised [4:34] fine tuning uh trainer that is provided [4:37] by this labrary and then the covered [4:39] Library which I'm going to show you why [4:40] we're going to be using in a bit uh we [4:43] have a lot of imports and most of those [4:45] are based on the fact that I'm going to [4:48] show you a couple of uh plots uh but the [4:51] more important thing here is that I'm [4:54] seeding the uh torch and the numai and [4:58] the random uh from the python with a [5:01] seat and then I'm specifying a p token [5:04] I'm going to show you how you can apply [5:06] the P token to the tokenizer since the [5:09] tokenizer at least for w 38b instruct [5:12] model doesn't come with a PO token [5:14] included so we're going to be doing just [5:17] that in a bit and then uh I'm going to [5:20] be having a a constant for the original [5:24] model and then the new model that I'm [5:25] going to show you how you can push to [5:27] the Hang face Repository [5:30] so first I'm going to start with uh [5:32] creating the configuration for the model [5:34] itself and here you can see that I have [5:37] something very basic I'm loading the [5:39] model into 4 bit and I'm using the new [5:43] word nf4 uh format for the Quant type of [5:47] the 4bit model and uh here I'm saying [5:50] that the compute type which are we're [5:52] going to be using for the computational [5:54] part of this model is going to be a [5:56] binary for 16 uh other than that this is [5:59] pretty much a very standard [6:02] configuration for wading the model into [6:04] 4bit format uh next I'm going to show [6:07] you that uh we are actually downloading [6:10] the original tokenizer from The Meta [6:12] repository and I'm adding a p token [6:16] which is going to be this P token [6:18] constant right here and I'm setting the [6:20] Ping side to the right this is just in [6:22] case uh if this is not set already and [6:26] then I'm loading the model from the [6:28] quantization and then after I've [6:31] downloaded or loaded the model you'll [6:34] see that I'm actually expanding or [6:36] resizing the token in Bings for this [6:38] model based on the length of the [6:39] tokenizer since we've added a new token [6:42] right here now why I do that uh from [6:45] what I found if you're training with [6:48] more than one training example per batch [6:52] I've seen that usually the embeddings or [6:56] the tokenizer is getting scrumbled and [6:59] it appears that the at least the was and [7:02] the responses don't get very good and [7:05] what I found is that the models continue [7:07] to Jumble or try to speak a lot and [7:10] repeat some of the sentences if I set [7:13] this padding token it appears that uh [7:16] the model is actually stopping to [7:17] generate itself as it should and uh this [7:21] actually helped me to consider that also [7:24] I would like to know that I've tried to [7:26] actually fine tune the base model uh [7:29] that that is the model that didn't uh [7:31] include any instruct fine tuning and on [7:34] that model also without the P token uh [7:37] it appears that uh this model continues [7:39] to uh repeat the text uh forever and [7:42] ever so if you have another solution to [7:44] this problem please let me know down [7:46] into the to the comments of this video [7:49] uh and you'll see that we're downloading [7:51] the model uh you can see that the model [7:53] was able to be wed successfully and this [7:56] is the config you can see that we are [7:59] actually only adding the quantization [8:01] config right [8:03] here uh other than that uh I'm showing [8:06] you the beginning of SE of sequence [8:09] token the end of sequence token and the [8:12] new P token that we've added those are [8:15] already into our tokenizer okay so I'm [8:19] going to continue with the original data [8:21] set and here I'm going to show you how [8:24] you can essentially create your own [8:26] custom data set so you don't have to [8:28] rely on on some pre-processed data set [8:31] and for example you can have a data [8:33] frame or Json and from that uh you can [8:36] actually create your own custom hugging [8:39] phase data set so I'm going to start by [8:42] downloading the original hugging face [8:45] data set and I'm going to convert it [8:46] into a data frame I'm going to see a [8:49] couple of examples right here these are [8:51] the columns that we have originally and [8:54] the first thing is as I've already told [8:56] you I'm going to convert this data set [8:58] into a data frame so this is something [9:00] that you might have in the real world uh [9:03] for example a data frame or a CSV file [9:05] or uh you can have some SQL or uh SQL [9:10] database that you can convert into a CSV [9:12] file or a data frame and from here we're [9:15] going to be building our custom data set [9:17] and this is how uh I'm going to do this [9:21] so first uh something that I really like [9:24] to do is to check whether or not this [9:27] data set contains any new values since [9:29] this will probably W up our gradients [9:33] during training and our was is not going [9:35] to be very happy with that so I see that [9:38] pretty much uh everything is here we [9:41] have 7,000 examples and then after this [9:44] is complete I'm going to be building [9:46] this function called format example in [9:49] which I'm going to be using the question [9:52] the answer and the context for a [9:54] specific question along with this very [9:57] simple system prompt on top of that I'm [10:00] going to be calling apply chat template [10:03] and I don't want this to get tokenized [10:05] so in order to get these messages and [10:09] run through this I'm going to show you [10:12] that this is going to be running through [10:15] every example and I'm going to be adding [10:18] a new com com text to our data [10:21] frame and then I'm going to continue [10:23] with counting the actual tokens that our [10:27] tokenizer is going to be doing in order [10:30] to have their count into our final data [10:33] frame and this is something that you [10:35] might get uh for example here is a data [10:39] frame or a sample of the first couple of [10:42] examples five to be exact and you can [10:44] see the question the context the answer [10:46] and now we have the text along with a [10:49] token count for each text I'm going to [10:52] show you why we're going to be using [10:54] this but let's see a simple example or [10:57] the first example that we get [10:59] from the text here you can see that the [11:02] tokenizer has added all of the specific [11:05] tokens that are actually included within [11:08] the template you can see the system [11:10] prompt then you can see that uh the [11:14] question is actually [11:17] here sorry this is uh the system prompt [11:20] then this is the question from our [11:22] specific case and then this is the [11:24] context provided here between these [11:27] triple digs uh this is ending right here [11:31] and then we have a answer from the [11:34] assistant so this is going to be the [11:37] answer from our data set and then we [11:39] have end of sequence ID token at the end [11:42] so this is pretty much the format that [11:44] the model is going to be receiving our [11:46] texting and then I'm showing you a [11:49] histogram or let's say a plot that tells [11:53] how often tokens be between for example [11:57] 100 uh Zer and 200 100 Etc tokens are [12:02] relevant here and you can see our data [12:04] set is heavily skewed towards uh 300 or [12:08] less tokens right here which is a good [12:11] thing since we want to reduce the number [12:14] of tokens that we're going to be using [12:16] in order to have a a faster training so [12:20] this is a good for us and uh I'm going [12:23] to be actually reducing the number of [12:25] tokens under 512 and in our case we [12:30] seeing that only three of the examples [12:32] right here have more than 5 12 tokens so [12:35] what I'm going to do is to actually [12:37] remove those [12:39] examples uh and then I'm going to sample [12:42] uh 6,000 examples and based on that I'm [12:46] going to be splitting those into a train [12:48] validation and test sets so to continue [12:51] with that I'm going to be using the [12:53] train test split from the sk1 library [12:56] I'm going to be first creating a train [12:58] set and then the rest of the data set [13:01] I'm going to be splitting that into a [13:02] validation and test sets so these are [13:05] the results that I have and from that [13:07] I'm going to be saving roughly 4,000 [13:10] examples for training 500 for validation [13:14] and4 testing and this essentially is [13:18] going to be our data set that we're [13:21] going to be building and I'm going to be [13:23] using two Json on the data frame that we [13:27] have uh I'm going to orient towards the [13:29] records and I want this to be stored as [13:32] Json wines or Json l so essentially what [13:36] I'm going to do next is to get or W our [13:40] custom data set that we've just created [13:42] and this is essentially how you are [13:45] going to be wading a Json file and this [13:47] is the mapping between the Json files so [13:51] what we have here is our own custom data [13:53] set that we pre-processed enabled and [13:55] created finally based on the Json and [13:58] then uh at the was step we're actually [14:00] loading our own custom data set so this [14:03] is essentially the process that you need [14:06] to follow in order to build a data set [14:09] for fine-tuning your [14:12] L next I'm going to show you that uh [14:15] actually our data set is correctly split [14:17] you can see the number of rows right [14:19] here and I'm going to just be looking at [14:22] another example of the text which is [14:24] again a text with all of the tokens that [14:27] are needed to be applied based on the [14:29] chat template okay so next we're going [14:33] to continue with testing the original [14:36] model this is be before fine-tuning the [14:40] base model that is I'm going to be [14:42] creating this pipeline I'm going to be [14:44] pipelining the model in the tokenizer [14:47] this is for the text generation task and [14:49] I want this to produce as much as uh [14:52] 128 tokens at [14:55] most so I'm going to be creating this [14:58] helper function [14:59] which essentially goes through the [15:02] example right here and does the exact [15:05] same thing that we've did before but it [15:07] is actually removing the original or the [15:10] uh final answer or the correct answer [15:12] from The Prompt and this is actually the [15:15] test prom that we're going to be [15:16] building here is an example of that uh [15:19] one important thing here to note is that [15:21] I'm adding add generation prompt equal [15:23] to true so this will actually add this [15:28] part to the prompt [15:30] uh which you don't have to do on your [15:32] own and again the model is going to be [15:35] promptly uh formatted [15:38] promptly all right so this is the [15:40] example right now and if I run the [15:44] prompt through the pipeline you'll see [15:47] that this is the original answer and [15:50] this is the prediction for our model you [15:53] can also see that this took us roughly [15:55] 10 seconds uh in order to produce the [16:00] uh prediction which is quite slow at [16:02] least on this GPU but yeah the GPU is [16:05] quite slow as well [16:08] so oh this is the first example let's [16:10] see another [16:12] one uh how did the company Net earnings [16:16] amount to in fisal 2022 net earnings [16:19] were 17.1 billion in fisal 2022 so [16:23] relatively straightforward question in a [16:26] context let's see uh you you can see [16:29] that the answer was pretty simple uh but [16:32] H 3 was quite verbos at least with the [16:36] prompt of course uh if you play around [16:38] with the prompt you might get better [16:40] results uh but yeah probably uh with [16:44] some fine tuning you get still better [16:46] results another example let's see at the [16:50] answer and very very both answer right [16:54] here compared to the original very [16:56] simple answer so uh I'm going to [17:00] essentially get the 100 example in the [17:03] test date sets and I'm going to be [17:04] running the predictions throughout the [17:08] uh pipeline that we have so we can [17:10] compare the results at the end to the [17:12] train model and of course this model is [17:15] quite verbos I'm not sure if it is [17:18] correct uh at all of the prompts but at [17:21] least in my experience I'm not very [17:24] happy with that and probably I would go [17:27] with further tuning the model changing [17:29] it all together uh tuning the prompts or [17:32] completely fine-tuning it based on the [17:34] performance that you [17:36] require another thing that I'm going to [17:39] show you is uh I've seen a lot of [17:41] examples of fine-tuning those watch [17:44] language models but most of the times [17:47] the wor function was calculated on the [17:50] complete generation of the text which is [17:53] something that we don't really want [17:55] since we want to only judge how well the [18:00] performance of the generation is doing [18:03] but not the performance of the already [18:06] inputed text so what I'm going to do is [18:09] to get the final token of the head and [18:12] header ID let me show you this so this [18:16] is this token right [18:18] here and after that I'm going to be only [18:22] uh looking at the was after this token [18:26] so you can see that this data cator for [18:29] completion only uh language modeling [18:32] task is going to be essentially masking [18:35] the tokens with minus 100 so this will [18:39] not be calculated during the was so this [18:41] will also speed up the calculation or [18:44] the training process that you have and [18:47] all of the rest tokens are going to be [18:49] used for calculating the loss [18:51] essentially so pretty neat trick uh if [18:54] you want to essentially speed up or get [18:57] even better results with this type of [19:00] collator which is available from the [19:02] Transformers library of course okay so [19:05] we have the collator we have the DAT set [19:07] let's see what we have for the model so [19:11] what I do in order to choose which [19:13] layers to Target with the War uh fine [19:17] tuning is uh pretty much I'm going to be [19:19] choosing each linear layer right here [19:22] and I would say that the wama [19:24] architecture is pretty straightforward [19:26] with the wama decoder layer so I'm going [19:29] to be using the query key value and then [19:33] pretty much every linear layer that we [19:36] have right here and for the MLP part [19:39] this was the attention part of the [19:41] architecture if you will and for the uh [19:44] multilayer perceptron layer whatever uh [19:48] I'm going to be essentially targeting [19:50] again all of the layers that are of [19:54] course linear as well so this is [19:56] something that is coming from the origin [19:59] War paper I believe and if I recall [20:01] correctly they were specifying that you [20:03] need to Target all the linear layers [20:05] this is how they get the best results [20:08] possible and in our case I'm going to [20:12] specify this linear layers right here [20:14] within the target modules and I'm going [20:17] to be specifying the coal language [20:19] modeling task along with a rank of the [20:22] war config of 32 and War Alpha of 16 and [20:27] if you're not familiar with the War [20:29] fine-tuning uh there is a video on my [20:31] channel that uh pretty much describes in [20:33] a bit more detail how war is performing [20:37] but essentially this is uh you can think [20:39] of of creating a smaller model on top of [20:42] the original model and this smaller [20:44] model you're going to be essentially [20:46] fine-tuning only the weights of this [20:47] small model while freezing the lch model [20:51] on the bottom of it and when a [20:53] prediction comes uh the prediction is [20:56] going to go through the original model [20:58] and then it is going to go through your [21:00] own fine tuned adapter on top of that so [21:03] this is the way that I pretty much think [21:06] of when thinking of War models and then [21:09] I'm going to be preparing this model for [21:12] kbit training since we are using [21:13] quantization right here and then I'm [21:16] going to be applying the war config on [21:19] top of the model that we have which is [21:22] again the original W 3 Model so how many [21:25] parameters we actually going to train [21:26] with uh you can see here that of course [21:29] the model offers roughly uh all the [21:33] parameters uh are roughly 8 billion [21:35] parameters while we're going to be [21:37] training only about [21:42] 1.34% or roughly 84 million parameters [21:47] on top of that and this is uh actually a [21:51] very good Ru of temp if the model is [21:54] watch enough think of like five six or [21:57] more billion parameter models then [21:59] probably 1% or even half% of the [22:02] parameters uh depending on some [22:05] experiments that you might do are going [22:07] to be enough in order to train the model [22:09] on your specific tasks of course this [22:11] will depend on the DAT set and the [22:13] complexity of the task that you're going [22:15] to be doing but roughly 1 half% 1 and a [22:20] half% is a good R of temp for larger [22:24] LS and next I'm going to be wading the [22:27] tensor board with this model I'm going [22:29] to go through the training itself in a [22:31] bit so I want to give a big shout out to [22:35] Philip Schmidt and I'm going to link [22:36] down his blog into the description of [22:38] this video but more importantly he [22:41] specified this part right here uh which [22:43] is very important we don't want the [22:46] tokenizer to add any special tokens and [22:49] we don't want any additional separator [22:51] tokens this is provided via the DAT set [22:53] keyword arguments of the sft trainer uh [22:57] and again this book post is very nice [22:59] how to findun L in 2024 with hugging [23:02] face so go and have a read on top of [23:05] that so back to our config as you can [23:08] see we have a lot of configuration here [23:11] uh I'm specifying the maximum number of [23:13] tokens uh [23:15] 512 uh this is based of course on the uh [23:20] experience that we got with the token [23:22] counts the text field that we're going [23:24] to be using is just going to be the text [23:27] uh we're going to be training for a [23:28] single Epoch probably it would be great [23:31] to train for more uh and probably you'll [23:34] get even better results for example two [23:36] eox might be great so uh let me know if [23:39] you train the model for two eox and let [23:42] me know of the results so I'm going to [23:44] be training on the T4 so this pretty [23:47] much allows me to have uh two examples [23:50] per batch uh I'm going to do the same [23:53] thing for the evaluation and I'm [23:55] accumulating for four this is actually [23:58] for 4 * 2 so the gradient accumulation [24:00] is going to be doing eight samples for [24:03] the gradient update which is uh quite [24:06] good at least on a single GPU uh I'm [24:09] going to be using the special item with [24:11] wayk fix page Optimizer that is uh I [24:15] believe coming from the bits and B [24:17] Library as well and this is for the 8bit [24:20] optimization so this Optimizer is quite [24:22] good it appears to be working quite well [24:24] and quite fast on top of that uh next [24:27] I'm going to be ass [24:29] evaluating every uh 20% of the training [24:32] process and uh running through the Valu [24:35] U sorry the validation set I have a very [24:38] small warning rate which appears to be [24:40] working quite all right uh also I have a [24:44] very small warm up ratio about 10% so [24:46] during this time uh yeah actually this [24:49] is quite redundant since I'm using a [24:52] constant uh warning rate schedule but [24:55] I've tried with linear it appears to be [24:58] doing something but not that impressed [25:01] with it and I want the responses or the [25:03] results to be in a safe tensor format [25:06] and these are the arguments that I'm [25:08] going to be essentially getting from the [25:09] Philip Schmid blog post that I've shown [25:11] you and I'm seeding the training process [25:15] itself not really sure if this is going [25:17] to be completely reproducible for you [25:20] but it appears to be doing something for [25:22] the seating of the values at least uh [25:24] when you have the correctly seated data [25:27] set and then the training itself is [25:30] quite straightforward I'm going to be [25:31] passing the configuration the model the [25:33] DAT set for training for the validation [25:35] the tokenizer and the cleor uh which is [25:38] again going to be calculating the was [25:41] only on the parts that are going to get [25:43] completed by the model and then uh you [25:47] can see that I'm essentially calling the [25:49] dot train method and this is the result [25:53] from this you can see that the training [25:56] is uh some somewhat junky if you will uh [26:00] but it goes quite well the validation [26:04] was on the other hand is also uh [26:07] decreasing somewhat but it is quite [26:11] slower in the decrease rate uh I recall [26:14] that we have only 500 examples for the [26:16] validation probably if you increase that [26:18] to let's say 1,000 or 2,000 you will [26:21] probably get a much smoother validation [26:23] most and again if you train the model [26:26] for a bit longer you probably get some [26:29] more of better results as well okay so [26:32] after this is complete I'm going to be [26:34] saving the model into our uh loal [26:39] repository or file system and after that [26:43] I'm going to be essentially Waring the [26:46] model uh again this is done on the [26:50] another actually I did this on a p100 [26:53] since the GPU memory for the T4 wasn't [26:56] enough to Lo the model without the [26:59] tokenization or the quantization sorry [27:02] and I essentially wed the model with the [27:04] p 100 uh GPU applied the P adapter on [27:09] top of that and then merged the model [27:11] into a single model and what I did after [27:14] that is to essentially upload the model [27:17] and the tokenizer to the hugging face [27:20] Hub and I wanted this to be split into [27:24] maximum shite of 5 GB so this is the [27:28] public model that is available on the H [27:30] face models it is called W 38b instruct [27:33] Finance R and here you can find the [27:35] complete text tutorial or sample [27:37] examples along with some of the [27:39] predictions that I got from this model [27:41] uh more importantly you can find the [27:43] files you can see these are essentially [27:46] the tensors with the sharts of 5 GB at [27:49] most which is quite good along with the [27:53] generation config and the tokenizer [27:55] itself along with a sample of the [27:57] predictions [27:58] and then we also have the training [28:00] metrics that are available for the [28:03] tensor board and uh let's see what we [28:06] got here I'm going to show you [28:11] something so here you can look through [28:14] the [28:15] complete training process you can see [28:18] that it took at least for the validation [28:20] was uh hour and a half and it appears to [28:24] be performing quite well again probably [28:27] you're going to be uh quite happy with [28:29] deploying [28:30] this or earning this for a bit longer [28:34] and this is uh the training course uh [28:36] roughly hour and 40 minutes but again [28:39] the complete training walk is available [28:42] within the H face repository so we have [28:45] the trend model and now I'm going to [28:48] essentially W our data set once more and [28:52] just for producibility of course and I'm [28:55] going to be downloading the model from [28:57] the huging face up I'm going to be [28:59] applying the quantization that I did [29:01] with the original model so we are going [29:03] to be doing a completely Fair uh [29:07] comparison between the base model and [29:08] the finetune version of the model also [29:11] I'm going to be uh getting the tokenizer [29:14] from our own repository since it [29:16] contains all the padding config Etc and [29:20] this is going to be go aheading and [29:22] getting all of the data for our model [29:25] again I'm going to be creating a [29:27] pipeline and in this pipeline I'm going [29:29] to be seing or expecting at most 128 [29:33] tokens so this is again the first uh the [29:37] first response that I got and this is [29:40] now the prediction of the model uh I'm [29:43] going to show you a couple of [29:44] comparisons in a bit but this is now [29:47] much more aligned with what we have in [29:49] the original data set not these bullet [29:52] list points that we got in the original [29:55] uh next the answer from the prediction [29:59] here again quite uh Compact and very [30:02] like what we get in the data set here [30:06] next I'm going to show you another [30:08] example uh here you can see that our [30:10] even our fun model is quite [30:14] verbos yeah it did uh provide a lot of [30:17] text but again uh the response is [30:20] correct let's see how many examples we [30:23] are going to be getting here and how [30:26] we're going to compare those to the [30:28] train prediction so this is the [30:30] predictions data frame and I'm going to [30:32] be essentially creating or adding those [30:35] predictions of with the train [30:38] model uh so I'm going to be taking a [30:41] sample of 20 examples and we're going to [30:44] go through some examples together uh the [30:47] first example this is the train model [30:49] and this is the untrained one uh you can [30:51] see that we got a much better response [30:53] from the train model at least based on [30:56] our qualitive uh analysis again here the [31:00] formatting and the words appear to be [31:03] quite [31:04] well matched to the ones that we have [31:08] from the uh train model compared to the [31:11] untrained [31:13] model okay [31:16] next uh you can see that the train model [31:19] is actually providing a very short [31:23] response compared to the answer in the [31:25] data set uh on that case I'm not really [31:29] sure if this is completely answering the [31:31] question but at least it appears to be [31:34] that our model is uh very biased towards [31:37] shorter answers on some occasions of [31:39] course uh okay so uh here another [31:45] example mechanical engineering from [31:47] University of California and from [31:50] Stanford School Etc again this appears [31:53] to be quite well [31:55] written and this is uh let's say an [31:59] additional word that I would not like to [32:02] see into my rock system uh and this is [32:06] the case when you don't fine tune at [32:08] least you're prompt enough with those [32:11] types of models something that we are [32:12] not seeing into the fine tune model [32:15] again a very good example based on our [32:17] fine [32:18] tuning uh another [32:21] example where the unra model is adding a [32:24] bit more verbosity and uh some [32:26] formatting that is actually not [32:30] needed concrete number here well the [32:34] untrained one has a lot of verbosity [32:37] yeah you can you can go through those [32:38] examples and you'll probably be quite [32:42] happy with the results that you get from [32:44] the fine tuning and probably if you do [32:46] some more fine tuning you'll be even [32:48] happier with the results so this is it [32:50] for this video we've seen how you can f [32:52] tune a w 38b instruct model on a custom [32:56] data set and we've seen how much better [32:59] this model is performing based on our [33:01] fine tuning compared to the base model [33:04] so what do you think is this model [33:06] performing much better or is it [33:09] exceeding your expectations let me know [33:11] down into the comments below thanks for [33:13] watching guys please like share and [33:15] subscribe also join the Discord channel [33:18] that I'm going to link down into the [33:19] description and I'm going to see you in [33:21] the next one bye