[0:00] everybody training new large language [0:02] models are training them out the box for [0:04] chat a chat trained llm is like an [0:07] intelligent student who finished general [0:09] education in 50 languages now think of [0:12] fine-tuning these llms like kicking the [0:14] broke General ed student out of the [0:16] house and choosing exactly what [0:18] specialized degree they will get in this [0:20] video I'm going to demystify the concept [0:22] of fine-tuning a language model no [0:25] programming experience will be needed to [0:27] follow along this tutorial I'll be [0:29] showing how to fine tune meta's latest [0:31] llama 38 billion parameter model on free [0:35] gpus in Google collab let's discuss a [0:38] huge problem with implementation of [0:40] every team of AI agent project we have [0:43] seen to date it started with auto GPT [0:46] and now the current leader in hype crew [0:49] AI the reason most teams of AI agents [0:52] that people are creating to attempt to [0:54] accomplish complex tasks sucks is [0:57] because they are using raw large [0:59] language models trained to respond as [1:02] intelligent chat Bots and not as action [1:04] models out of the box these models are [1:07] trained to have decent general [1:09] intelligence and serve a general [1:11] audience like chat gbt as an AI software [1:14] engineer the first step I would take to [1:16] develop a system to outperform any of [1:19] these current existing AI agent swarms [1:22] is to train a decision-making model [1:24] based on first principal reasoning I [1:26] think it's also important to reason from [1:28] first principles rather than by analogy [1:31] the normal way that we conduct Our Lives [1:32] is we Reason by analogy we're doing this [1:34] because it's like something else that [1:36] was done hold up wait a minute and what [1:39] that really means is you kind of boil [1:41] things down to the most fundamental [1:43] truths say okay what are we sure as [1:44] possible is true and then reason up from [1:47] there that takes a lot more mental [1:48] energy think of each response from an [1:50] llm as a thought we want this [1:53] intelligent AI to generate in 10 seconds [1:56] or less instead of expecting our AI to [1:58] use first principle reasoning break the [2:01] prompt into multiple needles in a [2:03] haystack of facts the model needs to [2:05] understand or tasks the AI needs to [2:08] complete all in a single response in [2:10] this video I will fine-tune a llama [2:12] model to power the first AI agent of [2:14] contact in a team of AI agents I want [2:17] the AI agent to be able to use first [2:19] principal reasoning to Output highly [2:22] logical order of steps that need to be [2:24] completed to actually provide a factual [2:27] response and better yet automate a [2:29] complex task before we get into [2:31] fine-tuning a model let's first demo raw [2:34] llama 3 8bs ability to accomplish just [2:37] the task of generating first principal [2:39] reasoning outputs if I go try to get [2:42] llama 38b to act as a decision model the [2:45] reasoning is as though it came from a [2:47] mind who wasn't trained to make actions [2:49] in the real world tell llama to respond [2:52] with just a python list and it [2:54] constantly adds notes before or after [2:56] the list making the responses unreliable [2:59] as commands for a Python program or even [3:01] other AI agents to process we can even [3:04] try these same tasks on llama 370b the [3:07] chat tune model still cannot manage to [3:10] Output the correct response format [3:12] reliably now let's look at the responses [3:14] from llama 3 8B fine-tuned on a tiny but [3:18] highquality data set that I created to [3:20] show the model exactly how I want it to [3:23] respond to agent prompts fine tuning on [3:26] just 40 parameters in this case allowed [3:28] the model to break break out of thinking [3:30] it is just a chatbot limited to [3:33] generating text now llama 3 thinks [3:35] freely about what tasks would need to be [3:38] accomplished by an AI to actually [3:41] accomplish my instructions despite half [3:43] of them claiming to in the title all of [3:46] the video fine-tuning tutorials I have [3:48] found do not show how to fine-tune on [3:50] your own data set in this video I will [3:53] show you how to create your own data set [3:55] to fine tune on instead of using some [3:58] pre-existing data set the data set you [4:00] use for fine-tuning is about quality and [4:03] not quantity since we are training a [4:06] specialized model we want to take full [4:08] control of maximizing our data set's [4:11] response examples quality on exactly the [4:14] task we need it to work at as my data [4:16] set I have a Json file called Data set. [4:19] Json inside this file I have one long [4:22] list of dictionaries each dictionary [4:25] consist of the same system prompt I'm [4:27] trying to get the model to properly [4:29] respond to as well as the prompt for the [4:32] input value and the response as the [4:34] output value to create my data set for [4:37] each response I used mixl 8X 22b to [4:40] generate rough draft responses before [4:43] adding any of these responses to my data [4:46] set I'll go through and manually edit [4:48] each to improve upon the quality and [4:50] ensure perfect formatting as a python [4:52] list your data set for fine tuning could [4:55] be 20 examples or thousands of examples [4:58] while larger fine tuning data sets can [5:01] improve upon your model's performance I [5:03] can't stress it enough the importance of [5:05] adding only highquality examples to your [5:08] data set each example in your data set [5:10] is an example showing llama 3 what you [5:13] expect a perfect response from that [5:16] input should be so if you are [5:17] fine-tuning on Mid data expect the [5:20] quality from your fine-tune model to be [5:23] mid if you want a copy of my data set to [5:25] skip making your own for this tutorial [5:27] or just have a copy of my data to ask on [5:29] to for your own fine tuning it's [5:31] available in the Pro learning docs [5:33] channel of my Discord for anyone with an [5:36] AI Austin Pro membership with my data [5:38] set of 40 examples complete I now am [5:41] ready to start loading them into my [5:42] collab notebook and start fine-tuning [5:45] llama 3 check the comment section for my [5:48] pinned comment with the link to the [5:49] Google collab notebook that I will be [5:51] going through in this video Once the [5:53] notebook loads I can drag my data set. [5:56] Json file into the main content folder [5:59] then I will select my runtime type to [6:02] use a free T4 GPU and save it to start [6:05] the runtime once it is up and running [6:07] click the play button on the first code [6:09] Block in step one to install the needed [6:12] python libraries for a T4 GPU I'll run [6:15] step two to import the libraries into my [6:18] runtime once the installations and [6:20] imports complete we'll run this next [6:22] oneline block to log into our hugging [6:25] face account with a right access token [6:27] if you don't have a hugging face access [6:30] token yet you can get one for free by [6:32] logging into your account going to [6:34] settings clicking access tokens and [6:36] create a token with right access granted [6:39] copy that and paste that into the field [6:41] to log into your hugging face in the [6:43] next block we have some python code that [6:45] loads our data set. Json file and [6:48] converts our examples into llama 3's [6:50] correct template format you'll see the [6:52] hugging phore userv value is set as my [6:56] username make sure you change this to [6:58] your actual hugging face username our [7:01] next code block will set up our [7:03] configuration settings for the [7:04] fine-tuning the fine-tuned model [7:07] variable sets the name you want to save [7:09] the model as in your hugging face [7:11] repository so feel free to change this [7:13] too we can run the configuration [7:15] settings block now and the next block to [7:18] load the Llama 38b Cur and trainer model [7:22] now we'll run the trainer to start the [7:24] fine-tuning process on our data set [7:27] you'll see the trainer going through [7:28] multiple training steps before [7:30] completing each training step is a batch [7:33] of our training data being ran each [7:35] batch in the training steps you will see [7:37] this training loss number start to drop [7:40] when our model is training it is going [7:42] through each of our prompts from the [7:44] data set and blind generating what it [7:46] expects our example response in the data [7:48] set is training loss is a value to [7:51] represent the difference between the [7:53] model in training's predicted response [7:55] to the example response in our data set [7:58] a lower training loss value means that [8:00] the predicted outputs during fine-tuning [8:03] are getting closer to the responses in [8:05] our data set this code with its current [8:07] configuration settings runs one Epoch [8:10] one Epoch equals one pass through of our [8:13] entire training data set running more [8:15] Epoch up to a certain threshold will [8:18] absolutely allow your model to achieve a [8:21] lower training loss during fine-tuning [8:23] going back up to your Laura [8:25] configurations you can change the num [8:27] train epox variable to the number of [8:30] passes through your data you want it to [8:31] run now there is a few things to note [8:34] before changing this increasing this [8:36] number will increase memory usage [8:38] meaning you can only raise it so much on [8:40] the free collab gpus before the runtime [8:43] will fail from exceeding memory another [8:46] consideration is that the benefits of [8:47] raising the epoch is diminishing meaning [8:50] at some point running more Epoch will [8:52] not decrease the training loss value the [8:55] ideal number for my training data set [8:57] was about 15 to 20 EP talks before the [9:00] training loss was practically staying [9:02] the same step eight will save the [9:04] trainer stats. Json file to your collabs [9:07] content folder step nine will quantize [9:10] your fine-tune model and save it to your [9:12] hugging face quantizing the model will [9:14] allow it to perform much faster on your [9:17] local machine this code block will take [9:19] about 20 minutes to complete in the last [9:21] step of the notebook you can test some [9:23] of your prompts to your custom model a [9:26] better option I can recommend for anyone [9:28] with a computer with at least 8 GB of [9:31] RAM and ideally 16 or more you can test [9:34] the model locally with LM Studio LM [9:37] studio is completely free to use and [9:39] easy to install inside LM Studio I can [9:42] go to the search Tab and type my hugging [9:45] face username inside there I can click [9:47] my Project's repository locate the file [9:50] with Q4 korm at the end of the file and [9:55] download that model file once downloaded [9:57] I can go to the chat tab click new chat [10:00] and load my custom fine-tuned model in [10:02] the system prompt tab I will paste in [10:05] the same exact system prompt that I used [10:07] to fine-tune my model on while this is [10:09] not going to be a tutorial on how to use [10:12] LM studio just note that there's also a [10:14] lot of settings for optimizing the speed [10:16] of your model on your machine don't [10:18] forget to hit the like button on this [10:20] video If you learned anything new about [10:22] fine-tuning this has been AI Austin I [10:25] will see you in the next one