[0:00] everybody training new large language
[0:02] models are training them out the box for
[0:04] chat a chat trained llm is like an
[0:07] intelligent student who finished general
[0:09] education in 50 languages now think of
[0:12] fine-tuning these llms like kicking the
[0:14] broke General ed student out of the
[0:16] house and choosing exactly what
[0:18] specialized degree they will get in this
[0:20] video I'm going to demystify the concept
[0:22] of fine-tuning a language model no
[0:25] programming experience will be needed to
[0:27] follow along this tutorial I'll be
[0:29] showing how to fine tune meta's latest
[0:31] llama 38 billion parameter model on free
[0:35] gpus in Google collab let's discuss a
[0:38] huge problem with implementation of
[0:40] every team of AI agent project we have
[0:43] seen to date it started with auto GPT
[0:46] and now the current leader in hype crew
[0:49] AI the reason most teams of AI agents
[0:52] that people are creating to attempt to
[0:54] accomplish complex tasks sucks is
[0:57] because they are using raw large
[0:59] language models trained to respond as
[1:02] intelligent chat Bots and not as action
[1:04] models out of the box these models are
[1:07] trained to have decent general
[1:09] intelligence and serve a general
[1:11] audience like chat gbt as an AI software
[1:14] engineer the first step I would take to
[1:16] develop a system to outperform any of
[1:19] these current existing AI agent swarms
[1:22] is to train a decision-making model
[1:24] based on first principal reasoning I
[1:26] think it's also important to reason from
[1:28] first principles rather than by analogy
[1:31] the normal way that we conduct Our Lives
[1:32] is we Reason by analogy we're doing this
[1:34] because it's like something else that
[1:36] was done hold up wait a minute and what
[1:39] that really means is you kind of boil
[1:41] things down to the most fundamental
[1:43] truths say okay what are we sure as
[1:44] possible is true and then reason up from
[1:47] there that takes a lot more mental
[1:48] energy think of each response from an
[1:50] llm as a thought we want this
[1:53] intelligent AI to generate in 10 seconds
[1:56] or less instead of expecting our AI to
[1:58] use first principle reasoning break the
[2:01] prompt into multiple needles in a
[2:03] haystack of facts the model needs to
[2:05] understand or tasks the AI needs to
[2:08] complete all in a single response in
[2:10] this video I will fine-tune a llama
[2:12] model to power the first AI agent of
[2:14] contact in a team of AI agents I want
[2:17] the AI agent to be able to use first
[2:19] principal reasoning to Output highly
[2:22] logical order of steps that need to be
[2:24] completed to actually provide a factual
[2:27] response and better yet automate a
[2:29] complex task before we get into
[2:31] fine-tuning a model let's first demo raw
[2:34] llama 3 8bs ability to accomplish just
[2:37] the task of generating first principal
[2:39] reasoning outputs if I go try to get
[2:42] llama 38b to act as a decision model the
[2:45] reasoning is as though it came from a
[2:47] mind who wasn't trained to make actions
[2:49] in the real world tell llama to respond
[2:52] with just a python list and it
[2:54] constantly adds notes before or after
[2:56] the list making the responses unreliable
[2:59] as commands for a Python program or even
[3:01] other AI agents to process we can even
[3:04] try these same tasks on llama 370b the
[3:07] chat tune model still cannot manage to
[3:10] Output the correct response format
[3:12] reliably now let's look at the responses
[3:14] from llama 3 8B fine-tuned on a tiny but
[3:18] highquality data set that I created to
[3:20] show the model exactly how I want it to
[3:23] respond to agent prompts fine tuning on
[3:26] just 40 parameters in this case allowed
[3:28] the model to break break out of thinking
[3:30] it is just a chatbot limited to
[3:33] generating text now llama 3 thinks
[3:35] freely about what tasks would need to be
[3:38] accomplished by an AI to actually
[3:41] accomplish my instructions despite half
[3:43] of them claiming to in the title all of
[3:46] the video fine-tuning tutorials I have
[3:48] found do not show how to fine-tune on
[3:50] your own data set in this video I will
[3:53] show you how to create your own data set
[3:55] to fine tune on instead of using some
[3:58] pre-existing data set the data set you
[4:00] use for fine-tuning is about quality and
[4:03] not quantity since we are training a
[4:06] specialized model we want to take full
[4:08] control of maximizing our data set's
[4:11] response examples quality on exactly the
[4:14] task we need it to work at as my data
[4:16] set I have a Json file called Data set.
[4:19] Json inside this file I have one long
[4:22] list of dictionaries each dictionary
[4:25] consist of the same system prompt I'm
[4:27] trying to get the model to properly
[4:29] respond to as well as the prompt for the
[4:32] input value and the response as the
[4:34] output value to create my data set for
[4:37] each response I used mixl 8X 22b to
[4:40] generate rough draft responses before
[4:43] adding any of these responses to my data
[4:46] set I'll go through and manually edit
[4:48] each to improve upon the quality and
[4:50] ensure perfect formatting as a python
[4:52] list your data set for fine tuning could
[4:55] be 20 examples or thousands of examples
[4:58] while larger fine tuning data sets can
[5:01] improve upon your model's performance I
[5:03] can't stress it enough the importance of
[5:05] adding only highquality examples to your
[5:08] data set each example in your data set
[5:10] is an example showing llama 3 what you
[5:13] expect a perfect response from that
[5:16] input should be so if you are
[5:17] fine-tuning on Mid data expect the
[5:20] quality from your fine-tune model to be
[5:23] mid if you want a copy of my data set to
[5:25] skip making your own for this tutorial
[5:27] or just have a copy of my data to ask on
[5:29] to for your own fine tuning it's
[5:31] available in the Pro learning docs
[5:33] channel of my Discord for anyone with an
[5:36] AI Austin Pro membership with my data
[5:38] set of 40 examples complete I now am
[5:41] ready to start loading them into my
[5:42] collab notebook and start fine-tuning
[5:45] llama 3 check the comment section for my
[5:48] pinned comment with the link to the
[5:49] Google collab notebook that I will be
[5:51] going through in this video Once the
[5:53] notebook loads I can drag my data set.
[5:56] Json file into the main content folder
[5:59] then I will select my runtime type to
[6:02] use a free T4 GPU and save it to start
[6:05] the runtime once it is up and running
[6:07] click the play button on the first code
[6:09] Block in step one to install the needed
[6:12] python libraries for a T4 GPU I'll run
[6:15] step two to import the libraries into my
[6:18] runtime once the installations and
[6:20] imports complete we'll run this next
[6:22] oneline block to log into our hugging
[6:25] face account with a right access token
[6:27] if you don't have a hugging face access
[6:30] token yet you can get one for free by
[6:32] logging into your account going to
[6:34] settings clicking access tokens and
[6:36] create a token with right access granted
[6:39] copy that and paste that into the field
[6:41] to log into your hugging face in the
[6:43] next block we have some python code that
[6:45] loads our data set. Json file and
[6:48] converts our examples into llama 3's
[6:50] correct template format you'll see the
[6:52] hugging phore userv value is set as my
[6:56] username make sure you change this to
[6:58] your actual hugging face username our
[7:01] next code block will set up our
[7:03] configuration settings for the
[7:04] fine-tuning the fine-tuned model
[7:07] variable sets the name you want to save
[7:09] the model as in your hugging face
[7:11] repository so feel free to change this
[7:13] too we can run the configuration
[7:15] settings block now and the next block to
[7:18] load the Llama 38b Cur and trainer model
[7:22] now we'll run the trainer to start the
[7:24] fine-tuning process on our data set
[7:27] you'll see the trainer going through
[7:28] multiple training steps before
[7:30] completing each training step is a batch
[7:33] of our training data being ran each
[7:35] batch in the training steps you will see
[7:37] this training loss number start to drop
[7:40] when our model is training it is going
[7:42] through each of our prompts from the
[7:44] data set and blind generating what it
[7:46] expects our example response in the data
[7:48] set is training loss is a value to
[7:51] represent the difference between the
[7:53] model in training's predicted response
[7:55] to the example response in our data set
[7:58] a lower training loss value means that
[8:00] the predicted outputs during fine-tuning
[8:03] are getting closer to the responses in
[8:05] our data set this code with its current
[8:07] configuration settings runs one Epoch
[8:10] one Epoch equals one pass through of our
[8:13] entire training data set running more
[8:15] Epoch up to a certain threshold will
[8:18] absolutely allow your model to achieve a
[8:21] lower training loss during fine-tuning
[8:23] going back up to your Laura
[8:25] configurations you can change the num
[8:27] train epox variable to the number of
[8:30] passes through your data you want it to
[8:31] run now there is a few things to note
[8:34] before changing this increasing this
[8:36] number will increase memory usage
[8:38] meaning you can only raise it so much on
[8:40] the free collab gpus before the runtime
[8:43] will fail from exceeding memory another
[8:46] consideration is that the benefits of
[8:47] raising the epoch is diminishing meaning
[8:50] at some point running more Epoch will
[8:52] not decrease the training loss value the
[8:55] ideal number for my training data set
[8:57] was about 15 to 20 EP talks before the
[9:00] training loss was practically staying
[9:02] the same step eight will save the
[9:04] trainer stats. Json file to your collabs
[9:07] content folder step nine will quantize
[9:10] your fine-tune model and save it to your
[9:12] hugging face quantizing the model will
[9:14] allow it to perform much faster on your
[9:17] local machine this code block will take
[9:19] about 20 minutes to complete in the last
[9:21] step of the notebook you can test some
[9:23] of your prompts to your custom model a
[9:26] better option I can recommend for anyone
[9:28] with a computer with at least 8 GB of
[9:31] RAM and ideally 16 or more you can test
[9:34] the model locally with LM Studio LM
[9:37] studio is completely free to use and
[9:39] easy to install inside LM Studio I can
[9:42] go to the search Tab and type my hugging
[9:45] face username inside there I can click
[9:47] my Project's repository locate the file
[9:50] with Q4 korm at the end of the file and
[9:55] download that model file once downloaded
[9:57] I can go to the chat tab click new chat
[10:00] and load my custom fine-tuned model in
[10:02] the system prompt tab I will paste in
[10:05] the same exact system prompt that I used
[10:07] to fine-tune my model on while this is
[10:09] not going to be a tutorial on how to use
[10:12] LM studio just note that there's also a
[10:14] lot of settings for optimizing the speed
[10:16] of your model on your machine don't
[10:18] forget to hit the like button on this
[10:20] video If you learned anything new about
[10:22] fine-tuning this has been AI Austin I
[10:25] will see you in the next one