[0:00] Remember how back in the day people would [0:03] Google themselves, you type your name into a search engine and you see what it knows about you? [0:08] Well, the modern equivalent of that is to do the same thing with a chatbot. [0:13] So when I ask a large language model, who is Martin Keen? [0:18] Well, the response varies greatly depending upon which model I'm asking, [0:22] because different models, they have different training data sets, they have a different knowledge cutoff dates. [0:28] So what a given model knows about me, well, it differs greatly. [0:32] But how could we improve the model's answer? [0:36] Well, there's three ways. [0:38] So let's start with a model here, and we're gonna see how we can improve its responses. [0:44] Well, the first thing it could do is it could go out and it could perform a search, [0:51] a search for new data that either wasn't in its training data set, [0:54] or it was just data that became available after the model finished training, [0:58] and then it could incorporate those results from the search back into its answer. [1:03] That is called RAG or Retrieval Augmented Generation. [1:11] That's one method. [1:12] Or we could pick a specialized model, a model that's been trained on, let's say, transcripts of these videos. [1:21] That would be an example of something called fine tuning, [1:29] or we could ask the model a query that better specifies what we're looking for. [1:36] So maybe the LLM already knows plenty about the Martin Keens of the world, [1:41] but let's tell the model that we're referring to the Martin keen who works at IBM, [1:45] rather than the Martin Keen that founded Keen Shoes. [1:50] That is an example of prompt engineering. [1:55] Three ways to get better outputs out of large language models, each with their pluses and minuses. [2:03] Let's start with RAG. [2:05] So let's break it down. [2:06] First there's retrieval. [2:08] So retrieval of external up-to-date information. [2:12] Then there's augmentation. [2:14] That's augmentation of the original prompt with the retrieved information added in. [2:19] And then finally there's generation. [2:22] That's generation of a response based on all of this enriched context. [2:27] So we can think of it like this. [2:30] So we start with a query and the query comes in to a large language model. [2:40] Now, what RAG is gonna do is it's first going to go searching through a corpus of information. [2:48] So we have this corpus here full of some sort of data. [2:53] Now, perhaps, that's your organization's documents. [2:56] So it might be spreadsheets, PDFs, internal wikis, you know, stuff like that, [3:01] But unlike a typical search engine that just matches keywords, [3:07] RAG converts both your question, the query, and all of the documents into something called vector embeddings. [3:18] So these are all converted into vectors. [3:20] essentially turning words and phrases into long lists of numbers that capture their meaning. [3:27] So when you ask a query like, what was our company's revenue growth last quarter? [3:34] Well, RAG will find documents that are mathematically similar in meaning to your question, [3:38] even if they don't use the exact same words. [3:41] So it might find documents mentioning fourth quarter performance or quarterly sales. [3:48] Those don't contain the keyword revenue growth, but they are semantically similar. [3:54] Now, once RAG finds the relevant information, it adds this information [3:59] back into your original query before passing it to the language model. [4:06] So instead of the model just kind of guessing based on its training data, [4:09] it can now generate a response that incorporates your actual facts and figures. [4:15] So this makes RAG particularly valuable when you are looking for information that is up to date, [4:24] and it's also very valuable when you need in to add in information that is domain specific as well, [4:34] but there are some costs to this. [4:38] Let's go with the red pen. [4:40] So one cost, that would be the cost of performance. [4:45] for performing all of this, because you have this retrieval step here, and that [4:50] adds latency to each query compared to a simple prompt to a model. [4:55] There are also costs related to just kind of the processing of this as well. [5:01] So if we think about what we're having to do here, we've got documents that need to be vector embeddings, [5:07] and we need to store these vector embedding in a database. [5:11] All of this adds to processing costs, it adds to infrastructure costs [5:15] to make this solution work. [5:17] All right, next up, fine tuning. [5:20] So remember how we discussed getting better answers about me by [5:24] training a model specifically on, let's say, my video transcripts. [5:26] Well, that is fine tuning in action. [5:30] So what we do with fine tuning is we take a model, but specifically an existing model. [5:40] and that existing model has broad knowledge. [5:44] And then we're gonna give it additional specialized training on a focused data set. [5:51] So this is now specialized to what we want to develop particular expertise on. [5:58] Now, during fine tuning, we're updating the model's internal parameters through additional training. [6:05] So the model starts out with some weights here. [6:10] like this, and those weights were optimized during its initial pre-training. [6:16] And as we fine tune, we're making small adjustments here to the model's weights using this specialized data set. [6:26] So this is being incorporated. [6:29] Now this process typically uses supervised learning where we provide input-output [6:34] pairs that demonstrate the kind of responses we want. [6:37] So for example, if we're fine-tuning for technical support, we might provide thousands of examples of customer queries, [6:46] and those would be paired with correct technical responses. [6:50] The model adjusts its weights through back propagation [6:53] to minimize the difference between its predicted outputs and the targeted responses. [6:58] So we're not just teaching the model new facts here, we're actually modifying how it processes information. [7:06] The model is learning to recognize domain-specific patterns. [7:11] So, fine-tuning shows its strength when you particularly need a model that has very deep domain expertise. [7:22] That's what we can really add in with fine tuning, [7:25] and also, it's much faster, specifically at inference time. [7:31] So when we are putting the queries in, it's faster than RAG because it doesn't need to search through external data, [7:38] and because the knowledge is kind of baked into the model's weights, you don't need to maintain a separate vector database, [7:43] but there's some downsides as well. [7:46] Well, there's certainly issues here with the training complexity of all of this. [7:54] You're going to need thousands of high quality training examples. [7:59] There are also issues with computational cost. [8:05] The computational cost for training this model can be substantial and is going to require a whole bunch of GPUs. [8:12] And there's also challenges related to maintenance as well [8:17] because unlike RAG where you can easily add new documents to your knowledge base at any point. [8:22] Updating a fine-tune model requires another round of training [8:27] and then perhaps most importantly of all there is a risk of something called catastrophic forgetting. [8:37] Now that's when the model loses some of its general capabilities while it's busy learning these specialized ones. [8:44] So finally let's explore prompt engineering. [8:48] Now specifying Martin Keen who works at IBM versus [8:52] Martin Keene who founded Keene Shoes, that's prompt engineering, but at its most basic. [8:57] Prompt engineering goes far beyond simple clarification. [9:01] So let's think about when we input a prompt, the model receives this prompt and it processes it through a series of layers, [9:16] and these layers are essentially tension mechanisms and each one [9:21] focuses on different aspects of your prompt text that came in. [9:25] And by including specific elements in your prompt, so examples or context or how you want the format to look, [9:32] you're directing the model's attention to relevant patterns it learned during training. [9:38] So for example, telling a model to think about this step-by-step, [9:42] that activates patterns it learnt from training data where methodical reasoning led to accurate results. [9:49] So a well-engineered prompt can transform a model's output without any additional training or without data retrieval. [9:59] So take an example of a prompt. [10:02] Let's say we say, is this code secure? [10:06] Not a very good prompt. [10:08] An engineered prompt, it might read a bit more like this. [10:12] It's much more detailed. [10:13] Now. [10:14] We haven't changed the model, we haven't added new data, we've just better activated its existing capabilities. [10:23] Now I think the benefits to this are pretty obvious. [10:26] One is that we don't need to change any of our back-end infrastructure here [10:32] because there are no infrastructure changes at all in order to prompt better, it's all on the user. [10:39] There's also the benefit that by doing this, You get to see immediate responses and immediate results to what you do. [10:50] We don't have to add in new training data or any kind of data processing, [10:53] but of course there are some limitations to this as well. [10:58] Prompt engineering is as much an art as it is a science. [11:01] So there is certainly a good amount of trial and error in this sort of process to find effective prompts, [11:10] and you're also limited in what you can do here, you're limited [11:15] to existing knowledge because you're not able to actually add anything else in here. [11:23] No additional amount of prompt engineering is going to teach it truly new information. [11:28] You're not going to the model anything that's outdated in the model. [11:33] So we've talked about now RAG as being one option and we talked about fine tuning as being another one. [11:44] And now, just now, we've talked about prompt engineering as well [11:51] and I've really talked about those as three different distinct things here, [11:57] but they're commonly used actually in combination. [12:02] We might use all three together. [12:05] So consider a legal AI system. [12:07] RAG, that could retrieve specific cases and recent court decisions. [12:12] The prompt engineering part, that could make sure that we follow proper legal document formats by asking for it. [12:19] And then fine-tuning, that can help the model master firm-specific policies. [12:24] I mean, basically, we can think of it like this. [12:27] We can think that prompt engineering offers flexibility and immediate results, but it can't extend knowledge. [12:34] RAG, that can extend knowledge, it provides up-to-date information, but with computational overhead. [12:39] and then fine-tuning, [12:41] that enables deep domain expertise, but it requires significant resources and maintenance. [12:47] Basically, it comes down to picking the methods that work for you. [12:52] You know, we've, we sure come a long way from vanity searching on Google.