[0:00] Have you ever tried asking who won the [0:02] IPL in 2025 [0:04] [Music] [0:06] or explain the code I wrote last week [0:11] and what happens? Nine times out of 10 [0:14] it just starts hallucinated just making [0:16] stuff up going completely off the rails. [0:18] Well, if you used any LLM in the past [0:20] years, whether it's chat GPT, Claude, [0:22] Gemini Grock Mistral whatever you've [0:24] probably run into this one big annoying [0:27] problem. You ask something super [0:29] specific like a detailed question, [0:30] something about yourself, some code you [0:32] wrote last week or a spreadsheet that [0:34] you uploaded and the model answers super [0:37] confidently, like it knows everything, [0:38] but it completely misses the point. [0:40] Sometimes it just straight up [0:42] hallucinates and gives answers that [0:44] don't even exist. And look, the reason [0:46] is dead simple. Large language models [0:48] are pattern matching machines. They're [0:50] incredible at regurgitating what they've [0:52] already been trained on. But here's the [0:54] kicker. They don't know your data, your [0:56] context, or your secret source. And this [0:59] is exactly why AI is still struggling to [1:01] make a massive dent in fields like law, [1:03] medicine, and compliance. You know, the [1:05] places where hallucination isn't just an [1:07] oops, my bad kind of situation. It's [1:09] downright dangerous. Because let's be [1:11] real, when you yank out your context, [1:13] that fancy AI model just becomes [1:15] generic. It becomes mid. But there's got [1:17] to be a fix for this, right? We can't be [1:19] pushing AI this hard and just leave this [1:22] massive huge problem hanging. So, what [1:25] are we going to do? We've actually got [1:26] two solid ways to tackle this. And [1:28] today, we're going to break them down [1:29] for you. [1:34] Ways of fixing AI. All right, so we've [1:36] got this context problem. How do we [1:38] actually solve it? It turns out that [1:40] we've got two main players in the book. [1:42] So, let's dissect them. The first option [1:43] that we have is fine-tuning. Think of [1:45] this as sending your AI model back to [1:47] school. But this time, the curriculum is [1:49] all about you. You literally take that [1:50] base model and retrain it from the [1:52] ground up with your own data. Like your [1:54] emails, your entire code base, your [1:56] chats, your pictures, everything gets [1:58] thrown into the mix and it literally [2:00] learns your specific domain and becomes [2:02] a native. The upside is massive. Once [2:04] the model is trained up, it's like it [2:06] was literally born for your use cases. [2:08] You don't need to keep spoon feeding it [2:09] and giving it extra context every single [2:11] time. It just gets it. but and it's a [2:13] big butt. It can be extremely painful. [2:16] Seriously. So, GPU time is going to cost [2:18] you an arm and a leg. And what happens [2:20] when your data changes? New data, new [2:22] code, you guessed it, back to square [2:24] one. Repeat the entire process. And [2:26] plus, managing versions of these huge [2:28] model checkpoints is a messy logistical [2:31] nightmare. Trust me. So, that brings us [2:34] to option number two, which is RAG. And [2:36] RAG stands for retrieval augmented [2:38] generation. And folks, this is where [2:40] things get really, really interesting. [2:42] This is the street smart agile cousin. [2:45] Way way simpler. You don't even need to [2:47] touch the underlying base model. No [2:49] expensive retraining. Instead, you just [2:51] build the clever context engine. And you [2:54] can just think of this as a [2:55] superefficient research assistant that [2:57] sits around the LLM. And then at [2:59] runtime, when a query comes in, the [3:01] engine zips in and feeds the model just [3:03] the right pieces of information it needs [3:05] right when it needs them. Let's imagine [3:07] that you're a world-class chef. You know [3:08] how to cook anything, but you don't know [3:10] what the next order from the dining room [3:12] is going to be. With Rag, the moment [3:14] that order hits the kitchen, bam, [3:16] someone magically hands you the perfect [3:18] detailed recipe for the exact dish. You [3:20] didn't even have to deal on cooking. You [3:22] just got the precise instructions that [3:24] you needed. That is Rag right there. [3:27] That's the power. No retraining, live [3:29] updates, and way cheaper. So, now you [3:31] guys understand the beauty of Rag. But [3:33] why does this setup work so incredibly [3:35] well? Why is it becoming the go-to for [3:37] so many people trying to make LLMs [3:39] actually useful with their own data? Why [3:41] does Rag work so well? And here are the [3:43] reasons. Number one is fast iterations, [3:45] new docs, no sweat. Add them. Re-mbed [3:48] them and your rag will instantly get [3:50] smarter. No waiting for weeks for a [3:52] retrain. Next is cheap infrastructure. [3:54] Forget burning cash on endless GPU [3:56] cycles. Rag is lean, minimal compute and [3:58] your wallet will always stay happy. Next [4:00] is it's always fresh. Your info never [4:03] gets stale. upload a doc, your rag [4:05] adapts in seconds and always with the [4:07] latest intel. So you get speed, you can [4:09] save cash, and your AI always stays [4:12] current. That's a pretty powerful combo. [4:14] Okay, so now you're probably thinking, [4:15] okay, this sounds cool, but how does [4:17] this rag magic actually work under the [4:20] hood? Don't worry, we've got you. Rag [4:22] pipeline. Okay, so how does this rag [4:24] wizardry pull off giving your LM the [4:26] brains it needs without the pain of [4:28] retraining? We're going to break down [4:30] the entire pipeline. And to make sure [4:33] it's super easy to lock into your [4:35] memory, we're going to use an analogy. [4:37] Imagine you're setting up the most [4:39] insanely organized high- techch library [4:41] ever built. And for all you visual [4:43] thinkers out there, we've created this [4:45] crazy crazy massive diagram. So, we're [4:47] going to drop a link so you can explore [4:48] it on your own later, but for now, let's [4:49] walk through it together. All right, [4:51] let's dive in. All right, so step number [4:53] one is your data intake. Imagine this [4:55] being the part where the books arrive at [4:57] the library. The first things first, [4:59] your data. This is where all your books [5:01] start showing up at the library doors. [5:04] Think of your company's PDFs, your email [5:06] archives, critical CSVs, even your [5:08] entire codebase, all your content. So [5:10] consider this to be your raw materials, [5:12] the books that need to be cataloged in [5:14] our super library. Now we move on to [5:15] step two, which is chunking. Now imagine [5:18] this is where you're breaking down the [5:19] books into index cards. Now you're not [5:21] just going to cram the entire [5:22] encyclopedia onto one shelf, right? So, [5:24] you take each book, each document, and [5:27] you chunk it. You break it down into [5:28] smaller bite-sized pieces. And you can [5:30] think of them as individual index cards, [5:32] maybe one paragraph per card or logical [5:35] section. And the key is digestible [5:37] pieces. Why? So, instead of your AI [5:40] librarian having to flip through 300 [5:43] pages to find a single answer, it can [5:45] search these cards way faster and way [5:47] more effectively. Precision, people, [5:49] it's all about precision. So the tools [5:51] that you can use for this is lang text [5:52] split or you can also use llama index. [5:55] Now we're moving on to step three which [5:56] is embedding. Now imagine this to be the [5:58] part where you're giving each card GPS [6:00] coordinates. Now this is where the real [6:03] AI magic starts to kick in. We take [6:05] those text chunks, those index cards and [6:07] we run them into coordinates. Now think [6:09] of it as assigning a super precise GPS [6:12] location to every single piece of [6:13] information in your library but for [6:15] language. The trick is that the cards [6:18] with similar meaning get plotted in [6:20] nearby locations in this massive [6:22] multi-dimensional space. So words like [6:24] similar, same, identical, they're all [6:27] hanging out in the same neighborhood. [6:28] Popular models that you can use for this [6:29] is Google's text embedding API or you [6:31] can also use OpenAI's text embedding 3. [6:34] So you have a lot of horsepower to [6:35] choose from. Now we're moving on to step [6:36] number four, which is vector storage. [6:38] Now this you can imagine as organizing [6:40] the high-tech shelves. All right, so our [6:42] Index cars now have their GPS [6:44] coordinates. So, next up, we're going to [6:45] need some serious shelving to store [6:47] them. And this isn't your grandma's [6:49] dusty bookshelf. This is a high [6:51] performance vector database. So, you've [6:53] got names like Pine Corn, Chroma, [6:55] Qentrint in the Ring. Pick the one whose [6:56] landing page you vibe with the most or [6:58] the one that fits your scale and budget. [7:00] Seriously, they're all pretty good. And [7:02] it doesn't matter if you got a,000 cards [7:04] or 10 million. These databases are built [7:06] for speed. They can use semantic [7:09] searches, finding those relevant meaning [7:11] coordinates in milliseconds. Blink and [7:13] you'll probably miss it. Okay, now we're [7:15] moving on to step number five which is [7:16] retrieval. Imagine this to be the part [7:17] where the librarian finds the exact [7:19] cards. Okay, so now your library is set [7:21] up. Now user walks in with a question. [7:24] So after the user asks that questions, [7:25] what do you think is going to happen? So [7:26] first the rag system takes that user's [7:29] query, embeds it and turns it into a [7:31] vector just like it did with all your [7:33] documents. Then it performs a similarity [7:35] search against your entire vector [7:37] database and does something like show me [7:38] the top five or six cards whose content [7:41] is semantically closest to this [7:42] question. So those are going to be your [7:44] golden index cards with each one of them [7:46] holding a crucial part of the answer and [7:49] also a relevant snippet of information. [7:51] Now we're moving on to step number six [7:52] which is synthesis. This is the part [7:54] where the librarian writes the perfect [7:56] answer. This is where our super smart [7:58] LLM, our AI librarian steps up to the [8:01] plate. We feed it those top rank [8:03] relevant chunks plus the original user [8:05] query and we usually give it a little [8:07] nudge, a guardrail prompt so to speak, [8:10] something like use only the context [8:12] provided. If the answer isn't there, [8:14] just say so. The LLM then reads these [8:16] carefully selected cards, understands [8:18] the question in that specific context, [8:20] and spits out focused, accurate, and [8:22] contextual answer. No hallucination, no [8:24] wild guessing, and no making stuff up. [8:27] It's answering like it knows your data [8:29] because in that moment, for that query, [8:31] thanks to Rag, it actually does. So now, [8:34] theory is great. Analogies are fun, but [8:36] at Builder Central, we're all about [8:38] building and shipping. So, now that [8:40] we've walked you through how Ragg [8:41] actually works, how about we show you [8:42] what we actually built using the same [8:44] exact approach. [8:46] Ragbot, this isn't a full-blown line by [8:49] line coding tutorial on how we built [8:50] this specific chatbot. So, we actually [8:53] dove deep into any which was our main [8:55] tool for this in a previous video. If [8:57] you've missed that video, make sure you [8:58] check it out. The link is going to be [8:59] either in the description or somewhere [9:00] over here, depends on where the editor [9:02] puts it. So, we showed how you can [9:03] visually build these kind of powerful [9:05] workflows with minimal to no code. So [9:07] here's our flow for the data source. [9:09] What we did is we used Google Drive and [9:11] connected it via GCP. Now the reason we [9:13] did this is because it enables us to [9:15] upload the documents in real time [9:16] effectively turning it into a live [9:18] database. For embeddings we used OpenAI [9:20] to generate them which was really really [9:22] easy and inexpensive. For storage we use [9:24] Pine Cone as our main vector storage [9:26] database because they offer a fairly [9:28] generous free storage tier. For [9:30] retrieval and synthesis, we use Google's [9:32] API, which handled the LLM part by [9:33] synthesizing answers based on the [9:35] embedded chunks that were received. So, [9:37] what does this actually look like in [9:39] action? Well, with this setup, you can [9:41] throw pretty much any file at it. PDFs, [9:43] Word docs, you name it. The bot chews it [9:45] up, processes it, and then boom, you're [9:47] chatting with your own data. So, you [9:49] need to ask it something specific like, [9:50] "What's the difference between function [9:51] A and function B in this massive [9:53] codebase I just uploaded?" And it should [9:55] spit back the answer according to the [9:57] document. It also works for CVs if [9:59] you're hiring. So those complicated [10:01] recipes you can never follow, dense [10:04] legal documents, your chaotic lecture [10:06] notes, whatever you've got, just, you [10:07] know, be smart about it. Don't upload [10:09] some deepest darkest secrets. Okay, [10:11] that's kind of stupid. Don't do that. So [10:13] what would be the end result? You can [10:15] literally just drag and drop any [10:17] document into the Google Drive folder [10:19] and the chatbot it updates either in [10:21] real time or on a schedule that you set [10:24] and then it's ready to answer your [10:26] questions using that fresh updated [10:28] context which is pretty cool, right? [10:29] JSON file for NA10 is in the [10:31] description. So, make sure you use that [10:32] and create your own ragbot. Basically, [10:34] in a nutshell, rag is making your AI [10:37] actually know your world. All right, [10:39] ladies and gentlemen, that's our session [10:40] for today. Until next time, keep [10:42] building, keep experimenting, and stay [10:44] tuned to Builder Central for more such [10:45] content.