[0:00] Have you ever tried asking who won the
[0:02] IPL in 2025
[0:04] [Music]
[0:06] or explain the code I wrote last week
[0:11] and what happens? Nine times out of 10
[0:14] it just starts hallucinated just making
[0:16] stuff up going completely off the rails.
[0:18] Well, if you used any LLM in the past
[0:20] years, whether it's chat GPT, Claude,
[0:22] Gemini Grock Mistral whatever you've
[0:24] probably run into this one big annoying
[0:27] problem. You ask something super
[0:29] specific like a detailed question,
[0:30] something about yourself, some code you
[0:32] wrote last week or a spreadsheet that
[0:34] you uploaded and the model answers super
[0:37] confidently, like it knows everything,
[0:38] but it completely misses the point.
[0:40] Sometimes it just straight up
[0:42] hallucinates and gives answers that
[0:44] don't even exist. And look, the reason
[0:46] is dead simple. Large language models
[0:48] are pattern matching machines. They're
[0:50] incredible at regurgitating what they've
[0:52] already been trained on. But here's the
[0:54] kicker. They don't know your data, your
[0:56] context, or your secret source. And this
[0:59] is exactly why AI is still struggling to
[1:01] make a massive dent in fields like law,
[1:03] medicine, and compliance. You know, the
[1:05] places where hallucination isn't just an
[1:07] oops, my bad kind of situation. It's
[1:09] downright dangerous. Because let's be
[1:11] real, when you yank out your context,
[1:13] that fancy AI model just becomes
[1:15] generic. It becomes mid. But there's got
[1:17] to be a fix for this, right? We can't be
[1:19] pushing AI this hard and just leave this
[1:22] massive huge problem hanging. So, what
[1:25] are we going to do? We've actually got
[1:26] two solid ways to tackle this. And
[1:28] today, we're going to break them down
[1:29] for you.
[1:34] Ways of fixing AI. All right, so we've
[1:36] got this context problem. How do we
[1:38] actually solve it? It turns out that
[1:40] we've got two main players in the book.
[1:42] So, let's dissect them. The first option
[1:43] that we have is fine-tuning. Think of
[1:45] this as sending your AI model back to
[1:47] school. But this time, the curriculum is
[1:49] all about you. You literally take that
[1:50] base model and retrain it from the
[1:52] ground up with your own data. Like your
[1:54] emails, your entire code base, your
[1:56] chats, your pictures, everything gets
[1:58] thrown into the mix and it literally
[2:00] learns your specific domain and becomes
[2:02] a native. The upside is massive. Once
[2:04] the model is trained up, it's like it
[2:06] was literally born for your use cases.
[2:08] You don't need to keep spoon feeding it
[2:09] and giving it extra context every single
[2:11] time. It just gets it. but and it's a
[2:13] big butt. It can be extremely painful.
[2:16] Seriously. So, GPU time is going to cost
[2:18] you an arm and a leg. And what happens
[2:20] when your data changes? New data, new
[2:22] code, you guessed it, back to square
[2:24] one. Repeat the entire process. And
[2:26] plus, managing versions of these huge
[2:28] model checkpoints is a messy logistical
[2:31] nightmare. Trust me. So, that brings us
[2:34] to option number two, which is RAG. And
[2:36] RAG stands for retrieval augmented
[2:38] generation. And folks, this is where
[2:40] things get really, really interesting.
[2:42] This is the street smart agile cousin.
[2:45] Way way simpler. You don't even need to
[2:47] touch the underlying base model. No
[2:49] expensive retraining. Instead, you just
[2:51] build the clever context engine. And you
[2:54] can just think of this as a
[2:55] superefficient research assistant that
[2:57] sits around the LLM. And then at
[2:59] runtime, when a query comes in, the
[3:01] engine zips in and feeds the model just
[3:03] the right pieces of information it needs
[3:05] right when it needs them. Let's imagine
[3:07] that you're a world-class chef. You know
[3:08] how to cook anything, but you don't know
[3:10] what the next order from the dining room
[3:12] is going to be. With Rag, the moment
[3:14] that order hits the kitchen, bam,
[3:16] someone magically hands you the perfect
[3:18] detailed recipe for the exact dish. You
[3:20] didn't even have to deal on cooking. You
[3:22] just got the precise instructions that
[3:24] you needed. That is Rag right there.
[3:27] That's the power. No retraining, live
[3:29] updates, and way cheaper. So, now you
[3:31] guys understand the beauty of Rag. But
[3:33] why does this setup work so incredibly
[3:35] well? Why is it becoming the go-to for
[3:37] so many people trying to make LLMs
[3:39] actually useful with their own data? Why
[3:41] does Rag work so well? And here are the
[3:43] reasons. Number one is fast iterations,
[3:45] new docs, no sweat. Add them. Re-mbed
[3:48] them and your rag will instantly get
[3:50] smarter. No waiting for weeks for a
[3:52] retrain. Next is cheap infrastructure.
[3:54] Forget burning cash on endless GPU
[3:56] cycles. Rag is lean, minimal compute and
[3:58] your wallet will always stay happy. Next
[4:00] is it's always fresh. Your info never
[4:03] gets stale. upload a doc, your rag
[4:05] adapts in seconds and always with the
[4:07] latest intel. So you get speed, you can
[4:09] save cash, and your AI always stays
[4:12] current. That's a pretty powerful combo.
[4:14] Okay, so now you're probably thinking,
[4:15] okay, this sounds cool, but how does
[4:17] this rag magic actually work under the
[4:20] hood? Don't worry, we've got you. Rag
[4:22] pipeline. Okay, so how does this rag
[4:24] wizardry pull off giving your LM the
[4:26] brains it needs without the pain of
[4:28] retraining? We're going to break down
[4:30] the entire pipeline. And to make sure
[4:33] it's super easy to lock into your
[4:35] memory, we're going to use an analogy.
[4:37] Imagine you're setting up the most
[4:39] insanely organized high- techch library
[4:41] ever built. And for all you visual
[4:43] thinkers out there, we've created this
[4:45] crazy crazy massive diagram. So, we're
[4:47] going to drop a link so you can explore
[4:48] it on your own later, but for now, let's
[4:49] walk through it together. All right,
[4:51] let's dive in. All right, so step number
[4:53] one is your data intake. Imagine this
[4:55] being the part where the books arrive at
[4:57] the library. The first things first,
[4:59] your data. This is where all your books
[5:01] start showing up at the library doors.
[5:04] Think of your company's PDFs, your email
[5:06] archives, critical CSVs, even your
[5:08] entire codebase, all your content. So
[5:10] consider this to be your raw materials,
[5:12] the books that need to be cataloged in
[5:14] our super library. Now we move on to
[5:15] step two, which is chunking. Now imagine
[5:18] this is where you're breaking down the
[5:19] books into index cards. Now you're not
[5:21] just going to cram the entire
[5:22] encyclopedia onto one shelf, right? So,
[5:24] you take each book, each document, and
[5:27] you chunk it. You break it down into
[5:28] smaller bite-sized pieces. And you can
[5:30] think of them as individual index cards,
[5:32] maybe one paragraph per card or logical
[5:35] section. And the key is digestible
[5:37] pieces. Why? So, instead of your AI
[5:40] librarian having to flip through 300
[5:43] pages to find a single answer, it can
[5:45] search these cards way faster and way
[5:47] more effectively. Precision, people,
[5:49] it's all about precision. So the tools
[5:51] that you can use for this is lang text
[5:52] split or you can also use llama index.
[5:55] Now we're moving on to step three which
[5:56] is embedding. Now imagine this to be the
[5:58] part where you're giving each card GPS
[6:00] coordinates. Now this is where the real
[6:03] AI magic starts to kick in. We take
[6:05] those text chunks, those index cards and
[6:07] we run them into coordinates. Now think
[6:09] of it as assigning a super precise GPS
[6:12] location to every single piece of
[6:13] information in your library but for
[6:15] language. The trick is that the cards
[6:18] with similar meaning get plotted in
[6:20] nearby locations in this massive
[6:22] multi-dimensional space. So words like
[6:24] similar, same, identical, they're all
[6:27] hanging out in the same neighborhood.
[6:28] Popular models that you can use for this
[6:29] is Google's text embedding API or you
[6:31] can also use OpenAI's text embedding 3.
[6:34] So you have a lot of horsepower to
[6:35] choose from. Now we're moving on to step
[6:36] number four, which is vector storage.
[6:38] Now this you can imagine as organizing
[6:40] the high-tech shelves. All right, so our
[6:42] Index cars now have their GPS
[6:44] coordinates. So, next up, we're going to
[6:45] need some serious shelving to store
[6:47] them. And this isn't your grandma's
[6:49] dusty bookshelf. This is a high
[6:51] performance vector database. So, you've
[6:53] got names like Pine Corn, Chroma,
[6:55] Qentrint in the Ring. Pick the one whose
[6:56] landing page you vibe with the most or
[6:58] the one that fits your scale and budget.
[7:00] Seriously, they're all pretty good. And
[7:02] it doesn't matter if you got a,000 cards
[7:04] or 10 million. These databases are built
[7:06] for speed. They can use semantic
[7:09] searches, finding those relevant meaning
[7:11] coordinates in milliseconds. Blink and
[7:13] you'll probably miss it. Okay, now we're
[7:15] moving on to step number five which is
[7:16] retrieval. Imagine this to be the part
[7:17] where the librarian finds the exact
[7:19] cards. Okay, so now your library is set
[7:21] up. Now user walks in with a question.
[7:24] So after the user asks that questions,
[7:25] what do you think is going to happen? So
[7:26] first the rag system takes that user's
[7:29] query, embeds it and turns it into a
[7:31] vector just like it did with all your
[7:33] documents. Then it performs a similarity
[7:35] search against your entire vector
[7:37] database and does something like show me
[7:38] the top five or six cards whose content
[7:41] is semantically closest to this
[7:42] question. So those are going to be your
[7:44] golden index cards with each one of them
[7:46] holding a crucial part of the answer and
[7:49] also a relevant snippet of information.
[7:51] Now we're moving on to step number six
[7:52] which is synthesis. This is the part
[7:54] where the librarian writes the perfect
[7:56] answer. This is where our super smart
[7:58] LLM, our AI librarian steps up to the
[8:01] plate. We feed it those top rank
[8:03] relevant chunks plus the original user
[8:05] query and we usually give it a little
[8:07] nudge, a guardrail prompt so to speak,
[8:10] something like use only the context
[8:12] provided. If the answer isn't there,
[8:14] just say so. The LLM then reads these
[8:16] carefully selected cards, understands
[8:18] the question in that specific context,
[8:20] and spits out focused, accurate, and
[8:22] contextual answer. No hallucination, no
[8:24] wild guessing, and no making stuff up.
[8:27] It's answering like it knows your data
[8:29] because in that moment, for that query,
[8:31] thanks to Rag, it actually does. So now,
[8:34] theory is great. Analogies are fun, but
[8:36] at Builder Central, we're all about
[8:38] building and shipping. So, now that
[8:40] we've walked you through how Ragg
[8:41] actually works, how about we show you
[8:42] what we actually built using the same
[8:44] exact approach.
[8:46] Ragbot, this isn't a full-blown line by
[8:49] line coding tutorial on how we built
[8:50] this specific chatbot. So, we actually
[8:53] dove deep into any which was our main
[8:55] tool for this in a previous video. If
[8:57] you've missed that video, make sure you
[8:58] check it out. The link is going to be
[8:59] either in the description or somewhere
[9:00] over here, depends on where the editor
[9:02] puts it. So, we showed how you can
[9:03] visually build these kind of powerful
[9:05] workflows with minimal to no code. So
[9:07] here's our flow for the data source.
[9:09] What we did is we used Google Drive and
[9:11] connected it via GCP. Now the reason we
[9:13] did this is because it enables us to
[9:15] upload the documents in real time
[9:16] effectively turning it into a live
[9:18] database. For embeddings we used OpenAI
[9:20] to generate them which was really really
[9:22] easy and inexpensive. For storage we use
[9:24] Pine Cone as our main vector storage
[9:26] database because they offer a fairly
[9:28] generous free storage tier. For
[9:30] retrieval and synthesis, we use Google's
[9:32] API, which handled the LLM part by
[9:33] synthesizing answers based on the
[9:35] embedded chunks that were received. So,
[9:37] what does this actually look like in
[9:39] action? Well, with this setup, you can
[9:41] throw pretty much any file at it. PDFs,
[9:43] Word docs, you name it. The bot chews it
[9:45] up, processes it, and then boom, you're
[9:47] chatting with your own data. So, you
[9:49] need to ask it something specific like,
[9:50] "What's the difference between function
[9:51] A and function B in this massive
[9:53] codebase I just uploaded?" And it should
[9:55] spit back the answer according to the
[9:57] document. It also works for CVs if
[9:59] you're hiring. So those complicated
[10:01] recipes you can never follow, dense
[10:04] legal documents, your chaotic lecture
[10:06] notes, whatever you've got, just, you
[10:07] know, be smart about it. Don't upload
[10:09] some deepest darkest secrets. Okay,
[10:11] that's kind of stupid. Don't do that. So
[10:13] what would be the end result? You can
[10:15] literally just drag and drop any
[10:17] document into the Google Drive folder
[10:19] and the chatbot it updates either in
[10:21] real time or on a schedule that you set
[10:24] and then it's ready to answer your
[10:26] questions using that fresh updated
[10:28] context which is pretty cool, right?
[10:29] JSON file for NA10 is in the
[10:31] description. So, make sure you use that
[10:32] and create your own ragbot. Basically,
[10:34] in a nutshell, rag is making your AI
[10:37] actually know your world. All right,
[10:39] ladies and gentlemen, that's our session
[10:40] for today. Until next time, keep
[10:42] building, keep experimenting, and stay
[10:44] tuned to Builder Central for more such
[10:45] content.