[0:00] We've all had this experience. [0:02] You search for something, you get thousands of results, and somehow, none of them are what you wanted. [0:08] Well, what if I told you search engines don't actually understand your questions? [0:12] At least, they didn't used to. [0:14] From simple keyword search to present-day agentic RAG, [0:18] information retrieval has seen an evolution, and search engines didn't get smarter overnight; they grew up one step at a time. [0:26] Let's start from the beginning. [0:28] The earliest search systems were designed around the question of "Where does this word appear?" [0:33] Documents were indexed using what's called inverted indices, aka a mapping of keywords to documents. [0:43] When a user asks a question, the search system will look up these words and quickly return the matching documents. [0:58] These documents may then be ranked using TF-IDF or BM25 to measure how important or frequent different terms were. [1:06] This powerful keyword matching approach still powers a lot of the internet today, but there's a fundamental limitation: it doesn't understand language. [1:16] It treats words as symbols, not meaning. [1:20] Synonyms, ambiguity and any complex intents were essentially invisible. [1:24] For example, is the search help Python? [1:28] Related to coding, or did I just get a pet snake? [1:31] It was on the user to be asking the right questions with the exact right words. [1:37] The next major leap was semantic search. [1:40] Instead of treating text as words, we began representing them as language. [1:45] This is done using vectors or high dimensional number representations that can understand meaning. [1:53] For example, coffee might be represented as 0 1 0 versus house might be represented as 1 0 0. [2:09] These embeddings don't just come out of nowhere. [2:11] They are learned by large neural networks trained on massive text corpora. [2:16] By encountering words in context, over time these similar concepts will end up close together even if they use different words. [2:25] If this is coffee, maybe espresso is represented here. Very close in concept to coffee, but not anywhere close to house. [2:35] Semantic search turns your words into a kind of map. [2:38] So the system knows espresso and coffee are pointing to a very similar place. [2:43] It's essentially your friend who knows what you mean, even if you don't say it perfectly every time. [2:49] This allowed search systems to understand intent. [2:52] Even if the exact keywords were not used, you could still find relevant documents. [2:59] And this didn't replace keyword search; it actually complemented it. [3:04] Hybrid systems began to emerge, bridging the precision of keyword search with semantic recall. [3:10] For the first time, instead of just matching text, search was able to approximate understanding. [3:17] Then, the world shifted. [3:19] Large language models were born. [3:23] These are models trained on a large corpora of text to learn patterns in the data. [3:31] LLMs don't retrieve facts. [3:34] When prompted, they will predict the most likely next token or words for an answer based on those patterns that they learned from the training data. [3:44] The user asks a question to the LLM and it will return a text answer. [3:53] These are super powerful and revolutionize the business world. [3:58] However, they had a problem. [4:00] LLMs only use specific knowledge they learned during a long and expensive training process. [4:07] Realistically, that means any knowledge is locked to only the documents that that specific LLM was trained on before a certain point in time. [4:20] LLMs don't know today's information, and certainly don't know your specific documents. [4:26] So what's the solution? [4:27] Well, it's actually search. [4:30] Retrieval augmented generation, or RAG, was born. [4:34] The idea is very simple. [4:36] The user asks a question, the system does a search for relevant documents using an external knowledge base. [4:47] This retrieval is used to augment the LLM's prompt and a final answer is generated. [5:02] This gave LLMs a form of external memory. [5:06] Now they could cite sources, adapt to new information and even operate in specialized domains without the costly retraining. [5:15] These original RAG pipelines were very linear. [5:18] Documents were embedded offline into these vector databases. [5:26] They were retrieved once at query time and passed straight into the model. [5:30] It was simple, but effective. [5:32] This massive improvement significantly dropped hallucinations and enabled LLM adoption across a multitude of new domains. [5:41] But traditional RAG is nowhere near perfect. [5:44] It cannot adapt to new scenarios. [5:47] And suddenly we are back at the problem of traditional search. [5:51] The answer is only as good as the search itself. [5:55] Within such a short period, countless advancements were made to RAG, developing the simple concept into a sophisticated power to be reckoned with. [6:04] Instead of a single retrieval step, pipelines added rerankers to reorder results to be more relevant. [6:13] User queries were rewritten or expanded upon to improve recall. [6:18] Similar to before, hybrid retrieval became the norm, leveraging the precision of keyword search with semantic vector search. [6:29] These systems were far more accurate, but still fundamentally static. [6:34] The pipeline was predetermined and retrieval was smarter, but still not intelligent. [6:41] Enter the next disruptor: agents. [6:45] Agents are systems that use LLMs and tools to perform tasks autonomously. [6:51] Suddenly we shifted from simple pipelines to complex decision-making systems. [6:57] Agents have a variety of tools such as LLMs, memory, planning, critics, retrievers and many more. [7:15] Agents had become autonomous decision-makers, planning and executing complex tasks. [7:22] Now, instead of linear RAG retrieval, when the user asks a question, [7:27] an AI agent will decide whether retrieval is needed, where to search, [7:33] what questions should be asked, when enough information is obtained, and then generate a final answer. [7:42] Agents can compare sources, validate claims, refine queries and iterate. [7:48] It can invoke APIs, pull data from many knowledge bases and incorporate multimodal data. [7:55] Retrieval is no longer fixed; it's a tool invoked as part of reasoning. [8:02] This opens up a world of possibilities. [8:05] Now, agentic RAG systems are capable of multistep research, cross-document synthesis and general adaptive behavior. [8:14] The system doesn't just answer questions; it reasons and figures out how to answer them. [8:20] From simple search to current agentic RAG, we have learned time and time again that the next big step isn't better answers; it's systems that know how to find them. [8:31] And the hardest part of AI isn't generation; it's deciding what to look at.