[0:00] We've all had this experience.
[0:02] You search for something, you get thousands of results, and somehow, none of them are what you wanted.
[0:08] Well, what if I told you search engines don't actually understand your questions?
[0:12] At least, they didn't used to.
[0:14] From simple keyword search to present-day agentic RAG,
[0:18] information retrieval has seen an evolution, and search engines didn't get smarter overnight; they grew up one step at a time.
[0:26] Let's start from the beginning.
[0:28] The earliest search systems were designed around the question of "Where does this word appear?"
[0:33] Documents were indexed using what's called inverted indices, aka a mapping of keywords to documents.
[0:43] When a user asks a question, the search system will look up these words and quickly return the matching documents.
[0:58] These documents may then be ranked using TF-IDF or BM25 to measure how important or frequent different terms were.
[1:06] This powerful keyword matching approach still powers a lot of the internet today, but there's a fundamental limitation: it doesn't understand language.
[1:16] It treats words as symbols, not meaning.
[1:20] Synonyms, ambiguity and any complex intents were essentially invisible.
[1:24] For example, is the search help Python?
[1:28] Related to coding, or did I just get a pet snake?
[1:31] It was on the user to be asking the right questions with the exact right words.
[1:37] The next major leap was semantic search.
[1:40] Instead of treating text as words, we began representing them as language.
[1:45] This is done using vectors or high dimensional number representations that can understand meaning.
[1:53] For example, coffee might be represented as 0 1 0 versus house might be represented as 1 0 0.
[2:09] These embeddings don't just come out of nowhere.
[2:11] They are learned by large neural networks trained on massive text corpora.
[2:16] By encountering words in context, over time these similar concepts will end up close together even if they use different words.
[2:25] If this is coffee, maybe espresso is represented here.
Very close in concept to coffee, but not anywhere close to house.
[2:35] Semantic search turns your words into a kind of map.
[2:38] So the system knows espresso and coffee are pointing to a very similar place.
[2:43] It's essentially your friend who knows what you mean, even if you don't say it perfectly every time.
[2:49] This allowed search systems to understand intent.
[2:52] Even if the exact keywords were not used, you could still find relevant documents.
[2:59] And this didn't replace keyword search; it actually complemented it.
[3:04] Hybrid systems began to emerge, bridging the precision of keyword search with semantic recall.
[3:10] For the first time, instead of just matching text, search was able to approximate understanding.
[3:17] Then, the world shifted.
[3:19] Large language models were born.
[3:23] These are models trained on a large corpora of text to learn patterns in the data.
[3:31] LLMs don't retrieve facts.
[3:34] When prompted, they will predict the most likely next token or words for an answer based on those patterns that they learned from the training data.
[3:44] The user asks a question to the LLM and it will return a text answer.
[3:53] These are super powerful and revolutionize the business world.
[3:58] However, they had a problem.
[4:00] LLMs only use specific knowledge they learned during a long and expensive training process.
[4:07] Realistically, that means any knowledge is locked to only the documents that that specific LLM was trained on before a certain point in time.
[4:20] LLMs don't know today's information, and certainly don't know your specific documents.
[4:26] So what's the solution?
[4:27] Well, it's actually search.
[4:30] Retrieval augmented generation, or RAG, was born.
[4:34] The idea is very simple.
[4:36] The user asks a question, the system does a search for relevant documents using an external knowledge base.
[4:47] This retrieval is used to augment the LLM's prompt and a final answer is generated.
[5:02] This gave LLMs a form of external memory.
[5:06] Now they could cite sources, adapt to new information and even operate in specialized domains without the costly retraining.
[5:15] These original RAG pipelines were very linear.
[5:18] Documents were embedded offline into these vector databases.
[5:26] They were retrieved once at query time and passed straight into the model.
[5:30] It was simple, but effective.
[5:32] This massive improvement significantly dropped hallucinations and enabled LLM adoption across a multitude of new domains.
[5:41] But traditional RAG is nowhere near perfect.
[5:44] It cannot adapt to new scenarios.
[5:47] And suddenly we are back at the problem of traditional search.
[5:51] The answer is only as good as the search itself.
[5:55] Within such a short period, countless advancements were made to RAG, developing the simple concept into a sophisticated power to be reckoned with.
[6:04] Instead of a single retrieval step, pipelines added rerankers to reorder results to be more relevant.
[6:13] User queries were rewritten or expanded upon to improve recall.
[6:18] Similar to before, hybrid retrieval became the norm, leveraging the precision of keyword search with semantic vector search.
[6:29] These systems were far more accurate, but still fundamentally static.
[6:34] The pipeline was predetermined and retrieval was smarter, but still not intelligent.
[6:41] Enter the next disruptor: agents.
[6:45] Agents are systems that use LLMs and tools to perform tasks autonomously.
[6:51] Suddenly we shifted from simple pipelines to complex decision-making systems.
[6:57] Agents have a variety of tools such as LLMs, memory, planning, critics, retrievers and many more.
[7:15] Agents had become autonomous decision-makers, planning and executing complex tasks.
[7:22] Now, instead of linear RAG retrieval, when the user asks a question,
[7:27] an AI agent will decide whether retrieval is needed, where to search,
[7:33] what questions should be asked, when enough information is obtained, and then generate a final answer.
[7:42] Agents can compare sources, validate claims, refine queries and iterate.
[7:48] It can invoke APIs, pull data from many knowledge bases and incorporate multimodal data.
[7:55] Retrieval is no longer fixed; it's a tool invoked as part of reasoning.
[8:02] This opens up a world of possibilities.
[8:05] Now, agentic RAG systems are capable of multistep research, cross-document synthesis and general adaptive behavior.
[8:14] The system doesn't just answer questions; it reasons and figures out how to answer them.
[8:20] From simple search to current agentic RAG, we have learned time and time again that the next big step isn't better answers; it's systems that know how to find them.
[8:31] And the hardest part of AI isn't generation; it's deciding what to look at.