Stanford CS25: V3 I Retrieval Augmented Language Models

Transcribed Jun 20, 2026 Watch on YouTube ↗

Intermediate 16 min read For: Students, researchers, and practitioners in machine learning and NLP with a basic understanding of language models.

201.6K

Views

3.9K

Likes

75

Comments

16

Dislikes

2.0%

📊 Average

AI Summary

This lecture by Douwe Kiela, CEO of Contextual AI, provides a comprehensive overview of Retrieval-Augmented Generation (RAG). It covers the evolution from simple 'frozen' RAG systems to more sophisticated, end-to-end optimized architectures, addressing key challenges like hallucination, attribution, and customization.

Chapters

1 Introduction and Context 00:05 2 The Problem with Pure Language Models 03:00 3 RAG Basics and the Open-Book Analogy 09:00 4 Retrieval Methods: Sparse, Dense, and Hybrid 15:00 5 Optimizing the Retriever for the Generator 24:00 6 End-to-End Training and Key Architectures 37:00 7 Advanced RAG, Open Questions, and Future Directions 54:00

[01:55]

Language Models Are Not New

The core idea of language models (predicting the next token) is decades old, not invented by OpenAI. ChatGPT's key innovation was fixing the user interface through instruction tuning and alignment.

[04:55]

Problems with Pure LMs

Pure language models suffer from hallucination, lack of attribution, staleness, inability to revise information, and difficulty in customization.

[06:08]

RAG as a Solution

RAG couples a language model with an external memory (retriever), allowing it to access and ground its generation in retrieved information, solving many of the above problems.

[11:00]

Frozen RAG (RAG 1.0)

The most common form of RAG is 'frozen RAG', where a pre-trained retriever and generator are used without any joint training. This is criticized as a 'Frankenstein's monster'.

[12:15]

Sparse vs. Dense Retrieval

Sparse retrieval (e.g., BM25) counts word overlaps, while dense retrieval (e.g., DPR) uses embeddings for semantic similarity. Hybrid search combines both for best results.

[17:20]

FAISS Powers Vector Databases

FAISS is the open-source library that underlies most modern vector databases, enabling efficient approximate nearest neighbor search.

[25:30]

Replug: Training the Retriever

Replug trains the retriever by minimizing the KL divergence between its document distribution and the generator's perplexity-based distribution, without updating the generator.

[30:15]

Original RAG: End-to-End Training

The original RAG paper (2020) proposed end-to-end training of both retriever and generator, but it only supports a small number of documents in context.

[40:37]

Atlas: Query-Side Fine-Tuning is Key

The Atlas paper provides a comprehensive analysis of RAG training, finding that query-side fine-tuning (updating only the query encoder) is often sufficient if the document encoder is high-quality.

[58:50]

Lost in the Middle Problem

The 'lost in the middle' problem shows that models tend to ignore information in the middle of a long context, making RAG systems brittle.

[63:10]

Advanced RAG Techniques

Advanced RAG techniques include hybrid search, hypothetical document embeddings (HyDE), and active retrieval (e.g., FLARE) where the model decides when to search.

[68:50]

Future: RAG 2.0 and Systems Thinking

The future of RAG involves moving to 'RAG 2.0' with end-to-end optimization, treating the entire system (including chunking) as differentiable, and decoupling knowledge from reasoning.

Clickbait Check

95% Legit

"The title is accurate; the lecture is a deep dive into retrieval-augmented language models, exactly as promised."

Mentioned in this Video

Contextual AI

service

Hugging Face

service

FAISS

tool

BM25

tool

DPR (Dense Passage Retriever)

tool

ColBERT

tool

Dragon / Dragon+

tool

Replug

tool

RAG (Retrieval-Augmented Generation)

tool

Atlas

tool

Retro

tool

FLARE

tool

LlamaIndex

tool

LangChain

tool

Chroma

tool

Weaviate

tool

Karen Spärck Jones

person

Douwe Kiela

person

Study Flashcards (12)

What was the main innovation of ChatGPT according to the lecture?

medium Click to reveal answer

To fix the user interface to the language model, making it easier for humans to interact with it.