---
title: 'Build a Large Language Model AI Chatbot using Retrieval Augmented Generation'
source: 'https://youtube.com/watch?v=XctooiH0moI'
video_id: 'XctooiH0moI'
date: 2026-06-30
duration_sec: 173
---

# Build a Large Language Model AI Chatbot using Retrieval Augmented Generation

> Source: [Build a Large Language Model AI Chatbot using Retrieval Augmented Generation](https://youtube.com/watch?v=XctooiH0moI)

## Summary

This video demonstrates how to build a large language model (LLM) application that enables users to chat with their own data. The core technique used is retrieval augmented generation (RAG), which involves chunking custom data into prompts for the LLM to answer based on that context. The tutorial walks through building the app with Streamlit for the chat interface, integrating a Watsonx.ai LLM, and adding custom PDF data via vector embeddings.

### Key Points

- **Introduction to RAG** [00:00] — The technique enabling chat with custom data is retrieval augmented generation (RAG), which chunks data into prompts for LLM context.
- **App Dependencies** [00:14] — Dependencies include LangChain, Streamlit, and Watsonx.ai. These are explained as they are used.
- **Chat Interface Setup** [00:28] — Streamlit's chat components are used: chat input for prompts, chat message for display. Initially only the last message shows; fixed by storing messages in session state.
- **LLM Integration** [01:14] — Using LangChain to interface with Watsonx.ai, chosen for state-of-the-art models and no data training. Requires API key from IBM Cloud IAM and project ID.
- **Displaying LLM Responses** [01:43] — LLM responses are shown using Streamlit's chat message component with role 'assistant'. Messages are saved to session state for history.
- **Adding Custom Data** [02:11] — Phase 3: Load a PDF using a function, pass it to a vector store index creator (using Chroma DB) with embeddings. Wrapped in st.cache_resource for efficiency.
- **Chat with PDF** [02:25] — Use an LLM retriever QA chain with the index and base LLM via chain.run to enable chatting with a PDF (e.g., on generative AI).

### Conclusion

The tutorial successfully builds a working app that allows users to chat with their own PDF data using RAG, combining Streamlit, LangChain, Watsonx.ai, and Chroma DB for an efficient and cost-effective LLM application.

## Transcript

In this video, I'm going to show you how to build a large language model app to chat with your own data. This is arguably the cheapest and the most efficient way to get started with LLM's for your own business. But before we do that, I want to back up a little. The technique that makes this work is called retrieval augmented generation. A fancy way of saying
we chuck in chunks of your data into a prompt and get the LLM to answer based on that context. The first thing that we need to do is build an app to chat. There's a bunch of dependencies that I need to import. They're mainly from lane chain, but there's a little streamlit and what's the next running for good measure. I'll explain these as I use them so don't stress for now.
Streamlit has a great set of chat components so I'm going to make the most of them. Add in a chat input component to hold the prompt and then display the user message using the chat message component via markdown. This means I can now see the messages showing up in the app, but it's only displaying the last message posted, not all of them. Easy fix, create a streamlit,
save variable. I'll call that message and append all of the user prompts into it. While added, I'll save the roll type in this case user into the dictionary. And then I can test it out. But the history doesn't show up. Well turns out, I haven't printed out the historical messages.
Yet, looped through all the messages in the session state message variable and used the chat message component to display them. And wait, did I save the app? Of course not, I'd never make a mistake like that. Let's just try that again and look at this. Historical prompts that have been passed through.
Woop didunic, where's the LLM at? Well, let's do it. I'm going to use the lane chain interface to what's an x.ai. Why? Well, it uses state-of-the-art-lage language models, doesn't use your data to train and it's built for business. But that's just scraping the surface. To do that, I'll create a credential
dictionary with an API key and use the ML service URL. You can create an API key from the IBM Cloud IAM menu. URLs for different regions are shown on the screen right now. Then, the LLM. I'm using llama 270b chat because I'm pretty fond of those furry bugs. Pass through some decoding parameters and
specify the project ID from what's an x. Now send the prompt through to the LLM and boom! Woop wait, it looks like it's running. But I need to show the LLM responses as well. Easy enough with the streamlit chat message component. Note, the chat role for the LLM
response is assistant rather than user. This helps to differentiate the responses. I'll render the prompt as markdown and save the message to the session state as well. That way the history is displayed in the app. And now it works. Yeah, yeah, that's great. But where's the custom data
coming to play? Entering phase 3. Alad in a load PDF function and specify the name of the PDF here. Then pass out to the LLM vector store index creator and choose the embedding's function to use. This basically chunks up the PDF and loads it into a vector database, chroma DB in this case.
Wrapping it in the ST.k resource function means that streamlit doesn't need to reload it each time, which makes it a whole heap faster. Anyway, I can then pass out index to the LLM retriever QA chain and swap out the base LLM for the Q&A chain using chain.run. And we can now chat with our PDF.
In this case, a PDF to do with generative AI. Meta, I know.
