[00:00] In this video, I'm going to show you how to build a large language model app to chat with your own data. This is arguably the cheapest and the most efficient way to get started with LLM's for your own business. But before we do that, I want to back up a little. The technique that makes this work is called retrieval augmented generation. A fancy way of saying
[00:14] we chuck in chunks of your data into a prompt and get the LLM to answer based on that context. The first thing that we need to do is build an app to chat. There's a bunch of dependencies that I need to import. They're mainly from lane chain, but there's a little streamlit and what's the next running for good measure. I'll explain these as I use them so don't stress for now.
[00:28] Streamlit has a great set of chat components so I'm going to make the most of them. Add in a chat input component to hold the prompt and then display the user message using the chat message component via markdown. This means I can now see the messages showing up in the app, but it's only displaying the last message posted, not all of them. Easy fix, create a streamlit,
[00:45] save variable. I'll call that message and append all of the user prompts into it. While added, I'll save the roll type in this case user into the dictionary. And then I can test it out. But the history doesn't show up. Well turns out, I haven't printed out the historical messages.
[00:59] Yet, looped through all the messages in the session state message variable and used the chat message component to display them. And wait, did I save the app? Of course not, I'd never make a mistake like that. Let's just try that again and look at this. Historical prompts that have been passed through.
[01:14] Woop didunic, where's the LLM at? Well, let's do it. I'm going to use the lane chain interface to what's an x.ai. Why? Well, it uses state-of-the-art-lage language models, doesn't use your data to train and it's built for business. But that's just scraping the surface. To do that, I'll create a credential
[01:28] dictionary with an API key and use the ML service URL. You can create an API key from the IBM Cloud IAM menu. URLs for different regions are shown on the screen right now. Then, the LLM. I'm using llama 270b chat because I'm pretty fond of those furry bugs. Pass through some decoding parameters and
[01:43] specify the project ID from what's an x. Now send the prompt through to the LLM and boom! Woop wait, it looks like it's running. But I need to show the LLM responses as well. Easy enough with the streamlit chat message component. Note, the chat role for the LLM
[01:57] response is assistant rather than user. This helps to differentiate the responses. I'll render the prompt as markdown and save the message to the session state as well. That way the history is displayed in the app. And now it works. Yeah, yeah, that's great. But where's the custom data
[02:11] coming to play? Entering phase 3. Alad in a load PDF function and specify the name of the PDF here. Then pass out to the LLM vector store index creator and choose the embedding's function to use. This basically chunks up the PDF and loads it into a vector database, chroma DB in this case.
[02:25] Wrapping it in the ST.k resource function means that streamlit doesn't need to reload it each time, which makes it a whole heap faster. Anyway, I can then pass out index to the LLM retriever QA chain and swap out the base LLM for the Q&A chain using chain.run. And we can now chat with our PDF.
[02:40] In this case, a PDF to do with generative AI. Meta, I know.