---
title: 'How to Build a Local AI Agent With Python (Ollama, LangChain & RAG)'
source: 'https://youtube.com/watch?v=E4l91XKQSgw'
video_id: 'E4l91XKQSgw'
date: 2026-07-28
duration_sec: 1688
---

# How to Build a Local AI Agent With Python (Ollama, LangChain & RAG)

> Source: [How to Build a Local AI Agent With Python (Ollama, LangChain & RAG)](https://youtube.com/watch?v=E4l91XKQSgw)

## Summary

This tutorial demonstrates how to build a local AI agent using Python, Ollama, LangChain, and Chroma DB for retrieval-augmented generation (RAG). The agent can query a CSV file of restaurant reviews to answer questions, all running locally without any cloud APIs.

### Key Points

- **Project Overview** [00:00] — Build a local AI agent in minutes using Python, Ollama, LangChain, and Chroma DB for RAG, enabling retrieval from CSV/PDF files.
- **Demo: Querying Reviews** [00:34] — Agent queries a CSV of fake pizza restaurant reviews to answer questions like 'how is the quality of the pizza?' and 'are there vegan options?'.
- **Setup: Dependencies** [02:02] — Create a virtual environment, install langchain, langchain-ollama, langchain-chroma, and pandas.
- **Setup: Ollama Models** [04:34] — Install Ollama, pull llama3.2 for the LLM and mxbai-embed-large for embeddings.
- **Coding: Basic LLM Chain** [07:03] — Import Ollama LLM, create a chat prompt template, build a chain, and test with a simple question.
- **Coding: Vector Store Setup** [14:52] — Create a separate file to load CSV, define embeddings, initialize Chroma DB, and create a retriever.
- **Integration and Testing** [24:58] — Import retriever into main.py, use it to fetch relevant reviews before invoking the LLM chain, and run the interactive loop.

### Conclusion

This tutorial shows how to build a fully local AI agent with RAG using Python, Ollama, LangChain, and Chroma DB. The approach can be adapted to any CSV or document data, enabling private, offline question-answering.

## Transcript

in this video I'll be showing you how to
build a local AI agent in just a few
minutes using python we'll be using AMA
Lang chain and something called chroma
DB to act as our Vector search database
because I'm going to show you how to add
retrieval augmented generation into this
app that essentially means we can
retrieve relevant information from
something like a CSV file or a PDF and
bring that into our model now all of
this is completely free you don't need
an open AI account you don't need a clot
account or something you can do this all
from your local computer so let me show
you how to set it all up so I'm just
going to show you a quick demo of the
finished product and then we'll get into
the tutorial now you can see on the
right hand side of my screen here that I
just opened up a CSV file this CSV file
just contains some fake reviews for a
random pizza restaurant so we have title
date rating and review and you can see
something like best pizza in town here's
the date here's the ID or sorry the
rating of the review out of five and
then you have what the actual review is
and there's kind of some information now
what I'm going to show you is how to
build an AI agent here that can actually
go and look up relevant reviews from
this document to answer questions about
the restaurant I don't know about you
guys but whenever I go to a new place I
always look at the reviews and typically
I'm looking for an answer to my
particular question so this can kind of
do that for you so for example maybe I
want to know you know how is the quality
of the pizza okay well what it can do is
then go to this document find the
relevant reviews which you can see it
kind of pulls into here and analyzes and
then it gives me a conclusion overall
without more data or context it's
challenging to give a definitive score
on the pizza based solely on the reviews
however they do suggest a restaurant
with potentially room for improvement in
presentation and overall consistency so
there you go right I could ask something
like are there vegan options let's see
what that gives us and you can see here
in conclusion based on the reviews there
appear to be at least one vegan pizza
pizza and possibly more vegan options
available Okay cool so that's what we're
going to build this isn't going to be
super complicated it'll be pretty fast
so stick around around and let me show
you how we make it all right so we have
a few quick setup steps here and then we
can dive right into the code now the
first thing that we're going to need is
obviously some kind of CSV file now you
can use anything that you want and I'll
show you how to adjust this code for
your own example but if you want to
download the CSV file that I'm using
I'll leave a link to it in the
description and in fact all of the code
will be available from the GitHub so you
can go to the GitHub and you can
download this CSV file and just bring it
into a new folder in VSS code so to
begin open up some kind of code editor
I'm using VSS code code create a new
folder you can see I have one called
local AI agent bring in the CSV file and
then I also created this
requirements.txt file which just has the
three things that we're going to need to
install in Python so let's get started
with that installing our Python
dependencies and then I'll show you the
next steps so what we need to do is open
up our terminal again I'm inside of the
directory that I want to write code in
for this video and what I'm going to do
is create a virtual environment so to do
that I'm going to type python DMV en EnV
and then VV if you're on Mac or Linux
you can change this to Python 3 and what
this will do is create a new isolated
environment that we can install various
dependencies into if you don't know
anything about virtual environments and
you want to learn more I'll leave a
video on screen now that the virtual
environment has been created we need to
activate it to activate it if you're on
Windows is going to be dot slash the
name of the virtual environment slash
and then scripts with a capital S and
then slash activate when you type that
you should see that you get the name of
the virtual environment as a prefix
before your command line now if you are
on Mac or Linux then the command is
going to be SL venv and then this is
going to be bin SL activate Okay so it's
different if you're on Windows it's this
one and if you are Mac or Linux it's
going to be this one and again I'll
leave a video on screen that we'll go
through this more in depth now that we
have the virtual environment activated
what we're going to do is install the
various dependencies inside of here now
if you have this requirements. txt file
then you can say pip install dasr and
then you can do requirements.txt and
this will install all of the
requirements into our virtual
environment however if you don't have
the requirements.txt file you can just
type them out so you can just install
Lang chain you can install Lang chain
dama and you can install Lang chain Das
chroma like that okay so we just need to
install these dependencies in order to
be able to use these in Python so that's
going to take a second installing of
those dependencies for us and then once
that's done I'll be right back okay so
those are installed and the next thing
that we're going to need to get is
something called olama now olama allows
us to run models locally on our own
computer using our own Hardware so
that's why we're able to do everything
locally here rather than having to use
something like an openai API key so
please go to this page just ama.com if
you don't already have this software and
simply download it once you download it
what you should be able to do is just
open up some kind of terminal or command
prompt and then type the command olama
if you have any issues with this again
I'll put another video on screen that
walks through AMA in depth and we'll
show you how to set this up but once we
have Ama installed on our computer what
we're going to do is install an olama
model now AMA again it's just this open
source software and allows us to pull
various models to our own computer and
then run them using our own Hardware now
depending on the type of Hardware you
have that will dictate the models you'll
be able to run for example you probably
can't run a 200 g by model if you don't
have a graphics card in your computer so
I'm going to show you a few models that
should work on most machines if you have
a graphics card if you don't have a
graphics card and you just have a CPU
there's some very small models that you
can download and use but obviously the
performance won't be as good so what you
can do is you can actually go to the
olama library I'll leave this link in
the description and you can see that
there's various different models and it
kind of shows you all of the options
that they have now we're going to pull
two models to our computer we're just
going to pull llama 3.2 so this is kind
of a smaller model that we can use that
performs pretty well and then we're
going to pull an embedding model and
I'll show you the name of that in 1
second which we'll use to embed the
documents that we add into our Vector
store if that means nothing to you don't
worry just follow along with the next
steps okay so we're going to go into our
terminal and again we're going to make
sure that ama command works and then
we're going to type AMA pull and we're
going to start by pulling the model
llama 3.2 now you can pull any model
that you want you can choose you can go
look at the directory but I'm going to
go with 3.2 once that's done okay you
can see it's here because I already had
it downloaded then we can move on to the
next one now the next model that we're
going to pull is going to be an
embedding model now this embedding model
is going to be mxb Ai and then this is
going to be Dash embed Das large there's
various other embedding models you can
use but this is the one we'll use for
this video okay so we're going to go and
hit enter and then again downloaded to
our computer these are not super big so
you should be able to run them on your
computer if you have any kind of GPU all
right so now that we have these models
we're good to start writing some code so
I'm going to go back into VSS code I'm
going to make a file called
main.py and in this file I'm going to
start writing some code now you'll
notice that I actually get this
autocomplete here this is coming from
GitHub co-pilot you know that really
cool assistant that replaces a lot of
your manual typing work they've actually
sponsored this video and speaking of
Microsoft's GitHub co-pilot I was
fortunate enough to have them sponsor a
video a few weeks ago on AI agents and
today's video where I promise to
highlight some of the standout ways that
developers are using GitHub co-pilot
that you guys submitted with the coding
with co-pilot hashtag so let's get into
it check out these examples of how
developers are using GitHub co-pilot
like Emy who created an entire flutter
mobile app tug Duel who created a python
script to resize and save images Adrien
who used co-pilot as a beginner when he
was working in Jupiter to learn better
ways to write functions and Yousef who
uses it to avoid manually writing
tedious documentation Now personal I use
GitHub co-pilot every single time I open
up vs code and it's insane how well it
can predict what I want to do next and
save me tons of hours of manual typing
it's literally like it can read my mind
now I'm sure that you guys have more
stories on how you're using GitHub
co-pilot so please share them with me
using the coding with co-pilot hashtag
because I'm excited to check them out
now with that said let's get back to the
video all right so back into the code
editor here let's go ahead and get
started now we're going to begin by just
importing a few things so we're going to
say from Lang
chain. llms import the olama llm we're
then going to say from Lang chain
core. prompts import the chat prompt
template okay now if you're unfamiliar
with Lang chain this is a framework that
just makes it a lot easier for us to
work with llms it's very popular in
Python and it has all of these
extensions like the AMA extension that
allows us to directly use our llama
models and by the way what will happen
is a llama should be running in the
background on your computer and it's
going to expose a server or like an HTTP
rest API that we'll be able to
communicate with from our program so
when you pull these models they are
actually running on your own computer
and we can trigger AMA to utilize these
models from code in python or we can
actually just do it directly from the
command line so everything that I'm
showing you here will run 100% locally
on your own computer even though it
might not necessarily feel like that it
also means it'll be pretty fast okay so
after this we're going to specify our
model now I'm going to show you in this
code snippet here how to utilize an AMA
model like quite quickly and then we'll
start connecting some more complexity to
it with the vector database and I'll
talk about what that means so I'm going
to say model is equal to oama and then
inside of here I need to specify the
specific model from olama that I want to
use now if you're confused on what to
put here you can open up your command
prompt you can type a llama list like
this and it will show you the models
that you have available so you can see
that I have this embedding model I have
llama 3.2 I have mistol I have llama 2
so any of these models I can use so what
I'm going to do is just copy llama 3.2
you don't need the latest part of it you
can just do the original name and you
can put it right here okay so I'm going
to use model o llama model equal to
llama 3.2 and now I can start utilizing
this model and kind of invoking it so
next what we're going to do is is we're
going to create a template and this
template is going to be just a string
and inside of this string we're just
going to specify what we want the model
to actually do so we're going to say
something like you are an
expert in answering questions about a
pizza restaurant okay here are some
relevant reviews and then we're just
going to put inside of a variable here
reviews and say here is the question to
answer okay and then we're going to put
a question perfect then what we're going
to do is we're going to say our prompt
is equal to a chat prompt template we're
going to pass our template and actually
we don't need to pass the model I don't
know why it's doing that and now we've
created a chat prompt template where
we'll be able to pass in a reviews
variable and a question variable and
then the model can respond to that okay
then we're going to create a chain so
with the chain we can say prompt and
then we can put a type and then we can
put model now what this allows us to do
is essentially invoke this entire chain
that can combine multiple things
together to run our llm so first what
we'll do is we'll pass variables reviews
and question into this prompt this chat
prompt template that we just created and
then that will automatically get passed
to our model because we put it inside of
this chain and then it will return to us
whatever the answer is so if we want to
test this out really quickly because
this is literally all we need to in
order to do this we can say
chain. invoke and then inside of a
python dictionary we need to specify the
two variables that we had inside of this
prompt so we're going to have reviews
and then question okay so we'll start
with reviews and for now we can just
make this an empty list and then we can
say question and something like what is
the best pizza place in town that might
not necessarily make sense because this
is just about one pizza place but I just
want to show you a quick demo so we're
going to say result is equal to this and
then we're going to go down and we're
going to say print result okay so we can
just test this out and make sure that
it's working and it should go ahead and
invoke our olama llm and give us some
kind of response so let's go here and
run this we can do that by typing python
the name of our file which is main.py or
Python 3 main.py so I'm going to hit
enter give this a second to run and we
got an error some kind of formatting
issue so let's see what the problem is
okay so silly mistake here what we
actually need to say chat prompt
template. from template I forgot to
specify this method so of course that
was giving us an issue so let's go back
here and fix that quickly Python main.py
and we should see that this works now
give it a second and you can see it say
based on our customer feedack and
ratings I would highly recommend this
the top rated pizza place One reviewer
mentioned blah blah blah blah in fact
our own team has sampled their pizza so
it just came up with something random
here because I didn't actually give it
any reviews so it's kind of
hallucinating the response but you get
the idea okay it did actually work we
were able to use AMA and we got a
response from the model which is really
just the point of what we were testing
here okay so now what we're going to do
is we're just going to put this inside
of a y Loop so essentially we can just
keep asking it questions and then we're
going to set up the vector search so we
can actually get a relevant response so
let's set up a simple Loop here we're
just going to say while true then we're
going to ask a question so we're going
to say question is equal to input so we
can get some input from the user and
we'll say you know ask
your
question and then we're just going to
put a set of parentheses here and say Q
to quit so if they type Q then we can
quit we're going to say if the question
is equal to Q then we are going to break
otherwise we can invoke this chain so
we're going to say result is equal to
chain. invoke okay and for the question
we'll just put the question the user
asked so we'll replace this with
question and then we can print the
result now we also can just have a few
kind of formatting variables here so I'm
just going to say print and I'm just
going to print kind of a big line with a
few back slend characters and then same
thing here I'm just going to print a few
back SL ends so we can kind of read
what's happening okay so we don't need
to test this but this will just allow us
to continue to ask questions until we
type in Q now what I want to do is show
you how to set up the vector search all
right so we're going to create a new
file here called vector. py can call
this anything that you want and here's
where we're going to write the logic for
actually embedding our documents and
then looking them up or vectorizing our
documents now in case you're unfamiliar
with Vector search this essentially is
going to be a database it's going to be
hosted locally on our own computer using
something called chroma DB which we
installed earlier and this is going to
allow us to really quickly look up
relevant information that we can then
pass to our model and then our model can
use that data to give us some more
contextually relevant replies so
obviously llms are really good at kind
of synthesizing text and giving us
responses but usually they don't have
the correct data so in this case what
we're going to do is we're going to take
this entire CSV file we're going to put
it inside of this Vector enabled
database and then as soon as we ask a
question we're going to look up the
relevant documents in that database
we're going to pass those to the llm as
a list of reviews and then it will be
able to search through those reviews and
answer our question okay so that's like
the very Basics on Vector search let me
show you how we do that so we're going
to say from Lang chain uncore olama
we're going to
import the olama embeddings okay now one
thing that we need when we do this
vectorization process is an embedding
model this model will be able to take
text and convert it into a vector this
is essentially numbers that we can then
use to look up data really efficiently
next we're going to say from Lang chain
and then underscore chroma and we're
going to import chroma like this which
is which is going to be our Vector store
we're then going to say from Lang chain
uncore core. document import a document
we're going to create documents and then
pass these to our uh what do you call it
chroma database we're then going to
import OS and we're going to import
something that I forgot to install
before which is pandas as PD okay now
pandas is a library that we can use to
really easily read in our CSV file so
just quickly before I forget we do need
to install this so same as before we're
going to type pip install pandas in our
virtual environment and then we should
install that dependency and be able to
use it I'll also add it to the
requirements.txt file so if you guys
were to have downloaded this before you
would already have it okay so pandas is
installing we can just wait for that to
run and start writing some more code so
first things first we're going to load
in our CSV file we're going to use the
data in the CSV file for our Vector
store so of course we're going to need
the data so we're going to say DF
standing for data frame and this is
going to be pd. read _ CSV and we're
going to read in the realistic uncore
restaurant uncore
reviews. CSV and obviously you know read
in whatever the name of your CSV file is
I think that I spelled that correctly
although maybe not restaurant let's see
you know what we can just do this rename
copy and then paste here to avoid any
misspellings okay anyway so we have our
data frame here next we're going to
bring in the embedding model so we're
going to say embeddings is equal to the
olama embeddings and then we're going to
say model is equal to and then the name
of the model that we installed which is
mxb ai- embed D
large okay now after that we're going to
specify the location where we want to
store our Vector database so I'm going
to say do slash and then
chroma Lang chain and then this is going
to beore DB you can call this anything
that you want but this is just going to
a folder where we store our uh database
okay next after that we're going to say
addore documents is equal to and then
we're going to say not os. path. exist
and then the database location now what
I want to do is I want to check and see
if this database already exists if it
does that means that I've already
performed the process of converting the
CSV file into vectors and adding into
the database if it doesn't exist then it
means that I need to do that okay so we
don't need to keep doing this every
single time we can just one time
vectorize our data and then once it's
vectorized and it's in the database we
don't need to do that again we can just
start using it so below here I'm going
to say if add documents so if we do
actually need to add them then we're
going to say the following we're going
to say documents is equal to an empty
list and we're going to say IDs is equal
to an empty list as well then what we're
going to do is we're going to iterate
through our rows so we're going to say 4
I comma Row in DF do eer rows this is
simply going to go row by row through
our CSV file and then allow us to access
the various entries now what we're going
to do is we're going to create
individual documents we're going to add
them to the documents list and then
we're going to add them to our Vector
store okay so we're going to say
document is equal to document and inside
of this document we need to pass three
things we need to pass a page content
and this page content is going to be
what we will actually be vectorizing and
what we'll be looking up so if you
wanted to adjust this for your own
example any of the content that you want
to use to actually look up the
information in the database that needs
to go in the page content so what we're
going to do is we're going to combine
the title of the review with the review
itself so that we have a bunch of
information to be able to actually query
our data okay there's all kinds of
different things you can do here but you
want to include the important
information that you'll be querying
based on in the page content so we're
going to take row at title and then
we're going to say plus a space and then
row at review okay then we're going to
specify some metadata co-pilot's already
doing it for me so we have metadata and
then rating and that's row rating and
then we're going to have date and then
this is going to be
row date okay so the metadata is just
additional information that we will grab
along with the document but we won't be
querying based on the Met metadata okay
so hopefully that makes sense again just
additional data that will be included
with the document but it won't
necessarily be used to actually query
and then lastly we can specify an ID so
we're going to say the ID is the string
of I which is just the index of this
value in the row or in the uh what do
you call it the CSV file and just make
sure that you convert this to a string
okay so I think that should be good for
now after this what we're going to do is
we're going to say IDs do append and
we're going to append string I and then
we're going to say documents. append and
we're going to append our document now
the reason why we need to store the IDS
is because when we actually create this
data in the vector store for some reason
we need two separate lists we need a
list of documents and then we need a
list of their Associated IDs in case for
some reason they're different so I know
it seems a bit weird that we have the ID
twice but just follow along because we
need that for this process okay so now
we've kind of prepared the data in
documents and the next thing we need to
do is add this to the vector store so we
need to create the vector store so we're
going to say Vector store is equal to
chroma and then inside of chroma we're
going to specify the location and the
collection name so we're going to say
collection name is equal to restaurant
reviews we're going to say the
persistent directory co-pilot is leading
me wrong here is going to be equal to
the DB location now this just means that
we'll store it persistently rather than
just storing it in memory you don't need
to do this but I recommend that you do
store this permanently so that you don't
need to keep regenerating this chroma
database and then lastly we we need to
pass the embedding function which will
be equal to our embeddings from olama
okay so we're using all of this stuff
locally we have the chrom ADB locally we
have the local embeddings model and now
we have the vector store next I'm going
to do a quick if statement and I'm going
to say if add documents then we're going
to say Vector store. add documents and
then this is going to be documents is
equal to documents and IDs is equal to
IDs okay so this is how you add this you
just say Vector store. add documents you
specify the documents that you want to
add which we've already prepared here
and then you specify the corresponding
IDs and we're only doing that if this
did not already exist because if it did
already exist then we don't need to add
the documents right and we wouldn't have
already prepared this data hopefully
that makes sense but that essentially
will create the vector store for us and
automatically add the data last thing
we're going to do is we're going to make
this Vector store be usable by our llm
so I'm going to show you how to do that
we're going to say
retriever okay is equal to the vector
store.
asore retriever okay now inside of here
there's a few parameters that we can
pass for example we can specify the
number of documents that we wanted to
look up so I'm going to say search
keyword arguments is equal to K and then
five now when I do this what's going to
happen is it's going to look up five
relevant reviews and then pass those
five reviews to the the llm now if we
wanted 10 reviews we would make this 10
if we wanted one review we would make
this one you can specify as many or as
few as you want obviously minimum of one
but I'm going to go with five okay so
now we have the Retriever and what this
retriever will allow us to do is look up
documents then we can pass those
documents into the prompt for our llm so
quickly recapping we import all the
relevant data we bring in the CSV file
we uh Define the embeddings model from
llama we check if this location already
exists if it doesn't then we're going to
prepare all of our data by converting it
into documents we're going to initialize
the vector store if for some reason this
directory already exists then there's no
need to add the data but if it doesn't
exist then we're going to add this data
into the vector store by adding all of
our documents this will automatically
embed all of the documents for us and
add it to the vector store and then we
can create this retriever from the
vector store which will allow us to grab
documents so the last step is to Simply
use this retriever from our main.py file
so we're going to go into main.py we're
going to say from Vector import
retriever because we're just going to
import it from the other file and now
before we actually invoke this chain we
can use the retriever to grab the
relevant reviews and then we can pass
the reviews as a parameter to our prompt
okay so in order to do that we're just
going to say the reviews is equal to the
Retriever and then this is going to be
Dot invoke we're just going to invoke
this with our question and then we can
simply pass the reviews that are
returned here to our chain so all that
we do here is we just say retriever do
invoke we pass the question or like the
search string that we want to use to
look up the relevant reviews what will
happen is the retriever is automatically
going to embed that question it's going
to go into the vector store it's going
to look up all of the relevant reviews
using a similarity search algorithm it's
going to grab the top five reviews and
then it's going to pass this to our
chain and then we can print out the
result and hopefully we get something
meaningful based on those reviews so
let's give this a run now and pray that
it works with python and then
main.py give this a second to run you
can see that it creates this chroma Lang
chain DB directory it will take a second
because it does need to embed all of our
documents and now we can ask a question
so I'm going to say how are the you know
vegan options if I can spell anything
correctly which apparently I cannot okay
so let's see what we get here and you
can see that it pulls up a few different
reviews here and it says based on the
reviews provided appears the vegan
options of the PE Peach Restaurant are a
mixed bag on the positive side some
reviewers have raved about the vegan
pizz saying they're hidden gems okay and
it even tells us what document it got
this from however not all reviews are
glowing One reviewer had a vastly
different experience with the vegan
cheese option calling it tasteless and
then it says overall it seems that the
vegan opt options are Hit or Miss but
there's definitely potential and then it
gave us a overall rating three out of
five based on the two positive views out
of the four total okay cool we can also
ask it something like you know how is
the Ambiance or something I don't know
if I spelled that correctly but let's
see what it says said overall I would
say the Ambiance of the pizza restaurant
has an all Style no substance feel and
apparently they don't like the pizza
restaurant based on these reviews but
you guys get the idea it is insanely
fast it uses the vector store database
everything runs completely low locally
and we're ready to quit we can hit q and
we can exit out this was a simple
example that was just meant to
demonstrate how you can run llms locally
on your own computer using your own
Hardware obviously you can adjust the
CSV file and you can make this any type
of data that you want it also doesn't
need to be CSV data you can just convert
anything that you want into documents
like I demonstrated here and if you want
the code from this video it will be
available from the link in the
description if you guys enjoyed make
sure to leave a like subscribe to the
channel and I will see you in the next
one
[Music]