Run LLMs Locally for FREE with Ollama
45sShows how to run powerful AI models on your own computer for free, appealing to privacy-conscious users and developers.
▶ Play ClipThis video teaches how to install and use Ollama, a free open-source tool for running LLMs locally. It covers installation, running models via terminal, using the HTTP API in code, and customizing models with model files.
Ollama is a free open-source tool to manage and run LLMs locally, offering privacy, security, and no cost.
Download from ollama.com, select your OS, and install. On Windows, double-click; on Mac/Linux, use command line.
Open the desktop app (starts backend server) or use terminal with 'ollama' command. Verify installation by typing 'ollama'.
Access models from Ollama library or GitHub. Consider RAM requirements; e.g., Llama 3.1 405B needs 231 GB RAM.
Use 'ollama run <model>' to download and start a model. Example: 'ollama run llama2'.
Type 'ollama list' to see installed models.
Run multiple models by repeating 'ollama run <model>'. Switch between them with the same command.
Ollama exposes an HTTP API on localhost:11434. Use 'ollama serve' to start the server manually.
Send POST requests to /api/chat with model and messages. Example code uses requests library with streaming.
Install 'ollama' Python package for simpler usage: 'client.generate(model, prompt)'.
Create a Modelfile with FROM, temperature, SYSTEM message. Use 'ollama create <name> -f <file>' to create custom model.
Use 'ollama rm <model>' to delete a model.
Ollama enables running LLMs locally for free with easy setup, API integration, and customization. It's a powerful tool for developers seeking privacy and control.
"Title promises a 15-minute tutorial and delivers exactly that, covering installation, usage, and API integration."
What is Ollama?
A free open-source tool to manage and run LLMs locally.
How do you install Ollama?
Download from ollama.com and install for your OS.
00:37
What command verifies Ollama installation?
Type 'ollama' in terminal.
01:26
How do you run a model with Ollama?
Use 'ollama run <model>' command.
03:16
How do you list installed models?
Type 'ollama list'.
04:13
What port does the Ollama HTTP API use by default?
11434.
07:58
How do you start the Ollama HTTP server manually?
Run 'ollama serve' in terminal.
07:40
What endpoint is used for chat in the Ollama API?
/api/chat.
09:00
How do you create a custom model in Ollama?
Create a Modelfile and run 'ollama create <name> -f <file>'.
12:32
How do you remove a model in Ollama?
Use 'ollama rm <model>'.
13:24
Ollama: Free Local LLM Tool
Introduces a tool that enables running LLMs locally for free with privacy and security.
HTTP API Integration
Shows how to integrate LLMs into applications via a simple HTTP API.
06:55Custom Model Creation
Demonstrates how to customize models with system prompts and parameters using Modelfiles.
11:18[00:00] in this short video I'll teach you
[00:01] everything you need to know to get up
[00:03] and running with AMA which is a
[00:05] fantastic free open-source tool that
[00:08] allows you to manage and run llms
[00:10] locally rather than having to pay for
[00:12] Chad GPT or use these hosted services
[00:15] online you can actually run all of these
[00:17] models locally on your own computer so
[00:19] you get privacy security and best of all
[00:22] they are completely free so with that in
[00:24] mind let me show you how to set this up
[00:26] get it running and I'll also explain to
[00:28] you how you can utilize this through
[00:29] code because olama provides an HTTP
[00:32] server which means you can call your
[00:34] models from really any type of
[00:35] application so first things first we do
[00:37] need to install AMA so to do that you
[00:39] can go to the website which is ama.com I
[00:42] will link it in the description and you
[00:44] can simply press on download and then
[00:46] select your operating system in my case
[00:48] I'm using Windows but of course you have
[00:50] the command for Linux and then the
[00:51] installation for Mac now once that's
[00:53] downloaded simply double click it and
[00:55] install it and then I'll show you the
[00:56] next step so once you've installed AMA
[00:59] there's a few different ways to run it
[01:00] first of all you can just open the
[01:02] desktop application so if you're on
[01:03] Windows you can go here to the search
[01:05] bar and just search for ama if you're on
[01:07] Mac you can simply go to the spotlight
[01:09] search same for Linux and just run the
[01:11] application now when you do that
[01:12] nothing's going to appear on your screen
[01:15] and the reason for that is this just
[01:16] starts a backend server that's running
[01:18] the AMA service now the other way to do
[01:20] this is to open up a command prompt or a
[01:22] terminal so you can see I'm in command
[01:24] prompt here on Windows and then to
[01:26] simply just type O llama if you do this
[01:28] you should get some kind of out put and
[01:30] if you see that it means that you've
[01:31] installed o llama correctly at this
[01:33] point I'll assume that you have this
[01:34] installed correctly and that this
[01:36] command gave you some kind of output and
[01:38] now what we can do is start running
[01:40] models so the first thing to look at is
[01:42] the different models that we have access
[01:43] to with olama now truthfully you have
[01:46] access to pretty much any open-source
[01:47] model you want and you can even write
[01:49] some custom configurations to use your
[01:51] own models or things you pull in from
[01:53] something like hugging face now if you
[01:55] go to the AMA GitHub repository which I
[01:57] will link down below you can see some of
[01:59] the common mods mod that you may want to
[02:00] download now keep in mind that since you
[02:02] are running these locally you will need
[02:04] to download the entire model and you'll
[02:06] need to have enough space on your
[02:07] computer you can see some of these are
[02:09] 43 GB for example and enough RAM to run
[02:12] and load the model depending on how
[02:14] large it is so you see if we look
[02:15] through here it defines the number of
[02:17] parameters for these models if we look
[02:19] at something like llama 3.1 we have 231
[02:22] GB and 45 billion parameters and if you
[02:26] go down here to this note it specifies
[02:28] how many gabt of ram you should have
[02:30] based on the different model parameters
[02:31] so even on my computer which has 64 GB
[02:34] of RAM it would be difficult to load the
[02:36] new llama 3.1 model with the 405 billion
[02:39] parameters so keep that in mind when you
[02:41] are choosing the models that you want to
[02:42] use for now I'm just going to go with
[02:44] the standard llama 2 model because this
[02:47] is older and it's not as large and I
[02:49] know that I can run it and most of you
[02:50] should be able to run it as well so I'm
[02:52] going to show you how we can pull that
[02:54] but if you're looking for a list of all
[02:56] of the models you have available you can
[02:57] go to the oama library so if you go to
[03:00] ama.com library and you can scroll
[03:02] through here and you'll see there are
[03:03] hundreds of different models you can
[03:05] sort them you can filter and you can
[03:07] find even multimodal models except
[03:09] things like video photos voice Etc so
[03:12] once you've decided on a Model that
[03:14] you'd like to run it's very simple to do
[03:16] so all you need to do is type a llama
[03:18] run and then the identifier of that
[03:20] model now in my case I just want to run
[03:22] llama 2 I know this is an outdated model
[03:24] I'm just doing it because it's smaller
[03:26] so I can simply type O llama run llama 2
[03:29] and if this model is not already
[03:31] installed on my system then it will
[03:33] download it and install it for me if it
[03:35] is already installed it's just going to
[03:37] bring up a prompt where it allows me to
[03:39] actually start typing to the model and
[03:41] messaging with it so notice here that
[03:43] it's just loading and it kind of gives
[03:44] me these three arrows and I can just
[03:46] start typing something to the model and
[03:47] get some kind of response and you can
[03:49] see it's pretty much instant because
[03:50] there's no latency it's running on my
[03:52] own machine now again if this wasn't
[03:54] already installed it would start pulling
[03:56] the model for you and then you would
[03:58] have to wait for it to finish it would
[03:59] install then you can run the model and
[04:01] you can start using it now after some
[04:03] experimentation it's told me that you
[04:05] can type slash bu to get out of this so
[04:07] if I type slby you can see that it will
[04:09] enclose this window and then if we want
[04:11] we can type amaama and then list and we
[04:14] can list the different models that we
[04:15] have available on our system in this
[04:16] case you can see I have llama 2 which is
[04:19] the latest version if I had any other
[04:20] models they would show up here so that's
[04:22] the basics on running models using oama
[04:24] but there's a lot more to show you so
[04:26] make sure you stick around after a quick
[04:28] word from our sponsor today's video is
[04:31] sponsored by SEO writing a tool that's
[04:33] transforming content creation across
[04:35] different niches and industries their
[04:37] new brand voice feature lets you
[04:39] generate content that matches your
[04:40] Unique Style whether you're writing
[04:42] tutorials reviews or even industry
[04:44] analysis one click generates a complete
[04:47] blog post with AI generated images and
[04:49] relevant videos embedded automatically
[04:52] potentially saving you hours of manual
[04:54] work what sets SEO writing apart is
[04:56] their deep web research with built-in
[04:58] citations when you need accurate
[05:00] up-to-date information the platform
[05:02] pulls from reliable sources and adds
[05:04] citations automatically their humanized
[05:06] text feature helps your AI generated
[05:08] content stand out while their external
[05:10] linking feature intelligently connects
[05:12] to relevant resources and for all you
[05:14] WordPress users out there there's a
[05:16] gamechanging feature that lets you
[05:18] connect your site and autopost content
[05:20] directly this feature allows for
[05:22] consistent scheduling while focusing on
[05:24] other projects now if you're ready to
[05:26] try it for yourself then use my code TW
[05:28] wt20 5 for a 25% discount click the link
[05:32] in the description and see how SEO
[05:34] writing can fit into your content
[05:36] strategy all right so we are continuing
[05:37] here and I want to show you what happens
[05:39] when you pull multiple models so again
[05:41] if we go back to the library we can
[05:43] start looking through different models
[05:44] that we may want to utilize maybe I can
[05:46] even just go back here to the GitHub if
[05:47] I want to find them a little bit easier
[05:49] and maybe I want to use the mistal model
[05:51] as well if that's the case I can just
[05:53] copy this command or the name mistl I
[05:55] can go back here I can simply run the
[05:57] command AMA run mistl it will then pull
[05:59] that manifest for me pull the model once
[06:02] that's finished I'll be able to use this
[06:03] and I'll show you how so looks like this
[06:05] has been downloaded and now I can start
[06:06] using the model if I want I can exit out
[06:09] of this and if I want to switch between
[06:10] the two different models again I just
[06:12] type O llama run and then I can specify
[06:14] the model that I want to use so if I
[06:15] want to go back to llama 2 I use llama 2
[06:18] if I want to go back to mistl I just
[06:21] type mistl and now I can start using
[06:23] Mistral so you can have as many models
[06:25] as you want and again you can list them
[06:27] by typing ol llama list and if you want
[06:29] all of the commands you can use simply
[06:31] type AMA and then it will show you which
[06:33] ones you have access to there's a lot of
[06:34] them for example you can also remove a
[06:36] model if you want to do that copy a
[06:38] model there's also customizations you
[06:39] can make to them which I'll show you in
[06:41] just one second all right so all of that
[06:43] is great but we probably want to know
[06:44] how to utilize these models from
[06:46] something like code from our
[06:47] applications sure they're great to use
[06:49] here in the terminal but a lot of times
[06:50] you want to integrate them with some
[06:52] kind of software especially if you're a
[06:53] programmer and you watch this channel so
[06:55] the interesting thing about olama is
[06:57] that it actually exposes an http API on
[07:00] Local Host that means that anything we
[07:03] just did here with commands we can
[07:04] actually trigger through the API so we
[07:06] can send request to this from something
[07:08] like curl Postman something like python
[07:11] code really any code at all that can
[07:12] send some type of HTTP request Now by
[07:15] default if you're running aama you
[07:17] should be able to see this if you're on
[07:18] Windows in kind of like the I don't know
[07:20] what you would call this Services bar
[07:22] wherever it's showing the running
[07:23] applications and you can see I have this
[07:25] little AMA logo now when olama is
[07:27] running as the desktop application by
[07:30] default that Port is going to be open so
[07:32] you'll be able to access the HTTP API
[07:34] but if for some reason this isn't
[07:36] running so for example if I quit this
[07:38] what I can do to trigger that to run is
[07:40] I can simply type AMA serve in my
[07:43] terminal if I do this it's now going to
[07:45] start running the HTTP API in this
[07:47] terminal instance and now I'll have
[07:50] access to it and here it will also show
[07:51] us what port it's running on although it
[07:53] should be standard and you can see if we
[07:54] look through here it gives us the exact
[07:56] Port so it's on
[07:58] 11,434 so if you wanted to you can copy
[08:01] that and save it for later so that we
[08:02] can use it in our code regardless now
[08:05] that the olama serve or the olama HTTP
[08:07] API is running we're able to call it and
[08:10] again just to clarify if you're running
[08:12] this as the desktop application it will
[08:14] already be running in the background but
[08:15] if for some reason you want to manually
[08:17] invoke this to run then you can run the
[08:19] command ol llama serve where it will
[08:21] give you all of this output and you'll
[08:22] be able to view all of the requests to
[08:24] the HTTP server so now that the server
[08:26] is running we can use something like the
[08:28] following python code here to send a
[08:30] request to it now this is done manually
[08:32] very intentionally I'm going to show you
[08:34] an easier way to do this in 1 second but
[08:36] it's just to illustrate that you do have
[08:37] kind of complete control over this if
[08:39] you want so you can see here in Python
[08:41] I'm using the requests and the Json
[08:43] module now just by the way if you want
[08:45] this to work on your machine you will
[08:46] need to install the request module so
[08:48] you can say pip install request or pip
[08:51] three install requests and I'm going to
[08:52] leave this code in the description uh
[08:54] Linked In A GitHub repo in case you want
[08:56] to check it out now what we do is we
[08:58] Define our base URL this is the URL of
[09:01] the server and then
[09:02] /i/ chat there's a lot of other
[09:04] endpoints that you can use here and you
[09:06] can even control deleting models adding
[09:08] models Etc but in this case we just want
[09:10] to chat with one of our models then we
[09:12] can define a payload this is the model
[09:14] that we want to chat with so in this
[09:15] case I've gone with mistl and then we
[09:17] can Define different messages here's a
[09:19] standard message with the role of a user
[09:22] next we can send a post request here
[09:23] using request. poost to our URL with our
[09:27] Chason payload which is this right here
[09:29] and enable the streaming mode which
[09:30] allows us to grab all of the responses
[09:33] as they are typed this way we can grab
[09:35] them in real time and we can show the
[09:37] model actually typing the response
[09:39] rather than waiting for the entire
[09:41] response to be generated and then
[09:43] viewing it now here's a little bit of
[09:44] code just to handle that streaming data
[09:46] for us so we're going through all of the
[09:48] lines that are returned from this
[09:50] response and then we are simply kind of
[09:52] printing them out okay so I'm going to
[09:54] show you what happens when I run this so
[09:55] we already have requests installed and
[09:58] if I go python sample request dopy just
[10:01] wait one second here it will stream in
[10:03] all of the data and then print it out so
[10:05] you can see that it's kind of printing
[10:06] it out line by line for us here as it
[10:08] gets it and there you go python is a
[10:09] high Lev language blah blah blah gives
[10:11] us the answer if we go back to the API
[10:13] we can see that the request was sent
[10:15] here it took 4.1 seconds to process and
[10:18] it returned to us that data sweet so
[10:20] there you go that is how you utilize the
[10:22] API manually but a lot of you probably
[10:24] don't want to write all of this code so
[10:26] instead we can use a very simple module
[10:28] from python called you guessed it ol
[10:31] llama so if you're using python or
[10:33] JavaScript there are packages that will
[10:34] do this for you so you can simply pip
[10:37] install olama or pip three install olama
[10:41] in your systems that you have this
[10:43] module and now you have access to the ol
[10:45] module you can simply create a client
[10:47] you can Define your model you can Define
[10:49] some kind of prompt and then you can use
[10:51] client. generate specify the model and
[10:54] the prompt and then you can grab the
[10:56] response okay so I'm going to quickly
[10:58] show this to you I can run this code
[11:00] with python package. piy and you will
[11:05] see here in just one second that we
[11:06] should be able to get the response okay
[11:09] and there you go we get the response and
[11:10] it gives us the answer so that is how
[11:12] you use the HTTP API now I'm going to
[11:15] show you how you can do some
[11:16] customizations to the models in ama so
[11:19] moving on I'll show you a quick
[11:20] customization that you can make to any
[11:22] of the models that you can pull with AMA
[11:24] so you can see on the right hand side of
[11:26] my screen that I've created something
[11:27] called a model file now I've just put
[11:29] this in a directory that's on my desktop
[11:31] you need to put the file in a location
[11:33] that you know and that you're able to
[11:34] access from your terminal and for the
[11:36] model file I've used this very simple
[11:38] syntax that I just took directly from
[11:40] the AMA website all you do is you
[11:42] specify from and then you have some kind
[11:44] of base model so in this case we're
[11:45] using llama 3.2 but you can use any
[11:48] model that you want that's available
[11:49] with a llama you can do something like
[11:51] set the temperature of the model you
[11:53] don't need to do this but there's some
[11:54] other parameters you can set as well and
[11:56] then you're able to pass something like
[11:58] a system message which is essentially
[11:59] kind of instructing the model what it's
[12:01] supposed to be doing and how it should
[12:03] handle the upcoming messages so in this
[12:05] case they've just written Ur Mario from
[12:07] Super Mario Bros answer as Mario the
[12:09] assistant only okay so we have this
[12:12] model file written notice I don't have
[12:13] any extension it's literally just called
[12:15] Model file no. txt or anything and what
[12:18] I've done is I've put my terminal in the
[12:20] same directory where this file exists
[12:23] now what I'm able to do is create a new
[12:25] model based on this model file in olama
[12:29] and have one that's set up as Mario so
[12:32] to do that I can type AMA create I can
[12:35] give this a name in this case I'll call
[12:37] it Mario you can call it anything that
[12:38] you want and then I'm going to specify
[12:40] DF which stands for file and then the
[12:42] location of my model file now in this
[12:45] case it's just simply at@ slm model file
[12:49] okay so I'm going to go ahead and create
[12:50] this and you'll see that it says success
[12:52] that's because I've already pulled model
[12:54] llama 3.2 so now if I want to utilize
[12:57] this customized model what I can do is
[12:59] type a llama run and then the name of
[13:01] the model which is Mario and now if I
[13:04] say hello you'll see that it says it's a
[13:06] me Mario and it kind of you know
[13:08] simulates like how Mario would reply so
[13:11] if you want to set up some custom models
[13:12] where they have some system prompts they
[13:15] have some different parameters set up
[13:16] with them you want to tweak them somehow
[13:18] you can do that using these model files
[13:20] then you can simply create them in olama
[13:22] now let's say you're done with this one
[13:24] and you want to remove it you can say RM
[13:26] or sorry AMA RM and then what is it the
[13:29] name of this one Mario and it will
[13:30] remove that so now if we type oama list
[13:33] you no longer see it and also it's worth
[13:35] noting that these uh models like Mario
[13:38] you can utilize them from code so in my
[13:40] python code here I can just specify
[13:42] Mario once that's created and then I can
[13:44] use that anyways guys that is it that's
[13:46] all I wanted to show you I hope you
[13:47] found this valuable if you did make sure
[13:49] to leave a like subscribe to the channel
[13:51] and I will see you in the next one
[13:54] [Music]
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.