TubeSum ← Transcribe a video

Ollama Course – Build AI Apps Locally

Transcribed Jun 14, 2026 Watch on YouTube ↗
Intermediate 45 min read For: Developers and AI engineers with basic Python knowledge who want to build local LLM applications.
664.5K
Views
13.5K
Likes
442
Comments
144
Dislikes
2.1%
📈 Moderate

AI Summary

This course teaches how to use Ollama, an open-source tool for running large language models locally. It covers setup, model management, REST APIs, Python integrations, and real-world projects like a grocery list organizer, RAG system, and AI recruiter agency.

[00:00]
Course Introduction

Ollama simplifies running LLMs locally. The course covers pulling/customizing models, REST APIs, Python integrations, and projects like a grocery organizer, RAG system, and AI recruiter.

[02:00]
What is Ollama

Ollama is an open-source tool that simplifies running LLMs locally on your own hardware, abstracting technical complexity.

[06:44]
Ollama Deep Dive

Ollama uses a CLI to manage installation and execution of models. It provides a straightforward way to download, run, and interact with various LLMs without cloud services.

[08:52]
Problem Ollama Solves

Ollama addresses cost, privacy, latency, and customization issues. Local execution eliminates API costs, keeps data private, reduces latency, and allows model fine-tuning.

[13:10]
Key Features of Ollama

Model management, unified interface, extensibility, and performance optimizations including GPU acceleration.

[15:10]
Use Cases

Development/testing, education/research, and secure applications in healthcare/finance where data privacy is critical.

[17:58]
Installation on Mac

Download from ollama.com, install the application, and run 'ollama run llama3.2' to get started.

[21:00]
Interacting with Models via CLI

Use 'ollama run <model>' to start a shell. Commands like /show info display model details.

[26:28]
Model Selection and Parameters

Ollama library hosts many models. Parameters (e.g., 3B, 1B) indicate model size and complexity; larger models are more accurate but require more resources.

[33:00]
Understanding Model Parameters

Parameters are internal weights learned during training. More parameters generally mean better performance but higher computational cost.

[37:30]
Context Length and Embedding Length

Context length is max tokens per input; embedding length is vector size for token representation. Larger values capture more nuance.

[39:26]
Quantization

Technique to reduce model size by lowering weight precision (e.g., 4-bit), resulting in smaller, faster models with lower memory usage.

[42:57]
Ollama CLI Commands

Commands: ollama list, ollama pull, ollama run, ollama rm, ollama help. Models can be pulled and run interchangeably.

[47:00]
Multimodal Models (LLaVA)

LLaVA combines vision encoder and LLM for visual understanding. Example: describing an image of flowers.

[54:00]
Customizing Models with Modelfile

Create a Modelfile with FROM, PARAMETER temperature, SYSTEM message. Use 'ollama create' to build a customized model.

[60:00]
REST API Endpoints

Ollama serves at localhost:11434. Endpoints: /api/generate, /api/chat. Use curl with stream=false for complete responses.

[66:00]
UI-Based Interface with Msty

Msty app provides a ChatGPT-like UI for local models. Supports knowledge stacks for RAG with embedding models.

[84:00]
Python Library Basics

Install ollama Python library. Use ollama.list(), ollama.chat(), ollama.generate() to interact with models programmatically.

[96:00]
Streaming Responses in Python

Set stream=True in chat() and iterate over response chunks to display tokens as they arrive.

[103:00]
Grocery List Organizer Project

Use LLM to categorize and sort grocery items from a text file. Prompt instructs model to categorize into produce, dairy, etc., and sort alphabetically.

[111:00]
RAG System Overview

RAG = Retrieval Augmented Generation. Combines document retrieval with LLM to answer questions based on custom data, reducing hallucination.

[117:00]
RAG Architecture

Documents are chunked, embedded, stored in vector DB. User query is embedded, similar chunks retrieved, and passed with prompt to LLM for answer.

[120:00]
LangChain for RAG

LangChain provides abstractions for document loading, splitting, embeddings, vector stores, and retrieval. Simplifies building RAG pipelines.

[126:00]
Building RAG with Ollama and LangChain

Use Ollama embeddings (nomic-embed-text) and LLM (llama3.2) with ChromaDB. Multi-query retriever generates multiple query perspectives for better retrieval.

[148:00]
AI Recruiter Agency Project

Multi-agent system using Swarm framework with Ollama. Agents: extractor, analyzer, matcher, screener, recommender, orchestrator.

[157:00]
Base Agent Class

BaseAgent sets up OpenAI client with custom base URL for Ollama. Provides query_ollama() method and JSON parsing helper.

[162:00]
Specialized Agents

Each agent (e.g., ScreenerAgent) inherits from BaseAgent, defines instructions, and implements run() method. Agents are called by orchestrator.

[168:00]
Orchestrator Agent

Coordinates workflow: extract resume, analyze profile, match jobs, screen candidates, generate recommendations. Maintains workflow context.

[172:00]
Streamlit UI for Recruiter

Streamlit app provides tabs for upload, skills analysis, job matches, screening, and recommendations. Results saved to text file.

Ollama democratizes local AI by enabling free, private, and customizable LLM applications. With CLI, REST API, Python library, and integration with frameworks like LangChain and Swarm, you can build powerful AI solutions entirely on your own machine.

Clickbait Check

95% Legit

"Title accurately promises building AI apps locally with Ollama; course delivers exactly that with hands-on projects."

Mentioned in this Video

Tutorial Checklist

1 17:58 Download and install Ollama from ollama.com for your OS (Mac, Windows, Linux).
2 20:27 Run 'ollama run llama3.2' in terminal to download and start the model.
3 24:00 Interact with the model via CLI: ask questions, use /show info to view model details, /bye to exit.
4 30:58 Pull additional models: 'ollama pull llama3.2:1b' or 'ollama pull codegemma:2b'.
5 42:57 Manage models: 'ollama list' to see installed, 'ollama rm <model>' to delete.
6 54:00 Create a Modelfile with FROM, PARAMETER temperature, SYSTEM message. Run 'ollama create <name> -f Modelfile'.
7 60:00 Use REST API: curl -X POST http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"...","stream":false}'
8 84:00 Install Python library: pip install ollama. Use ollama.chat(model='llama3.2', messages=[{'role':'user','content':'...'}])
9 103:00 Build grocery organizer: read items from file, create prompt with categorization instructions, call ollama.generate(), save output.
10 126:00 Build RAG system: load PDF, split into chunks, embed with nomic-embed-text, store in ChromaDB, use multi-query retriever with llama3.2.
11 148:00 Build AI recruiter: define BaseAgent with custom OpenAI base URL, create specialized agents (extractor, matcher, etc.), orchestrate with Swarm, wrap in Streamlit.

Study Flashcards (10)

What is Ollama?

easy Click to reveal answer

An open-source tool that simplifies running large language models locally on your own hardware.

06:44

What command lists all installed models in Ollama?

easy Click to reveal answer

ollama list

42:57

What does the 'parameters' value (e.g., 3B) in a model indicate?

medium Click to reveal answer

The number of internal weights and biases (in billions) that determine how the model processes input and generates output.

33:00

What is quantization in the context of LLMs?

medium Click to reveal answer

A technique to reduce model size by lowering the precision of its weights (e.g., to 4 bits), resulting in smaller and faster models.

39:26

What is the default port and endpoint for Ollama's REST API?

medium Click to reveal answer

localhost:11434, with endpoints like /api/generate and /api/chat.

60:00

How do you prevent streaming when using Ollama's REST API?

easy Click to reveal answer

Set 'stream': false in the JSON payload.

62:31

What is a Modelfile used for in Ollama?

medium Click to reveal answer

To customize a model by specifying base model, parameters (e.g., temperature), and system message.

54:00

What does RAG stand for and what problem does it solve?

medium Click to reveal answer

Retrieval Augmented Generation; it allows LLMs to answer questions based on custom documents, reducing hallucination.

111:00

Name two embedding models available in Ollama.

hard Click to reveal answer

nomic-embed-text and mxbai-embed-large.

134:00

What is the role of the orchestrator agent in the AI recruiter project?

hard Click to reveal answer

It coordinates the recruitment workflow, delegating tasks to specialized agents (extractor, matcher, screener, recommender) and maintaining context.

168:00

💡 Key Takeaways

💡

Ollama's Core Purpose

Defines Ollama as a tool that abstracts technical complexity, making local LLM execution accessible.

06:44
📊

Key Advantages of Local LLMs

Highlights cost savings, privacy, reduced latency, and customization as major benefits.

08:52
🔧

Understanding Model Parameters

Explains the trade-off between model accuracy and computational resources based on parameter count.

33:00
🔧

Customizing Models with Modelfile

Demonstrates how to create tailored models by adjusting temperature and system prompts.

54:00
⚖️

RAG Reduces Hallucination

Explains how RAG injects relevant documents into the prompt to ground LLM responses.

111:00
💡

Multi-Agent AI Recruiter

Showcases a practical, production-like application using agent orchestration with local models.

148:00

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Run LLMs Locally for Free with Ollama

45s

High educational value: shows how to run powerful AI models on your own computer without paying for cloud APIs.

▶ Play Clip

Ollama Installation & First Model

60s

Practical step-by-step guide that viewers can follow along to set up Ollama and run their first model.

▶ Play Clip

Multimodal AI: LLaVA Describes Images

60s

Visually engaging demo of a multimodal model analyzing an image and writing a poem about it, showcasing advanced capabilities.

▶ Play Clip

Customize Your Own AI Model in Minutes

60s

Empowering and surprising: viewers learn they can create a custom AI assistant with a simple text file, no coding required.

▶ Play Clip

Ollama REST API: Build AI Apps

60s

Bridges the gap between CLI and real-world applications, showing how to integrate Ollama into Python projects.

▶ Play Clip

[00:00] olama is an open-source tool that

[00:02] simplifies running large language models

[00:05] locally on your personal computer in

[00:09] this course you'll learn how to set up

[00:11] and use olama this Hands-On course

[00:14] covers polling and customizing models

[00:17] rest apis python Integrations and real

[00:21] world projects like a grocer lless

[00:23] organizer rag system and an AI recruiter

[00:26] agency Paulo created this course hello

[00:30] and welcome to this mini course I'm very

[00:33] excited that you're here because I know

[00:35] you want to get right in and start

[00:37] learning about AMA and building AI based

[00:41] Solutions using AMA first let me go

[00:43] ahead and introduce myself again which

[00:45] is always awkward okay my name is Paulo

[00:48] deson I am a software Ai and Cloud

[00:51] engineer but most importantly I am an

[00:54] online instructor and I have taught over

[00:58] 280,000 students

[01:00] all over the world in about 175

[01:03] countries uh skills ranging from

[01:06] building mobile applications Android to

[01:08] learning Ai and Cloud Technologies so

[01:12] this is what I do and I love what I do

[01:14] what is this minicourse about I want to

[01:16] show you how to leverage olama and its

[01:19] many many models so that you can build

[01:22] AI Solutions a applications locally for

[01:26] free usually the structure that I like

[01:28] to follow and has Pro proven to work is

[01:31] in the beginning or the first part of

[01:33] the courses I start with the basics the

[01:36] fundamentals as well as Theory Concepts

[01:40] because I think it's important for you

[01:42] to also understand the theory before you

[01:44] go into Hands-On and then the second

[01:46] part is usually Hands-On but sometimes I

[01:49] like to mix those two together so that

[01:52] it's more exciting and you don't get

[01:54] bored okay so I know you're very excited

[01:56] to get started and I'm excited that

[01:58] you're here okay let's go ahead and get

[02:00] started well this course is going to be

[02:02] about ol that's the main thing so we're

[02:04] going to learn how to build local large

[02:07] language model llm applications using

[02:10] AMA so the idea is that we're going to

[02:12] be able to use AMA to customize our

[02:15] models meaning that we are able to use

[02:18] different flavors of models so we can

[02:21] test them around and all of that is

[02:23] actually going to be free which is very

[02:25] exciting and also we're going to be able

[02:27] to build rag system so retrieval

[02:30] augmented generation systems that are

[02:33] powered solely by AMA models keep in

[02:37] mind all of this is going to be for free

[02:39] you're not going to have to pay for

[02:41] anything okay and we are going to learn

[02:45] about tools and function calling when we

[02:49] are using these AMA models so we can Aid

[02:52] our large language models that are going

[02:54] to be using with more things that they

[02:57] can do okay I'm going to look at all of

[02:59] that and ultimately I'm going to show

[03:01] you how to use AMA models to build

[03:04] full-fledged large language model

[03:06] applications using these models okay

[03:10] keep in mind all of these models are

[03:12] going to be for free we don't have to

[03:14] pay a dime to use these models now let's

[03:17] talk about who is this course for now

[03:20] this course is for developers AI

[03:21] Engineers open-minded Learners machine

[03:25] learning engineers and so forth as well

[03:28] as data scientists so if you are

[03:31] somebody who is willing to put in the

[03:33] work and wants to learn about AMA and

[03:38] build local llm applications then this

[03:41] course is for you now let's talk about

[03:43] course prerequisites in this course I

[03:46] assume that you know programming this is

[03:49] a very um important skill obviously and

[03:52] I expect you to at least know the basics

[03:55] you'll be seeing a lot of python in this

[03:58] course so understand understanding how

[04:00] python Works how to write python code is

[04:03] obviously very important you also need

[04:06] to understand very basics of AI machine

[04:09] learning large language model you don't

[04:11] have to be an expert but just have some

[04:14] general

[04:15] understanding keep in mind also that

[04:18] this is not a programming course because

[04:20] I'm not going to be teaching you how to

[04:22] program and most importantly I always

[04:24] tell my Learners is that you have to be

[04:26] willing to learn and if there things

[04:29] that you need to go and brush on just go

[04:32] and do that and come back to the course

[04:34] but most likely you won't need to go

[04:37] anywhere but continue in this course and

[04:39] learn as much as you can so most of my

[04:42] courses I have this mixture of two

[04:45] things I have Theory so this is where we

[04:47] talk about the fundamental concepts the

[04:50] lingo and so forth and I have Hands-On

[04:53] because it's all about actually doing

[04:55] things that way you actually understand

[04:57] and know how to get things done that's

[05:00] the whole point and so the thing is I'd

[05:02] like to do a mixture of both and

[05:05] sometimes we have more handson and

[05:07] sometimes we we may have more Theory but

[05:10] I try to balance it all out where we

[05:12] have mixture both because I know I

[05:14] believe at least that is important for

[05:16] you to have both in your learning

[05:19] process but in this course you are

[05:21] mainly going to be focusing on Hands-On

[05:24] although you will still have some Theory

[05:27] fundamentals as well all right so let's

[05:29] look at the development environment

[05:32] setup so in this case you know that this

[05:35] is all P about python which means you'll

[05:37] have to have python installed and also

[05:40] you have to have some sort of a code

[05:42] editor for me I'm going to be using VSS

[05:45] code so you don't have to use that if

[05:48] you don't like it but that's what I'm

[05:49] going to be using you can use whatever

[05:51] you want okay now I'm not going to go

[05:53] through the process of installing python

[05:55] or vs code or anything like that because

[05:57] you should be able to have that if not

[05:59] you should be able to do that on your

[06:01] own so one place that I always recommend

[06:04] people go is to this link here kin.com

[06:08] knowledgebase install python there are

[06:10] many places online where you can learn

[06:13] or they will walk you through the

[06:15] installation of python if that's your

[06:17] case uh but this seems to be one of the

[06:20] best my favorite okay so if you don't

[06:22] have python installed please go ahead

[06:24] and get that set up so that we can

[06:28] continue in the next next videos now we

[06:30] have everything set up on our machine

[06:32] and we're ready to go so we're going to

[06:34] do a deep dive into AMA so you

[06:36] understand exactly what is how it works

[06:39] and what problem um AMA solves all right

[06:42] so let's go ahead and get started now

[06:44] it's time for us to do the Deep dive in

[06:47] ama so we understand exactly what is AMA

[06:51] what is the motivation behind and the

[06:53] advantages that ama brings for us

[06:55] developers okay so first let's look at

[06:58] AMA what is it ama is an open-source

[07:02] tool that is designed to essentially

[07:04] simplify the process of running large

[07:08] language models locally meaning on your

[07:12] own Hardware the idea here is very

[07:14] simple as you know right now if you want

[07:17] to run large language models or if you

[07:20] want to use a model large language model

[07:22] in this case just a model most likely

[07:25] you'll have to use open Ai chbt and so

[07:28] forth and many others out there that are

[07:31] paid and the thing is with a Lama you

[07:34] don't have to pay for anything it's free

[07:37] and that's the beauty so the idea is

[07:38] that ama sits at the center and allows

[07:41] us developers to pick different large

[07:44] language models depending on the

[07:46] situation depending on what we want to

[07:49] do at its core AMA uses what we call the

[07:54] CLI which is command line interface what

[07:57] this does is it manages is all the

[08:00] things in the back end so the

[08:02] installation and also the execution of

[08:05] different models but all of that again

[08:08] keep in mind it's all locally you will

[08:11] see as we go through that ama abstracts

[08:13] away the technicality so the technical

[08:16] complexity that are involved when we

[08:19] want to set up these models which makes

[08:22] Advanced language processing accessible

[08:25] to a broader audience such as developers

[08:28] researchers and hobbyists in nutshell

[08:32] AMA provides a straightforward way to

[08:35] download run and interact with various

[08:40] models or LMS without us having to rely

[08:44] on cloud-based services or even dealing

[08:47] with complex setup procedures rather

[08:49] what is the problem that ama solves now

[08:52] when we talk about large language models

[08:55] we are usually talking about this thing

[08:58] called rag system retrieval augmented

[09:00] generation the idea is very simple is

[09:02] that we have documents those documents

[09:05] are chopped into smaller chunks right

[09:08] and then we pass those through a model

[09:12] that is responsible for creating

[09:14] embeddings which is the vector

[09:17] representation of these chunks those

[09:20] embeddings are saved somewhere and then

[09:22] a query comes in goes through the same

[09:25] process of embedding right and then once

[09:27] we have this embedding that is indeed

[09:29] what's actually used to query do similar

[09:33] search internally in a factor database

[09:37] so that we can then pass all that

[09:39] information right through a large

[09:41] language model a model and of course get

[09:44] the response now again usually what

[09:46] happen is that all these models that we

[09:48] need to run a rag system for instance we

[09:51] may need to pay such as when we use

[09:54] openingi trpt and all different models

[09:56] that they provide okay now the other

[09:59] thing also behind all of that is that

[10:01] when we are using AMA models these are

[10:05] ours meaning that we download them

[10:07] internally in our local machine and we

[10:10] have a lot of control okay so that is

[10:12] the beauty of or in this case that is

[10:15] the problem that they solved but that's

[10:17] not the only problem that been solved

[10:19] the other one is that privacy so in this

[10:22] case here when we run our own models

[10:26] locally uh we are making sure that our

[10:28] data doesn't need to be sent to external

[10:32] servers because remember when we passing

[10:35] data through a large language model

[10:37] using chbt in this case or openi models

[10:40] and so forth we literally are passing

[10:42] through or sending our private data to

[10:46] different servers that we don't know

[10:48] what will happen out there so this way

[10:51] we have some security issues but with

[10:53] AMA that goes away because we have this

[10:57] enhanced privacy and security this is

[10:59] very important because imagine that you

[11:00] have an application that needs to be uh

[11:04] dealing with sensitive information then

[11:07] it's important to have your own models

[11:11] your own llm so that you have this

[11:15] contained enir environment where you're

[11:17] not sending out that sensitive

[11:19] information when you want to set up

[11:20] large language models it is cumbersome

[11:22] which means is technically challenging

[11:25] and often requiring knowledge of machine

[11:28] learning

[11:30] Frameworks and also Hardware

[11:32] configurations but as you will see with

[11:34] AMA all of that is simplified so this

[11:36] whole process is simplified by handling

[11:40] the heavy lifting for you also most

[11:43] importantly cost efficiency now we're

[11:46] going to be eliminating the need for

[11:48] cloud-based Services which means you're

[11:50] going to avoid this ongoing costs

[11:53] associated with API calls or server

[11:56] usage because everything remember is

[11:58] going to be local goal so once you have

[12:01] everything set up you can run models

[12:03] without additional expenses the other

[12:05] one is lency reduction what that means

[12:08] is that local execution obviously

[12:11] reduces the latency inherent in network

[12:14] communications so when we are

[12:16] communicating with something that is

[12:18] somewhere else through the network of

[12:21] course there is always a delay even if

[12:23] the network is really fast there's

[12:25] always this latency the delay okay but

[12:28] since every everything is local we don't

[12:30] have that problem okay because we're not

[12:32] dealing with remote servers which

[12:35] results in Faster response times for

[12:38] interactive

[12:39] applications and also most importantly

[12:42] we are able to customize our own models

[12:45] because when we running these models

[12:47] locally it allows for greater

[12:50] flexibility in customizing or even

[12:53] fine-tuning those models to better suit

[12:56] specific needs without having this

[12:59] limitations imposed by thirdparty

[13:01] services so these are some of the

[13:04] advantages we have by using AMA

[13:07] models so I know that we are eager to go

[13:10] ahead and start doing the Hands-On but

[13:12] let's continue here to talk about some

[13:14] key features of AMA okay so the first

[13:17] one is what we call the Model Management

[13:20] this is a big one which means that we

[13:21] are able to easily download and if we

[13:25] want to able to switch between different

[13:29] models or between different large

[13:31] language models and that is the main

[13:33] point about AMA is that the idea is that

[13:37] we have this Center place called Lama as

[13:39] a open source as a framework as a tool

[13:42] that essentially allows us to manage all

[13:44] of the different models that we can use

[13:47] interchangeably okay and next one of

[13:50] course is we have this unified interface

[13:53] what does that mean well that means that

[13:54] we're able to interact with various

[13:57] models using one consistent set of

[14:00] commands as you will see and of course

[14:02] we also have what we call extensibility

[14:05] what that means well it that means we

[14:07] have this support for adding custom

[14:09] models and extensions if we need to

[14:13] obviously we have what we call also

[14:15] performance optimizations which means

[14:17] you're able to utilize your own Hardware

[14:20] effectively including GPU acceleration

[14:22] if available because remember all AMA

[14:25] allows us to do is to manage all this

[14:28] large language models that we can

[14:30] download locally that is the key word

[14:33] key Point here and then we can do all

[14:35] sort of things with that we have this

[14:37] performance that is optimized because

[14:39] everything is internal it's locally as

[14:42] well extensibility because we can add

[14:45] different models and extensions as we

[14:47] see fit and of course we have this

[14:49] unified interface because we're able to

[14:51] interact with different models uh but

[14:54] still using a very consistent set of

[14:56] commands as you will see but the main

[14:58] point here of of course is that we have

[15:00] this Model Management in one place we're

[15:03] able to easily download and switch

[15:05] between different large language models

[15:08] let's look at some use cases well as you

[15:10] can see you can imagine we have a lot of

[15:12] different use cases uh

[15:14] for one of the first one is development

[15:17] and testing so you can imagine that as a

[15:20] developer you are able to look for

[15:23] different large language models and

[15:25] switch them to test them out to see

[15:28] which one is is going to perform better

[15:30] depending on what you want done so

[15:33] that's huge right so this is a deal of

[15:36] course for developers who are looking

[15:38] for ways to test applications that

[15:40] integrate large language models without

[15:43] having to set up different environments

[15:46] and all that complexity so also for

[15:48] education and resource what that means

[15:50] is that makes it really easy as a

[15:53] platform for Learning and

[15:55] experimentation without the barriers of

[15:57] Entry that are asso ated with cloud

[16:00] services so it's really easy just

[16:02] download certain kinds of large language

[16:05] models and you can go ahead and do

[16:07] testing and do some research and so

[16:09] forth and we talked about this secure

[16:12] applications the beauty of having AMA is

[16:15] that it provides this managed or this

[16:18] management platform of all of these

[16:21] different large language models which

[16:23] means it's suitable for Industries like

[16:25] healthc care or Finance where data

[16:28] privacy is very critical because

[16:31] everything is your own you download the

[16:34] large language models on your own

[16:36] machines so in conclusion AMA addresses

[16:39] these challenges of accessibility

[16:42] privacy and of course as we talked about

[16:44] cost in the realm of large language

[16:48] models so by enabling local execution

[16:51] which is exactly what we do with AMA it

[16:54] democratizes the use of advanced AI

[16:57] Technologies make making them more

[17:00] accessible and practical for a wide

[17:02] range of applications which is exactly

[17:05] what you would want okay so now it's

[17:07] time for us to look into setting up AMA

[17:10] locally um and what we'll do we're going

[17:12] to do go through the installation

[17:14] process and do the setup before we do

[17:17] setup there's some things that I need to

[17:18] let you know most of you should be okay

[17:20] there are some system requirements we

[17:22] need to be aware of first of all uh AMA

[17:26] supports Mac Linux and windows

[17:29] and I'm sure other operating systems

[17:31] there as well but these are the main

[17:33] ones and also make sure that we have at

[17:36] least close to 10 GB of free storage on

[17:39] your machine this is very important

[17:41] because as you will see some of these

[17:42] models require a lot of space and also

[17:47] processor as long as you have a modern

[17:49] CPU processor you should be fine and

[17:51] most of you should have that in place

[17:54] okay so just something to keep in mind

[17:56] all right let's go ahead and get started

[17:58] with

[17:59] setting installing and setting up AMA so

[18:03] if you go to ama.com O lam.com as such

[18:08] this is what you are going to be able to

[18:10] see says here get up and running with

[18:12] large L models now because this is a

[18:15] moving Target in technology things may

[18:18] change a little bit or dramatically by

[18:21] the time you watch this video hopefully

[18:23] not but the most important thing here to

[18:25] remember is that the IDE is still the

[18:27] same you will be able to download this

[18:29] tool and use that way independent on how

[18:33] it looks okay so should look the same

[18:36] but you never know and you can see here

[18:38] at the recording of this video says here

[18:41] Lama 3.2 this is the latest Lama 3.2

[18:44] model that they provide we need to

[18:46] download AMA how do we do this well it's

[18:49] very simple all you do click here

[18:51] download Once you click that this will

[18:53] take you to download AMA page now you

[18:56] can see we have three different flavors

[18:59] we have Mac OS so the browser was able

[19:01] to pick up that I am on Mac and so

[19:05] offered Mac if you're on Linux of course

[19:07] you'll have to click and look at this so

[19:10] you'll have to install by taking this

[19:12] command and running on your terminal if

[19:15] your windows of course is also just like

[19:17] mac you have to download as an

[19:19] application and run it okay whichever

[19:21] way you have to download the AMA zip

[19:25] file so I'm on Mac I'm going to click

[19:27] download

[19:29] and then you can see that it is

[19:30] downloading here so I'm going to wait

[19:32] for a

[19:33] second all right so it was downloaded

[19:36] and let's go ahead and click and open

[19:40] real quick so this is what you should

[19:42] see a zip file at least on Mac all right

[19:45] so I'm going to go ahead and double

[19:46] click and it's going to and zip and then

[19:49] it's going to look like this as an

[19:51] application okay so what I will do I

[19:53] just double click again and it will

[19:55] install again as an application on my

[19:58] back so I'm going to say open that's

[20:00] fine and then I'm going to move to

[20:03] applications and there we go so once

[20:05] we've done that I'm going to show you

[20:07] real quick

[20:08] here this is what I see so I have now

[20:13] this window on Mac that allows us to go

[20:15] through the process I'm going to say

[20:17] next and install the command line I'm

[20:20] going to say go ahead and say install

[20:22] and I'm going to pass here my

[20:24] credentials say okay so run your first

[20:27] model so we're going to run this AMA run

[20:30] Lama 3.2 now again this number here may

[20:33] be different from when you're watching

[20:35] this doesn't matter just follow the

[20:37] instructions so I'm going to say finish

[20:38] so what happens now is going to go ahead

[20:40] in the back and run the command and

[20:43] everything is good so it's going to

[20:45] install the Llama 3.2 which is a model

[20:49] that's a large language model that's

[20:51] going to be installed with our

[20:53] installation so we can get started and

[20:55] you can see once that happens that the

[20:57] Llama is running because now now we have

[20:59] this icon at the top that is showing

[21:02] there this is Lama icon and you can quit

[21:04] from here as well okay next what we can

[21:06] do is let's go back to models this page

[21:09] this is where all the different models

[21:12] that ama has this is where they are

[21:15] aggregated and we can look into them and

[21:17] learn about them you can see at the top

[21:20] we have Lama 3.2 which is the latest one

[21:22] which is the one that was installed

[21:25] already locally in our machine well come

[21:29] to this later and see more now let's go

[21:31] ahead and click on this one so I can

[21:32] show you something real quick here okay

[21:35] the beauty here is that when you click

[21:37] on any of these large language models

[21:39] you will see essentially the same things

[21:42] okay we have these 3B we'll talk about

[21:46] this later and um 1B latest and so forth

[21:51] okay and then most importantly we also

[21:54] have all of these things that we don't

[21:57] care about at this point we have a

[21:59] readme we have the sizes it tell us

[22:01] exactly what they mean what they good at

[22:04] this 3B default for instance and how to

[22:07] run it so this is essentially what you

[22:09] have at the top here so we're going to

[22:12] cop this and we're going to open a

[22:14] command line or a terminal I have one

[22:16] open and I'm going to paste that so what

[22:18] this does well it say Lama run Lama 3.2

[22:21] so what we saying here we're going to

[22:23] use a llama as the tool the framework

[22:25] right to run llama 3.2 which

[22:29] incidentally it's already installed on

[22:31] our machine right I'm going to just

[22:33] enter and what we'll do is going to spin

[22:36] up the Lama

[22:37] 3.2 model for us and then open this

[22:42] shell where we can start interacting

[22:44] with the model just like that now I need

[22:47] you to think to go back to what we

[22:50] talked about the beauty or the

[22:52] advantages of using ol is that ama has

[22:55] different models that we can use and

[22:58] install them internally locally on our

[23:00] machine and we can easily then through

[23:03] AMA as this manager of this Model start

[23:06] interacting with different models right

[23:09] now we are interacting we're able to

[23:11] interacting through the shell here with

[23:13] Lama 3.2 model which means I can ask

[23:16] question

[23:17] like what how

[23:20] old is the

[23:22] universe look at that and it gives me an

[23:25] answer the thing to keep in mind is that

[23:28] Lama 3.2 tends to be a little bit verose

[23:31] and as you can see uh it is quite veros

[23:34] but that's okay I can say in short tell

[23:37] me how old is the universe so you can

[23:41] actually direct it to tell you exactly

[23:43] what you need right kind of kind of like

[23:46] prompting it a little bit so the

[23:47] universe is approximately 13.8 billion

[23:50] years all right so I can also say for

[23:54] instance clear see what's going to

[23:56] happen glad you could convey that

[23:57] information clearly

[23:59] I guess I have to say something like

[24:01] this clear it cleared the session okay

[24:05] now I can continue here by saying for

[24:08] instance for help I can say forward

[24:09] slash I can put the exclamation point or

[24:14] I can say help like this so this will

[24:16] give me it's actually exactly other

[24:20] commands that I can use we have the set

[24:22] show load save and clear session as we

[24:26] saw as well as Buy to exit the shell so

[24:30] let's go ahead and use this show here

[24:33] okay I'm going to say forward slash show

[24:35] what we want what is that I want to show

[24:37] I want to show info so this will give us

[24:40] information about the model that we are

[24:42] looking at which is the Lama 3.2 says

[24:46] here architecture is llama the

[24:49] parameters is 3.2b we'll talk about that

[24:51] and the context length is that and

[24:54] embedding length and so forth and some

[24:57] other licensing information so just like

[25:00] that ladies and gentlemen we're able to

[25:02] install AMA and of course because AMA

[25:06] itself is just a manager we need to have

[25:09] the actual large language model and

[25:11] that's easy to do by installing in this

[25:13] case llama 3.2 we could have installed

[25:15] any other model of course but we start

[25:18] with that one when we're installing

[25:21] AMA all right so now I can say another

[25:25] question tell me a short

[25:29] joke okay here's one why couldn't the

[25:32] bicycle stand up by itself because it

[25:34] was too tired very funny all right and

[25:38] if I want to get out of this shell I can

[25:39] say for a slash buy like this and you

[25:43] can see we're no longer in our Lama 3

[25:46] shell so I can clear and we're done okay

[25:49] very good so I hope this is a good

[25:51] introduction there is a lot that coming

[25:54] um and I hope this excites you because

[25:56] the power that we have right now is at

[25:58] all this is locally we've imported we

[26:01] have downloaded the AMA tool in this

[26:04] case manager along with one model which

[26:08] is Lama 3.2 okay make sure that you're

[26:11] able to do this and I'll see you next

[26:13] now to keep in mind that the whole idea

[26:16] of a Lama is that we have this place

[26:18] where we have many different models we

[26:20] can use interchangeably we can just pick

[26:23] one and then drop it and pick another

[26:25] one and test until we find something

[26:27] that works

[26:28] so it is important to always look at

[26:31] these

[26:31] models page here so we can look at

[26:34] what's available and most importantly as

[26:37] you can see here you can also go ahead

[26:39] and search or actually filter by

[26:41] featured most popular and uh or newest

[26:46] and so forth okay so newest here seems

[26:49] to be neotron so keep in mind again that

[26:53] these will change depending on when

[26:55] you're watching this video okay so maybe

[26:57] you're not going to see this as the

[26:59] newest but the important thing is this

[27:01] is the place for you to come and look at

[27:04] what's going on so as far as we're

[27:06] concerned right now updated 3 weeks ago

[27:09] we have the Lama

[27:10] 3.2 it is also important for us to have

[27:13] more information about those large

[27:15] language models because the idea is for

[27:17] us to be able to choose what works for

[27:20] what we trying to accomplish okay so

[27:22] let's click on Lama 3.2 so when you

[27:25] click on Lama 3.2 or on any large

[27:28] language model that is shown here you're

[27:30] taken to this description page now

[27:33] there's a lot going on here the

[27:34] important part here is that we

[27:36] understand at least the basics of what's

[27:38] Happening first of all we see that is

[27:40] under Tools 1B 3B which we'll talk about

[27:43] but also we see that it was pull 1.1

[27:47] million times okay which means this is

[27:50] essentially being used by many many many

[27:53] people and then here we have these tags

[27:56] six to three tags but then we have here

[27:59] this drop- down list that tells us

[28:02] different things first one we have

[28:04] different flavors of this Lama 3.2 we

[28:07] have this 3B which is the latest that's

[28:10] the one we have and then if we want a

[28:12] shorter smaller one is 1B one important

[28:15] to keep in mind also is that depending

[28:18] on the flavor that we get here right you

[28:21] notice that also we have the sizes so

[28:26] depending on the flavor that we have

[28:29] here we can have different sizes so this

[28:31] one 3B is 3 2.0 gigabytes okay so it's

[28:35] not too bad but 1B is 13 GB and latest

[28:39] of course is 2 GB which is this one

[28:41] latest same thing okay and we've seen

[28:44] this before we have this command that

[28:46] we're going to use to run in this case

[28:49] this particular

[28:51] model now if you go down here we have a

[28:54] read me tells a little bit more about

[28:56] this model important for us to read and

[29:00] then we have the sizes so 3B parameters

[29:03] which is

[29:04] default this says here the 3B model

[29:07] outperforms the Jamma 2 2.6 B and the 5

[29:11] 3.5 Min models on tasks such as

[29:14] following instructions summarization

[29:17] prompt rewriting tool use but what does

[29:20] this really mean so I want you to look

[29:22] at this as a guide because again we go

[29:25] back to the whole point of Ama is that

[29:28] we have different models that we can use

[29:32] and us as developers or as people who

[29:35] are wanting to use these models is that

[29:39] we need to find something that works for

[29:40] us this is why it's important for you to

[29:43] always think about testing different

[29:45] models until you find something that

[29:47] works for you what works for me in a

[29:50] situation X Y and Z it's not obviously

[29:53] going to be what's going to work for you

[29:55] in your own situation so that is

[29:56] something to keep in mind

[29:58] always all right so it tells us here the

[30:02] comparison that for 3B parameter this

[30:05] size here it's really good at following

[30:07] instructions summarization prompt

[30:09] rewriting and Tool use so use tools or

[30:13] function calling and so forth now when

[30:15] it comes to 1B parameter this is a

[30:17] smaller version of 3.2 it says here the

[30:21] model is compatitive with other 1

[30:23] through 3B parameter models its use

[30:26] cases include personal information

[30:28] management multilingual knowledge

[30:30] retrieval rewriting tasks running

[30:33] locally on edge okay so if you want to

[30:37] pull Lama 3.2 1B then you would run this

[30:42] so you can imagine if you want something

[30:44] that will be good for this use case

[30:48] obviously you will use Lama 3.2

[30:50] 1B and let's go ahead I'm going to show

[30:52] you how to do this so we can run or Lama

[30:55] run Lama 3.2 to pull down to get this

[30:59] Lama 3.2 1B so I'm going to copy

[31:03] this go back to our terminal just paste

[31:06] that

[31:08] actually I should copy the whole thing

[31:11] and paste it so run Lama 32 1B so this

[31:16] will download this particular subset of

[31:19] model this model so hit enter you can

[31:22] see it's pulling down manifest it's

[31:24] pulling everything in what may take a

[31:26] little bit depending on on this size

[31:28] remember we're actually downloading

[31:31] locally so that means it will take up

[31:33] some

[31:35] space farewell so now you can see that

[31:38] we have installed Lama 3 1B so if I were

[31:42] to go ahead and say for slash again show

[31:46] info you can see voila it's now we're

[31:50] now using architecture Lama of course

[31:52] the same thing parameters is 1 2 B which

[31:55] is exactly different than the

[31:57] full-fledged Lama 3.2 so it's a little

[32:00] smaller okay and context length and all

[32:03] that information so we can go and ask

[32:06] question how old is the

[32:09] universe like this and it's going to go

[32:11] ahead and do the same thing but because

[32:14] we

[32:14] know from Reading here we know that this

[32:18] particular one here is really good at

[32:22] writing tasks running locally on edge

[32:25] multilingual knowledge retrievals so I

[32:28] can ask things like as an example how do

[32:32] you say hello I am

[32:37] fine in Tai let's

[32:41] see okay so it was really good this is

[32:44] just a very simplistic uh test here so

[32:48] again you can see the things you can do

[32:49] you can pull in different large language

[32:52] models to see what's going on now the

[32:54] other thing I can do here I can say I'm

[32:57] going to just say buy just to get out of

[32:59] here and still here I can say

[33:03] AMA and I can say list so this command

[33:07] here is going to list all of the models

[33:10] that have been download downloaded

[33:12] locally look at this now you can see we

[33:14] have this Lama 3.2 1B this is the latest

[33:18] one okay it says exactly when this was

[33:21] downloaded and of course I have a few

[33:23] others that I've done loaded a while ago

[33:25] so L 32 latest time ago I have this

[33:29] embedding large as well which we'll talk

[33:31] about later very good so here it shows

[33:34] everything that has been downloaded in

[33:36] terms of a list of all of the

[33:40] models let's look at parameters and do a

[33:44] quick Deep dive uh so that we understand

[33:46] a little bit more about the information

[33:49] that comes with models so we're going to

[33:51] learn what are parameters as well as

[33:53] what is that they really mean right so

[33:56] when we say show info that command we

[33:59] saw this we have this model architecture

[34:03] llama parameters of course

[34:05] 3.2b context length embedding length

[34:08] quantization and so forth so what do all

[34:11] of these mean let's start with the first

[34:14] one so the first one is in this these

[34:17] llama architecture here first of all the

[34:19] architecture side of thing that means it

[34:21] was architected it's a llama large

[34:23] language model meta AI that means they

[34:26] are the ones who created it matter

[34:28] Facebook and also that means that these

[34:30] models were designed with one thing in

[34:33] mind which is efficiency they were

[34:35] designed to be extremely efficient which

[34:38] means they are strong when it comes to

[34:41] performing even at smaller scales

[34:44] compared of course to other large models

[34:47] out there so they're very efficient and

[34:50] perform really well now when we look at

[34:53] then the parameters side of things it

[34:54] says here 3.2b and we saw that sometimes

[34:58] it fluctuates to 1.2 or3 or 1 and so

[35:02] forth different numbers these are the

[35:05] internal weights and biases that the

[35:08] model learns during training but what

[35:11] they do really is that they determined

[35:14] essentially how the model processes

[35:16] input data and how it generates output

[35:19] because that's the whole point okay now

[35:22] when we say

[35:24] 32b for instance what does that mean

[35:26] well that stands for 3.2 billion

[35:30] parameters now you can imagine even if

[35:32] you don't understand exactly what those

[35:34] means that means if you have 3.2 billion

[35:37] parameters in this case these weights

[35:40] and biases everything neurons and nodes

[35:42] and everything that means you have a lot

[35:45] inside of this neural neural network

[35:48] which means there's a lot of information

[35:51] that can be passed around the

[35:52] relationships and so forth which in

[35:54] terms mean that this model the the more

[35:58] billions or the more parameters it has

[36:02] which means has more connections and

[36:03] more interactivity inside that means the

[36:07] more accurate it is when it comes to

[36:10] getting the results now it's important

[36:13] to understand also that the number of

[36:16] parameters does reflect the complexity

[36:19] and the capacity of the model as

[36:21] explained but there's also what we call

[36:23] a tradeoff which means that more

[36:26] parameters can improve performance of

[36:29] course but it require more computational

[36:33] resources so if you have a large

[36:36] language model that is 3B or 5B or 7B or

[36:40] 8B well that means it's going to be

[36:43] amazing in terms of performance but also

[36:46] means it is going to require a large

[36:48] amount of computational resources so

[36:51] when we have something like

[36:53] 32b this kind of strikes the balance

[36:56] between per performance and resource

[36:59] consumption so let's summarize this

[37:01] simply when we talk about parameters

[37:03] talk about 3B or 2B or 10p or 7p and so

[37:07] forth these are numbers inside a neural

[37:11] network that it adjusts to learn how to

[37:14] turn inputs into correct outputs so the

[37:18] largest number the more the better in

[37:20] this case but we have drawbacks in that

[37:23] you also need uh a lot of computational

[37:26] resources to run it so next we have the

[37:30] context length so this number refers to

[37:33] the maximum number of tokens which

[37:35] essentially are pieces of text that the

[37:38] large language model can ingest in a

[37:41] single input so when we have something

[37:44] like

[37:45] 131,072 this is an exceptionally long

[37:49] context length what that means is that

[37:52] it can handle very long documents

[37:55] capturing dependencies across large

[37:57] large spans of text which is really good

[37:59] that means you can take big large books

[38:03] and article lengthy articles and

[38:05] extensive conversations and it can deal

[38:08] with it with no problem when we talk

[38:10] about the embedding length we're talking

[38:12] about the size of the vector

[38:14] representation for each token in the

[38:17] input text so when we say 372 what we're

[38:22] saying is that we have

[38:24] 372 dimensions in the embedding space so

[38:28] the larger number again the more

[38:30] Dimensions we have which means the more

[38:31] relations you're going to have in this

[38:33] Vector space so what does that mean well

[38:36] that means that we have what we call the

[38:37] semantic richness which means higher

[38:40] dimensional embeddings can obviously

[38:42] capture more nuanced meanings and

[38:45] relationships between words when it

[38:48] comes to a large language model then

[38:49] that means this will reflect the model's

[38:52] ability to understand very complex

[38:55] language patterns so the large ler we

[38:57] have this High dimensional the more the

[39:00] model is going to understand complex

[39:04] language patterns now there are some

[39:06] implications again is that we have this

[39:09] competition load so larger embeddings

[39:11] increases of course the compettition

[39:14] requirements and also if we have that

[39:17] that's going to improve the model's

[39:18] ability to generate contextually

[39:21] relevant and coherent

[39:23] responses okay and the last one we have

[39:26] here is quantization now there's books

[39:28] and we can spend hours and hours talking

[39:31] about this but in general quantization

[39:33] essentially is a technique used to

[39:35] reduce the size of a neural network

[39:38] model by in this case reducing the

[39:40] Precision of its weights this number

[39:43] indicates that the model's weights are

[39:46] quantized or

[39:47] quantized to four bits so translating

[39:51] essentially we just saying we now have a

[39:54] smaller model and faster processing

[39:57] and lower memory usage so now we have a

[40:01] more efficient um

[40:03] model so going back to our Lama 3.2

[40:06] model page here to see more information

[40:09] now you understand what 3B means 1B

[40:11] means and the implications and so forth

[40:14] and one thing you will realize also is

[40:16] that at the bottom we have what we call

[40:18] benchmarks now I don't really trust a

[40:20] lot of these benchmarks because anybody

[40:22] can inflate B the benchmarks or deflate

[40:24] them to follow certain agenda but

[40:28] nonetheless it's nice to look at and see

[40:30] The Benchmark but one thing I want to

[40:32] show you here is the implication of the

[40:34] sizes the parameters if I go back

[40:37] perhaps and let's go back to models here

[40:39] real quick and let's look for something

[40:42] different let's look for Lama 3.1 you

[40:44] can see with Lama 3.1 you even have 8B

[40:49] 70b

[40:51] 45b let's click on

[40:53] 405b well let's look at this one you can

[40:57] see that if we go to

[40:59] 405b that's 229 GB of space you have to

[41:03] have in your hard dis that's a lot and

[41:07] even if you're able to have that

[41:09] internally locally that that means you

[41:12] also have to have the computational

[41:15] capacity for you to be able to run these

[41:17] models so this is something to keep in

[41:19] mind you could have the capacity of

[41:21] having 229 gabyt locally but will you

[41:25] have the capacity of the computational

[41:28] resources that are needed to run this

[41:31] now keep in mind as you learn now this

[41:33] is probably one of the best models

[41:36] versions right because it has everything

[41:39] it's large billions and billions of of

[41:42] nodes and neurons and

[41:43] everything that's something to uh keep

[41:46] in mind and so for things that most of

[41:48] us are able to do want to do 8B even 70b

[41:52] should be totally fine okay just

[41:55] something to keep in mind and as you go

[41:58] through each one of these different

[41:59] models they have different things okay

[42:02] you can see of course we have more

[42:04] information here 45b 7B 8B and so forth

[42:07] as we talked about and they tell us the

[42:09] evaluation and what they're good at when

[42:12] should you use it and all these

[42:14] different informations as well as human

[42:16] evaluation as you see here so this is

[42:18] something to always go ahead and read

[42:20] and perhaps go to uh some links here to

[42:23] read more about it now I want you to

[42:25] always have this inquisitive mindset

[42:27] whenever you're looking for models so

[42:29] that you know exactly what you're

[42:31] getting um and contract that with what

[42:34] you want to accomplish that's how you

[42:36] need to uh go by uh when it comes to

[42:39] using AMA and different models here it's

[42:42] not just one model fits all that's not

[42:46] the idea the idea is that you find

[42:47] something that will work for you okay so

[42:51] models are real fun and you can always

[42:53] um go ahead and test them out now now

[42:57] let's go ahead and run through some

[42:59] commands uh with our Lama that way we

[43:02] are well situated and we understand how

[43:04] to use some of the most important

[43:06] commands Okay Okay so let's go ahead and

[43:08] get started so we know that we always

[43:10] start with AMA we can say AMA list this

[43:13] will list all of the installed models

[43:18] internally locally so you can see we

[43:20] have

[43:21] 32b 32 callon 1B here it's the size and

[43:27] when it was modified and we have other

[43:30] ones that I have here in your case you

[43:31] should probably most likely just have

[43:34] one let's say I want to delete or remove

[43:36] a model I can say

[43:38] AMA remove also I can say AMA help which

[43:42] is going to give me some of the commands

[43:45] that I can use we have serve have create

[43:47] Show run stop pull push list we've seen

[43:51] before PS and RM for remove so let's

[43:57] start with I'm going to go back say AMA

[44:00] list to show the list of all of the

[44:02] models that we have here let's say I

[44:04] want to delete the this one here so I'm

[44:07] going to copy that and I'm going to say

[44:09] clear AMA RM to remove and pass in the

[44:15] model name hit enter and it's going to

[44:17] delete that model now if I go ahead and

[44:19] say AMA list again we should see that we

[44:22] no longer have that model very good and

[44:25] if you want to delete this

[44:27] other one here you can also do like that

[44:30] now if we want to go ahead and pull in a

[44:32] new run we can say

[44:35] AMA and I'm going to say pull and the

[44:38] name of the model as we've seen before

[44:41] so again we can go back to our models

[44:46] website all right AMA Library models and

[44:49] let's say we want to get Jamma okay so I

[44:52] go back here you can see models all

[44:55] these models let's say I want to get

[44:58] something different let's say we want

[45:00] this code Gemma if you click here you

[45:03] can see that we have different flavors

[45:06] let's say I want to be the smaller so

[45:09] you can see we can run this one here I

[45:11] can either run all of this or I can just

[45:13] pull first this one and then run it okay

[45:16] so two ways to do the same thing so I

[45:19] can say AMA pull like that it's going to

[45:21] go ahead and

[45:23] pull our model now remember this is 1.

[45:27] gigabytes and may take a little

[45:29] bit okay so once that is done I can say

[45:34] AMA list we should see that we have code

[45:37] Jamma so I can now run it right so let's

[45:42] go ahead say all Lama run coama so it's

[45:45] going to now run coama so we can

[45:48] interact with it go back here I can say

[45:50] that Co Jamma is a collection of

[45:51] powerful lightweight models that can

[45:53] perform a variety of coding tasks like

[45:56] fill in the middle code completion code

[45:58] generation so forth so let's see we can

[46:01] test it out right can see if this works

[46:03] so I can say for instance write me a

[46:06] python

[46:08] function that returns hello

[46:13] world okay there we go you can see

[46:16] Define function your name return hello

[46:19] world your name and there we go all

[46:21] right very good so you can see that

[46:23] we're able to of course download a new

[46:27] model and test it out so what I can do

[46:30] again I can just say for slash buy to

[46:33] get out of here and then I can say

[46:37] AMA

[46:39] list to see that and I can copy this and

[46:43] remove it so I'm going to say

[46:46] AMA RM to remove that and it's

[46:50] deleted and

[46:52] AMA list again you can see that we no

[46:55] longer have Jemma

[46:58] okay so back to our models here I can go

[47:00] ahead and look at models and imagine

[47:02] that we want to find a model that deals

[47:05] with images or that is able to read an

[47:09] image and tell us what's going on

[47:11] there's a model called

[47:14] lava so lava is a novel endtoend trained

[47:17] large

[47:19] multimodel that combines a vision

[47:21] encoder and funa for general purpose

[47:24] Visual and language understanding so

[47:26] this is is a good candidate for what we

[47:29] call a

[47:30] multimodel model which means is that

[47:32] it's able to deal with images as well as

[47:35] text and so forth and other types okay

[47:37] so we have just this one and then we

[47:39] have this lava Lama 3 so depending on

[47:42] the situation we can read more to figure

[47:44] out which one is the one we should

[47:46] probably look into so let's click on

[47:48] this

[47:49] lava and you can see we have 7B 13B and

[47:53] 34b so you know which one we're going to

[47:55] be using so I'm going to go ahead head

[47:57] and get at least the

[47:59] 7B which is this one here latest or 7B

[48:02] doesn't matter it's about 4.7 gigabytes

[48:05] okay so let's go ahead and pull this one

[48:07] in I'm going to copy that and let's go

[48:08] ahead and just paste it all in and get

[48:12] our lava

[48:13] 7B all right so after a little while you

[48:16] can see that we downloaded a few things

[48:18] here but most importantly we have this

[48:20] 4.1 gab so we have now our model is

[48:23] running and you can see we have our

[48:25] shell I'm going to just say what is your

[48:29] name see what's going to say I don't

[48:31] have a name of course it's just a large

[48:33] language model okay so now let's see

[48:36] what we can do one thing I'm going to do

[48:37] actually is I'm going to move over to a

[48:41] code editor vs code to make this simple

[48:44] okay so I'm just going to say byy for

[48:46] now and then I'm going to open up here

[48:49] to see vs code so I have this

[48:53] flower1 PNG we're going to use this with

[48:56] of course our lava model to see if it

[49:00] can read this image and tell us what

[49:02] it's all about okay so let me go ahead

[49:05] and open the terminal here real quick

[49:07] and so what I'm going to do first let's

[49:08] go ahead and say AMA I'm going to

[49:11] run lava

[49:15] 7B okay now it's running very good so

[49:18] what I'm going to do now is I'm going to

[49:20] say what is in this image

[49:27] and I'm just going to

[49:29] pass the image name which is

[49:32] flower one. PNG let's see what's going

[49:35] to

[49:38] happen so now it's going to think and

[49:40] tell us exactly what it sees the image

[49:43] shows a bunch of small purple white

[49:45] flowers with yellow centers which

[49:47] appears to be a type of pansy they're

[49:50] growing in what seems like soil or a

[49:53] potting medium surrounded by what looks

[49:56] like green foliage so that's pretty good

[49:59] right for if you go back here and look

[50:02] yeah looks exactly that's exactly what

[50:05] it's describing which is really really

[50:06] awesome now let's test something here

[50:09] I'm going to test to see if this model

[50:12] has this history capability meaning it's

[50:15] able to save the conversation for

[50:18] context I'm going to say and

[50:21] now write me a

[50:24] short poem

[50:27] about the about that let's see what's

[50:29] going to

[50:30] happen ah there we go so it was able to

[50:33] look look at that a purple and white

[50:35] with yellow Hearts Define they add A

[50:38] Touch of Beauty to the world around

[50:40] little patch and so forth so it knows

[50:43] now we know if we're testing that this

[50:46] model is able to save the context the

[50:50] history of the things that we've been

[50:53] asking uh the context of our

[50:55] conversation

[50:57] tell me more

[50:59] about those flowers let's see what it's

[51:03] going to

[51:05] say okay there we

[51:07] go it knows exactly what flowers we're

[51:09] talking about which is what we had here

[51:13] okay I can ask can you tell me where

[51:16] they grow

[51:21] best and keep in mind again this is all

[51:25] local large language model that we

[51:27] running we don't have to go and send

[51:29] request to a network somewhere it's all

[51:33] local very good so pansies are native to

[51:36] the Mediterranean region and all that

[51:38] great stuff very good so we see that it

[51:42] is indeed working so we're able to pull

[51:45] in a new large language model in this

[51:47] case lava which is a multimodel model

[51:51] which allows us to deal with text as

[51:53] well with images and so forth and we ask

[51:55] it to tell us what it sees right in this

[52:00] image here and it was able to give us

[52:03] the description of the image and we ask

[52:06] some follow-up questions about that

[52:09] image and it

[52:11] works so I've changed to

[52:15] AMA run Lama 3.2 model right now so I

[52:20] can do it again you can see since it's

[52:22] loaded then we can see that we are able

[52:24] to go to the

[52:27] shell so what I want to be able to do is

[52:30] show you that we can also do many many

[52:32] different things here right so because

[52:34] we are running on a large language model

[52:37] it's a model that does a lot of things I

[52:39] can perhaps say I have a sentence that

[52:41] I'm not sure the sentiment of that

[52:44] sentence right I can say tell me what is

[52:47] the

[52:48] sentiment of the

[52:51] following

[52:55] sentence I

[52:57] not willing to pay you back let's see

[53:02] what's going to happen okay again it's

[53:05] very verbose which is okay but

[53:07] essentially says that the sentiment of

[53:09] the sentence I am not willing to pay you

[53:11] back is negative and confrontational it

[53:14] implies that the speaker is unwilling or

[53:17] unable to repay a debt all right so this

[53:19] is really good now one thing we can do I

[53:21] can also say can you please be less

[53:27] verbos okay there we go so the beauty

[53:31] here is that you're talking or you're

[53:32] interacting with a large language model

[53:34] and you can actually prompt it to do

[53:37] what you want it to do in this case I

[53:40] told it to always be prompt to always be

[53:44] less verbos at least in this case and

[53:46] you can see says the sentiment of I am

[53:48] not willing to pay your back is negative

[53:50] and confrontational it implies

[53:52] resentment defensiveness and obligation

[53:55] aversion okay now the beauty here is

[53:57] also we can modify or in this case

[54:01] customize our model because we can

[54:03] actually add certain pieces of metadata

[54:06] to the model that cater to what we want

[54:09] it to be like what I mean by that is

[54:12] that we can actually create a file here

[54:15] right click I'm going to click here and

[54:17] this is what we call a model file so it

[54:20] has to be something like this model file

[54:23] no extension whatsoever and inside of

[54:26] this file file this is where we are

[54:27] going to add a few lines of text code

[54:31] essentially so at the top here I'm going

[54:33] to say all caps say from because we are

[54:38] modifying it from a different from a

[54:41] base model it's going to be from Lama

[54:46] 3.2 in this case if you have something

[54:48] different you add that and then here is

[54:51] where we're going to set a few things so

[54:53] we can set a lot of things but now I'm

[54:54] going to show you how to set in this

[54:57] case the temperature what is a

[54:59] temperature temperature is what allows

[55:01] the model to be more creative or more

[55:03] direct and matter of fact so the higher

[55:06] the number in this case from 0 to one

[55:09] the higher the number closer to one the

[55:11] more elaborate the more um creative the

[55:16] model is going to be okay so I can

[55:18] change to 0 point three for instance if

[55:21] I want it to be less

[55:23] creative and then I can add for instance

[55:27] the system message so this is just the

[55:28] prompt right that it needs to know of so

[55:33] in this case here put that into let's

[55:36] say triple quotes like that to say

[55:39] inside here we can put text a very smart

[55:44] assistant who answers questions

[55:49] succintly and

[55:51] informatively

[55:53] okay something like this so this

[55:56] is our model file which is going to

[56:00] allow us to modify or customize our

[56:03] model okay so we can add more parameters

[56:06] here more things to Aid our model but

[56:10] this is okay for now so how do we do

[56:12] this now to make it so that our model

[56:14] indeed will comply with what we put

[56:17] together here in this model file well

[56:20] it's very simple all we have to do now

[56:23] I'm going to let's go say byy real quick

[56:26] here

[56:27] all we have to do now is we need to use

[56:29] the create command from AMA to create

[56:32] some sort of a new version of our model

[56:36] which is going to be specified by this

[56:39] model file very easy really so I'm going

[56:42] to say AMA and then say create and I can

[56:45] give it a name just

[56:47] say James okay whatever you want and

[56:50] then we're going to say f to say to the

[56:53] file it's going to be under the model

[56:56] file notice the model file is actually

[56:58] the same level here so I can just go

[57:00] ahead and that so if I hit enter what's

[57:03] going to happen now let's say enter okay

[57:05] so says success you can see that it went

[57:07] ahead and transferred model data so

[57:10] using the layers and everything in the

[57:12] background and so now if I save for

[57:15] instance list you can see that we have

[57:17] this James so it really created a

[57:20] replica of the main

[57:23] model and now this is going to be

[57:26] different from the Lama 3 3.2 right

[57:30] because it will have a little bit more

[57:32] of a modified customized piece to it

[57:35] very good so now I can use Mario or

[57:38] James I should

[57:39] say to ask

[57:42] questions so look now I can say

[57:46] orama first

[57:48] run James look at that and we should run

[57:52] James how beautiful is that so look what

[57:55] will happen I can say

[57:56] what is

[57:58] your name look what's going to happen my

[58:01] name is James and look how suc it is it

[58:05] goes straight to tell to answer what

[58:08] needs to be answered are you smart let's

[58:11] see I designed to process and provide

[58:14] accurate information quickly and

[58:15] efficiently making me highly

[58:17] knowledgeable on a wide range of topics

[58:19] and all of that comes from this because

[58:23] we said that your assistant that was

[58:25] very smart

[58:26] ask answer questions suly and

[58:29] informatively very good very good you

[58:31] the power things you can do right so now

[58:34] I can ask questions let's see let's say

[58:36] tell me about all tell me about the

[58:42] oceans let's

[58:45] see okay so this is pretty good because

[58:48] it's such an open ended question it went

[58:51] ahead there are five oceans gives me all

[58:53] that information oceans play a crucial

[58:55] role in regulating the earth's climate

[58:58] weather patterns and ecosystems they

[59:00] also provide all of this information

[59:02] here okay so it is to the point that's

[59:04] pretty good this was just an example to

[59:07] show you the things you can do you can

[59:09] actually modify customize your model to

[59:15] be and do certain things that you wanted

[59:17] to do this of course was very very

[59:19] simplistic but you can see for the

[59:21] system here we can prompt it even better

[59:24] with more complex information and change

[59:27] temperature and all these different

[59:29] things let's go ahead and just say byy

[59:32] and because we're done let's go ahead

[59:34] and get rid of it so that we save some

[59:37] space so llama

[59:41] RM James okay so Lama list we should not

[59:47] see James and while I'm here let me go

[59:49] ahead and actually get rid of some of

[59:50] these as well so that we have more space

[59:59] okay very good so far we've been using

[1:00:02] the CLI the command line interface with

[1:00:06] AMA which is attached with AMA that

[1:00:08] governs everything in a back end but we

[1:00:10] can also use the rest API in fact most

[1:00:14] of the things that we're going to be

[1:00:16] using later are based on the rest API

[1:00:19] which means there is an endo that we can

[1:00:22] hit and run in this case the models

[1:00:26] so how does that work one thing to keep

[1:00:29] in mind is that in the back end as you

[1:00:31] see well you can't quite see here but at

[1:00:34] the top my bar here I do have this well

[1:00:37] you probably see this quid AMA but there

[1:00:39] is this AMA that is running in the back

[1:00:41] end so that means in the background it's

[1:00:43] running which means we're able to do all

[1:00:45] sort of things that we're doing right

[1:00:47] the all Lama application per se it's

[1:00:49] running but it's actually being served

[1:00:52] at an endpoint locally and all of that

[1:00:55] is being served at Local Host

[1:00:58] 11434 what does that mean well that

[1:01:01] means then we can generate a response

[1:01:03] using the olama rest API because it has

[1:01:06] it attached to it and it's running in

[1:01:09] the back end in the

[1:01:11] background okay so to do so it's very

[1:01:14] simple all you have to do you have

[1:01:16] access to all of this we can curl

[1:01:18] something like this so in this case you

[1:01:20] can see it's a curl and this is the end

[1:01:22] point so Local Host 1143 for API gen

[1:01:28] generate and then dasd and then we pass

[1:01:31] the payload here so here we're saying

[1:01:33] the model is going to be Lama 3.2 right

[1:01:37] and the prompt we passing along it why

[1:01:38] is the sky blue so if I were to run this

[1:01:41] or hit enter what will happen is it's

[1:01:43] going to hit that end point and of

[1:01:45] course it's going to show me this now

[1:01:48] this is not very helpful mainly because

[1:01:51] it just shows a lot of gibberish and

[1:01:53] there is this stream that's happening

[1:01:55] but if you look look closely you can see

[1:01:57] that every time this run extremely it's

[1:02:00] going to show a certain word in response

[1:02:04] so combining all these words will be the

[1:02:07] response so that means then when we

[1:02:10] write the actual rest API payload we

[1:02:14] need to also add something else to it to

[1:02:17] tell it that we don't want it to stream

[1:02:19] as it is right now so that we just get

[1:02:22] result what that means now I can let me

[1:02:27] go back down let's clear so now what we

[1:02:31] need to pass is this here that says

[1:02:34] stream false so say we don't want it to

[1:02:36] stream just give us the response if you

[1:02:39] hit enter you can see it will take a few

[1:02:41] moments and then voila we have this

[1:02:43] model everything all the metadata about

[1:02:47] the model that we're using in this case

[1:02:49] is say tell me a fun fact about Portugal

[1:02:52] and then says here here is one response

[1:02:56] did you know that the town of obid obid

[1:02:59] in Portugal is often called the fairy

[1:03:02] tale Town due to its medieval

[1:03:04] architecture and picturesque streets

[1:03:07] right it gives it some other information

[1:03:10] all right so that is a difference here

[1:03:11] if we put stream false then we get that

[1:03:14] if this is not there it assumes that

[1:03:17] stream is true that's why we get what do

[1:03:19] you saw earlier now in this case you can

[1:03:21] see the end point is generate now we can

[1:03:24] also get the chat with the model the

[1:03:27] chat end point okay so generate the

[1:03:30] difference here generate it just goes

[1:03:32] ahead and predicts what needs to happen

[1:03:34] as you see just gives us result but we

[1:03:36] can also pass in the chat endpoint so

[1:03:39] let me go and clear to do so I have that

[1:03:42] you will have access to all of this

[1:03:44] anyway so no worries and there we go we

[1:03:46] do the same thing curl now the end point

[1:03:49] here is API for SL chat and then DD and

[1:03:53] then we pass here the payload so so we

[1:03:56] pass the model the messages messages

[1:03:59] plural that's why we have here a list

[1:04:02] okay so we pass the role it's going to

[1:04:04] be user and the content is going to say

[1:04:06] tell me a fun fact about mosm Beek let's

[1:04:09] go ahead and run this so there we go and

[1:04:12] now this is a different endpoint it

[1:04:15] gives us here some here's one fun fact

[1:04:18] about gorang gorza Peninsula which has

[1:04:20] one of the largest Coral in the world um

[1:04:23] I think this is actually not correct

[1:04:26] they have to remember that sometimes

[1:04:29] these large lar models are not always

[1:04:31] correct so there's no such thing as

[1:04:33] going a peninsula I know this because

[1:04:35] I'm from there but anyway you can see

[1:04:37] that it's giving us some results large

[1:04:39] ler models and are always correct so as

[1:04:42] always we can also pass more information

[1:04:45] or more specification to our endpoint in

[1:04:49] our payload so imagine that we want to

[1:04:51] for instance we want to request a Json

[1:04:55] mode we can do so by requesting it by

[1:04:58] adding Json right so you can see here we

[1:05:02] are getting the generated API and we

[1:05:06] have the same thing model prompt now

[1:05:09] here we even said in our prompt make

[1:05:12] sure that we want to get the response as

[1:05:16] Json but also we want to say here as one

[1:05:20] of the parameters that format it has to

[1:05:22] be Json to make sure that indeed the

[1:05:25] large language model aders to this okay

[1:05:27] let's go ahead and

[1:05:28] run and you can see doesn't look that

[1:05:31] great but we want to believe that this

[1:05:32] is actually a Json okay that is coming

[1:05:35] in so you can see the curly Bryce is or

[1:05:38] opening their model and then llama 3

[1:05:41] created and all that information and we

[1:05:43] should have an actual Json but you can

[1:05:45] see response Day morning Sky color and

[1:05:50] so forth the beauty here is that if you

[1:05:52] go to this link here gith habo Lama docs

[1:05:56] API end points or so many other end

[1:05:58] points we can go ahead and hit um using

[1:06:02] the rest API okay you can always come

[1:06:05] here and look at some of them we can do

[1:06:07] essentially the same thing we were able

[1:06:09] to do in our s ey that's the whole point

[1:06:11] we can do that using the rest API for

[1:06:14] instance we can copy a model show model

[1:06:17] information you've seen that before okay

[1:06:19] you can just curl and then do that so

[1:06:21] I'm not going to do that you can do that

[1:06:22] yourself and then this is the result

[1:06:24] that you get so all these things that

[1:06:26] we've done before you're going to be

[1:06:27] able to do them as well using the

[1:06:30] endpoint API the rest API endpoints

[1:06:33] we've learned a lot so far there are a

[1:06:35] lot of things that we've learned about

[1:06:37] the CLI we learn how to use the CLI of

[1:06:40] course which is attached to orama to do

[1:06:43] all sort of things to pull in different

[1:06:46] models to run them to remove them to

[1:06:50] even modify the model or in this case

[1:06:52] customize the model by creating the

[1:06:54] model file file and then use that to

[1:06:59] create some sort of a copy or a modified

[1:07:02] version customized version of that model

[1:07:04] which is very handy as you see for use

[1:07:07] cases that you may have okay so a lot

[1:07:10] that I've thrown at you I hope this is

[1:07:12] making sense I hope you are practicing

[1:07:14] and seeing the power and most

[1:07:16] importantly here remember is that all of

[1:07:18] that is local you're not having to do

[1:07:21] anything uh passing something to the

[1:07:23] network to get anything back in that way

[1:07:27] which means it's free you're just using

[1:07:29] your own resources which is the power of

[1:07:31] AMA so as a quick full summary here AMA

[1:07:35] as we know is a platform that allows you

[1:07:37] to run large language models locally

[1:07:41] which is really awesome the great thing

[1:07:43] is that it supports various models

[1:07:45] tailored for different tasks including

[1:07:47] text generation code generation and

[1:07:50] multimodel applications so essentially

[1:07:53] AMA model support these tasks here text

[1:07:57] generation such as Lama 3 x could be 3.2

[1:08:00] 3.10 or 3.11 depends Moll and so many

[1:08:04] others uh we have also code generation

[1:08:07] one of the example was code Lama okay

[1:08:10] and multi model application in this case

[1:08:13] text and images and we saw in this case

[1:08:15] was lava the lava model now this is just

[1:08:19] to show you the breath of things or

[1:08:22] models that ama provides Okay so again

[1:08:26] your job is really to figure out what

[1:08:29] models is going to give you the best

[1:08:32] results for your use case Okay so keep

[1:08:35] in mind that you always have to test

[1:08:37] figure out which one will work for you

[1:08:40] all right so to create large language

[1:08:42] model applications using AMA we saw that

[1:08:46] first of all uh we can use large

[1:08:49] language models that come with Alama

[1:08:51] because Alama hosts per se all these

[1:08:54] different large language models we can

[1:08:56] use now we also learned that there are

[1:08:59] different ways to interact with AMA and

[1:09:02] its models so we saw that the main way

[1:09:06] is through the CLI the command line

[1:09:09] interface so it's easy straightforward

[1:09:12] and that's what we've been looking into

[1:09:14] and then the second way would be in this

[1:09:16] case to use UI based interface which

[1:09:19] we're going to look into next so

[1:09:21] essentially have a user interface where

[1:09:24] the back and we can put or apply a model

[1:09:29] AMA model and then we have a nice user

[1:09:33] interface that users or yourself can use

[1:09:36] to interact with the

[1:09:38] model now we've just finished looking at

[1:09:41] the rest API so this is the base of

[1:09:44] essentially everything else we're going

[1:09:45] to be doing after the next UI based

[1:09:48] interface way of interacting with the

[1:09:50] models okay so essentially we're able to

[1:09:53] use rest API curl and then hit the end

[1:09:57] point and get information that way about

[1:10:00] generating text or um in this case

[1:10:04] deleting things deleting models or

[1:10:06] looking up everything related to our

[1:10:08] models so essentially the same things

[1:10:10] that we're able to do using the CLI we

[1:10:13] have a way of doing that using the API

[1:10:16] the rest API and later we'll see we're

[1:10:18] going to use AMA python library now this

[1:10:21] is where we're going to be able to then

[1:10:23] use the tools in Python in code because

[1:10:27] now we have more customization more

[1:10:29] freedom per se to start building the

[1:10:32] actual large language models

[1:10:34] applications that we so want to do okay

[1:10:38] okay so now let's go ahead and look at

[1:10:40] the UI based interface so if you go to m

[1:10:44] thata MST that app as you see here this

[1:10:48] is what you will see so you can read

[1:10:49] about it say Lama 3.2 Vapor mode all

[1:10:52] this great stuff say the easiest way way

[1:10:55] to use local and online AI models so

[1:10:59] without Misty painful setup endless

[1:11:01] configurations confusing UI Docker and

[1:11:04] all these other things with MTI one app

[1:11:06] one click setup no Docker no terminal

[1:11:09] offline and private unique and Powerful

[1:11:12] feature so that's the reason why I chose

[1:11:14] this of course you can choose any other

[1:11:16] that you might find out there that I

[1:11:18] mentioned so this is what I'm going to

[1:11:19] be using and you can see there's a lot

[1:11:21] of logos here saying that it works with

[1:11:24] uh meta stuff with chat open Ai and many

[1:11:27] many others so it's really good so you

[1:11:30] can see Windows Mac and Linux and they

[1:11:34] do have a user interface so you can see

[1:11:36] exactly how it will look like so

[1:11:38] essentially it's going to be like having

[1:11:40] chat GPT uh but now we are running our

[1:11:44] own models so click here and I am on Mac

[1:11:48] of course I'm going to go and go that

[1:11:50] way if you're on Windows you have the

[1:11:52] option to do the 64 or 64 CPU only MD or

[1:11:57] Nvidia if you're Mac you have apple

[1:11:59] silicon or Intel chip if you're Linux of

[1:12:02] course you have these flavors as well so

[1:12:04] pick the one that works best for you so

[1:12:06] I'm on Mac I'm going to go to Apple

[1:12:08] silicon like that so it's going to go

[1:12:11] ahead and

[1:12:12] download okay so it's now downloaded I'm

[1:12:15] go let's go ahead and open it and I'm

[1:12:17] going to double click to start the

[1:12:19] installation so I'm going to go ahead

[1:12:21] and drop it there very good and let me

[1:12:25] go ahead and open it real

[1:12:28] quick okay so I'm going to say open all

[1:12:31] right and there you have it now once you

[1:12:33] open it this should happen and the

[1:12:36] beauty here is that you can see how

[1:12:39] would you like to get started you can

[1:12:40] set up locally AI I remote modeles

[1:12:43] provided and so forth but if you look

[1:12:45] down here look at this Gama get started

[1:12:48] quickly using AMA models all right and

[1:12:51] even found it under users meama and all

[1:12:55] the models they found it locally because

[1:12:58] it detected that we have Ama installed

[1:13:01] in our in our uh on our machine so I'm

[1:13:04] going to go ahead and say continue down

[1:13:07] here okay that's it just one click and

[1:13:12] we are done and you can see here at the

[1:13:14] bottom here make see I can make this

[1:13:16] larger yeah I can make this larger

[1:13:18] that's very

[1:13:19] good okay at the bottom here you can

[1:13:21] click smaller you can click you can see

[1:13:24] we have

[1:13:25] different levels or different models we

[1:13:28] have lava 7B which you remember and I

[1:13:31] have Lama

[1:13:32] 3.2 all right because those are the ones

[1:13:35] that we have installed at least I have

[1:13:36] installed locally if you have more than

[1:13:39] one then you're going to see all of them

[1:13:40] sure here and this is beautiful because

[1:13:43] we didn't have to do anything just one

[1:13:45] click indeed like they promised and we

[1:13:47] are there so let me choose the Lama 3.2

[1:13:50] and the moment you do that that's it and

[1:13:52] you can just start chatting just like

[1:13:54] that I'm going to say how old are you

[1:13:58] just silly to see what's going to

[1:14:00] say so this is being powered Again by

[1:14:03] our own model so I was released to the

[1:14:07] public in 2023 very good can you tell me

[1:14:12] a funny short

[1:14:16] story and just like that look how fast

[1:14:18] this is one day a man walked into a

[1:14:20] library and asked the librarian do you

[1:14:22] have any books on puff loves dogs and

[1:14:25] scroller cats scroll danger I think

[1:14:28] that's what you say the libran replied

[1:14:30] it rings a bell but I'm not sure if it's

[1:14:32] hair or

[1:14:34] not okay that's very cool so we have

[1:14:37] that so you can start talking you can

[1:14:40] start chatting with this long large

[1:14:43] language model and if I wanted to I can

[1:14:45] also go and attach a document let's

[1:14:47] click here and I'm going to add one

[1:14:50] image that we've seen before drop it

[1:14:53] there

[1:14:55] now image is attached let me go back

[1:14:58] here and I can

[1:15:00] ask tell

[1:15:02] me about the image now here's the thing

[1:15:06] the reason why it's saying this is

[1:15:08] because the model doesn't CU we're using

[1:15:10] llama 3.2 it doesn't really know how to

[1:15:12] deal with the images however if I change

[1:15:14] here go to Lava 7 and ask the same

[1:15:17] question tell me about the

[1:15:21] image Aha and now because lava is a mul

[1:15:25] model multimodal model it will know how

[1:15:29] to answer questions about the image

[1:15:33] there we go now it's working the image

[1:15:34] shows two purple flowers with yellow

[1:15:37] centers okay very good so it is

[1:15:39] describing it took a little bit of doing

[1:15:41] here perhaps it should have reset

[1:15:42] everything but it's okay you can see

[1:15:45] that now it is actually working because

[1:15:47] we're using lava 7 and it will tell me

[1:15:50] exactly what it sees now I can change to

[1:15:53] Lama 2 or

[1:15:56] 3.1 and if I say tell me

[1:16:00] about the image it's going to give me

[1:16:02] some issues of

[1:16:04] course okay very good so it can't do

[1:16:07] that but the other thing I can do here I

[1:16:09] can also add let me just delete this I'm

[1:16:11] going to add a document let's say for

[1:16:14] instance I'm going to add this document

[1:16:17] that we've seen before which is the b o

[1:16:19] i PDF there we go I'm when I say

[1:16:25] can you tell

[1:16:28] me about the PDF file okay so now I have

[1:16:34] some issues here the best way to do this

[1:16:36] is you go to attach knowledge Stacks so

[1:16:39] click here and I'm going to say add your

[1:16:41] first knowledge stack so I'm going to

[1:16:43] say my

[1:16:45] tester and the thing is because we are

[1:16:49] now invoking want to create a rag system

[1:16:53] we need to have an embedding model so

[1:16:56] that it can create those embeddings of

[1:16:59] the PDF file of the document that way it

[1:17:02] can be saved and then the large language

[1:17:04] model is able to Converse that way okay

[1:17:06] so we're going to just use whatever they

[1:17:08] give us nomic uh embedding test text I'm

[1:17:13] going to add that it's very good and at

[1:17:16] this point let's go ahead and add an

[1:17:18] actual file so I'm going go ahead and

[1:17:21] drop it again in this case here just

[1:17:23] drop it like that all right

[1:17:25] so once you do that you have to hit

[1:17:27] compose so that it a it's able to pick

[1:17:30] it up do all the things it needs to do

[1:17:32] to actually get information from our

[1:17:35] file okay embeddings and everything say

[1:17:39] compose so it's

[1:17:41] composing okay knowledge stack saved and

[1:17:44] composed successfully you can now use it

[1:17:46] for chatting very good so let me get out

[1:17:51] of here and I'm going to go here so

[1:17:54] click again here and what I want to do

[1:17:56] here is I want to be able to say click

[1:18:00] on my tester but this is my knowledge

[1:18:03] base I want to include in my chat here

[1:18:05] so make sure you click that and that

[1:18:08] will what I will do is we'll let this

[1:18:11] know that indeed we have included that

[1:18:14] knowledge base which includes all the

[1:18:16] information about our PDF file so say

[1:18:20] Give me a summary of the PDF

[1:18:27] file okay let's

[1:18:30] uh says there's no PDF file mentioned in

[1:18:33] our conversation so far the text we

[1:18:34] discussed earlier was from beneficial

[1:18:36] ownership and everything so it took a

[1:18:38] little bit uh because I'm asking PDF it

[1:18:40] doesn't really know about the PDF

[1:18:42] because all of that information was

[1:18:43] actually transformed into text and

[1:18:47] embeddings and everything for the large

[1:18:49] language model so I just can say Give me

[1:18:51] a summary

[1:18:55] of the

[1:18:59] document okay there we go so it says the

[1:19:02] document provide instruction on how to

[1:19:03] fill out certain items including

[1:19:05] identifying documents issues a company's

[1:19:08] images and everything and it is indeed

[1:19:10] beneficial ownership all of that so

[1:19:12] we've seen this before I can keep asking

[1:19:15] what are

[1:19:17] the

[1:19:20] penalties for not

[1:19:23] filing and and there you go it gives me

[1:19:27] exactly the penalty and everything that

[1:19:29] will happen if I don't file right I can

[1:19:33] just say give me the

[1:19:36] penalties maybe that's a better question

[1:19:39] perhaps okay that's better because now

[1:19:42] it's giving me exactly what I need okay

[1:19:44] based on the text from beneficial

[1:19:45] ownership information blah blah blah and

[1:19:48] everything is there failure to comply

[1:19:50] with beneficial ownership reporting

[1:19:51] requirements Cil penalty 10,000 fine up

[1:19:55] to 500,000 and so forth

[1:19:59] okay what are

[1:20:02] the deadlines for

[1:20:11] filing okay so you can see that this is

[1:20:13] working we're able to converse with our

[1:20:17] data our own document now the beauty

[1:20:20] here if you AR haven't realized yet is

[1:20:23] that all of this again is internal it's

[1:20:25] our own large language models we can of

[1:20:28] course go and spin out a different large

[1:20:30] language model OKAY adding new provider

[1:20:33] and so forth and we can use

[1:20:35] interchangeably as you see here and we

[1:20:37] passed in a knowledge base which means

[1:20:40] we said here are some documents in this

[1:20:42] case which is one document and use that

[1:20:45] to create a knowledge base so a rag

[1:20:47] system essentially and so that we can

[1:20:48] chat and ask questions about that

[1:20:51] knowledge base okay and just like chat

[1:20:54] GPT we can go ahead and create a new

[1:20:55] chat we can do all sort of things as you

[1:20:57] see here the point is not for me to go

[1:20:59] through Misty thoroughly the point is to

[1:21:03] show you what's possible to create a

[1:21:05] user interface or to use a user

[1:21:07] interface that is guided or is being

[1:21:11] fueled by our own AMA model how cool is

[1:21:16] that so what I want you to do is to play

[1:21:18] around with this and you can see it's

[1:21:20] just to show you how amazing this is

[1:21:23] because we have our own box that we can

[1:21:26] uh pass in sensitive documents and all

[1:21:29] those things without worrying about

[1:21:32] prices you know having to call an API

[1:21:35] anything external put this in a cloud

[1:21:38] base or having to pay for usage so it's

[1:21:41] all here and we can use as we see

[1:21:44] fit so we're able to see all of the ways

[1:21:47] in which we can interact with and its

[1:21:50] models so we looked at CLI command line

[1:21:53] interface and we saw that it's really

[1:21:55] easy because it's the fastest access to

[1:21:58] our models it's very easy but as you

[1:22:01] know it's not scalable in sense that

[1:22:03] you're not going to build a full-fledged

[1:22:05] application using that and then we went

[1:22:08] and looked at the API the rest API which

[1:22:11] is essentially the same but now we have

[1:22:14] a different back door per se we we using

[1:22:16] rest API to pass in certain payloads and

[1:22:21] get information so we're doing the same

[1:22:22] thing that we can do with CLI hitting

[1:22:24] the same functions per se but now we're

[1:22:26] using different end points with the rest

[1:22:30] API and we just finished looking at the

[1:22:32] UI based interface so this is way easier

[1:22:35] for us to be able to have a UI based

[1:22:39] interface and actual interface such as

[1:22:42] chat DBT that way we're able to ask

[1:22:45] questions uh change manually quickly

[1:22:49] different models that we may have and

[1:22:52] start chatting with our models so it's

[1:22:54] really really easy to put uh together or

[1:22:56] to have that working using the M that

[1:23:01] app and of course there are different

[1:23:02] flavor different tools out there that

[1:23:05] you can use that will do the same thing

[1:23:07] right that would give you this UI based

[1:23:10] interface so I chose MTI because it's

[1:23:13] just easy as you saw to install and get

[1:23:16] started so it wasn't about how to use

[1:23:18] these tools but I was about to give you

[1:23:20] the knowledge and tools that you can

[1:23:23] then uh use

[1:23:25] on your own and explore more okay so now

[1:23:29] we're going to go to the fun part which

[1:23:31] is we're going to use now sort of a a

[1:23:34] backend combination of the API rest API

[1:23:37] through the python Library the AMA

[1:23:40] python library because truth be told we

[1:23:43] want to be able to create local large

[1:23:46] language model applications using AMA

[1:23:49] models and so for that we need a way for

[1:23:53] us to be able to use python this code in

[1:23:55] this case code or any other language but

[1:23:57] in this case going to be python to be

[1:23:59] able to take advantage of these models

[1:24:02] that ama provides and so that's what

[1:24:04] we're going to be doing next which is

[1:24:06] we're going to get started with AMA

[1:24:08] python Library so we can use code to

[1:24:11] interact with AMA

[1:24:14] Models All right so I have this folder

[1:24:17] here we've seen before called AMA we got

[1:24:20] a few things here let's get started here

[1:24:22] simply by using the AMA library but

[1:24:26] actually I'm going to do the hard way

[1:24:30] first to show you that we can actually

[1:24:31] use code to get to the end point just

[1:24:34] like what we saw with the restful API so

[1:24:38] real quick so we can see what's going on

[1:24:40] so let me go ahead and create a new file

[1:24:41] here so we're going to have access to

[1:24:43] all of these code so no worry so I'm

[1:24:45] going to say start one. P1 or py

[1:24:49] okay and so what we'll do here first

[1:24:52] actually is I'm going to go ahead and

[1:24:54] make sure that we have an actual virtual

[1:24:58] environment for our python project so

[1:25:01] I'm going to say Python

[1:25:03] 3 okay like that so we should have a

[1:25:07] virtual environment there that's very

[1:25:09] good and let's go ahead and activate it

[1:25:12] so Source bnv if you're on Windows of

[1:25:16] course there's a different way to do

[1:25:18] this activate and voila so now we have

[1:25:21] our virtual environment AC activate

[1:25:25] active I should

[1:25:26] say very good so what is the first thing

[1:25:29] we're going to do well first of all I'm

[1:25:30] going to go and import let's say pip

[1:25:33] install I'm going to install

[1:25:36] requests real fast here so we have

[1:25:39] that okay very good because we're going

[1:25:41] to use that to do what we are looking

[1:25:44] for it to do all right so I'm going to

[1:25:47] go ahead and import requests so we have

[1:25:51] that and also I'm going to import Json

[1:25:53] so we have that as well okay so I'm

[1:25:56] going to create a URL here which is

[1:25:58] going to be the URL where is the end

[1:26:00] point it's going to be HTTP for our

[1:26:04] Local Host here so it's not going to be

[1:26:05] that it is 1 4

[1:26:10] 34 and in this case I want to go to the

[1:26:14] generate endpoint like this

[1:26:18] okay and for that remember we need to

[1:26:20] pass some payload data when we pass

[1:26:22] along so I'm going to say data just

[1:26:24] create a little uh dictionary

[1:26:27] here and for a dictionary I'm going to

[1:26:30] pass the model that we need it's not

[1:26:32] going to be gpt3 or

[1:26:34] gpt2 we're going to call this llama it's

[1:26:37] going to be

[1:26:38] Lama 3. two and make sure that llama is

[1:26:43] running for this to work

[1:26:45] always and then I'm going to pass the

[1:26:48] [Music]

[1:26:49] prompt I'm going to pass something that

[1:26:52] says tell me

[1:26:55] a short story and make it funny like

[1:27:01] that okay so there we go this is our

[1:27:04] payload that we're going to pass along

[1:27:07] using the rest API so I'm going to send

[1:27:10] the request so I'm going to say

[1:27:12] response pass in the request. post pass

[1:27:15] in url and Json and what I want to do

[1:27:18] here

[1:27:20] is I want to say stream

[1:27:24] to true to say I want this to be

[1:27:27] streamed okay as you will see in a

[1:27:29] little bit okay maybe this a little bit

[1:27:32] smaller so you can see everything all

[1:27:34] right okay so then let's go ahead and

[1:27:36] check the response first so I'm going to

[1:27:38] say check the

[1:27:41] response status so it's going to say if

[1:27:44] response status is as you see there I

[1:27:47] already have all these code so I'm not

[1:27:49] going to Bor you with all of the

[1:27:51] intricacies

[1:27:54] I'm going to put it all there okay so

[1:27:55] what we're doing now is that we're going

[1:27:57] to check if response code is 200 which

[1:27:59] means all is good and then we're going

[1:28:01] to go ahead and start generating thing

[1:28:02] so we're going to go and iterate over

[1:28:05] the streaming response because we said

[1:28:07] the streaming to true as it comes the

[1:28:10] response that comes in okay and so what

[1:28:12] we're doing here we're decoding the line

[1:28:14] and parsing everything until we actually

[1:28:17] print the generated text as we go

[1:28:19] through very good so let's go ahead and

[1:28:21] see if this works I'm going to go ahead

[1:28:22] and run this make sure that of course

[1:28:25] the

[1:28:27] AMA icon is running is out there

[1:28:30] otherwise this is not going to work so

[1:28:32] let's go ahead and run let's run this

[1:28:35] looks like I'm having

[1:28:36] issues right so I'm need to pass API

[1:28:40] like that okay my bad that was my

[1:28:44] problem let's go ahead and give it a try

[1:28:50] again okay and there we go you can see

[1:28:52] that is indeed working and it's really

[1:28:55] really fast as you see here generat text

[1:28:58] here's a short silly story for you once

[1:29:01] upon a time there was a chicken blah

[1:29:03] blah blah and there you have it all

[1:29:06] right very good so this is indeed

[1:29:08] working we're able to use code to

[1:29:12] actually interact with the our llama 3.2

[1:29:17] model locally right albe it we are not

[1:29:20] using the AMA uh python Library but you

[1:29:24] can see that this is actually working go

[1:29:27] ahead and play around with this and I'll

[1:29:29] see you next all right so this was

[1:29:31] really nice we're able to use the rest

[1:29:34] API in code here the restful API should

[1:29:38] sayama restful API in code to interact

[1:29:42] with Lama 3.2 this is fine but now I'm

[1:29:45] going to show you how to use the AMA

[1:29:48] Library the AMA python Library so we can

[1:29:53] go straight into code we don't have to

[1:29:54] hit these we don't have to explicitly

[1:29:58] hit these end

[1:30:00] points all right let me go ahead and

[1:30:02] create a new file here let's call this

[1:30:06] start

[1:30:07] to.py remember you will have access to

[1:30:10] all this code so do not worry okay so

[1:30:14] the first thing we need to do here of

[1:30:15] course is to make sure that we import or

[1:30:20] we get the right dependency so going to

[1:30:24] say pip install we need to install or

[1:30:28] Lama as such very

[1:30:31] simple and there we go now once we have

[1:30:35] a Lama we can start working at right so

[1:30:40] first let's go ahead and import

[1:30:42] AMA there we go and real fast here I'm

[1:30:46] going to show you how simple this is so

[1:30:49] the same thing we did before with the

[1:30:51] CLI and even with the rest API we can do

[1:30:55] the same thing here using

[1:30:57] this dependency right using this SDK

[1:31:01] perite so now I can say put inside of

[1:31:04] response and I'm going to say all Lama

[1:31:07] and I can say list so this function here

[1:31:10] as in name imply is going to list list

[1:31:14] it should list all of

[1:31:17] the you guess it all of the things that

[1:31:19] we have in this case all of the models

[1:31:21] so let's go ahead and save and give it a

[1:31:23] quick run run so real

[1:31:25] quick okay I ran and you can see that we

[1:31:28] have this Json that comes up it says

[1:31:33] llama 7 oh name lava 7 and that is the

[1:31:37] model and I should have gives all that

[1:31:40] information and I should also have the

[1:31:44] Lama

[1:31:45] 3.2 and some other things all right this

[1:31:49] is very cool very good all right

[1:31:55] very good so you can imagine we can say

[1:31:58] AMA list we can also say

[1:32:01] AMA let's say chat chat response client

[1:32:05] create and delete embed and all these

[1:32:08] other different things right so that's

[1:32:10] very very exciting I just want to show

[1:32:13] you that I'm going to just call comment

[1:32:14] that out so we don't have a lot of

[1:32:16] things here all right so let's go ahead

[1:32:19] and hit the chat endpoint or in this

[1:32:22] case the chat API

[1:32:25] well as you know it's very simple so

[1:32:28] let's go ahead and get started here all

[1:32:31] right so I'm going to say res we can put

[1:32:34] anywhere orama chat

[1:32:38] so. chat and then I'm going to pass a

[1:32:41] few things and you can see that if you

[1:32:43] have over it says it respects at least

[1:32:46] to have the model name so the model I'm

[1:32:49] going to say llama

[1:32:56] 3.2 okay just add the one that you have

[1:33:00] and then here because it's chat I'm

[1:33:01] going to pass messages okay so this

[1:33:04] comes as a list and I'm going to pass

[1:33:07] the role as a user and the content is

[1:33:09] going to say something I can say

[1:33:11] something different that perhaps why is

[1:33:14] this sky

[1:33:18] blue something like this okay so now I

[1:33:22] can actually print

[1:33:24] what comes back in this case from our

[1:33:27] chat we pass the model and the messages

[1:33:30] could be more than one message okay we

[1:33:32] can also say uh for our role here

[1:33:36] another one that says let me just put

[1:33:38] like that I can say the context pass the

[1:33:41] context pass all different things that I

[1:33:44] may want to pass okay but so far this is

[1:33:47] okay let's go ahead and see how this

[1:33:49] works I'm going to go ahead and just run

[1:33:51] from

[1:33:52] here okay to a little bit and you can

[1:33:54] see we have the results so you can see

[1:33:57] the model and everything but most

[1:33:58] importantly we have the content says

[1:34:00] content is the sky appears and

[1:34:02] everything but to make it even better I

[1:34:05] can just go ahead and get the content so

[1:34:07] let's go to

[1:34:09] result uh let's say I think I have to go

[1:34:12] to

[1:34:13] messages and they get the content like

[1:34:17] that

[1:34:20] okay let's go ahead and run again all

[1:34:23] not messages

[1:34:26] message

[1:34:32] okay all right and there you go so this

[1:34:36] guy appears blue because of phenomenom

[1:34:39] because of phenomenon called scattering

[1:34:41] and all of that great stuff so we have

[1:34:43] all the information here now you noticed

[1:34:45] before when we

[1:34:47] just look at showing the entire payload

[1:34:52] it comes in with a lot information about

[1:34:54] the model when it was created of course

[1:34:56] the content the role and all that

[1:34:59] information at the bottom we have some

[1:35:02] other pieces of data metadata so done

[1:35:05] true total duration how long it took and

[1:35:08] low duration prompt evaluation and uh

[1:35:12] also evaluation count prompt evaluation

[1:35:15] duration all these little

[1:35:18] things that we may want to keep track

[1:35:21] of all right so just like that ladies

[1:35:24] and gentle ladies and gentlemen we're

[1:35:26] able to use the chat endpoint here using

[1:35:30] uh the ol Lama python Library so go

[1:35:34] ahead and play around with this and see

[1:35:36] what we can do because this is the

[1:35:37] beginning of a very exciting thing I

[1:35:42] think now one thing you can also do you

[1:35:44] should know this is that we can pass

[1:35:46] other things in our chat function here

[1:35:50] right see we have model messages tool

[1:35:54] stream so we can say we want this to be

[1:35:56] streamed and format as well very cool so

[1:35:59] what you can do I'm going to give

[1:36:00] another example here for streaming let

[1:36:03] me go ahead and comment this out so we

[1:36:05] have that for streaming it's essentially

[1:36:08] the same thing notice I passing messages

[1:36:10] we're all user now I have a different

[1:36:12] question why is the ocean so

[1:36:15] salty and I can say stream to true

[1:36:19] because I want it to be streamed when I

[1:36:21] do that then whenever I want to print

[1:36:23] that out I have to put that through a

[1:36:25] for Loop or a loop of some sort that way

[1:36:28] we can get all these pieces that are

[1:36:31] being streamed so we can see them being

[1:36:33] flashed out in our command line okay so

[1:36:36] let's go ahead and run real quick so you

[1:36:38] can

[1:36:41] see just run like

[1:36:46] this takes a second and there you have

[1:36:50] it okay so there we go if you want to

[1:36:53] have the this capability of streaming

[1:36:55] and showing those messages as they come

[1:36:58] in this is how you would do using Code

[1:37:02] all right so go ahead and play around

[1:37:04] perhap add different messages or

[1:37:06] different questions here and see how

[1:37:09] this behaves and what you can also do

[1:37:11] you can use a different model whichever

[1:37:13] model you may have running

[1:37:16] okay so one thing to keep in mind is

[1:37:19] that all we've seen now this AMA python

[1:37:22] Library West DEC I guess I could call

[1:37:24] that it's all based or designed around

[1:37:28] the AMA rest API so internally when we

[1:37:32] call orama that chat internally we're

[1:37:35] essentially doing what we did before I

[1:37:37] showed you here so it's hitting this

[1:37:39] endpoint URL and passing in the API in

[1:37:43] this case go to chat or generate or

[1:37:45] create and so forth it's just like an AB

[1:37:48] it's just an abstracted way of hitting

[1:37:51] this end point so we don't have to

[1:37:53] explicit L do this so I made sure to put

[1:37:56] it here so that you have that okay now

[1:37:59] we can hit now we can use another now we

[1:38:02] can also use the same now we can go and

[1:38:06] use a different function or end point

[1:38:08] for instance like generate if want to

[1:38:10] generate something right we can do that

[1:38:12] what I mean by that is I can say for

[1:38:15] instance AMA generate and pass that and

[1:38:18] pass that prompt I can even pass all of

[1:38:21] that I don't think I can okay I don't

[1:38:23] need to pass any of that so we can also

[1:38:27] just generate a response instead of

[1:38:29] chatting as such so nothing new here I'm

[1:38:32] just going to leave it as is so you can

[1:38:34] see the other thing we can do is we can

[1:38:37] say for

[1:38:39] instance we can

[1:38:42] show by saying AMA let's just print and

[1:38:47] say

[1:38:48] AMA and then that

[1:38:51] show and we want to show what want to

[1:38:53] show anything about

[1:38:56] llama three like that that two okay

[1:39:00] let's go ahead and

[1:39:02] see and there you go after a few moments

[1:39:04] you can see it tells everything we need

[1:39:07] to know about our model OKAY a lot of

[1:39:10] information that's for sure okay there

[1:39:12] we

[1:39:15] go next thing we can do is we can

[1:39:18] actually use the model file as we did

[1:39:20] before but all in code to create a

[1:39:24] separate model that has Specific

[1:39:27] Instructions so it's very simple I have

[1:39:31] some parts of the code here so we create

[1:39:33] in this case the model file we don't

[1:39:36] have the actual model file but we can

[1:39:38] put all in line is here as you see here

[1:39:41] and the other thing I can do is I can

[1:39:43] pass the temperature here let's go back

[1:39:47] to this one to see how you do that just

[1:39:50] copy this and pass temperature 0.

[1:39:53] 1 for instance and then all we need to

[1:39:56] do now that we have this information is

[1:39:59] as we saw before we are going to go

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.