Ollama Course – Build AI Apps Locally

2h 57m video Published Nov 26, 2024 Transcribed Jun 14, 2026 freeCodeCamp.org

freeCodeCamp.org

Intermediate 45 min read For: Developers and AI engineers with basic Python knowledge who want to build local LLM applications.

AI Trust Score 95/100

✅ Highly Legit

"Title accurately promises building AI apps locally with Ollama; course delivers exactly that with hands-on projects."

AI Summary

This course teaches how to use Ollama, an open-source tool for running large language models locally. It covers setup, model management, REST APIs, Python integrations, and real-world projects like a grocery list organizer, RAG system, and AI recruiter agency.

Chapters

1 Introduction and Setup 00:00 2 Ollama Deep Dive and CLI Usage 17:58 3 REST API and UI Interfaces 60:00 4 Python Library and Customization 84:00 5 Real-World Projects: Grocery Organizer and RAG 103:00 6 Advanced: AI Recruiter Agency with Agents 148:00

[00:00]

Course Introduction

Ollama simplifies running LLMs locally. The course covers pulling/customizing models, REST APIs, Python integrations, and projects like a grocery organizer, RAG system, and AI recruiter.

[02:00]

What is Ollama

Ollama is an open-source tool that simplifies running LLMs locally on your own hardware, abstracting technical complexity.

[06:44]

Ollama Deep Dive

Ollama uses a CLI to manage installation and execution of models. It provides a straightforward way to download, run, and interact with various LLMs without cloud services.

[08:52]

Problem Ollama Solves

Ollama addresses cost, privacy, latency, and customization issues. Local execution eliminates API costs, keeps data private, reduces latency, and allows model fine-tuning.

[13:10]

Key Features of Ollama

Model management, unified interface, extensibility, and performance optimizations including GPU acceleration.

[15:10]

Use Cases

Development/testing, education/research, and secure applications in healthcare/finance where data privacy is critical.

[17:58]

Installation on Mac

Download from ollama.com, install the application, and run 'ollama run llama3.2' to get started.

[21:00]

Interacting with Models via CLI

Use 'ollama run <model>' to start a shell. Commands like /show info display model details.

[26:28]

Model Selection and Parameters

Ollama library hosts many models. Parameters (e.g., 3B, 1B) indicate model size and complexity; larger models are more accurate but require more resources.

[33:00]

Understanding Model Parameters

Parameters are internal weights learned during training. More parameters generally mean better performance but higher computational cost.

[37:30]

Context Length and Embedding Length

Context length is max tokens per input; embedding length is vector size for token representation. Larger values capture more nuance.

[39:26]

Quantization

Technique to reduce model size by lowering weight precision (e.g., 4-bit), resulting in smaller, faster models with lower memory usage.

[42:57]

Ollama CLI Commands

Commands: ollama list, ollama pull, ollama run, ollama rm, ollama help. Models can be pulled and run interchangeably.

[47:00]

Multimodal Models (LLaVA)

LLaVA combines vision encoder and LLM for visual understanding. Example: describing an image of flowers.

[54:00]

Customizing Models with Modelfile

Create a Modelfile with FROM, PARAMETER temperature, SYSTEM message. Use 'ollama create' to build a customized model.

[60:00]

REST API Endpoints

Ollama serves at localhost:11434. Endpoints: /api/generate, /api/chat. Use curl with stream=false for complete responses.

[66:00]

UI-Based Interface with Msty

Msty app provides a ChatGPT-like UI for local models. Supports knowledge stacks for RAG with embedding models.

[84:00]

Python Library Basics

Install ollama Python library. Use ollama.list(), ollama.chat(), ollama.generate() to interact with models programmatically.

[96:00]

Streaming Responses in Python

Set stream=True in chat() and iterate over response chunks to display tokens as they arrive.

[103:00]

Grocery List Organizer Project

Use LLM to categorize and sort grocery items from a text file. Prompt instructs model to categorize into produce, dairy, etc., and sort alphabetically.

[111:00]

RAG System Overview

RAG = Retrieval Augmented Generation. Combines document retrieval with LLM to answer questions based on custom data, reducing hallucination.

[117:00]

RAG Architecture

Documents are chunked, embedded, stored in vector DB. User query is embedded, similar chunks retrieved, and passed with prompt to LLM for answer.

[120:00]

LangChain for RAG

LangChain provides abstractions for document loading, splitting, embeddings, vector stores, and retrieval. Simplifies building RAG pipelines.

[126:00]

Building RAG with Ollama and LangChain

Use Ollama embeddings (nomic-embed-text) and LLM (llama3.2) with ChromaDB. Multi-query retriever generates multiple query perspectives for better retrieval.

[148:00]

AI Recruiter Agency Project

Multi-agent system using Swarm framework with Ollama. Agents: extractor, analyzer, matcher, screener, recommender, orchestrator.

[157:00]

Base Agent Class

BaseAgent sets up OpenAI client with custom base URL for Ollama. Provides query_ollama() method and JSON parsing helper.

[162:00]

Specialized Agents

Each agent (e.g., ScreenerAgent) inherits from BaseAgent, defines instructions, and implements run() method. Agents are called by orchestrator.

[168:00]

Orchestrator Agent

Coordinates workflow: extract resume, analyze profile, match jobs, screen candidates, generate recommendations. Maintains workflow context.

[172:00]

Streamlit UI for Recruiter

Streamlit app provides tabs for upload, skills analysis, job matches, screening, and recommendations. Results saved to text file.

Ollama democratizes local AI by enabling free, private, and customizable LLM applications. With CLI, REST API, Python library, and integration with frameworks like LangChain and Swarm, you can build powerful AI solutions entirely on your own machine.

Mentioned in this Video

Ollama

tool

Msty

tool

LangChain

tool

ChromaDB

tool

Swarm (OpenAI)

tool

Streamlit

tool

Llama 3.2

model

LLaVA

model

CodeGemma

model

nomic-embed-text

model

Ollama API docs

link

Python installation guide

link

Tutorial Checklist

1 17:58 Download and install Ollama from ollama.com for your OS (Mac, Windows, Linux).

2 20:27 Run 'ollama run llama3.2' in terminal to download and start the model.

3 24:00 Interact with the model via CLI: ask questions, use /show info to view model details, /bye to exit.

4 30:58 Pull additional models: 'ollama pull llama3.2:1b' or 'ollama pull codegemma:2b'.

5 42:57 Manage models: 'ollama list' to see installed, 'ollama rm <model>' to delete.

6 54:00 Create a Modelfile with FROM, PARAMETER temperature, SYSTEM message. Run 'ollama create <name> -f Modelfile'.

7 60:00 Use REST API: curl -X POST http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"...","stream":false}'

8 84:00 Install Python library: pip install ollama. Use ollama.chat(model='llama3.2', messages=[{'role':'user','content':'...'}])

9 103:00 Build grocery organizer: read items from file, create prompt with categorization instructions, call ollama.generate(), save output.

10 126:00 Build RAG system: load PDF, split into chunks, embed with nomic-embed-text, store in ChromaDB, use multi-query retriever with llama3.2.

11 148:00 Build AI recruiter: define BaseAgent with custom OpenAI base URL, create specialized agents (extractor, matcher, etc.), orchestrate with Swarm, wrap in Streamlit.

Study Flashcards (10)

What is Ollama?

easy Click to reveal answer

An open-source tool that simplifies running large language models locally on your own hardware.