Steps By Step Tutorial To Fine Tune LLAMA 2 With Custom Dataset Using LoRA And QLoRA Techniques

Transcribed Jun 15, 2026 Watch on YouTube ↗

Intermediate 13 min read For: Machine learning practitioners and developers interested in fine-tuning LLMs with limited resources.

229.8K

Views

4.5K

Likes

177

Comments

60

Dislikes

2.0%

📊 Average

AI Summary

Krish Naik introduces a series on fine-tuning LLMs, starting with a practical demonstration of fine-tuning Llama 2 using LoRA and QLoRA techniques. The video covers parameter-efficient transfer learning, quantization, and step-by-step code implementation in Google Colab.

Chapters

1 Introduction and Overview 00:00 2 Setting Up Environment and Libraries 02:04 3 Understanding PEFT and LoRA 04:42 4 Dataset Preparation and Prompt Formatting 06:08 5 Quantization and Model Loading 09:10 6 Configuring LoRA and Training Arguments 12:02 7 Fine-Tuning with SFTTrainer 15:00 8 Inference and Results 21:59

[00:00]

Introduction to Fine-Tuning Series

Krish Naik announces a series on fine-tuning various LLMs, starting with Llama 2 using custom datasets and techniques like PEFT and LoRA.

[00:42]

Plan for This Video

This video focuses on practical implementation with a code template, dataset preprocessing, and quantization. Theoretical intuition will be covered in a follow-up video.

[01:22]

Importance of Fine-Tuning Open-Source Models

With many open-source models like Llama 2, Mistral, and Falcon, knowing how to fine-tune them with custom data is valuable for companies.

[02:30]

Techniques Covered: PEFT and LoRA

Parameter Efficient Transfer Learning (PEFT) and Low-Rank Adaptation (LoRA) are used to fine-tune large models efficiently.

[03:33]

Installing Required Libraries

Libraries include accelerate, peft, bitsandbytes for quantization, transformers, and trl.

[04:42]

Understanding PEFT

PEFT freezes most weights of the LLM and retrains only a subset, enabling fine-tuning with limited resources.

[06:08]

Llama 2 Prompt Template

Llama 2 uses a specific prompt template with system, user, and assistant sections. Datasets must be reformatted accordingly.

[07:00]

Dataset: Open Assistant Guanaco

The dataset used is Open Assistant Guanaco, containing human-assistant conversations. 1,000 samples are used for fine-tuning.

[09:10]

Resource Constraints and Quantization

Google Colab's free GPU (15GB) is insufficient for full fine-tuning of 7B model. Quantization (4-bit) reduces memory usage.

[10:53]

LoRA and QLoRA Configuration

LoRA rank is set to 64, scaling parameter (alpha) to 16. Model is loaded in 4-bit precision using bitsandbytes.

[12:02]

Training Arguments

Training arguments include output directory, 1 epoch, fp16/bf16, batch size, learning rate, and cosine scheduler.

[15:00]

Loading Model and Tokenizer

AutoModelForCausalLM loads Llama 2 in 4-bit with quantization config. Tokenizer is loaded with padding and EOS token.

[18:46]

Supervised Fine-Tuning with SFTTrainer

SFTTrainer from trl is used with model, dataset, LoRA config, tokenizer, and training arguments to perform fine-tuning.

[21:59]

Training Completion and Results

Training completed 250 steps in 25 minutes on Colab. Training loss reached 1.36. Model saved as adapter.

[23:20]

Inference with Fine-Tuned Model

Using pipeline for text generation, the model answers prompts like 'What is large language model?' and 'How to own a plane in United States?'

This practical tutorial demonstrates fine-tuning Llama 2 with LoRA/QLoRA on a custom dataset. The next video will explain the theoretical intuition behind these techniques.

Clickbait Check

90% Legit

"Title accurately describes the tutorial; video delivers step-by-step fine-tuning of Llama 2 with LoRA and QLoRA."

Mentioned in this Video

Hugging Face Transformers

tool

PEFT

tool

bitsandbytes

tool

TRL

tool

Google Colab

tool

Open Assistant Guanaco

dataset

Llama 2 7B Chat

model

Tutorial Checklist

1 03:33 Install required libraries: accelerate, peft, bitsandbytes, transformers, trl.

2 04:42 Import libraries: os, torch, datasets, transformers (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig), peft, trl.

3 06:08 Define Llama 2 prompt template with system, user, and assistant tokens.

4 07:00 Load and preprocess dataset (Open Assistant Guanaco) into Llama 2 format. Use 1,000 samples.

5 10:53 Configure LoRA: set rank=64, lora_alpha=16, target_modules, lora_dropout=0.1.

6 12:02 Configure bitsandbytes for 4-bit quantization: bnb_4bit_compute_dtype=float16, bnb_4bit_quant_type='nf4'.

7 15:00 Load Llama 2 model in 4-bit with quantization config and tokenizer with padding token.

8 18:46 Set training arguments: output_dir, num_train_epochs=1, per_device_train_batch_size=4, learning_rate=2e-4, fp16=True.

9 18:46 Initialize SFTTrainer with model, dataset, LoRA config, tokenizer, and training arguments.

10 21:59 Train the model. After training, save the adapter model.

11 23:20 Use pipeline for inference: load fine-tuned model and tokenizer, generate responses to prompts.

Study Flashcards (12)

What does PEFT stand for?

easy Click to reveal answer

Parameter Efficient Transfer Learning.