Fine-Tuning Qwen 3.5 for $11 on a Rented GPU

0h 27m video Published Apr 3, 2026 Transcribed Jul 28, 2026 CoreWorxLab

CoreWorxLab

Intermediate 6 min read For: AI developers and hobbyists interested in fine-tuning LLMs for tool calling.

AI Trust Score 90/100

✅ Highly Legit

"Title accurately reflects content: fine-tuning Qwen 3.5 on a rented GPU for ~$11 (9B model)."

AI Summary

The creator explains how they fine-tune Qwen 3.5 for tool calling using Vast AI, a cloud GPU marketplace. They detail the shift from a local RTX 3060 to rented GPUs due to increased VRAM requirements from BF16 training and complex multi-turn training data. The process includes setting up instances, running training scripts, and automating model download and instance shutdown to minimize costs.

Chapters

1 Introduction and Motivation 00:00 2 Why Cloud GPU? VRAM Requirements 01:09 3 Training Data: Multi-Turn Tool Calling 03:28 4 Avoiding Over-Tool-Calling 07:01 5 Setting Up Vast AI Instance 10:27 6 Running Training and Automation 14:21 7 Cost Analysis and Conclusion 20:01

[00:42]

Shift from local to cloud GPU

Started with Mistral 3 on a 12GB RTX 3060, but outgrew VRAM when switching to Qwen 3.5 and more sophisticated training data.

[01:22]

Why Qwen 3.5 requires more VRAM

Unsloth recommends full BF16 training for Qwen 3.5, not QLoRA 4-bit, due to quantization differences. This increases VRAM demand.

[02:28]

Training data complexity

Moved from single-turn examples to multi-turn conversations including system prompt, tool definitions, and history, pushing VRAM usage to 75GB.

[03:28]

Using Vast AI

Vast AI is a marketplace for renting GPUs. The creator uses it to access high-VRAM GPUs like A100s and H100s.

[04:09]

Training data example

A three-step tool call: look up F1 race via ESPN, find Sarah's email via Google Contacts, send email via Gmail. The model learns to chain tools.

[07:39]

Avoiding over-tool-calling

Initially the model called tools for everything. Fixed by training only on the last message in a conversation, not on previous context.

[10:12]

Achieving 99% accuracy

With the correct training approach, the model reached 99% on tests, correctly deciding when to use tools, converse, or rely on knowledge.

[10:27]

Setting up Vast AI instance

Select base image (PyTorch), storage (48GB), GPU (e.g., 5090), and consider network speed for faster model downloads.

[14:21]

Setup script automation

A script pushes files, installs Unsloth, and provides the training command. It uses the Vast AI CLI to get SSH details.

[16:25]

Training run example

Training a 2B parameter model on a 5090 with 32GB VRAM, 2600 pairs, 2 epochs, batch size 4*2, max sequence length 8000 (noted as too short).

[18:59]

Polling script to save costs

A script polls for a done.txt file, downloads the LoRA and GGUF model, then shuts down the instance to avoid extra charges.

[22:58]

Training from an airplane

Kicked off a training run on a flight using Wi-Fi; the poll script on a local server downloaded the model and shut down the instance automatically.

[23:36]

Cost breakdown

2B model: ~$2.14 (5 hours at $0.428/hr). 9B model: ~$12 (11 hours at $1.13/hr). Total trial-and-error cost: ~$100.

[26:10]

Transferable technique

The same pipeline is used for another AI application, showing the approach is reusable.

Fine-tuning Qwen 3.5 for tool calling is cost-effective using rented GPUs, with a well-automated pipeline that minimizes manual effort and cost. The technique is transferable to other applications.

Mentioned in this Video

Unsloth

tool

Vast AI

service

Qwen 3.5

tool

Mistral 3

tool

RTX 3060

tool

PyTorch

tool

Transformers

tool

LoRA

tool

GGUF

tool

SCP

tool

tmux

tool

SSH

tool

Jupyter

tool

ZimaBoard

tool

Quadro P2200

tool

GT 1030

tool

Sarah Jones

person

Tutorial Checklist

1 10:27 Log into Vast AI, ensure credits, search for available GPUs.

2 10:54 Select base image (e.g., PyTorch) and set storage space (48 GB).

3 11:12 Select a GPU (e.g., 5090 for 2B model, A100 80GB for 9B model).

4 11:50 Consider network speed for faster model downloads; rent the instance.

5 12:49 Set up SSH key under Keys in Vast AI.

6 13:07 SSH into the instance using the provided command.

7 13:33 Install Unsloth and ensure Transformers 5 is installed.

8 14:21 Use a setup script to push training files and install dependencies, or manually SCP files.

9 16:25 Run the training script with parameters: model size, training data, output folder.

10 18:59 Run a polling script to check for completion, download model, and shut down instance.

Study Flashcards (10)

Why does Qwen 3.5 require full BF16 training instead of QLoRA 4-bit?

medium Click to reveal answer

Due to higher than normal quantization differences in Qwen 3.5 models.