Launching Oobabooga via Cloud GPU with Vast.AI

0h 09m video Published Jul 8, 2023 Transcribed Jun 14, 2026 Vast AI

Vast AI

Beginner 3 min read For: Beginners interested in running large language models on cloud GPUs.

AI Trust Score 90/100

✅ Highly Legit

"Title accurately describes the content: launching Oobabooga on Vast.ai cloud GPUs."

AI Summary

This video demonstrates how to run large language models from Hugging Face on powerful GPUs using Vast.ai and the Oobabooga web UI. The presenter walks through selecting a template, allocating sufficient disk space, choosing an appropriate GPU, and downloading models.

Chapters

1 Introduction and Template Selection 0:00 2 Disk Space and GPU Selection 1:34 3 Instance Launch and Web UI Access 5:40 4 Model Download and Usage 7:22

[0:00]

Introduction to Vast.ai and Oobabooga

The video shows how to run large language models from Hugging Face on powerful GPUs using Oobabooga as the web UI.

[0:50]

Selecting the Oobabooga Template

Select the recommended template for Oobabooga, which sets up the environment and opens port 7860 for the Gradio web interface.

[1:34]

Allocating Disk Space

Allocate at least 80 GB of disk space upfront because many models are 60-70 GB and disk space cannot be added later.

[2:22]

Matching GPU to Model Requirements

Check the model's GPU RAM requirements (e.g., Falcon 40B needs ~60 GB) and select a GPU with sufficient RAM, such as an A100 with 80 GB.

[3:42]

Selecting a GPU Instance

Choose a GPU with enough RAM, like a 1x A100 (80 GB) or multi-GPU options. Cheaper alternatives include 4x A5000 (96 GB) or A6000 (48 GB).

[5:40]

Instance Creation and Opening Interface

The instance takes about 3-5 minutes to load. Once ready, click the open button to access the Oobabooga web UI on port 7860.

[7:22]

Downloading a Model

In the Models tab, paste the Hugging Face username/model name (e.g., from the LLM leaderboard) and click download. After download, load the model into GPU RAM.

[8:48]

Billing and Monitoring

Check the billing tab to estimate credits needed for long runs and set auto-billing threshold to avoid instance stoppage.

By following these steps, you can easily run large language models on cloud GPUs via Vast.ai using Oobabooga, ensuring proper disk space and GPU RAM allocation.

Mentioned in this Video

Oobabooga

tool

Vast.ai

tool

Hugging Face LLM Leaderboard

link

Tutorial Checklist

1 0:50 Log into Vast.ai and select the recommended Oobabooga template.

2 1:34 Allocate at least 80 GB of disk space (slider to 81 GB).

3 2:22 Check the model's GPU RAM requirement (e.g., from Hugging Face) and select a GPU with enough RAM (e.g., 1x A100 with 80 GB).

4 5:40 Wait for instance to load (3-5 minutes), then click the open button to access the Oobabooga web UI.

5 7:22 In the Models tab, paste the Hugging Face username/model name and click download.

6 8:04 After download, load the model into GPU RAM and start querying in the Text Generation tab.

Study Flashcards (5)

What is the minimum disk space recommended for running one large language model on Vast.ai?

easy Click to reveal answer

At least 80 GB.