TubeSum ← Transcribe a video

Launching Oobabooga via Cloud GPU with Vast.AI

Transcribed Jun 14, 2026 Watch on YouTube ↗
Beginner 3 min read For: Beginners interested in running large language models on cloud GPUs.
32.8K
Views
61
Likes
5
Comments
4
Dislikes
0.2%
📊 Average

AI Summary

This video demonstrates how to run large language models from Hugging Face on powerful GPUs using Vast.ai and the Oobabooga web UI. The presenter walks through selecting a template, allocating sufficient disk space, choosing an appropriate GPU, and downloading models.

[0:00]
Introduction to Vast.ai and Oobabooga

The video shows how to run large language models from Hugging Face on powerful GPUs using Oobabooga as the web UI.

[0:50]
Selecting the Oobabooga Template

Select the recommended template for Oobabooga, which sets up the environment and opens port 7860 for the Gradio web interface.

[1:34]
Allocating Disk Space

Allocate at least 80 GB of disk space upfront because many models are 60-70 GB and disk space cannot be added later.

[2:22]
Matching GPU to Model Requirements

Check the model's GPU RAM requirements (e.g., Falcon 40B needs ~60 GB) and select a GPU with sufficient RAM, such as an A100 with 80 GB.

[3:42]
Selecting a GPU Instance

Choose a GPU with enough RAM, like a 1x A100 (80 GB) or multi-GPU options. Cheaper alternatives include 4x A5000 (96 GB) or A6000 (48 GB).

[5:40]
Instance Creation and Opening Interface

The instance takes about 3-5 minutes to load. Once ready, click the open button to access the Oobabooga web UI on port 7860.

[7:22]
Downloading a Model

In the Models tab, paste the Hugging Face username/model name (e.g., from the LLM leaderboard) and click download. After download, load the model into GPU RAM.

[8:48]
Billing and Monitoring

Check the billing tab to estimate credits needed for long runs and set auto-billing threshold to avoid instance stoppage.

By following these steps, you can easily run large language models on cloud GPUs via Vast.ai using Oobabooga, ensuring proper disk space and GPU RAM allocation.

Clickbait Check

90% Legit

"Title accurately describes the content: launching Oobabooga on Vast.ai cloud GPUs."

Mentioned in this Video

Tutorial Checklist

1 0:50 Log into Vast.ai and select the recommended Oobabooga template.
2 1:34 Allocate at least 80 GB of disk space (slider to 81 GB).
3 2:22 Check the model's GPU RAM requirement (e.g., from Hugging Face) and select a GPU with enough RAM (e.g., 1x A100 with 80 GB).
4 5:40 Wait for instance to load (3-5 minutes), then click the open button to access the Oobabooga web UI.
5 7:22 In the Models tab, paste the Hugging Face username/model name and click download.
6 8:04 After download, load the model into GPU RAM and start querying in the Text Generation tab.

Study Flashcards (5)

What is the minimum disk space recommended for running one large language model on Vast.ai?

easy Click to reveal answer

At least 80 GB.

2:07

Why can't you add disk space later to a Vast.ai instance?

medium Click to reveal answer

All disk space must be allocated upfront; it cannot be added later.

1:59

What GPU RAM does a 1x A100 provide?

easy Click to reveal answer

80 gigabytes.

3:58

How do you download a model from Hugging Face in Oobabooga?

medium Click to reveal answer

Paste the Hugging Face username and model name into the Models tab and click download.

7:38

What port does the Oobabooga template open for the web interface?

medium Click to reveal answer

Port 7860.

6:00

💡 Key Takeaways

🔧

Disk Space Allocation

Emphasizes the critical need to allocate sufficient disk space upfront, as it cannot be added later.

1:34
⚖️

GPU RAM Matching

Explains the importance of matching GPU RAM to model requirements to avoid failures.

2:22
📊

GPU Selection Options

Lists various GPU options with different RAM sizes and costs, helping users choose appropriately.

3:42
🔧

Model Download Process

Shows the simple process of downloading models from Hugging Face directly within Oobabooga.

7:22

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Run LLMs on Cloud GPUs in Minutes

45s

Shows how to quickly set up powerful language models on cloud GPUs, appealing to AI enthusiasts and developers.

▶ Play Clip

GPU RAM: The Key to Running LLMs

45s

Explains the critical importance of GPU RAM for large language models, a common pain point for beginners.

▶ Play Clip

Cheapest GPUs for LLMs Revealed

50s

Compares GPU options by price and VRAM, helping viewers save money while running models.

▶ Play Clip

Download Any Hugging Face Model

45s

Demonstrates the simple process to download and run any model from Hugging Face, a key skill for AI practitioners.

▶ Play Clip

[00:00] hello uh welcome to vast in this video I

[00:03] want to show you how you can run some of

[00:07] the best large language models that

[00:10] exist from hugging face or other places

[00:12] on a very powerful gpus and so

[00:18] let me get started today we'll be using

[00:21] uba Booga to as the web UI which is a

[00:25] great interface for prompting the models

[00:28] and also kind of loading and managing

[00:30] them that's some open source software

[00:32] that will will load up in an instance on

[00:34] vast

[00:36] so I'll first kind of Click into the

[00:38] console and make sure that you are

[00:41] logged into your account and that you

[00:43] have credits if you've never done this

[00:46] before with fast we have a different

[00:47] video that can go over a lot of some of

[00:49] the basics

[00:50] but for uba booga

[00:54] you're going to come in here and select

[00:57] our recommended template for that

[01:00] and uh that's gonna have the description

[01:03] here it kind of shows you which language

[01:06] models that you can run with it and uh

[01:09] it's going to have some specific options

[01:11] and an on-site script that you don't

[01:13] want to mess with that's going to set up

[01:15] your environment correctly so this will

[01:18] work it's also going to open a port for

[01:22] the open button and that will launch the

[01:25] gradio web interface so that it all just

[01:28] works so really all you need to do is

[01:30] Select that uh template

[01:34] now one of the most important things

[01:38] um

[01:39] is to make sure you allocate enough

[01:41] Discord storage I just reset my filters

[01:44] because the default is only 16 gigabytes

[01:46] which is not going to be enough

[01:49] a lot of these large language models are

[01:51] 60 70 gigabytes to download and your

[01:54] instance will start to throw errors if

[01:57] it runs out of disk space you also need

[01:59] to allocate all the disk space that you

[02:01] want to use up front for this instance

[02:03] you cannot add it later

[02:05] so with that in mind you're probably

[02:07] going to want if you're just going to

[02:08] try one language model at least about 80

[02:13] gigabytes so I'm just going to move the

[02:15] slider to 81 and get that all set up the

[02:20] other important thing to understand when

[02:22] you're running these large language

[02:23] models is to match the GPU with the

[02:27] model that you want to run for example

[02:30] if you're looking at hugging face

[02:34] hugging faces has a actual llm

[02:37] leaderboard and so you can see some of

[02:40] the most popular models here and

[02:45] um

[02:46] how you can run them

[02:48] and basically what you will need to do

[02:51] is to load these into uba Booga once we

[02:54] have that running so we'll come back to

[02:56] this

[02:57] but know that the model that you each

[03:00] one of these models that you're trying

[03:02] to run for example if you want to run

[03:05] Falcon 40 billion you need to read

[03:09] through and understand how much GPU Ram

[03:12] this is going to require because if this

[03:15] requires say

[03:17] 60 gigabytes of GPU RAM and you select a

[03:20] GPU that only has 10 gigabytes of GPU

[03:24] Ram it is not going to work so you need

[03:26] to make sure that the the large language

[03:28] model that you want to run it's going to

[03:31] have

[03:32] um uh you need to figure out what exact

[03:35] specifications it needs

[03:37] and then select an appropriate GPU

[03:42] what I like to do is to just actually

[03:44] select the GPU that has the most GPU Ram

[03:47] which is one of our a100 uh

[03:51] smx4s or pcies so I'm going to go ahead

[03:54] and select a 1x smx4 these have 80

[03:58] gigabytes of GPU Ram so they have uh one

[04:03] of the more powerful cards that are out

[04:05] right now from Nvidia and 80 gigabytes

[04:09] is is enough for most large language

[04:11] models you can also select a multi-gpu

[04:14] instance so if I needed even more space

[04:17] I could have a 2X a100 that would be I

[04:20] have 160 gigabytes of GPU RAM available

[04:23] for the large language models or I could

[04:26] select sort of a cheaper option like an

[04:29] a5000 and this 4X a 5

[04:34] 000.

[04:36] has actually 96 gigabytes of GPU RAM and

[04:41] it is a little bit cheaper than a single

[04:44] a100 you can also look at an a6000 they

[04:47] have 48 gigabytes of GPU Ram

[04:51] and an A40

[04:53] has 45 gigabytes of GPU Ram

[04:57] the consumer graphics cards like the

[04:59] 4090 and 3090 are only going to have 24

[05:02] gigabytes of GPU Ram each so again this

[05:06] is just something that you want to be

[05:08] really aware of and make sure that

[05:10] you're selecting a GPU that's going to

[05:11] have enough space so I'm going to go

[05:13] ahead and select a 1X a100 and now this

[05:18] is going to load I have 80 gigabytes

[05:21] allocated on this instance and I have

[05:25] selected a Ooba Booga web UI which is

[05:31] our recommended template and so if I

[05:34] jump into my instances here I can see

[05:36] that this is being created and set up

[05:37] for me

[05:40] it's going to take a three four or five

[05:43] minutes to load maybe a little bit

[05:45] longer it's really going to depend on

[05:46] the internet connection speed of the

[05:48] machine and

[05:50] um

[05:51] uh the size of the image this one loaded

[05:54] about three and a half minutes for me

[05:56] and now the open button is going to open

[06:00] port 7860 which was put in the

[06:04] environment variables when we set this

[06:06] up

[06:07] um and uh

[06:09] there's a few things that were installed

[06:11] and set up on the onstart script but

[06:13] anyways this is all just stuff that's in

[06:16] the template that we have set up for

[06:18] Ooba booga

[06:20] and I'm going to go ahead and open that

[06:22] interface up and here it is

[06:26] so uh here's where you can actually

[06:27] query the model that you set up the most

[06:30] important thing is that going to be

[06:32] downloading and setting up the model so

[06:35] um this software is not

[06:38] developed or maintained by vast this is

[06:43] open source software so to understand

[06:45] how to use this software you're going to

[06:48] want to find the open source project for

[06:53] this

[06:54] and and load that

[07:02] so here's the

[07:05] GitHub that's going to have a readme

[07:08] um

[07:09] the of course the installation steps you

[07:12] don't have to worry about because

[07:13] um you're using a Docker image and

[07:15] everything is is

[07:18] pre-loaded

[07:21] um

[07:22] so you can place the models into the

[07:24] model folder or when you're using the

[07:27] web UI you can just simply go to the

[07:29] models tab where it was before and

[07:32] here's where you can download the custom

[07:34] model so for hugging face you just use

[07:38] the username and model so for example if

[07:43] I wanted to try to to use just looking

[07:45] at the leaderboard if I wanted to use

[07:47] this model I would simply select the

[07:51] username and the name of the model like

[07:53] that and copy and paste it into the web

[07:55] UI

[07:57] and hit download and now it is going to

[08:00] start downloading

[08:02] this model once that model is downloaded

[08:04] I will be able to load the model in here

[08:06] into the GPU

[08:09] Ram in the instance

[08:11] sort of memory so then I can query it I

[08:14] can go back to text generation and

[08:16] actually start using it there's probably

[08:18] some other things that you can do and

[08:22] become familiar with with this interface

[08:23] this is a very nice way to run llms so

[08:27] that you don't have to use a command

[08:28] line

[08:32] um so there's quite a bit here and but

[08:34] again you're going to want to read about

[08:36] this and the ooga booga

[08:40] GitHub

[08:44] so if I go back and look at my instance

[08:48] um

[08:49] you can see that it's running

[08:51] you can also click on the billing tab if

[08:54] you just want to see you know if you're

[08:55] going to run something for multiple days

[08:57] you can get an idea of how many credits

[08:59] you're going to need you can set up your

[09:01] auto billing threshold so that your

[09:05] instance is not stopped when your

[09:08] balance gets low

[09:10] and that's the basics of running

[09:13] ubeoka on vast thanks for your time

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.