[0:00] Looking to build a local AI powerhouse
[0:02] but don't know where to start? Then this
[0:04] is going to be a great guide for you as
[0:05] we put together a quad GPU setup today
[0:08] that really does have some excellent
[0:10] performance and is using some great
[0:12] cost-effective consumer desktop parts.
[0:14] Especially if you're looking at running
[0:15] multiple GPUs, you want to optimize what
[0:18] you're spending so you spend the most
[0:20] money on GPUs and not the rest of the
[0:22] system. If you're looking for LLM
[0:25] inference, this is going to be a great
[0:27] build for you. We're going to be able to
[0:28] run four 3090s on our AM5 platform. I'm
[0:33] going to show you some benchmarks. We're
[0:34] going to talk about the parts and
[0:36] components, and I'm also going to
[0:37] present you an alternative if you
[0:39] already have an AM4 system and some DDR4
[0:42] laying around that can save you a ton of
[0:44] money. Let's get started. The system
[0:46] we're putting together is going to be
[0:47] based off of a B650 Eagle AX. And if you
[0:51] are a owner of a AM4 platform, I'm also
[0:53] going to show you the B550, a really
[0:55] good option to save some cost.
[0:57] Additionally, of course, we're going to
[0:58] be using our Quad 3090s. These still are
[1:01] for the performance you get some of the
[1:03] best bang for the buck that you can get
[1:05] out there. We're going to look at all
[1:06] this on a cost per gigabyte sheet, and
[1:08] that will help illustrate that pretty
[1:10] well. As far as the rest of the
[1:12] components, I picked up during a Prime
[1:14] special, a Samsung EVO Plus 990. That's
[1:17] a 1 TB NVME we'll be throwing in here.
[1:20] We've also got our AMD 9600X.
[1:24] One of the cheapest ways to get into the
[1:25] DDR5 class systems. The RAM that we're
[1:27] going to be using is some GSkill Trident
[1:29] Z5. All of this is going to be going
[1:32] into our GPU rig frame. The first thing
[1:35] that we're going to do is check out the
[1:36] motherboard. This is the Gigabyte B650
[1:38] Eagle AX. And the reason this
[1:42] motherboard is one to consider strongly
[1:46] is the fact that it has
[1:54] four PCIe full width slots. That's
[1:57] pretty cool and very unusual in a AMD
[2:01] desktop system. Now only the first one
[2:04] is going to operate at the full x16. The
[2:07] rest are going to operate at X1 for
[2:10] inference work. That's fine. And this
[2:12] also allows you to have a lead GPU. And
[2:14] these GPUs here would not be useful for
[2:17] something like doing video generation
[2:19] where you need full bandwidth, but this
[2:21] one could still perform very well.
[2:23] You've got pretty much your standard
[2:24] everything else on it. It's not really a
[2:27] super fancy motherboard, but in the
[2:30] 150ish price range, it is also
[2:33] rather affordable. And we're going to be
[2:35] populating this with our Ryzen 5 9600X
[2:38] for about $190 and $195 does allow us to
[2:42] get all the benefits that you're going
[2:44] to get with your 9000 series and one of
[2:46] the cheapest prices that you can
[3:14] Next, we're going to go ahead and pull
[3:16] out our Samsung 990 EVO Plus.
[3:20] Now, this will negotiate 5 at 2 or 4x4.
[3:25] It is a cheap, but for our purposes,
[3:28] it's going to be just fine in VME.
[3:45] So, the RAM that I've got here is some
[3:47] Gskill. This is 6000 speed, DDR5, and
[3:51] it's 64 GB of it. Now,
[3:54] I don't think that it's wildly important
[3:56] to get the fastest RAM out there. I
[3:58] would recommend not spending money
[4:00] trying to do that. Instead, go for
[4:02] volume, always, and your RAM. Make sure
[4:04] it does have AMD Expo, though, on it so
[4:06] that you can just really quickly click
[4:09] click and have everything tuned and set
[4:11] so you can get optimal performance. And
[4:14] I've got links to all this stuff on the
[4:16] website with the written article with
[4:18] highresolution photography of all of
[4:20] these parts and components which should
[4:21] help you if you are putting this
[4:23] together. And so we want to populate
[4:25] this in A2 and B2 for the slots.
[4:44] And we're going to put just a little
[4:47] P-s size drop right in the center. So,
[4:50] this is a good piece of information for
[4:52] you. If you're putting a desktop
[4:54] motherboard, which has the CPU kind of
[4:57] centered a little bit further this
[4:58] direction into this frame, it's going to
[5:02] absolutely be different than if you're
[5:04] using a gigantic server motherboard like
[5:06] this, like we had used for the quad rig
[5:08] in the past, which had all the PCIe
[5:11] separated much further down this way and
[5:13] the CPU situated up this way. So, if you
[5:16] put this in, you can see that yeah, a
[5:18] tower CPU like an SP3 cooler could go
[5:22] here and blow out that direction. But
[5:24] for this, we needed a low profile
[5:26] cooler. I've got this one temporarily in
[5:28] here. It's going to be fine for now.
[5:30] It's a 65 W CPU, so it's not really
[5:33] going to generate a ton of heat. And
[5:35] I've got a ordered one that I'll be
[5:37] replacing this one with. You can find
[5:39] the link to that in the description
[5:40] below. All right.
[5:52] And there we go. And it's good to have a
[5:55] little extra power switch here.
[6:00] Since we're not in a real case, this
[6:01] will make it easy to turn it on and off.
[6:12] And both of those I'll gather up and put
[6:15] underneath there like that. When you're
[6:16] looking at your PCIe power delivery,
[6:20] this is an important consideration when
[6:22] you're using a desktop component. This
[6:24] motherboard delivers the full 75 watts
[6:26] to this top slot, but it does not
[6:28] deliver the full 75 watts to the
[6:30] remaining three slots. So, this can
[6:33] cause a problem for the power delivery
[6:35] to really powerful GPUs like 3090s.
[6:39] There are some ways around this. One of
[6:41] those ways is to use PCIe powered
[6:44] risers. Now, this is an option that if
[6:46] you do end up going that route, you
[6:48] probably want to have high-owered GPUs
[6:51] in the first place to necessitate it.
[6:53] And you also want to get ones that have
[6:54] six pin or moax power. And you do not
[6:57] want to get the ones that have the SATA
[6:59] power to them as well. You can also
[7:02] overcome this by just using some
[7:04] traditional methods that are going to
[7:05] shard the model for the LLM across the
[7:08] GPUs and distribute the workload evenly
[7:11] amongst the GPUs. This if you are
[7:13] looking at a quad GPU setup like this
[7:16] effectively means that you get about a
[7:18] quarter of the full utilization. So
[7:21] you're talking about 350 W GPUs. Each
[7:23] one of those only able to run at a
[7:25] quarter its performance. This does not
[7:27] hold true in certain LLM runners like
[7:30] VLM which are built to maximize every
[7:33] last little ounce of performance out of
[7:35] a system with great diminishing returns
[7:37] as you get approaching the edges. So if
[7:40] you are thinking of using VLM, you
[7:42] definitely would probably want to
[7:43] consider you're going to want a lot of
[7:45] RAM to augment your system and you're
[7:47] also going to definitely probably want
[7:49] to consider going towards a server grade
[7:51] component. If you are however only
[7:53] looking at O Lama, LM Studio and Llama
[7:56] C++, this is a great option for you.
[7:59] They will actually spread out the
[8:01] workload very evenly and this can be
[8:04] overcome with a generous power level
[8:06] applied. So for our GPUs here, if we
[8:09] apply a power limit of about 175 on the
[8:12] remaining three slots that are the lower
[8:14] ones, we can overcome any power outage
[8:16] potential. This also does indicate that
[8:19] if you are looking for a really good
[8:21] cost performance ratio, finding GPUs
[8:24] that are in the 175 watt category,
[8:26] things like a 5060Ti could be a great
[8:29] option as well. So for inference
[8:32] workloads, you won't be pushing your
[8:34] GPUs at 100% power utilization. Whatever
[8:37] the number of GPUs you have is, it'll
[8:39] actually usually be divided by that
[8:41] number. So in this instance, divided by
[8:43] four. And if you look at the 350 watt
[8:46] TDP that these typically have, it's
[8:48] going to be if you set it to 225 for a
[8:51] power limit or 200, perfectly fine to
[8:53] run off of a single cable. I would
[8:56] definitely say if you want to run
[8:57] multiple GPUs that are high power and
[9:00] you want to do things like image
[9:02] generation, training, or other tasks on
[9:05] them, make sure you have two independent
[9:07] connectors going to each one of these.
[9:10] But if you're just doing inference like
[9:12] we're going to set this up for, you
[9:13] don't necessarily have to worry about
[9:15] that because like I mentioned, the power
[9:17] level will be divided by however many
[9:18] GPUs you've got. Let's get these wired
[9:20] up.
[9:22] So, I've got really high quality risers
[9:24] here. These would work for most of the
[9:27] use cases that you have out there. They
[9:28] are gen fourspec. There are of course
[9:30] gen 5-spec ones out there as well, but I
[9:33] would urge you to if you are looking at
[9:35] PCIe 3x1, consider that you could go
[9:38] with much lesser risers in that scenario
[9:41] and be okay. So, for these it'll be
[9:43] fine. I'm just using them because I've
[9:45] got them here handy and they're going to
[9:47] demonstrate really well. But definitely
[9:49] check the description below for some
[9:50] links to some ideas for different ones
[9:52] that are significantly less expensive
[9:55] than these. These are pretty expensive
[9:57] still. So, let's get these separated
[9:59] out. But I got two long ones and two
[10:00] short ones. And so we're going to have
[10:03] short one, short one, long one, long one
[10:05] as far as the arrangement. So we'll get
[10:08] that first GPU plugged up here.
[10:40] looking nice.
[10:43] Very nice. Very nice. And you do have
[10:46] two M.2s. Now, sometimes these are
[10:48] shared with some of the PCIe slots, so
[10:50] I'm not sure if these are going to stay
[10:52] functional. Really, all that we've got
[10:53] left to do is power it up and get the
[10:56] system installed, do some benchmarking,
[10:58] and then we'll go through some of the
[11:00] components. Like I mentioned, if you do
[11:01] have AM4 and DDR4 laying around, you can
[11:05] go substantially cheaper. We're also
[11:07] going to get wattage on this while we're
[11:09] running it so that we can have a pretty
[11:11] good idea about what the operating
[11:13] expenses look like.
[11:16] And power on.
[11:21] Okay. So, go ahead, hit yes. So, you can
[11:24] see we've got our 64 gigs of RAM. We've
[11:26] got our R5 9600X. Got our 4800. It's
[11:30] reading as and that's because our X expo
[11:33] is not set. With the Expo set here, we
[11:35] should be okay. You can see here we've
[11:37] got PCIe 4 at 16, at 1, at 1, and at 1.
[11:43] If you haven't taken a chance yet,
[11:44] follow along with the guides on getting
[11:46] set up with Open Web UI, Olama, and
[11:48] Llama C++. Let's jump in and start doing
[11:51] some benchmarks. We have our Llama C++
[11:53] benchmarks in and we're going to run our
[11:56] Olama benchmarks on the system and this
[11:58] will give us a really good idea of on
[12:00] the same machine the difference in
[12:02] performance between using O Lama and
[12:04] Llama C++. And I urge you to follow
[12:06] along with some of the guides on the
[12:08] website to get yourself up and running
[12:10] with at least Llama C++.
[12:13] And we'll run that against GPTOSS 12B.
[12:22] And my script here runs three
[12:24] iterations. So you can get a pretty good
[12:27] reading of that.
[12:32] And we got 1785 prompt processing eval
[12:36] speed tokens per second and 102.20
[12:39] generation tokens per second. So, I've
[12:42] already got these things filled in, and
[12:44] I wanted to talk about the results of
[12:46] this so that we could see really what
[12:48] this looks like as far as a head-to-head
[12:50] comparison on the same machine between
[12:52] Olama and Llama C++. And I've got some
[12:56] charts that I think will help illustrate
[12:57] that pretty well. Click that really
[12:59] quick. Sorry for the uh insane
[13:02] brightness of this. So, our prop
[13:04] processing up here, our text generation
[13:06] down here, and our Olama Ryzen system
[13:10] versus the Llama C++ on the same Ryzen
[13:13] system. You can see that the prompt
[13:15] processing is faster across the board.
[13:18] Yet, there are some that are going to be
[13:20] significantly faster. Uh, definitely if
[13:22] you start looking at Quinn 3, it's a
[13:24] little bit tighter. Also, Gemma 3, a
[13:26] little bit tighter. But you can see that
[13:28] definitely when you get to GPTOSS, there
[13:31] are big gaps. So if you are interested
[13:34] in running GPTOSS 20B or 12B, Llama C++
[13:38] really will extra give you some extra
[13:40] tokens per second. Now on text
[13:42] generation, you can see the Quinn 3 over
[13:44] here, the A3B 32B. Pretty good results
[13:48] that we got there. 120 on the Ryzen rig
[13:51] uh for Olama, but really good 132 on
[13:54] Llama C++. Now I would say Gemma 3 very
[14:00] dense, very hard to process. So going
[14:02] fast with Gemma 3 is just very difficult
[14:05] to do with most of the runtimes out
[14:07] there and 26 tokens versus 26.5 tokens
[14:11] on the token generation side. So, not a
[14:14] huge difference there, not statistically
[14:16] significant, but once again, you get
[14:17] over to the GPT OSS's 136 to 178, like a
[14:22] massive difference. And also 100 to 125.
[14:27] Huge difference. So, if you're
[14:29] interested in running the script that
[14:31] I've got here, drop a comment down
[14:33] below. And I'm going to put this up. I
[14:35] vibe coded this. So, if it blows up, not
[14:37] on me. Uh, but definitely I'm going to
[14:39] release this somewhere so you guys can
[14:41] run this also open source it. Open vibe
[14:43] it. That should be what we call it. I
[14:45] think there's a pretty good reason to go
[14:47] with llama C++. I think I've made the
[14:50] case here. Now, let's compare just one
[14:53] other thing. So, comparing the Epic rig
[14:56] with a Lama to Llama C++, which is of
[14:59] course faster on the Ryzen rig. Now, you
[15:02] can see these are huge spreads on the
[15:06] prompt processing. And I would say don't
[15:08] look at this one and infer anything from
[15:11] it. This is actually I think not the
[15:14] right amount of prompt processing
[15:16] tokens. So I wanted to be consistent
[15:18] between the two and I'll need to go back
[15:19] and rerun this test uh with the tools
[15:23] that I have created here. But definitely
[15:25] on the text generation side I think you
[15:27] actually have a very good result that is
[15:30] accurate and 104 versus 132. Definitely
[15:34] there we see the Ryzen rig beating out
[15:36] the epic rig. Now, Llama C++ needs to be
[15:40] head-to-head with Llama C++ on the
[15:42] different rig. And as soon as I take
[15:44] this apart, I will definitely be putting
[15:45] that together. And I've got a huge rig
[15:47] that I'm building that is like, this is
[15:49] going to be crazy. Definitely make sure
[15:50] you like and subscribe. 25.4 to 26.5.
[15:53] Again, Gemma 3 27B. I use it all the
[15:56] time. Definitely a very hard model to
[15:59] run and process. GPT OSS 20B50ish.
[16:04] 50. Yeah. Yeah. This is huge. So
[16:07] definitely again the Ryzen rig kills
[16:10] because of that single thread
[16:12] performance and that does give you a
[16:14] distinct advantage over something like a
[16:15] 7702 and on the Lama Epic rig 92.8
[16:20] versus 125. So, while you can get lots
[16:24] of cheaper RAM and run bigger models,
[16:27] which is actually a benefit in and of
[16:28] itself that is not to be trivial, uh,
[16:32] you know, it it's worth it while do it.
[16:34] And also, you get all those great power
[16:36] delivery features that you have with
[16:38] wonderful, very well-gineered
[16:40] motherboards versus desktop class. Uh,
[16:44] you also at the same time don't get the
[16:46] single thread performance that you get
[16:48] with something like a Ryzen. So, it's
[16:50] not really surprising to me to see this
[16:53] system be quite a performer. And I
[16:56] expected that because of the fact that
[16:57] it has a very good single thread speed,
[17:00] better than the 7702. And as we saw in
[17:04] prior testing, it definitely has an
[17:06] impact. So, the power utilization on
[17:08] this system in a very untuned state
[17:10] here, I've done literally nothing to try
[17:12] to tame this is about 150 watts idle.
[17:16] That's actually higher than the server
[17:18] setup. That's interesting, right?
[17:19] Especially since you've got a BMC and a
[17:22] lot more sticks of RAM on there. If
[17:24] you're looking at this fan, this uh CPU
[17:26] cooler is going to be replaced. A much
[17:29] better option than this is going to be
[17:31] on the way. And it should lower the
[17:33] noise. Hopefully lowers the temperatures
[17:35] also so the fan doesn't have to run at
[17:37] an audible level the entire time.
[17:39] Certainly having a fan wall along the
[17:41] back of it that is blowing air into it
[17:44] so that the motherboard does stay nice
[17:46] and chill isn't a bad option to do
[17:48] either. I've got that set up with those
[17:50] little Silent X orange kind of well
[17:52] yellow fans. And so definitely this
[17:54] isn't as quiet as when I had the water
[17:57] cooler on just the server motherboard.
[17:59] So I would say if you did have a water
[18:01] cooler, the prior mounting guide that
[18:03] I've got linked on this page definitely
[18:06] should be something that you check out.
[18:07] I've got some modifications you can make
[18:09] that allow you to fit extra-lar GPUs and
[18:12] also fit in a water cooler at the top.
[18:14] Pretty cool, very silent, and it was
[18:17] definitely a bit overkill, but keeping
[18:19] the noise down is a big thing to me. So,
[18:21] if it's a big thing to you, also, you
[18:23] might want to consider that. If you're
[18:25] looking at whether or not you would be
[18:27] able to fit five cards onto here, maybe
[18:29] you've asked yourself that. Let me show
[18:31] you something that is a little bit
[18:32] crazy. This is the B550. Now, I
[18:36] mentioned this kind of early on. This is
[18:39] the B650. This is a newer build. This is
[18:42] DDR5. If you have DDR4, if you've got an
[18:45] AM4 CPU, I've got an AM4 CPU that I'm
[18:49] going to be putting onto this, and it's
[18:50] got DDR4, and it's sitting over there
[18:53] right now, but this is such a cool
[18:54] board. I don't know how I missed this
[18:56] board up until now, but this, as a
[18:58] consumer board, might be one of the
[19:01] coolest boards I've ever seen. Why?
[19:04] because it has five freaking full wide
[19:07] slots. Freaking awesome. Now, I'm going
[19:10] to put a 5950X on it and it's going to
[19:13] go into a server case. One of the really
[19:15] cool things about this also is this
[19:17] allows you to have the opportunity to
[19:19] get really high performance networking
[19:21] on a desktop class system. Something
[19:23] that's actually pretty hard to do. Would
[19:24] that take and necessitate one of the
[19:26] slots, probably the top slot, if you
[19:29] really wanted to go fast? Well, it
[19:30] would. But even if you look at the speed
[19:32] you could attain with a dedicated
[19:34] network card on any one of these other
[19:36] slots that are going to operate at a 1x,
[19:39] you'd be able to easily run 10 gigabit
[19:40] networking. Pretty cool for a $99
[19:44] motherboard. So if you've got an AM4
[19:46] system, this is probably a good way to
[19:48] consider. It also saves a lot of cost on
[19:51] new RAM and of course new CPU, but it's
[19:53] also a cheaper motherboard of course at
[19:55] $99 versus about $150. Not that huge of
[19:58] a difference, but five slots, that
[20:00] really is actually pretty cool. So, I
[20:03] would say that one is definitely a
[20:06] strong consideration if you have an AM4
[20:08] system. Right now, if you had to go out
[20:10] and buy DDR4, it's almost the same price
[20:13] as DDR5. Like, the prices on RAM right
[20:16] now are kind of crazy. So, that is kind
[20:18] of me helping you. And as this video
[20:20] ages, it probably isn't going to stay
[20:22] that way more than like six months is my
[20:24] guess. But probably the next six months
[20:25] is going to be a tight period of time
[20:27] for buying system RAM. So pretty cool
[20:30] consideration, good alternatives. I like
[20:32] to try to save people money. You can
[20:34] find links to that on the digital
[20:35] spaceport.com website. Links to that
[20:37] article in the description below and
[20:39] pinned comment. So, while I've got the
[20:42] 3090s in here, certainly this
[20:45] motherboard, in my opinion, has kind of
[20:47] a good use case for having a strong lead
[20:50] GPU that would be good for doing things
[20:52] like image generation, video generation.
[20:54] Most of that's going to run best on just
[20:55] a single GPU. So, having a 24 GB GPU, if
[20:59] you can find one, is going to yield the
[21:00] best that you can get. It will also give
[21:02] you better quality because you need
[21:05] really 24 GB to get the best out of what
[21:08] is available to run. Honestly, you need
[21:10] 80. But getting a GPU with 80 GB of
[21:12] VRAMm in it is insanely expensive, like
[21:15] way more than all of this. Now, the
[21:17] remaining three GPUs in a 4GPU kind of
[21:20] setup, which again, that's how many GPUs
[21:22] I've got cuz I've got a freakish amount
[21:24] of GPUs, but you could definitely start
[21:26] with just one or two in a very similar
[21:28] system to this and grow it throughout
[21:29] time. The AMD 960 XT16 GB and also the
[21:34] 5060Ti, a pretty decent option. We're
[21:37] going to look at these in the comparison
[21:38] sheet and what the prices would look
[21:40] like for doing a quad rig setup of that
[21:42] versus 3090s. And also I'm going to put
[21:45] the 3060 12 GB in surprising one around
[21:47] that. So I think you definitely want to
[21:49] stay tuned. We're going to jump into
[21:51] that sheet here and take a look at some
[21:53] of the performance that you get for the
[21:55] dollars that you spend. We'll start off
[21:58] looking at the AM5 build. That is the
[22:01] one that we put together here. And you
[22:03] can see definitely the DDR 64 GB of DDR5
[22:07] not cheap as well. You've got a bunch of
[22:10] other components. And like I mentioned,
[22:12] check out that AM4 build. Uh we'll get
[22:14] to that sheet here after this. So, as of
[22:17] the time of me putting this together,
[22:20] we're looking at right around $1,156
[22:23] just for the base system components.
[22:25] Now, this gets interesting if you look
[22:27] at a dual GPU and quad GPU option setup
[22:31] for this rig. So, we'll start with the
[22:33] 3060s. And with two of those at about
[22:36] 225 each, you're going to have 24 GB of
[22:39] VRAM. Total system cost of $166.
[22:43] And your price per GB of VRAM is $66.92.
[22:47] Might not seem bad, but just keep
[22:49] watching. Now, as you move on to your 16
[22:52] GB cards here, your third 960 XT 16 GB
[22:56] and your 5060 Ti 16 GB, you do bring in
[23:01] a bit more for the Nvidia. So, that's
[23:03] $1,896
[23:04] and $2,16 respectively, but the price
[23:08] per gigabyte of VRAM is actually sub $60
[23:11] here for the 9060 XT at $59.25.
[23:17] The Nvidia is $63.
[23:20] The surpriser for a lot of people might
[23:22] be the cost per gigabyte of VRAM for 24
[23:26] gigabyte GPUs. And the reason why is
[23:28] their cost per gigabyte per GPU is low.
[23:31] And that is because you see $750, that's
[23:34] about $1,500 for two of them. 48 total
[23:37] gigabytes of VRAM. And there is a lot of
[23:40] additional benefits to having more and
[23:42] more VRAM. So definitely consider this.
[23:45] The system cost overall does go up to
[23:47] 26.56, but the price per gigabyte of
[23:49] VRAM goes down to 5533.
[23:52] And when you add additional GPUs on each
[23:55] incremental add addition uh kind of
[23:58] helps defay some of the cost. It's the
[24:01] most optimal to get as many GPUs writing
[24:04] on a single system as possible. So if we
[24:07] go in the same order again, you'll see
[24:09] that the totals for the VRAMm double. So
[24:12] we're at 48,64 and 96 as well. The costs
[24:16] go up and that is 256
[24:20] all the way up to 4156.
[24:22] However, the price per gigabyte of VRAM
[24:25] drops quite a bit. So all the way down
[24:27] to 4238,
[24:29] 4119,
[24:31] 44.97, and 4329.
[24:34] So there is a lot of very good
[24:36] considerations that you want to make if
[24:38] you are looking the number of GPUs and
[24:40] the price per gigabyte because the
[24:42] gigabytes of VRAM is what you should
[24:44] always optimize for. I get this question
[24:47] definitely the most frequent question on
[24:48] the channel. And yes, you always want to
[24:51] go for more gigabytes of VRAM regardless
[24:54] if you go for a single GPU. Always
[24:57] optimize for as many gigabytes as
[24:59] possible and also never go under uh 12.
[25:03] really you don't want an 8 GB GPU. It's
[25:05] it's not going to be a good performer
[25:06] for you for the most part. Now, let's
[25:08] move on to the I think kind of more
[25:10] exciting option, which is you happen to
[25:13] have an DDR4 and AM5 CPU laying around.
[25:17] I think this is cool because it lowers
[25:19] that cost. And I mean, you know, there's
[25:21] a lot of these other components you
[25:22] probably could have laying around also
[25:24] is my guess. You're looking at like
[25:26] $650.
[25:28] That's pretty affordable, honestly. So,
[25:31] if you especially if you just need the
[25:33] motherboard, if you look at the cost per
[25:35] gigabyte of VRAM, we'll look at dual
[25:38] quad and since it actually has five
[25:40] slots, we'll actually take a look at
[25:41] five also. You you're looking at $45.83
[25:46] for the 3060 all the way down to $4479
[25:51] for the 3090. The 5060Ti
[25:55] is 4719 and the 960 XT 16 GB setup with
[26:01] two of those comes out to 4344.
[26:04] So you edge out a bit on the cost per
[26:07] gigabyte of VRAM by going with the 9060
[26:10] XT. However, I think it's very close and
[26:13] a very compelling argument to also
[26:14] consider a 3090. The interesting one is
[26:17] the 5060 is probably one of the weaker
[26:21] card recommendations that you would be
[26:23] looking at in most of these setups if
[26:25] you wanted to scale out. I would also
[26:27] say you're definitely with 442 GB of
[26:31] performance limiting system bandwidth,
[26:34] which that's always going to be the big
[26:35] impactor. That's, you know, same as with
[26:37] the DGX Spark, same with framework. I
[26:39] mean, it the system bandwidth is always
[26:41] going to be your dictator of where
[26:43] you're going to run into slowdown. And I
[26:46] mean the 5060 Ti 442 gigabytes per
[26:49] second, that's actually pretty
[26:51] respectable compared to what you get
[26:53] with the 9060 XT, which I believe is 322
[26:58] GB per second. Quite a big difference.
[27:01] So as you add GPUs, your speed will not
[27:05] increase the more GPUs you add unless
[27:07] you are doing a specific type of
[27:08] parallelism. But most of the time you
[27:11] want to run the bigger models. So it's
[27:12] going to shard the model across. And as
[27:15] it does that, it's going to
[27:17] unfortunately be limited to the
[27:19] performance of the GPU that is the
[27:21] slowest. So if you tossed a bunch of
[27:24] 3090s into a rig and then you tossed one
[27:26] 3060, you're going to be going pretty,
[27:30] you know, you're not going to like
[27:31] yourself because that 3060 is going to
[27:32] be your performance dictator. So, I
[27:35] would urge you to also consider that if
[27:37] you get into weird mixtures of GPUs that
[27:39] are too weird, sometimes it can have
[27:42] negative impacts. Looking at the quad
[27:44] GPU setup, you have total system costs
[27:47] ranging from $1,550
[27:49] for the four 3060s all the way up to
[27:53] 3650 for four 3090s. So, 96 GB of total
[27:59] system VRAM, 3650. That's really I think
[28:03] pretty compellingly. I mean that's
[28:05] that's good. And your price per gigabyte
[28:08] of VRAM there is 3229
[28:10] 3328
[28:12] 3703
[28:13] and all the way up to $382.
[28:18] If you're looking at your 5GPU option, I
[28:20] mean, you know, why not? you definitely
[28:23] can see that you, you know, run into 120
[28:27] potentially gigabytes of total system
[28:28] VRAM if you have 24 GB GPUs for $4,400
[28:33] of cost. And that comes out to $36.67
[28:37] price per gigabyte. And I think probably
[28:40] an interesting one here also is the 3060
[28:42] 12 GB gets you all the way up to 60 GB
[28:45] of total system VRAM. And at a cost of
[28:48] $1,775,
[28:50] that comes down to sub $30. That's
[28:52] $29.58.
[28:54] So a 4 + one good GPU setup could be
[28:59] compelling. Uh those kind of things I
[29:02] think are pretty interesting. And I
[29:04] think this is cheaper by a pretty good
[29:07] margin than what you can get for servers
[29:09] right now. mainly because of the
[29:11] insanity that is DDR4 ECC pricing which
[29:15] has gone through the roof recently.
[29:18] Yeah, definitely not going to get that
[29:20] setup for 3650 looking closer to about
[29:25] 5200 525. Uh that's getting lucky with
[29:29] some RAM also. So, when you're
[29:31] considering alternatives, I would
[29:33] definitely say this is a fairly decent
[29:35] one to consider, especially given that
[29:37] you have a variety of price points based
[29:38] upon how much VRAM and which type of
[29:40] GPUs you're putting in. You can also go
[29:42] all the way back to Pascal. Technically,
[29:44] you can go back to Maxwell, but both of
[29:46] these card generations are at the cusp
[29:48] or being retired from the driver
[29:50] support. This isn't actually as bad as
[29:53] it might sound, but definitely moving
[29:55] forward, new features that come out may
[29:57] not be supported on those cards. Those
[29:59] GPUs also sometimes don't get very good
[30:01] performance for the watts that they're
[30:02] using. The Volulta generation cards are
[30:05] just very hard to find. So, I don't
[30:07] think you're going to have a great luck
[30:09] finding any of those. They just
[30:10] seemingly are not out there in very
[30:12] significant numbers. Starting with the
[30:14] Amper lineup is pretty much where I
[30:16] think most people have the best it makes
[30:19] sense to get in. So, that's my take on
[30:22] it. And certainly do consider you don't
[30:24] have to go 3090s. you could put a
[30:26] different GPU in there just as easily.
[30:29] And so, we've covered a lot in this
[30:31] absolutely insane build, setup,
[30:34] benchmark, and valuation deep dive into
[30:37] building for newbies. So, I hope you've
[30:40] enjoyed this. And I know it's a lot of
[30:42] information. Don't feel like you can't
[30:44] go back and hit rewatch. And thanks for
[30:46] all the shares, thanks for all the
[30:47] likes, and a huge shout out and very
[30:50] much grateful to everybody who is a
[30:52] channel member. Also, the people that
[30:54] buy me a coffee. You guys really do make
[30:56] all of this possible. Everybody have a
[30:58] great day. Let me know what you think
[30:59] and I will check you out next time.