[0:00] This AI is not Neotron 3 Super. No, this [0:05] is Neotron 3 Ultra, Nvidia's newest free [0:09] and open AI model, and I've been [0:11] delighted, disappointed, and confused by [0:14] it. But I think I got it now. You see, [0:17] you can look at the benchmarks all you [0:19] want, but we are fellow scholars here. [0:21] We don't just believe stuff. We test it [0:24] for ourselves. That is the way of the [0:27] scholar. So, I had an early look at it [0:29] and ran some of my experiments day and [0:32] night. First impression is that it is [0:34] incredibly fast. Blazing fast. Love [0:38] that. But then my coding experiments did [0:40] not go that well. When I ask it to write [0:43] a light simulation program, this is my [0:45] original area of research and I get a [0:47] black screen. Nothing. When I ask it to [0:50] fix it, it does a bunch of things and [0:52] same. And then I said, "Okay, let's [0:55] debug this by hand." It had some [0:57] mistakes. After fixing that, well, we [1:00] get something. But maybe it's a scene [1:02] that does not work at all. Other even [1:05] smaller systems can do this task with [1:07] relative ease. And the other thing is, [1:09] goodness, it wrote up more than a [1:12] thousand lines of code. You don't need [1:15] that much. My handwritten solution from [1:17] my research is about 250 lines and [1:20] renders this scene. Fully open source, [1:23] free for everyone, forever. Now, let's [1:26] write a realtime strategy game. Yes. Oh, [1:29] no. [1:31] Black screen again. Almost. We got a [1:33] square. But if you ask Deepseek 4 Flash [1:36] with the same prompt, you get something [1:39] really cool. But not here. So, what is [1:41] going on here? Well, I went back and [1:43] forth with Nvidia and reported some of [1:46] the issues and later there were some [1:48] improvements. But still, this kind of [1:50] coding is not something I would [1:52] personally use this for. So I said, you [1:54] know, maybe let's not use this AI. But [1:57] then I thought, wait, it is super fast [2:00] and probably good at other things. So I [2:02] gave it aic things. Fixing broken [2:05] installations on my machine from the [2:07] terminal, excellent. Whipping up quick [2:10] experiments, organizing files, [2:12] excellent, super fast. And over time, I [2:15] found myself reaching out to it more and [2:18] more. And I found it to be useful [2:21] basically for everything other than [2:23] challenging coding tasks. Now that is [2:26] excellent because this might be the [2:28] openest AI model ever. Weights are open. [2:31] The research paper on how it was made is [2:34] open. Training data and recipes are [2:36] being released at least for the [2:38] redistributable parts. Now that is [2:40] pretty crazy. Now hold on to your papers [2:43] fellow scholars because it gets even [2:45] better. Licensing. Super important [2:48] question, very overlooked. We are always [2:51] hoping for Apache 2.0. This is the do [2:54] whatever you want license. For me, this [2:57] is 10 out of 10. Now, Nvidia started [3:00] publishing their models under their own [3:02] proprietary license, which I would rate [3:05] 7 out of 10. Derivative works and [3:07] commercial use is fine. On the other [3:09] hand, it needs a bit of attribution and [3:12] a little stricter on patent grants. Now, [3:15] this has the open MDW license. This is [3:19] basically Apache 2.0 tailored for [3:22] machine learning weights. This is [3:25] absolutely fantastic news. Glorious. I [3:29] think this might be a 9 out of 10, maybe [3:32] as close to 10 out of 10 as you can get [3:35] from a big company like Nvidia. Allows [3:38] basically everything, but less battle [3:40] tested. And my understanding is that if [3:42] you sue claiming this model infringes [3:45] your rights, you lose the license. Huge [3:48] improvement. Double thumbs up. Thank [3:50] you. Now, can you run it yourself? Hm. [3:53] Um, yes and no. Yes, because completely [3:56] open. Download it. It is yours forever. [3:59] No limits, no funny business. However, [4:02] no, because I would love to run it [4:04] locally, too. But it's huge. 550 billion [4:09] parameters. You need hundreds of [4:11] gigabytes of GPU memory for that. This [4:13] is why I will probably use it on Lambda. [4:16] Also, 1 million token long context [4:18] window. [4:20] Great. Have a larger code base with a [4:22] bug hiding somewhere. No worries. [4:25] Massive box. Easy. Okay. How about [4:28] images and videos? Well, it does not [4:30] have vision capabilities. Not multimodel [4:33] text only. Oh man, how much I would love [4:39] a multimodel version of this. Goodness, [4:41] please. [4:43] Okay, and I also had a realization. You [4:45] don't need one model to do everything. [4:48] You need a roster of models that cover [4:51] your use cases. For instance, I can't [4:54] add vision capabilities to Neatron 3 [4:56] Ultra, but I can bolt Gemma 4 to it with [5:00] a screwdriver. It's like a seeing eye [5:02] dog guiding a smarter blind man along. [5:06] It is hilarious and it kind of works. [5:09] Kind of. So, we finally have more [5:12] competition in the open AI model space [5:14] and that is glorious. So, how does it [5:17] work? Well, one trick is that it is [5:19] huge, but not all of it runs at once. [5:23] 550 billion parameters total, but only [5:26] about 10% of that is active per token. [5:30] These are specialist mini brains that [5:32] are being activated at a time. We call [5:34] that mixture of experts. But you wise [5:38] fellow scholars know that already. So [5:40] what else? Now they also use mambber [5:43] layers. Why member? Is this like a snake [5:46] or like the fruity chew? I don't know. I [5:50] don't even know why I brought this up. [5:52] So what do these do? Well, traditional [5:54] AI systems have a bit of a memory [5:57] problem. They work like a student who [6:00] constantly rereads the textbook over and [6:03] over again when they are given a [6:05] question. But memory is precious. So [6:08] instead read the book only once and take [6:11] highly compressed notes. So this kind of [6:14] memory remembers important details about [6:17] the conversation. However, it is also [6:19] smart enough to throw away the filler [6:22] words. Thus, this system can process [6:25] massive amounts of data efficiently. It [6:28] also uses low precision numbers, so you [6:30] have to do less number crunching when [6:33] running this. They call it NVFP4. And [6:36] this doesn't rely on predicting tokens [6:38] one by one. No, it has multiple heads [6:42] that draft multiple future tokens at the [6:45] same time. Once again, many things that [6:48] make it blazing fast. And we get all of [6:51] this for free forever. What a time to be [6:54] alive. Thank you to everyone who worked [6:57] on this and absolutely everyone [6:59] everywhere who is working on open-source [7:01] projects and open models. You are all [7:04] heroes. And look, this system is great, [7:07] but it could be tiny. It could be bad, [7:10] ugly. I don't care. As long as it is [7:12] open science and open models, it pushes [7:15] humanity forward. Thank you. What a time [7:18] to be alive. Here you see me running the [7:21] full Deepseek AI model through Lambda [7:24] GPU cloud. 671 [7:28] billion parameters running super fast [7:30] and super reliably. This is insane. I [7:34] love it and I use it on a regular basis. [7:37] Lambda provides you with powerful Nvidia [7:40] GPUs to run your own chatbots and [7:43] experiments. Seriously, try it out now [7:45] at lambda.ai/papers AI/papers [7:48] or click the link in the description.