[0:00] This AI is not Neotron 3 Super. No, this
[0:05] is Neotron 3 Ultra, Nvidia's newest free
[0:09] and open AI model, and I've been
[0:11] delighted, disappointed, and confused by
[0:14] it. But I think I got it now. You see,
[0:17] you can look at the benchmarks all you
[0:19] want, but we are fellow scholars here.
[0:21] We don't just believe stuff. We test it
[0:24] for ourselves. That is the way of the
[0:27] scholar. So, I had an early look at it
[0:29] and ran some of my experiments day and
[0:32] night. First impression is that it is
[0:34] incredibly fast. Blazing fast. Love
[0:38] that. But then my coding experiments did
[0:40] not go that well. When I ask it to write
[0:43] a light simulation program, this is my
[0:45] original area of research and I get a
[0:47] black screen. Nothing. When I ask it to
[0:50] fix it, it does a bunch of things and
[0:52] same. And then I said, "Okay, let's
[0:55] debug this by hand." It had some
[0:57] mistakes. After fixing that, well, we
[1:00] get something. But maybe it's a scene
[1:02] that does not work at all. Other even
[1:05] smaller systems can do this task with
[1:07] relative ease. And the other thing is,
[1:09] goodness, it wrote up more than a
[1:12] thousand lines of code. You don't need
[1:15] that much. My handwritten solution from
[1:17] my research is about 250 lines and
[1:20] renders this scene. Fully open source,
[1:23] free for everyone, forever. Now, let's
[1:26] write a realtime strategy game. Yes. Oh,
[1:29] no.
[1:31] Black screen again. Almost. We got a
[1:33] square. But if you ask Deepseek 4 Flash
[1:36] with the same prompt, you get something
[1:39] really cool. But not here. So, what is
[1:41] going on here? Well, I went back and
[1:43] forth with Nvidia and reported some of
[1:46] the issues and later there were some
[1:48] improvements. But still, this kind of
[1:50] coding is not something I would
[1:52] personally use this for. So I said, you
[1:54] know, maybe let's not use this AI. But
[1:57] then I thought, wait, it is super fast
[2:00] and probably good at other things. So I
[2:02] gave it aic things. Fixing broken
[2:05] installations on my machine from the
[2:07] terminal, excellent. Whipping up quick
[2:10] experiments, organizing files,
[2:12] excellent, super fast. And over time, I
[2:15] found myself reaching out to it more and
[2:18] more. And I found it to be useful
[2:21] basically for everything other than
[2:23] challenging coding tasks. Now that is
[2:26] excellent because this might be the
[2:28] openest AI model ever. Weights are open.
[2:31] The research paper on how it was made is
[2:34] open. Training data and recipes are
[2:36] being released at least for the
[2:38] redistributable parts. Now that is
[2:40] pretty crazy. Now hold on to your papers
[2:43] fellow scholars because it gets even
[2:45] better. Licensing. Super important
[2:48] question, very overlooked. We are always
[2:51] hoping for Apache 2.0. This is the do
[2:54] whatever you want license. For me, this
[2:57] is 10 out of 10. Now, Nvidia started
[3:00] publishing their models under their own
[3:02] proprietary license, which I would rate
[3:05] 7 out of 10. Derivative works and
[3:07] commercial use is fine. On the other
[3:09] hand, it needs a bit of attribution and
[3:12] a little stricter on patent grants. Now,
[3:15] this has the open MDW license. This is
[3:19] basically Apache 2.0 tailored for
[3:22] machine learning weights. This is
[3:25] absolutely fantastic news. Glorious. I
[3:29] think this might be a 9 out of 10, maybe
[3:32] as close to 10 out of 10 as you can get
[3:35] from a big company like Nvidia. Allows
[3:38] basically everything, but less battle
[3:40] tested. And my understanding is that if
[3:42] you sue claiming this model infringes
[3:45] your rights, you lose the license. Huge
[3:48] improvement. Double thumbs up. Thank
[3:50] you. Now, can you run it yourself? Hm.
[3:53] Um, yes and no. Yes, because completely
[3:56] open. Download it. It is yours forever.
[3:59] No limits, no funny business. However,
[4:02] no, because I would love to run it
[4:04] locally, too. But it's huge. 550 billion
[4:09] parameters. You need hundreds of
[4:11] gigabytes of GPU memory for that. This
[4:13] is why I will probably use it on Lambda.
[4:16] Also, 1 million token long context
[4:18] window.
[4:20] Great. Have a larger code base with a
[4:22] bug hiding somewhere. No worries.
[4:25] Massive box. Easy. Okay. How about
[4:28] images and videos? Well, it does not
[4:30] have vision capabilities. Not multimodel
[4:33] text only. Oh man, how much I would love
[4:39] a multimodel version of this. Goodness,
[4:41] please.
[4:43] Okay, and I also had a realization. You
[4:45] don't need one model to do everything.
[4:48] You need a roster of models that cover
[4:51] your use cases. For instance, I can't
[4:54] add vision capabilities to Neatron 3
[4:56] Ultra, but I can bolt Gemma 4 to it with
[5:00] a screwdriver. It's like a seeing eye
[5:02] dog guiding a smarter blind man along.
[5:06] It is hilarious and it kind of works.
[5:09] Kind of. So, we finally have more
[5:12] competition in the open AI model space
[5:14] and that is glorious. So, how does it
[5:17] work? Well, one trick is that it is
[5:19] huge, but not all of it runs at once.
[5:23] 550 billion parameters total, but only
[5:26] about 10% of that is active per token.
[5:30] These are specialist mini brains that
[5:32] are being activated at a time. We call
[5:34] that mixture of experts. But you wise
[5:38] fellow scholars know that already. So
[5:40] what else? Now they also use mambber
[5:43] layers. Why member? Is this like a snake
[5:46] or like the fruity chew? I don't know. I
[5:50] don't even know why I brought this up.
[5:52] So what do these do? Well, traditional
[5:54] AI systems have a bit of a memory
[5:57] problem. They work like a student who
[6:00] constantly rereads the textbook over and
[6:03] over again when they are given a
[6:05] question. But memory is precious. So
[6:08] instead read the book only once and take
[6:11] highly compressed notes. So this kind of
[6:14] memory remembers important details about
[6:17] the conversation. However, it is also
[6:19] smart enough to throw away the filler
[6:22] words. Thus, this system can process
[6:25] massive amounts of data efficiently. It
[6:28] also uses low precision numbers, so you
[6:30] have to do less number crunching when
[6:33] running this. They call it NVFP4. And
[6:36] this doesn't rely on predicting tokens
[6:38] one by one. No, it has multiple heads
[6:42] that draft multiple future tokens at the
[6:45] same time. Once again, many things that
[6:48] make it blazing fast. And we get all of
[6:51] this for free forever. What a time to be
[6:54] alive. Thank you to everyone who worked
[6:57] on this and absolutely everyone
[6:59] everywhere who is working on open-source
[7:01] projects and open models. You are all
[7:04] heroes. And look, this system is great,
[7:07] but it could be tiny. It could be bad,
[7:10] ugly. I don't care. As long as it is
[7:12] open science and open models, it pushes
[7:15] humanity forward. Thank you. What a time
[7:18] to be alive. Here you see me running the
[7:21] full Deepseek AI model through Lambda
[7:24] GPU cloud. 671
[7:28] billion parameters running super fast
[7:30] and super reliably. This is insane. I
[7:34] love it and I use it on a regular basis.
[7:37] Lambda provides you with powerful Nvidia
[7:40] GPUs to run your own chatbots and
[7:43] experiments. Seriously, try it out now
[7:45] at lambda.ai/papers AI/papers
[7:48] or click the link in the description.