TubeSum ← Transcribe a video

NVIDIA's New Free AI - A Gift To Humanity

Transcribed Jun 28, 2026 Watch on YouTube ↗
Intermediate 3 min read For: AI enthusiasts and developers interested in open-source models and practical AI usage.
67.0K
Views
2.5K
Likes
175
Comments
64
Dislikes
3.9%
📈 Moderate

AI Summary

The video reviews Nvidia's Neotron 3 Ultra, a fast and open AI model with strong licensing but mixed coding performance. The author tests it for various tasks, highlighting its strengths in quick tasks and weaknesses in complex coding, and discusses its technical aspects.

[0:00]
First Impressions

Neotron 3 Ultra is incredibly fast, but coding experiments fail: a light simulation produces a black screen, and fixing it doesn't work well.

[1:26]
Coding Issues

Realtime strategy game attempt results in a black screen again, while Deepseek 4 Flash succeeds with the same prompt. Neotron writes over 1000 lines of code versus the author's 250-line solution.

[1:57]
Useful Use Cases

Excels at fixing broken installations, organizing files, and quick experiments. Author finds it useful for everything except challenging coding tasks.

[2:28]
Openness and Licensing

Weights and research paper are open. Uses Open MDW license (similar to Apache 2.0), rated 9/10 for openness. Allows commercial use but revokes if you sue claiming infringement.

[3:53]
Running Locally

Open and downloadable, but huge (550 billion parameters) requiring hundreds of GB GPU memory. Author uses Lambda for cloud access. 1 million token context window.

[4:28]
No Vision Capabilities

Model is text-only, no multimodal abilities. Author suggests combining with other models (e.g., Gemma 4) for vision.

[5:09]
Technical Details

Uses mixture of experts (10% active per token), Mambber layers for efficient memory, low precision NVFP4, and multiple heads drafting future tokens simultaneously.

[6:48]
Conclusion

Author praises open science and models, thanking contributors. Notes that even if imperfect, open models push humanity forward.

Neotron 3 Ultra is a fast, open AI model with excellent licensing but falls short in complex coding. It excels at simpler tasks and represents a step forward for open AI, but requires powerful hardware or cloud services to run.

Clickbait Check

70% Legit

"The title 'NVIDIA's New Free AI - A Gift To Humanity' is slightly exaggerated: the model is free and open, but its coding shortcomings mean it's not a universal gift."

💡 Key Takeaways

💡

Speed Advantage

Demonstrates that the model's primary strength is speed, which makes it useful for quick tasks like fixing installations or organizing files.

2:00
📊

Open Licensing

The Open MDW license is a major milestone for open AI, offering freedom similar to Apache 2.0 but tailored for ML weights.

2:28
🔧

Mixture of Experts Efficiency

Explains how the model uses only 10% of its parameters per token, making it efficient despite its huge size.

5:23
⚖️

Importance of Open Models

Emphasizes that open science and models, regardless of imperfections, drive progress in humanity.

6:48

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

NVIDIA's Free AI Fails My Coding Test

60s

Shows a major company's free AI unexpectedly failing at a simple coding task, sparking curiosity and debate.

▶ Play Clip

This Free AI is Better Than You Think (Open License!)

50s

Reveals the AI's true strengths in non-coding tasks and its incredibly permissive open-source license, exciting the AI community.

▶ Play Clip

Combine AI Models Like a Pro (Seeing Eye Dog Trick)

30s

The clever analogy of using one AI to guide another offers a novel and shareable insight for AI users.

▶ Play Clip

How NVIDIA's AI Achieves Blazing Fast Speed

60s

Breaks down complex AI architecture (Mixture of Experts, Mamba) with a simple analogy, educating viewers in an engaging way.

▶ Play Clip

Why Open Source AI is a Gift to Humanity

30s

An inspiring message about the importance of open science and models, appealing to the viewer's sense of community and progress.

▶ Play Clip

[00:00] This AI is not Neotron 3 Super. No, this

[00:05] is Neotron 3 Ultra, Nvidia's newest free

[00:09] and open AI model, and I've been

[00:11] delighted, disappointed, and confused by

[00:14] it. But I think I got it now. You see,

[00:17] you can look at the benchmarks all you

[00:19] want, but we are fellow scholars here.

[00:21] We don't just believe stuff. We test it

[00:24] for ourselves. That is the way of the

[00:27] scholar. So, I had an early look at it

[00:29] and ran some of my experiments day and

[00:32] night. First impression is that it is

[00:34] incredibly fast. Blazing fast. Love

[00:38] that. But then my coding experiments did

[00:40] not go that well. When I ask it to write

[00:43] a light simulation program, this is my

[00:45] original area of research and I get a

[00:47] black screen. Nothing. When I ask it to

[00:50] fix it, it does a bunch of things and

[00:52] same. And then I said, "Okay, let's

[00:55] debug this by hand." It had some

[00:57] mistakes. After fixing that, well, we

[01:00] get something. But maybe it's a scene

[01:02] that does not work at all. Other even

[01:05] smaller systems can do this task with

[01:07] relative ease. And the other thing is,

[01:09] goodness, it wrote up more than a

[01:12] thousand lines of code. You don't need

[01:15] that much. My handwritten solution from

[01:17] my research is about 250 lines and

[01:20] renders this scene. Fully open source,

[01:23] free for everyone, forever. Now, let's

[01:26] write a realtime strategy game. Yes. Oh,

[01:29] no.

[01:31] Black screen again. Almost. We got a

[01:33] square. But if you ask Deepseek 4 Flash

[01:36] with the same prompt, you get something

[01:39] really cool. But not here. So, what is

[01:41] going on here? Well, I went back and

[01:43] forth with Nvidia and reported some of

[01:46] the issues and later there were some

[01:48] improvements. But still, this kind of

[01:50] coding is not something I would

[01:52] personally use this for. So I said, you

[01:54] know, maybe let's not use this AI. But

[01:57] then I thought, wait, it is super fast

[02:00] and probably good at other things. So I

[02:02] gave it aic things. Fixing broken

[02:05] installations on my machine from the

[02:07] terminal, excellent. Whipping up quick

[02:10] experiments, organizing files,

[02:12] excellent, super fast. And over time, I

[02:15] found myself reaching out to it more and

[02:18] more. And I found it to be useful

[02:21] basically for everything other than

[02:23] challenging coding tasks. Now that is

[02:26] excellent because this might be the

[02:28] openest AI model ever. Weights are open.

[02:31] The research paper on how it was made is

[02:34] open. Training data and recipes are

[02:36] being released at least for the

[02:38] redistributable parts. Now that is

[02:40] pretty crazy. Now hold on to your papers

[02:43] fellow scholars because it gets even

[02:45] better. Licensing. Super important

[02:48] question, very overlooked. We are always

[02:51] hoping for Apache 2.0. This is the do

[02:54] whatever you want license. For me, this

[02:57] is 10 out of 10. Now, Nvidia started

[03:00] publishing their models under their own

[03:02] proprietary license, which I would rate

[03:05] 7 out of 10. Derivative works and

[03:07] commercial use is fine. On the other

[03:09] hand, it needs a bit of attribution and

[03:12] a little stricter on patent grants. Now,

[03:15] this has the open MDW license. This is

[03:19] basically Apache 2.0 tailored for

[03:22] machine learning weights. This is

[03:25] absolutely fantastic news. Glorious. I

[03:29] think this might be a 9 out of 10, maybe

[03:32] as close to 10 out of 10 as you can get

[03:35] from a big company like Nvidia. Allows

[03:38] basically everything, but less battle

[03:40] tested. And my understanding is that if

[03:42] you sue claiming this model infringes

[03:45] your rights, you lose the license. Huge

[03:48] improvement. Double thumbs up. Thank

[03:50] you. Now, can you run it yourself? Hm.

[03:53] Um, yes and no. Yes, because completely

[03:56] open. Download it. It is yours forever.

[03:59] No limits, no funny business. However,

[04:02] no, because I would love to run it

[04:04] locally, too. But it's huge. 550 billion

[04:09] parameters. You need hundreds of

[04:11] gigabytes of GPU memory for that. This

[04:13] is why I will probably use it on Lambda.

[04:16] Also, 1 million token long context

[04:18] window.

[04:20] Great. Have a larger code base with a

[04:22] bug hiding somewhere. No worries.

[04:25] Massive box. Easy. Okay. How about

[04:28] images and videos? Well, it does not

[04:30] have vision capabilities. Not multimodel

[04:33] text only. Oh man, how much I would love

[04:39] a multimodel version of this. Goodness,

[04:41] please.

[04:43] Okay, and I also had a realization. You

[04:45] don't need one model to do everything.

[04:48] You need a roster of models that cover

[04:51] your use cases. For instance, I can't

[04:54] add vision capabilities to Neatron 3

[04:56] Ultra, but I can bolt Gemma 4 to it with

[05:00] a screwdriver. It's like a seeing eye

[05:02] dog guiding a smarter blind man along.

[05:06] It is hilarious and it kind of works.

[05:09] Kind of. So, we finally have more

[05:12] competition in the open AI model space

[05:14] and that is glorious. So, how does it

[05:17] work? Well, one trick is that it is

[05:19] huge, but not all of it runs at once.

[05:23] 550 billion parameters total, but only

[05:26] about 10% of that is active per token.

[05:30] These are specialist mini brains that

[05:32] are being activated at a time. We call

[05:34] that mixture of experts. But you wise

[05:38] fellow scholars know that already. So

[05:40] what else? Now they also use mambber

[05:43] layers. Why member? Is this like a snake

[05:46] or like the fruity chew? I don't know. I

[05:50] don't even know why I brought this up.

[05:52] So what do these do? Well, traditional

[05:54] AI systems have a bit of a memory

[05:57] problem. They work like a student who

[06:00] constantly rereads the textbook over and

[06:03] over again when they are given a

[06:05] question. But memory is precious. So

[06:08] instead read the book only once and take

[06:11] highly compressed notes. So this kind of

[06:14] memory remembers important details about

[06:17] the conversation. However, it is also

[06:19] smart enough to throw away the filler

[06:22] words. Thus, this system can process

[06:25] massive amounts of data efficiently. It

[06:28] also uses low precision numbers, so you

[06:30] have to do less number crunching when

[06:33] running this. They call it NVFP4. And

[06:36] this doesn't rely on predicting tokens

[06:38] one by one. No, it has multiple heads

[06:42] that draft multiple future tokens at the

[06:45] same time. Once again, many things that

[06:48] make it blazing fast. And we get all of

[06:51] this for free forever. What a time to be

[06:54] alive. Thank you to everyone who worked

[06:57] on this and absolutely everyone

[06:59] everywhere who is working on open-source

[07:01] projects and open models. You are all

[07:04] heroes. And look, this system is great,

[07:07] but it could be tiny. It could be bad,

[07:10] ugly. I don't care. As long as it is

[07:12] open science and open models, it pushes

[07:15] humanity forward. Thank you. What a time

[07:18] to be alive. Here you see me running the

[07:21] full Deepseek AI model through Lambda

[07:24] GPU cloud. 671

[07:28] billion parameters running super fast

[07:30] and super reliably. This is insane. I

[07:34] love it and I use it on a regular basis.

[07:37] Lambda provides you with powerful Nvidia

[07:40] GPUs to run your own chatbots and

[07:43] experiments. Seriously, try it out now

[07:45] at lambda.ai/papers AI/papers

[07:48] or click the link in the description.

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.