---
title: 'NVIDIA''s New Free AI - A Gift To Humanity'
source: 'https://youtube.com/watch?v=zJvN8PDX1is'
video_id: 'zJvN8PDX1is'
date: 2026-06-28
duration_sec: 0
---

# NVIDIA's New Free AI - A Gift To Humanity

> Source: [NVIDIA's New Free AI - A Gift To Humanity](https://youtube.com/watch?v=zJvN8PDX1is)

## Summary

The video reviews Nvidia's Neotron 3 Ultra, a fast and open AI model with strong licensing but mixed coding performance. The author tests it for various tasks, highlighting its strengths in quick tasks and weaknesses in complex coding, and discusses its technical aspects.

### Key Points

- **First Impressions** [0:00] — Neotron 3 Ultra is incredibly fast, but coding experiments fail: a light simulation produces a black screen, and fixing it doesn't work well.
- **Coding Issues** [1:26] — Realtime strategy game attempt results in a black screen again, while Deepseek 4 Flash succeeds with the same prompt. Neotron writes over 1000 lines of code versus the author's 250-line solution.
- **Useful Use Cases** [1:57] — Excels at fixing broken installations, organizing files, and quick experiments. Author finds it useful for everything except challenging coding tasks.
- **Openness and Licensing** [2:28] — Weights and research paper are open. Uses Open MDW license (similar to Apache 2.0), rated 9/10 for openness. Allows commercial use but revokes if you sue claiming infringement.
- **Running Locally** [3:53] — Open and downloadable, but huge (550 billion parameters) requiring hundreds of GB GPU memory. Author uses Lambda for cloud access. 1 million token context window.
- **No Vision Capabilities** [4:28] — Model is text-only, no multimodal abilities. Author suggests combining with other models (e.g., Gemma 4) for vision.
- **Technical Details** [5:09] — Uses mixture of experts (10% active per token), Mambber layers for efficient memory, low precision NVFP4, and multiple heads drafting future tokens simultaneously.
- **Conclusion** [6:48] — Author praises open science and models, thanking contributors. Notes that even if imperfect, open models push humanity forward.

### Conclusion

Neotron 3 Ultra is a fast, open AI model with excellent licensing but falls short in complex coding. It excels at simpler tasks and represents a step forward for open AI, but requires powerful hardware or cloud services to run.

## Transcript

This AI is not Neotron 3 Super. No, this
is Neotron 3 Ultra, Nvidia's newest free
and open AI model, and I've been
delighted, disappointed, and confused by
it. But I think I got it now. You see,
you can look at the benchmarks all you
want, but we are fellow scholars here.
We don't just believe stuff. We test it
for ourselves. That is the way of the
scholar. So, I had an early look at it
and ran some of my experiments day and
night. First impression is that it is
incredibly fast. Blazing fast. Love
that. But then my coding experiments did
not go that well. When I ask it to write
a light simulation program, this is my
original area of research and I get a
black screen. Nothing. When I ask it to
fix it, it does a bunch of things and
same. And then I said, "Okay, let's
debug this by hand." It had some
mistakes. After fixing that, well, we
get something. But maybe it's a scene
that does not work at all. Other even
smaller systems can do this task with
relative ease. And the other thing is,
goodness, it wrote up more than a
thousand lines of code. You don't need
that much. My handwritten solution from
my research is about 250 lines and
renders this scene. Fully open source,
free for everyone, forever. Now, let's
write a realtime strategy game. Yes. Oh,
no.
Black screen again. Almost. We got a
square. But if you ask Deepseek 4 Flash
with the same prompt, you get something
really cool. But not here. So, what is
going on here? Well, I went back and
forth with Nvidia and reported some of
the issues and later there were some
improvements. But still, this kind of
coding is not something I would
personally use this for. So I said, you
know, maybe let's not use this AI. But
then I thought, wait, it is super fast
and probably good at other things. So I
gave it aic things. Fixing broken
installations on my machine from the
terminal, excellent. Whipping up quick
experiments, organizing files,
excellent, super fast. And over time, I
found myself reaching out to it more and
more. And I found it to be useful
basically for everything other than
challenging coding tasks. Now that is
excellent because this might be the
openest AI model ever. Weights are open.
The research paper on how it was made is
open. Training data and recipes are
being released at least for the
redistributable parts. Now that is
pretty crazy. Now hold on to your papers
fellow scholars because it gets even
better. Licensing. Super important
question, very overlooked. We are always
hoping for Apache 2.0. This is the do
whatever you want license. For me, this
is 10 out of 10. Now, Nvidia started
publishing their models under their own
proprietary license, which I would rate
7 out of 10. Derivative works and
commercial use is fine. On the other
hand, it needs a bit of attribution and
a little stricter on patent grants. Now,
this has the open MDW license. This is
basically Apache 2.0 tailored for
machine learning weights. This is
absolutely fantastic news. Glorious. I
think this might be a 9 out of 10, maybe
as close to 10 out of 10 as you can get
from a big company like Nvidia. Allows
basically everything, but less battle
tested. And my understanding is that if
you sue claiming this model infringes
your rights, you lose the license. Huge
improvement. Double thumbs up. Thank
you. Now, can you run it yourself? Hm.
Um, yes and no. Yes, because completely
open. Download it. It is yours forever.
No limits, no funny business. However,
no, because I would love to run it
locally, too. But it's huge. 550 billion
parameters. You need hundreds of
gigabytes of GPU memory for that. This
is why I will probably use it on Lambda.
Also, 1 million token long context
window.
Great. Have a larger code base with a
bug hiding somewhere. No worries.
Massive box. Easy. Okay. How about
images and videos? Well, it does not
have vision capabilities. Not multimodel
text only. Oh man, how much I would love
a multimodel version of this. Goodness,
please.
Okay, and I also had a realization. You
don't need one model to do everything.
You need a roster of models that cover
your use cases. For instance, I can't
add vision capabilities to Neatron 3
Ultra, but I can bolt Gemma 4 to it with
a screwdriver. It's like a seeing eye
dog guiding a smarter blind man along.
It is hilarious and it kind of works.
Kind of. So, we finally have more
competition in the open AI model space
and that is glorious. So, how does it
work? Well, one trick is that it is
huge, but not all of it runs at once.
550 billion parameters total, but only
about 10% of that is active per token.
These are specialist mini brains that
are being activated at a time. We call
that mixture of experts. But you wise
fellow scholars know that already. So
what else? Now they also use mambber
layers. Why member? Is this like a snake
or like the fruity chew? I don't know. I
don't even know why I brought this up.
So what do these do? Well, traditional
AI systems have a bit of a memory
problem. They work like a student who
constantly rereads the textbook over and
over again when they are given a
question. But memory is precious. So
instead read the book only once and take
highly compressed notes. So this kind of
memory remembers important details about
the conversation. However, it is also
smart enough to throw away the filler
words. Thus, this system can process
massive amounts of data efficiently. It
also uses low precision numbers, so you
have to do less number crunching when
running this. They call it NVFP4. And
this doesn't rely on predicting tokens
one by one. No, it has multiple heads
that draft multiple future tokens at the
same time. Once again, many things that
make it blazing fast. And we get all of
this for free forever. What a time to be
alive. Thank you to everyone who worked
on this and absolutely everyone
everywhere who is working on open-source
projects and open models. You are all
heroes. And look, this system is great,
but it could be tiny. It could be bad,
ugly. I don't care. As long as it is
open science and open models, it pushes
humanity forward. Thank you. What a time
to be alive. Here you see me running the
full Deepseek AI model through Lambda
GPU cloud. 671
billion parameters running super fast
and super reliably. This is insane. I
love it and I use it on a regular basis.
Lambda provides you with powerful Nvidia
GPUs to run your own chatbots and
experiments. Seriously, try it out now
at lambda.ai/papers AI/papers
or click the link in the description.
