---
title: 'DeepSeek’s New AI Is A Game Changer'
source: 'https://youtube.com/watch?v=LpXhy2iiaQE'
video_id: 'LpXhy2iiaQE'
date: 2026-06-28
duration_sec: 0
---

# DeepSeek’s New AI Is A Game Changer

> Source: [DeepSeek’s New AI Is A Game Changer](https://youtube.com/watch?v=LpXhy2iiaQE)

## Summary

This video explores a new DeepSeek paper that introduces a technique allowing AI to 'point' at objects in images while reasoning, similar to how humans use their fingers. This approach reduces visual tokens by 90% while matching or surpassing the performance of billion-dollar frontier models. The method, called policy distillation, trains a student model by learning from multiple expert models specialized in different visual tasks.

### Key Points

- **The Problem with Text-Based Visual Reasoning** [0:00] — Traditional AI describes images with words, which is error-prone and computationally expensive. The new technique uses visual primitives like bounding boxes and points, enabling more accurate and faster reasoning.
- **Pointing Like a Human** [1:10] — The AI can point at objects while thinking, like a human using a finger to count. This makes it more accurate and faster, reducing token usage by 90%.
- **Topological Reasoning and Transparency** [1:38] — The technique enables topological reasoning, e.g., tracing a path through a maze or identifying where a crown connects to an octopus, with a transparent thought process.
- **Performance: Matches Frontier Models** [3:02] — The free system matches or beats almost all frontier models on benchmarks, with in-house benchmarks excluded to avoid bias.
- **How It Works: Policy Distillation** [4:23] — The method uses policy distillation: training a student model by learning from multiple expert models, each specialized in different visual tasks (e.g., bounding boxes, maze tracing).
- **Limitations** [5:48] — Limitations include needing a word cue to initiate 'pointy thinking', struggles with thin structures like blades of grass, and limited generalization to completely new scenarios.

## Transcript

Hmm, why does this deep sea quirk exist?
I mean, it adds vision capabilities to
the deep sea AI system, but that's not
new. A lot of other AI systems have
vision capabilities. You just drop an
image here and it works. Even video and
even for open models. So, why do we need
this paper? Well, they did something
incredible here and it is an absolute
game changer. Why? You see, if you ask a
previous technique to count the number
of people in this photo, it will think
something like this. Okay, there are
people on the upper left and a bunch of
stripy guys in two rows. That is kind of
three rows. Some of them are standing,
some of them are sitting.
Ah, it's just so confusing to just count
them up using only words. Two problems
with this one. One, this is prone to
error. Two, you have to think a lot.
Just describing stuff. Why? What would
we, humans, do? Of course, we would use
our finger and would point at the image.
One, two, three, and so on.
Done. Don't describe images like a poet.
Point like a human. Now, that is exactly
what this new technique does. It allows
an AI system to point at things while
thinking and it is absolutely brilliant.
This makes it more accurate and it also
makes it faster. In a world where
hardware and tokens cost a fortune, it
is fantastic to have something that
gives us results faster and cheaper.
But, it turns out thinking with visual
primitives has even more advantages. It
can also do topological reasoning. For
instance, if you give it a maze with a
start and end point, you not only get a
correct answer to your questions, but
you can also trace back the whole
thought process visually.
I love that. Also, here you can ask
where the crown connects and look.
To the octopus. Yeah, it answers
correctly, but you can also see how it
came to that conclusion. Now, make no
mistake. These are simple examples. I'll
show you in a moment if it is as good as
these billion-dollar frontier models.
Also, if something goes wrong, this will
make it easier to find mistakes and fix
them to create an even better model.
This puts us one step closer to AI
systems we can actually understand that
do not just give us a soup of numbers.
So good. So, how good is it? Well, hold
on to your papers, fellow scholars, and
I dropped my papers here. Look, it needs
about 90% fewer visual tokens than most
frontier models. Now, wait, wait, wait.
It doesn't matter how little you think
if you just say three as an answer
without thinking. Thinking time doesn't
matter if it is incorrect. So, how
accurate is it?
Are you kidding me? This free system
matches or beats almost everything. And
once again, we are talking about this,
which is free, going up against
billion-dollar systems here. Wow. Now,
we are fellow scholars here, so at this
point we ask, are these results real?
You know, benchmarks are being gamed
left and right. Now, here is what many
people missed. Average over seven
benchmarks, but in-house benchmarks
excluded.
That is the key. They did not rig their
own benchmarks. You know why? Well,
everyone loves it because it's one of
the oldest tricks in the book. If you
are not performing well, just create a
new benchmark that fits you. Let's make
a YUNUS benchmark. You will always be
world first in being you. And this is
not the case here. Amazing. This is free
and open research. So, this technique
can potentially be added to many
existing models, including free ones.
This paper does not have a model
attached that I know of. It describes
the concept of how to do it in detail.
It's a blueprint, if you will. More
intelligence for all of us for free.
The world needs more papers like this.
Love it. But, this all sounds like
magic. How did they do this? Well, look,
this is their own policy distillation
objective. We need exactly this. You
see, normally, we have a bunch of expert
AI models. Now, at the risk of
simplifying things, imagine that one of
these guys is great at boxes. Nobody
does boxes better than this guy. The
other one is great at tracing mazes with
points. But, that's not what we want.
What we want is one AI that can do all
of these things. And that is where this
comes into play. We train a student
model that learns from all of these
teachers. It says what it would try to
do, then the teachers say, "Okay, here's
what I would have done." Do this enough
and the student will be pretty good at
all of these different kinds of visual
thinking. This is why they used the name
distilling the knowledge of a bunch of
expert teachers into a student. So,
where does this put us? Okay, so here's
what I think. Dear fellow scholars, this
is Two Minute Papers with Dr. Károly
Zsolnai-Fehér. You know, we always
thought that we would make AI systems
smarter by giving it higher resolution
images to train on. More pixels, more
smarts. It turns out not true.
Sometimes, that's not what we need at
all. Deep Seek just cut down those
visual tokens by 90% and still beat
frontier models. Less is more. Now, is
this perfect? All problems solved? No.
Limitations. One, the AI does not
automatically do this kind of pointy
thinking. It needs a word as a cue for
this kind of thinking. Two, bounding
boxes are nice for people, but if you
are counting blades of grass or strands
of hair, now, in this case, not having
those in very high resolution is a
problem.
>> [laughter]
>> Yep, once again, the two-minute papers
special, thin structures. Every time,
man. It's so painful. And three, this
kind of topological reasoning does not
generalize as well as we'd like. It
might not be as robust when you show it
something completely new. So, careful
with the misleading media headlines,
careful with the hype everywhere. There
is still plenty to improve here. But, I
feel that this might be a breakthrough.
And that makes it
maybe the third one this month in AI
research. What a time to be alive. Also,
with large AI companies going to IPO,
they are about to become ventures that
look to maximize their profits. More
money needed every quarter. So, it's
going to become more and more crucial to
own your own AI systems with free open
weights models. And this one makes them
better.
Love it. Here you see me running the
full DeepSeek AI model through Lambda
GPU Cloud. 671
billion parameters running super fast
and super reliably. This is insane. I
love it and I use it on a regular basis.
Lambda provides you with powerful Nvidia
GPUs to run your own chatbots and
experiments. Seriously, try it out now
at lambda.ai/papers
or click the link in the description.
