---
title: 'My Honest Thoughts about Deepseek'
source: 'https://youtube.com/watch?v=UV1WDNe4J5w'
video_id: 'UV1WDNe4J5w'
date: 2026-07-28
duration_sec: 1041
---

# My Honest Thoughts about Deepseek

> Source: [My Honest Thoughts about Deepseek](https://youtube.com/watch?v=UV1WDNe4J5w)

## Summary

DeepSeek released its latest flagship model, V4, which is powerful, open-source, and cost-efficient. This development challenges America's lead in AI, not just because China caught up, but due to the economic and geopolitical implications of a nearly frontier-level model at a fraction of the cost.

### Key Points

- **DeepSeek V4 release** [00:00] — DeepSeek dropped V4, a massive, powerful, open-source model at a fraction of the cost, potentially ending America's AI lead.
- **DeepSeek's history** [01:27] — 18 months ago, DeepSeek R1 changed the world as an open-source thinking model, causing a stock market drop and showing efficiency.
- **Model specs** [02:52] — V4 comes in Pro and Flash versions, with 1.6 trillion total parameters (49B active) and 284B total (13B active) respectively, trained on 33 trillion tokens.
- **Benchmark performance** [04:22] — V4 is slightly behind top US models like Opus 4.7 and GPT 5.5 but competitive, especially considering its much lower cost.
- **Cost advantage** [05:18] — DeepSeek V4 is significantly cheaper than US frontier models, making it attractive for most enterprise use cases.
- **Export controls** [07:18] — Export controls limit China's access to top GPUs, but China innovates algorithmically, achieving frontier results with nerfed hardware.
- **Distillation attacks** [09:25] — Anthropic reported Chinese labs stealing data via distillation, but DeepSeek's 150,000 exchanges are minimal and their open-source paper suggests genuine innovation.
- **Economic threat** [12:52] — US companies may adopt cheaper Chinese open-source models, risking economic and security dependencies.
- **US response needed** [16:16] — The US should embrace open-source and improve efficiency to compete with DeepSeek's cost advantage.

### Conclusion

DeepSeek V4's combination of near-frontier performance and drastically lower cost poses a significant challenge to US AI dominance, potentially shifting enterprise adoption towards Chinese open-source models and requiring a strategic US response.

## Transcript

Deepseek just dropped their latest
flagship model V4. It's massive,
powerful, open-source, and a fraction of
the cost. And it might be the model that
ends America's lead in artificial
intelligence. Not because China caught
up, but because of what happens next.
So, usually at this point, I would do a
model overview. I would tell you about
the model. I would show you the
benchmarks. I would test it and show you
what I think. But as I looked at it, I
realized there was actually a much
bigger story here. America has the best
chips. It has the most money flowing
into AI labs. Yet, China was able to
release a frontier level model that
matches the best of them. Completely
open- source, completely open weights,
and at a fraction of the cost and
resources. They are literally working
with nerfed Nvidia GPUs. that's not
supposed to be possible and the fallout
will be bigger than people realize. And
so today, I'm going to tell you what's
special about Deep Seek V4, why it
matters, and what it means for the
world. But first, if you love seeing
videos about AI models, go ahead and hit
the like and subscribe button. I want to
reach as many people as possible and
teach them about artificial
intelligence, get them excited about it.
And hitting the like and subscribe
button helps the channel more than you
realize. So, thank you for doing that in
advance. Okay, so what is so special
about DeepSeek V4? First, let me tell
you about who DeepSEK actually is. If
you remember, about 18 months ago, they
dropped a model that literally changed
the world. It was called Deepseek R1,
and it was an open-source open weights
model that could think. Remember back
then, models that could think were only
developed by the closed source AI labs
in the United States. They dropped R1,
showed the world that other countries
and open-source labs could develop
models that were at the frontier, and
the stock market dropped 20% pretty much
overnight. And what was really special
about Deep Seek Rar1 was how efficient
they were able to train it a fraction of
the price than the hundreds of billions
of dollars paid by the Frontier USI
labs. And so people thought, "Wow, if
they can train it at a fraction of the
price, then maybe Nvidia GPUs are not
actually worth that much." But it turns
out they were very wrong about that.
When things get cheaper in price, we
actually use a lot more of it. That's
called Javon's Paradox. Okay. And now
fast forward to today. Deepseek is back
with V4. And they wrote an incredibly
thorough white paper explaining how they
did all of it, including being super
honest about some of their failures.
much more honest than any of the closed
source AI labs in the United States. All
right, so here's the post. It came out
late last night. Let me tell you about
it. Deepseek V4 preview is here and it
comes in two flavors, pro and flash.
First, it has a million token context
length. That is amazing because that is
the frontier. So immediately check that
box. They are at the frontier of context
limits. Next, it is a 1.6 6 trillion
total parameter model with 49 billion
active parameters. This is called
mixture of experts. It basically allows
you to have a massive model but run only
parts small parts of the model that are
specific to the question or the prompt
that you're giving it. They also have V4
flash which is their workhorse model.
It's going to be smaller, it's going to
be faster, and it's going to be much
cheaper. This is 284
billion total parameters with 13 billion
active. And if we look at this
screenshot, we can see both of them were
trained with about 33 trillion tokens of
training data. So some of the
characteristics of these models, they
have enhanced agentic capabilities. It
is comparable to the state-of-the-art
agentic coding models like Opus 47 and
GPT 5.5. Literally the models that were
just released in the last week from
anthropic and open AI. It has rich world
knowledge and worldclass reasoning beats
all current open models in math stem
coding rivaling top closed source
models. All right. So let me show you
some of the major benchmarks. Here we
have MMLU Pro which is knowledge and
reasoning. We can see here in the dark
green bar, this is DeepSk with orange
being Opus 46, purple being GPT54, and
then in the stripes, those are the new
models, Opus 47, and GPT55. But what
we're seeing is although it is slightly
behind here, it's right up there. Okay?
And remember, it is a fraction of the
price here. GPQA diamond, same thing.
Sweetbench verified. And basically what
you're seeing across the board is yes it
is behind but just a little bit. And
that's the real story here. Most use
cases, the vast majority of use cases do
not require the absolute frontier level
intelligence. And the fact that DeepSeek
is so much more efficient and so much
cheaper is actually the problem for the
United States. And so let's talk about
cost because that is really what we need
to be scared of. And if you're not sure
why, I'm going to explain. Let's look at
the cost first. This is AI model price
versus performance. On the Y ais, we
have intelligence. Just think the higher
the smarter. On the Xaxis, it is the
price. The more to the left it is, the
cheaper it is. Cheaper is better. And so
what you want is to be up here. in this
top left. You want to be as cheap as
possible and as intelligent as possible.
And so what do we see? We see GBT 5.5,
which was just released. At the very
top, we have Opus 4.7 right next to it.
And I'm just measuring Intelligence
right now. GBT 5.4 extra high right over
here. And then we have Deepseek V4 Pro.
a little bit behind, a little bit lower
on intelligence, but much much cheaper.
And then look at Flash down here.
Certainly a big drop in intelligence.
Still really good, but this is an
absolute workhorse model price right
here. This is pennies per million
tokens. Now, I want to show you how the
rivalry between the US Frontier Labs and
Chinese Frontier Labs has gone over the
last few years. So we have GPT4 that
came out in May 2023 and we had this
massive gap. This is the ELO score in
Arena and then Quen then GLM4 and at
this point right after 01 preview came
out. Remember 01 was the first thinking
model right after that just a few months
deepseeek R1 changed the world and
closed the gap almost completely. The US
labs did shoot ahead and there's been
this back and forth eb and flow between
them. Every time the US shoots ahead,
Chinese open source catches up. They
have always been behind, but that might
not always be the case. And so that
brings us to a geopolitical question.
Are export controls actually working?
Export controls basically means the US,
specifically Nvidia, is not allowed to
sell its top chips, its best GB300 and a
few others to China directly. Now, there
is a lot of rumors that China is going
around those export controls and
importing them into other countries and
smuggling them. And there's an entire
story there. We're not going to get into
that today, though. But are export
controls working? assuming that they are
actually enforced. Well, the answer is
kind of yes and kind of no. Export
controls are working because China
doesn't have the same compute resources
that the United States has. This is just
a fact. Even if they're able to smuggle
in chips, it is difficult and they
certainly don't have as much compute as
we have in the United States. But if
they did, imagine what they'd be able to
do. Because on the flip side, the export
controls kind of aren't working because
they are innovating on the algorithm
side. They are coming up with incredible
algorithmic unlocks that make training
and running inference of these models
including DeepSeek incredibly efficient.
And so even using nerfed GPUs, even
using Chinese native GPUs, they're still
able to create a frontier level model.
And in fact, Nvidia, specifically
Jensen, has made arguments for selling
our top GPUs to them. China is going to
be developing and producing their own AI
chips. They should be built on American
technology. And that argument is
actually why Deepseek V4 is actually
such a big deal and such a big threat to
the US economy. But just the flip side
to it, they're going to make their own
chips. They're going to make their own
incredible models and they are going to
be very attractive to US companies and
our allies. But more on that in a
minute. All right, I want to talk about
distillation hacking cuz it's all
related. Just a few weeks ago, Anthropic
put out a report basically saying they
have proof that the top Chinese AI labs
have been distillation attacking them
for their clawed model. And what does
that actually mean? The simplest way to
explain what a distillation attack is is
the Chinese AI labs are essentially
trying to steal the data from Claude and
from chat GPT. They're asking it
questions, getting the answers, and then
using those questionans answer pairs to
train their own models. Those
questionans answer pairs are everything.
That's the IP of companies like
Anthropic and OpenAI. And just
yesterday, the US government put out a
statement on distillation attacks. This
is director Michael Kratzios. The US has
evidence that foreign entities primarily
in China are running industrialcale
distillation campaigns to steal American
AI. We will be taking action to protect
American innovation. Now, this was
already reported by Anthropic a few
weeks ago. So, this is not really new
news, but the US government actually
saying yes, it's happening is the new
part of it. And I'm going to explain why
this ties into this overall story in a
moment. These foreign entities are using
tens of thousands of proxies and
jailbreaking techniques and coordinated
campaigns to systematically extract
American breakthroughs. But here's the
thing. If you look at Enthropics report,
the Chinese labs and specifically
DeepSeek didn't really steal all that
much data. And there is actually an
argument that they weren't stealing at
all. Maybe it's against the terms of
service, but a lot of it can be
explained by simple benchmark
comparisons. If you're a Frontier Lab
and you want to know how well does my
model do against my competitor model,
well, the only way to know is to run
benchmarks against both. And those
benchmarks look exactly the same as a
distillation attack. All right, so this
is the report from Anthropic. I just
want to very briefly show one thing. The
scale of Deep Seek's distillation attack
is just 150,000 exchanges. That is not
much. Now, Moonshot, the company behind
Kimmy, had 3.4 4 million and Miniax has
13 million. So certainly Deepseek of the
Chinese labs have been doing this quote
unquote dissolation attack far less than
the other labs. And 150,000 exchanges is
not really enough to explain the level
of quality that DeepS has been able to
achieve. And then you pair that with the
fact that they've open sourced the whole
thing. They have an incredibly detailed
and thorough white paper that explains
exactly how they were able to achieve
it. It just doesn't mesh. And so back to
our export controls actually working
well. Twitter user Jukcon pointed out
something very interesting in the report
because of course like I said DeepSeek
put out a very thorough report. It says
due to constraints in high-end compute
capacity, the current service capacity
for Pro is very limited. After the 950
super nodes are launched at scale in the
second half of this year, the price of
Pro is expected to be reduced
significantly. So they are very compute
constrained. They were able to bake and
produce this model, but they can't even
serve it in the most optimized way. And
they're also charging more than they
would have otherwise. So the price is
going to continue to drop and the price
is what I want to focus on now. So why
is the price and efficiency of Deep Seek
V4 such a big deal? Yes, it is nearly
state-of-the-art. Nearly, not quite. It
is almost as good as the top models Opus
47 and GPT 5.5. But here's the thing, it
doesn't need to be as good. And just
being nearly as good is good enough for
almost everybody including enterprise
companies in the United States. And that
is what matters. Imagine you are a CEO
of a company in the United States or one
of our ally countries and you're looking
at Opus 47. You're looking at GPT 5.5
and you're looking at the costs and you
see GPT 5.5 is $30 per million output
tokens. You see, Opus 47 is similarly
priced. And then you look at DeepSeek
and it can accomplish all of your use
cases because you're not doing frontier
scientific research. You're not trying
to crack some of the hardest coding
problems in the world. You have a
business and you're trying to run your
business. And you look at the price and
it is literally a fraction. And you get
to control it more precisely. It's open
source. You can fine-tune it all you
like. You can make it exactly how you
like, host it how you like, and your
bill will be a fraction of the size it
would be otherwise. The calculus that
these CEOs are making becomes very
obvious. Why would you pay so much more
for a US frontier lab to serve you their
model over an open-source Chinese model?
And that's where the problem comes in
because more and more US and our ally
countries enterprise companies are going
to think about this and make the
decision to build on top of Chinese
opensource technology. And that's the
big argument. Remember Jensen just had
the argument that hey China is going to
be building their own chips. They're
going to be building their own models.
They might as well be built on US chips.
Well, the same argument is on the flip
side with US companies building on top
of Chinese open- source models. That is
a big security risk for the United
States because if Chinese companies
decide to change their architecture or
cut us off suddenly, we're in a really
bad spot. And so, let's think about
this. We have trillions of dollars
pouring into the AI industry in the
United States. We have infrastructure
buildout happening more quickly than any
infrastructure buildout in history. So
if you have all of this investment that
requires a return and all of a sudden
we're not getting that return, there is
the potential for the US economy to
collapse, especially because we are
betting so heavily on artificial
intelligence being the future of our
economy. And then think about
culturally. Think about how social media
changed the world and social media came
from the United States. We were able to
control the narrative in a lot of
places. Now flip that on its head.
Imagine we're all built on Chinese
models and they're dictating what the
models are able to say and what they're
not able to say. These are big questions
that we're going to have to grapple with
if US companies decide to build their AI
strategy on top of Chinese open- source
models and they are looking very
attractive right now. All right, so
where do we go from here? Well, I think
there needs to be two big initiatives in
the United States. Number one, we need
to go much harder on open source. The
frontier labs in the US are not open-
source friendly for the most part. maybe
with the exception of Google, but Google
is building small open- source models,
not the same level and not the same
capability as a DeepSk V4. And then we
also need to work on efficiency even if
we are to maintain closed source and
they're being served by OpenAI and
anthropic. They need to get much cheaper
much more quickly because US enterprise
companies need to look at these
different models and it needs to make
sense costwise. That's going to enable
the entire world to build on top of US
artificial intelligence. So, if DeepSeek
is doing everything right, Anthropic
might be doing everything wrong, at
least lately. I made a video about it,
so check out the video on the screen
right now. People say I went a little
bit too hard on them, but I've been
really frustrated.