TubeSum ← Transcribe a video

My Honest Thoughts about Deepseek thumbnail

My Honest Thoughts about Deepseek

0h 17m video Published Apr 25, 2026 Transcribed Jul 28, 2026 Matthew Berman

Matthew Berman

Intermediate 4 min read For: Tech enthusiasts, AI professionals, and business leaders interested in AI geopolitics.

AI Trust Score 90/100

✅ Highly Legit

"Title accurately reflects the video's honest analysis of DeepSeek V4's impact."

AI Summary

DeepSeek released its latest flagship model, V4, which is powerful, open-source, and cost-efficient. This development challenges America's lead in AI, not just because China caught up, but due to the economic and geopolitical implications of a nearly frontier-level model at a fraction of the cost.

Chapters

1 Introduction and Context 00:00 2 DeepSeek's History and Model Specs 01:27 3 Benchmarks and Cost Analysis 04:22 4 Geopolitical Implications and Distillation Attacks 07:18 5 Economic Threat and US Response 12:52

[00:00]

DeepSeek V4 release

DeepSeek dropped V4, a massive, powerful, open-source model at a fraction of the cost, potentially ending America's AI lead.

[01:27]

DeepSeek's history

18 months ago, DeepSeek R1 changed the world as an open-source thinking model, causing a stock market drop and showing efficiency.

[02:52]

Model specs

V4 comes in Pro and Flash versions, with 1.6 trillion total parameters (49B active) and 284B total (13B active) respectively, trained on 33 trillion tokens.

[04:22]

Benchmark performance

V4 is slightly behind top US models like Opus 4.7 and GPT 5.5 but competitive, especially considering its much lower cost.

[05:18]

Cost advantage

DeepSeek V4 is significantly cheaper than US frontier models, making it attractive for most enterprise use cases.

[07:18]

Export controls

Export controls limit China's access to top GPUs, but China innovates algorithmically, achieving frontier results with nerfed hardware.

[09:25]

Distillation attacks

Anthropic reported Chinese labs stealing data via distillation, but DeepSeek's 150,000 exchanges are minimal and their open-source paper suggests genuine innovation.

[12:52]

Economic threat

US companies may adopt cheaper Chinese open-source models, risking economic and security dependencies.

[16:16]

US response needed

The US should embrace open-source and improve efficiency to compete with DeepSeek's cost advantage.

DeepSeek V4's combination of near-frontier performance and drastically lower cost poses a significant challenge to US AI dominance, potentially shifting enterprise adoption towards Chinese open-source models and requiring a strategic US response.

Mentioned in this Video

DeepSeek V4

tool

DeepSeek R1

tool

Claude (Anthropic)

tool

ChatGPT (OpenAI)

tool

Jensen Huang

person

Michael Kratsios

person

Jukcon (Twitter user)

person

Study Flashcards (8)

What is the total parameter count and active parameter count of DeepSeek V4 Pro?

easy Click to reveal answer

1.6 trillion total parameters with 49 billion active parameters.

03:15

What is the context length of DeepSeek V4?

easy Click to reveal answer

1 million tokens.

03:01

What is Jevons Paradox?

medium Click to reveal answer

When things get cheaper, we use a lot more of them.

02:31

How many exchanges did DeepSeek reportedly use in distillation attacks according to Anthropic?

medium Click to reveal answer

150,000 exchanges.

11:27

What are the two flavors of DeepSeek V4?

easy Click to reveal answer

Pro and Flash.

02:58

What is the training data size for DeepSeek V4?

medium Click to reveal answer

33 trillion tokens.

03:55

What is the price of GPT 5.5 per million output tokens?

hard Click to reveal answer

$30 per million output tokens.

13:41

What is the active parameter count for DeepSeek V4 Flash?

medium Click to reveal answer

13 billion active parameters.

03:46

💡 Key Takeaways

💡

DeepSeek V4 release

Highlights the launch of a model that could shift AI dominance.

⚖️

Jevons Paradox

Explains why cheaper AI leads to increased usage, not reduced demand.

02:31

💡

Cost advantage

Emphasizes the economic threat of DeepSeek's low cost to US AI labs.

05:18

💡

Export controls dilemma

Shows that despite chip restrictions, China innovates algorithmically.

07:18

💡

Enterprise adoption risk

Warns that US companies may choose cheaper Chinese models, creating dependency.

12:52

Full Transcript

Download .txt Download .md

[00:00] Deepseek just dropped their latest

[00:02] flagship model V4. It's massive,

[00:06] powerful, open-source, and a fraction of

[00:10] the cost. And it might be the model that

[00:12] ends America's lead in artificial

[00:15] intelligence. Not because China caught

[00:17] up, but because of what happens next.

[00:20] So, usually at this point, I would do a

[00:22] model overview. I would tell you about

[00:23] the model. I would show you the

[00:25] benchmarks. I would test it and show you

[00:27] what I think. But as I looked at it, I

[00:29] realized there was actually a much

[00:31] bigger story here. America has the best

[00:33] chips. It has the most money flowing

[00:36] into AI labs. Yet, China was able to

[00:40] release a frontier level model that

[00:43] matches the best of them. Completely

[00:45] open- source, completely open weights,

[00:48] and at a fraction of the cost and

[00:50] resources. They are literally working

[00:52] with nerfed Nvidia GPUs. that's not

[00:55] supposed to be possible and the fallout

[00:57] will be bigger than people realize. And

[00:59] so today, I'm going to tell you what's

[01:01] special about Deep Seek V4, why it

[01:03] matters, and what it means for the

[01:06] world. But first, if you love seeing

[01:08] videos about AI models, go ahead and hit

[01:11] the like and subscribe button. I want to

[01:14] reach as many people as possible and

[01:15] teach them about artificial

[01:17] intelligence, get them excited about it.

[01:19] And hitting the like and subscribe

[01:20] button helps the channel more than you

[01:23] realize. So, thank you for doing that in

[01:25] advance. Okay, so what is so special

[01:27] about DeepSeek V4? First, let me tell

[01:29] you about who DeepSEK actually is. If

[01:32] you remember, about 18 months ago, they

[01:34] dropped a model that literally changed

[01:37] the world. It was called Deepseek R1,

[01:40] and it was an open-source open weights

[01:42] model that could think. Remember back

[01:45] then, models that could think were only

[01:48] developed by the closed source AI labs

[01:50] in the United States. They dropped R1,

[01:53] showed the world that other countries

[01:56] and open-source labs could develop

[01:59] models that were at the frontier, and

[02:01] the stock market dropped 20% pretty much

[02:04] overnight. And what was really special

[02:06] about Deep Seek Rar1 was how efficient

[02:09] they were able to train it a fraction of

[02:11] the price than the hundreds of billions

[02:13] of dollars paid by the Frontier USI

[02:16] labs. And so people thought, "Wow, if

[02:19] they can train it at a fraction of the

[02:20] price, then maybe Nvidia GPUs are not

[02:23] actually worth that much." But it turns

[02:25] out they were very wrong about that.

[02:27] When things get cheaper in price, we

[02:29] actually use a lot more of it. That's

[02:31] called Javon's Paradox. Okay. And now

[02:33] fast forward to today. Deepseek is back

[02:36] with V4. And they wrote an incredibly

[02:39] thorough white paper explaining how they

[02:41] did all of it, including being super

[02:44] honest about some of their failures.

[02:46] much more honest than any of the closed

[02:48] source AI labs in the United States. All

[02:50] right, so here's the post. It came out

[02:52] late last night. Let me tell you about

[02:54] it. Deepseek V4 preview is here and it

[02:58] comes in two flavors, pro and flash.

[03:01] First, it has a million token context

[03:03] length. That is amazing because that is

[03:06] the frontier. So immediately check that

[03:08] box. They are at the frontier of context

[03:10] limits. Next, it is a 1.6 6 trillion

[03:15] total parameter model with 49 billion

[03:18] active parameters. This is called

[03:20] mixture of experts. It basically allows

[03:22] you to have a massive model but run only

[03:26] parts small parts of the model that are

[03:29] specific to the question or the prompt

[03:32] that you're giving it. They also have V4

[03:35] flash which is their workhorse model.

[03:37] It's going to be smaller, it's going to

[03:39] be faster, and it's going to be much

[03:40] cheaper. This is 284

[03:43] billion total parameters with 13 billion

[03:46] active. And if we look at this

[03:48] screenshot, we can see both of them were

[03:51] trained with about 33 trillion tokens of

[03:55] training data. So some of the

[03:56] characteristics of these models, they

[03:59] have enhanced agentic capabilities. It

[04:01] is comparable to the state-of-the-art

[04:03] agentic coding models like Opus 47 and

[04:06] GPT 5.5. Literally the models that were

[04:10] just released in the last week from

[04:11] anthropic and open AI. It has rich world

[04:14] knowledge and worldclass reasoning beats

[04:17] all current open models in math stem

[04:19] coding rivaling top closed source

[04:21] models. All right. So let me show you

[04:22] some of the major benchmarks. Here we

[04:24] have MMLU Pro which is knowledge and

[04:26] reasoning. We can see here in the dark

[04:29] green bar, this is DeepSk with orange

[04:31] being Opus 46, purple being GPT54, and

[04:34] then in the stripes, those are the new

[04:37] models, Opus 47, and GPT55. But what

[04:40] we're seeing is although it is slightly

[04:43] behind here, it's right up there. Okay?

[04:46] And remember, it is a fraction of the

[04:48] price here. GPQA diamond, same thing.

[04:50] Sweetbench verified. And basically what

[04:53] you're seeing across the board is yes it

[04:56] is behind but just a little bit. And

[04:59] that's the real story here. Most use

[05:02] cases, the vast majority of use cases do

[05:05] not require the absolute frontier level

[05:08] intelligence. And the fact that DeepSeek

[05:10] is so much more efficient and so much

[05:12] cheaper is actually the problem for the

[05:16] United States. And so let's talk about

[05:18] cost because that is really what we need

[05:22] to be scared of. And if you're not sure

[05:24] why, I'm going to explain. Let's look at

[05:26] the cost first. This is AI model price

[05:29] versus performance. On the Y ais, we

[05:32] have intelligence. Just think the higher

[05:34] the smarter. On the Xaxis, it is the

[05:38] price. The more to the left it is, the

[05:42] cheaper it is. Cheaper is better. And so

[05:44] what you want is to be up here. in this

[05:47] top left. You want to be as cheap as

[05:50] possible and as intelligent as possible.

[05:53] And so what do we see? We see GBT 5.5,

[05:56] which was just released. At the very

[05:58] top, we have Opus 4.7 right next to it.

[06:02] And I'm just measuring Intelligence

[06:04] right now. GBT 5.4 extra high right over

[06:07] here. And then we have Deepseek V4 Pro.

[06:10] a little bit behind, a little bit lower

[06:12] on intelligence, but much much cheaper.

[06:17] And then look at Flash down here.

[06:19] Certainly a big drop in intelligence.

[06:22] Still really good, but this is an

[06:24] absolute workhorse model price right

[06:26] here. This is pennies per million

[06:29] tokens. Now, I want to show you how the

[06:31] rivalry between the US Frontier Labs and

[06:35] Chinese Frontier Labs has gone over the

[06:37] last few years. So we have GPT4 that

[06:41] came out in May 2023 and we had this

[06:44] massive gap. This is the ELO score in

[06:46] Arena and then Quen then GLM4 and at

[06:51] this point right after 01 preview came

[06:53] out. Remember 01 was the first thinking

[06:56] model right after that just a few months

[06:58] deepseeek R1 changed the world and

[07:00] closed the gap almost completely. The US

[07:04] labs did shoot ahead and there's been

[07:06] this back and forth eb and flow between

[07:08] them. Every time the US shoots ahead,

[07:11] Chinese open source catches up. They

[07:13] have always been behind, but that might

[07:15] not always be the case. And so that

[07:18] brings us to a geopolitical question.

[07:21] Are export controls actually working?

[07:25] Export controls basically means the US,

[07:28] specifically Nvidia, is not allowed to

[07:31] sell its top chips, its best GB300 and a

[07:36] few others to China directly. Now, there

[07:39] is a lot of rumors that China is going

[07:41] around those export controls and

[07:43] importing them into other countries and

[07:45] smuggling them. And there's an entire

[07:47] story there. We're not going to get into

[07:48] that today, though. But are export

[07:50] controls working? assuming that they are

[07:53] actually enforced. Well, the answer is

[07:56] kind of yes and kind of no. Export

[07:58] controls are working because China

[08:00] doesn't have the same compute resources

[08:02] that the United States has. This is just

[08:05] a fact. Even if they're able to smuggle

[08:08] in chips, it is difficult and they

[08:11] certainly don't have as much compute as

[08:13] we have in the United States. But if

[08:16] they did, imagine what they'd be able to

[08:18] do. Because on the flip side, the export

[08:22] controls kind of aren't working because

[08:24] they are innovating on the algorithm

[08:26] side. They are coming up with incredible

[08:29] algorithmic unlocks that make training

[08:31] and running inference of these models

[08:34] including DeepSeek incredibly efficient.

[08:37] And so even using nerfed GPUs, even

[08:41] using Chinese native GPUs, they're still

[08:45] able to create a frontier level model.

[08:48] And in fact, Nvidia, specifically

[08:49] Jensen, has made arguments for selling

[08:53] our top GPUs to them. China is going to

[08:55] be developing and producing their own AI

[08:58] chips. They should be built on American

[09:01] technology. And that argument is

[09:03] actually why Deepseek V4 is actually

[09:06] such a big deal and such a big threat to

[09:09] the US economy. But just the flip side

[09:12] to it, they're going to make their own

[09:14] chips. They're going to make their own

[09:17] incredible models and they are going to

[09:19] be very attractive to US companies and

[09:21] our allies. But more on that in a

[09:24] minute. All right, I want to talk about

[09:25] distillation hacking cuz it's all

[09:27] related. Just a few weeks ago, Anthropic

[09:30] put out a report basically saying they

[09:32] have proof that the top Chinese AI labs

[09:36] have been distillation attacking them

[09:39] for their clawed model. And what does

[09:41] that actually mean? The simplest way to

[09:43] explain what a distillation attack is is

[09:45] the Chinese AI labs are essentially

[09:48] trying to steal the data from Claude and

[09:50] from chat GPT. They're asking it

[09:53] questions, getting the answers, and then

[09:55] using those questionans answer pairs to

[09:58] train their own models. Those

[09:59] questionans answer pairs are everything.

[10:01] That's the IP of companies like

[10:03] Anthropic and OpenAI. And just

[10:05] yesterday, the US government put out a

[10:08] statement on distillation attacks. This

[10:10] is director Michael Kratzios. The US has

[10:13] evidence that foreign entities primarily

[10:15] in China are running industrialcale

[10:17] distillation campaigns to steal American

[10:19] AI. We will be taking action to protect

[10:21] American innovation. Now, this was

[10:24] already reported by Anthropic a few

[10:26] weeks ago. So, this is not really new

[10:27] news, but the US government actually

[10:30] saying yes, it's happening is the new

[10:32] part of it. And I'm going to explain why

[10:35] this ties into this overall story in a

[10:37] moment. These foreign entities are using

[10:39] tens of thousands of proxies and

[10:40] jailbreaking techniques and coordinated

[10:42] campaigns to systematically extract

[10:44] American breakthroughs. But here's the

[10:46] thing. If you look at Enthropics report,

[10:49] the Chinese labs and specifically

[10:51] DeepSeek didn't really steal all that

[10:54] much data. And there is actually an

[10:56] argument that they weren't stealing at

[10:58] all. Maybe it's against the terms of

[11:00] service, but a lot of it can be

[11:02] explained by simple benchmark

[11:04] comparisons. If you're a Frontier Lab

[11:07] and you want to know how well does my

[11:09] model do against my competitor model,

[11:11] well, the only way to know is to run

[11:14] benchmarks against both. And those

[11:16] benchmarks look exactly the same as a

[11:18] distillation attack. All right, so this

[11:20] is the report from Anthropic. I just

[11:22] want to very briefly show one thing. The

[11:24] scale of Deep Seek's distillation attack

[11:27] is just 150,000 exchanges. That is not

[11:31] much. Now, Moonshot, the company behind

[11:33] Kimmy, had 3.4 4 million and Miniax has

[11:37] 13 million. So certainly Deepseek of the

[11:40] Chinese labs have been doing this quote

[11:43] unquote dissolation attack far less than

[11:45] the other labs. And 150,000 exchanges is

[11:50] not really enough to explain the level

[11:53] of quality that DeepS has been able to

[11:55] achieve. And then you pair that with the

[11:57] fact that they've open sourced the whole

[11:59] thing. They have an incredibly detailed

[12:01] and thorough white paper that explains

[12:03] exactly how they were able to achieve

[12:05] it. It just doesn't mesh. And so back to

[12:09] our export controls actually working

[12:11] well. Twitter user Jukcon pointed out

[12:13] something very interesting in the report

[12:16] because of course like I said DeepSeek

[12:18] put out a very thorough report. It says

[12:21] due to constraints in high-end compute

[12:23] capacity, the current service capacity

[12:25] for Pro is very limited. After the 950

[12:28] super nodes are launched at scale in the

[12:31] second half of this year, the price of

[12:33] Pro is expected to be reduced

[12:34] significantly. So they are very compute

[12:37] constrained. They were able to bake and

[12:39] produce this model, but they can't even

[12:42] serve it in the most optimized way. And

[12:44] they're also charging more than they

[12:46] would have otherwise. So the price is

[12:48] going to continue to drop and the price

[12:50] is what I want to focus on now. So why

[12:52] is the price and efficiency of Deep Seek

[12:56] V4 such a big deal? Yes, it is nearly

[13:00] state-of-the-art. Nearly, not quite. It

[13:03] is almost as good as the top models Opus

[13:06] 47 and GPT 5.5. But here's the thing, it

[13:10] doesn't need to be as good. And just

[13:14] being nearly as good is good enough for

[13:17] almost everybody including enterprise

[13:20] companies in the United States. And that

[13:22] is what matters. Imagine you are a CEO

[13:26] of a company in the United States or one

[13:28] of our ally countries and you're looking

[13:31] at Opus 47. You're looking at GPT 5.5

[13:35] and you're looking at the costs and you

[13:37] see GPT 5.5 is $30 per million output

[13:41] tokens. You see, Opus 47 is similarly

[13:43] priced. And then you look at DeepSeek

[13:46] and it can accomplish all of your use

[13:49] cases because you're not doing frontier

[13:51] scientific research. You're not trying

[13:53] to crack some of the hardest coding

[13:55] problems in the world. You have a

[13:57] business and you're trying to run your

[13:59] business. And you look at the price and

[14:01] it is literally a fraction. And you get

[14:03] to control it more precisely. It's open

[14:06] source. You can fine-tune it all you

[14:08] like. You can make it exactly how you

[14:09] like, host it how you like, and your

[14:12] bill will be a fraction of the size it

[14:15] would be otherwise. The calculus that

[14:17] these CEOs are making becomes very

[14:20] obvious. Why would you pay so much more

[14:23] for a US frontier lab to serve you their

[14:27] model over an open-source Chinese model?

[14:30] And that's where the problem comes in

[14:32] because more and more US and our ally

[14:34] countries enterprise companies are going

[14:37] to think about this and make the

[14:39] decision to build on top of Chinese

[14:41] opensource technology. And that's the

[14:44] big argument. Remember Jensen just had

[14:47] the argument that hey China is going to

[14:48] be building their own chips. They're

[14:50] going to be building their own models.

[14:51] They might as well be built on US chips.

[14:54] Well, the same argument is on the flip

[14:57] side with US companies building on top

[14:59] of Chinese open- source models. That is

[15:01] a big security risk for the United

[15:03] States because if Chinese companies

[15:05] decide to change their architecture or

[15:08] cut us off suddenly, we're in a really

[15:10] bad spot. And so, let's think about

[15:12] this. We have trillions of dollars

[15:15] pouring into the AI industry in the

[15:17] United States. We have infrastructure

[15:20] buildout happening more quickly than any

[15:22] infrastructure buildout in history. So

[15:24] if you have all of this investment that

[15:26] requires a return and all of a sudden

[15:29] we're not getting that return, there is

[15:31] the potential for the US economy to

[15:33] collapse, especially because we are

[15:36] betting so heavily on artificial

[15:38] intelligence being the future of our

[15:41] economy. And then think about

[15:43] culturally. Think about how social media

[15:45] changed the world and social media came

[15:48] from the United States. We were able to

[15:50] control the narrative in a lot of

[15:52] places. Now flip that on its head.

[15:55] Imagine we're all built on Chinese

[15:57] models and they're dictating what the

[16:00] models are able to say and what they're

[16:02] not able to say. These are big questions

[16:05] that we're going to have to grapple with

[16:06] if US companies decide to build their AI

[16:09] strategy on top of Chinese open- source

[16:11] models and they are looking very

[16:14] attractive right now. All right, so

[16:16] where do we go from here? Well, I think

[16:18] there needs to be two big initiatives in

[16:21] the United States. Number one, we need

[16:23] to go much harder on open source. The

[16:26] frontier labs in the US are not open-

[16:29] source friendly for the most part. maybe

[16:31] with the exception of Google, but Google

[16:33] is building small open- source models,

[16:36] not the same level and not the same

[16:38] capability as a DeepSk V4. And then we

[16:41] also need to work on efficiency even if

[16:45] we are to maintain closed source and

[16:47] they're being served by OpenAI and

[16:49] anthropic. They need to get much cheaper

[16:52] much more quickly because US enterprise

[16:54] companies need to look at these

[16:56] different models and it needs to make

[16:57] sense costwise. That's going to enable

[17:00] the entire world to build on top of US

[17:04] artificial intelligence. So, if DeepSeek

[17:06] is doing everything right, Anthropic

[17:09] might be doing everything wrong, at

[17:10] least lately. I made a video about it,

[17:13] so check out the video on the screen

[17:15] right now. People say I went a little

[17:17] bit too hard on them, but I've been

[17:19] really frustrated.

Matthew Berman

Matthew Berman

View channel analytics →

Topics #deepseek #artificial intelligence #open-source ai #ai geopolitics #ai economics