[0:00] Deepseek just dropped their latest
[0:02] flagship model V4. It's massive,
[0:06] powerful, open-source, and a fraction of
[0:10] the cost. And it might be the model that
[0:12] ends America's lead in artificial
[0:15] intelligence. Not because China caught
[0:17] up, but because of what happens next.
[0:20] So, usually at this point, I would do a
[0:22] model overview. I would tell you about
[0:23] the model. I would show you the
[0:25] benchmarks. I would test it and show you
[0:27] what I think. But as I looked at it, I
[0:29] realized there was actually a much
[0:31] bigger story here. America has the best
[0:33] chips. It has the most money flowing
[0:36] into AI labs. Yet, China was able to
[0:40] release a frontier level model that
[0:43] matches the best of them. Completely
[0:45] open- source, completely open weights,
[0:48] and at a fraction of the cost and
[0:50] resources. They are literally working
[0:52] with nerfed Nvidia GPUs. that's not
[0:55] supposed to be possible and the fallout
[0:57] will be bigger than people realize. And
[0:59] so today, I'm going to tell you what's
[1:01] special about Deep Seek V4, why it
[1:03] matters, and what it means for the
[1:06] world. But first, if you love seeing
[1:08] videos about AI models, go ahead and hit
[1:11] the like and subscribe button. I want to
[1:14] reach as many people as possible and
[1:15] teach them about artificial
[1:17] intelligence, get them excited about it.
[1:19] And hitting the like and subscribe
[1:20] button helps the channel more than you
[1:23] realize. So, thank you for doing that in
[1:25] advance. Okay, so what is so special
[1:27] about DeepSeek V4? First, let me tell
[1:29] you about who DeepSEK actually is. If
[1:32] you remember, about 18 months ago, they
[1:34] dropped a model that literally changed
[1:37] the world. It was called Deepseek R1,
[1:40] and it was an open-source open weights
[1:42] model that could think. Remember back
[1:45] then, models that could think were only
[1:48] developed by the closed source AI labs
[1:50] in the United States. They dropped R1,
[1:53] showed the world that other countries
[1:56] and open-source labs could develop
[1:59] models that were at the frontier, and
[2:01] the stock market dropped 20% pretty much
[2:04] overnight. And what was really special
[2:06] about Deep Seek Rar1 was how efficient
[2:09] they were able to train it a fraction of
[2:11] the price than the hundreds of billions
[2:13] of dollars paid by the Frontier USI
[2:16] labs. And so people thought, "Wow, if
[2:19] they can train it at a fraction of the
[2:20] price, then maybe Nvidia GPUs are not
[2:23] actually worth that much." But it turns
[2:25] out they were very wrong about that.
[2:27] When things get cheaper in price, we
[2:29] actually use a lot more of it. That's
[2:31] called Javon's Paradox. Okay. And now
[2:33] fast forward to today. Deepseek is back
[2:36] with V4. And they wrote an incredibly
[2:39] thorough white paper explaining how they
[2:41] did all of it, including being super
[2:44] honest about some of their failures.
[2:46] much more honest than any of the closed
[2:48] source AI labs in the United States. All
[2:50] right, so here's the post. It came out
[2:52] late last night. Let me tell you about
[2:54] it. Deepseek V4 preview is here and it
[2:58] comes in two flavors, pro and flash.
[3:01] First, it has a million token context
[3:03] length. That is amazing because that is
[3:06] the frontier. So immediately check that
[3:08] box. They are at the frontier of context
[3:10] limits. Next, it is a 1.6 6 trillion
[3:15] total parameter model with 49 billion
[3:18] active parameters. This is called
[3:20] mixture of experts. It basically allows
[3:22] you to have a massive model but run only
[3:26] parts small parts of the model that are
[3:29] specific to the question or the prompt
[3:32] that you're giving it. They also have V4
[3:35] flash which is their workhorse model.
[3:37] It's going to be smaller, it's going to
[3:39] be faster, and it's going to be much
[3:40] cheaper. This is 284
[3:43] billion total parameters with 13 billion
[3:46] active. And if we look at this
[3:48] screenshot, we can see both of them were
[3:51] trained with about 33 trillion tokens of
[3:55] training data. So some of the
[3:56] characteristics of these models, they
[3:59] have enhanced agentic capabilities. It
[4:01] is comparable to the state-of-the-art
[4:03] agentic coding models like Opus 47 and
[4:06] GPT 5.5. Literally the models that were
[4:10] just released in the last week from
[4:11] anthropic and open AI. It has rich world
[4:14] knowledge and worldclass reasoning beats
[4:17] all current open models in math stem
[4:19] coding rivaling top closed source
[4:21] models. All right. So let me show you
[4:22] some of the major benchmarks. Here we
[4:24] have MMLU Pro which is knowledge and
[4:26] reasoning. We can see here in the dark
[4:29] green bar, this is DeepSk with orange
[4:31] being Opus 46, purple being GPT54, and
[4:34] then in the stripes, those are the new
[4:37] models, Opus 47, and GPT55. But what
[4:40] we're seeing is although it is slightly
[4:43] behind here, it's right up there. Okay?
[4:46] And remember, it is a fraction of the
[4:48] price here. GPQA diamond, same thing.
[4:50] Sweetbench verified. And basically what
[4:53] you're seeing across the board is yes it
[4:56] is behind but just a little bit. And
[4:59] that's the real story here. Most use
[5:02] cases, the vast majority of use cases do
[5:05] not require the absolute frontier level
[5:08] intelligence. And the fact that DeepSeek
[5:10] is so much more efficient and so much
[5:12] cheaper is actually the problem for the
[5:16] United States. And so let's talk about
[5:18] cost because that is really what we need
[5:22] to be scared of. And if you're not sure
[5:24] why, I'm going to explain. Let's look at
[5:26] the cost first. This is AI model price
[5:29] versus performance. On the Y ais, we
[5:32] have intelligence. Just think the higher
[5:34] the smarter. On the Xaxis, it is the
[5:38] price. The more to the left it is, the
[5:42] cheaper it is. Cheaper is better. And so
[5:44] what you want is to be up here. in this
[5:47] top left. You want to be as cheap as
[5:50] possible and as intelligent as possible.
[5:53] And so what do we see? We see GBT 5.5,
[5:56] which was just released. At the very
[5:58] top, we have Opus 4.7 right next to it.
[6:02] And I'm just measuring Intelligence
[6:04] right now. GBT 5.4 extra high right over
[6:07] here. And then we have Deepseek V4 Pro.
[6:10] a little bit behind, a little bit lower
[6:12] on intelligence, but much much cheaper.
[6:17] And then look at Flash down here.
[6:19] Certainly a big drop in intelligence.
[6:22] Still really good, but this is an
[6:24] absolute workhorse model price right
[6:26] here. This is pennies per million
[6:29] tokens. Now, I want to show you how the
[6:31] rivalry between the US Frontier Labs and
[6:35] Chinese Frontier Labs has gone over the
[6:37] last few years. So we have GPT4 that
[6:41] came out in May 2023 and we had this
[6:44] massive gap. This is the ELO score in
[6:46] Arena and then Quen then GLM4 and at
[6:51] this point right after 01 preview came
[6:53] out. Remember 01 was the first thinking
[6:56] model right after that just a few months
[6:58] deepseeek R1 changed the world and
[7:00] closed the gap almost completely. The US
[7:04] labs did shoot ahead and there's been
[7:06] this back and forth eb and flow between
[7:08] them. Every time the US shoots ahead,
[7:11] Chinese open source catches up. They
[7:13] have always been behind, but that might
[7:15] not always be the case. And so that
[7:18] brings us to a geopolitical question.
[7:21] Are export controls actually working?
[7:25] Export controls basically means the US,
[7:28] specifically Nvidia, is not allowed to
[7:31] sell its top chips, its best GB300 and a
[7:36] few others to China directly. Now, there
[7:39] is a lot of rumors that China is going
[7:41] around those export controls and
[7:43] importing them into other countries and
[7:45] smuggling them. And there's an entire
[7:47] story there. We're not going to get into
[7:48] that today, though. But are export
[7:50] controls working? assuming that they are
[7:53] actually enforced. Well, the answer is
[7:56] kind of yes and kind of no. Export
[7:58] controls are working because China
[8:00] doesn't have the same compute resources
[8:02] that the United States has. This is just
[8:05] a fact. Even if they're able to smuggle
[8:08] in chips, it is difficult and they
[8:11] certainly don't have as much compute as
[8:13] we have in the United States. But if
[8:16] they did, imagine what they'd be able to
[8:18] do. Because on the flip side, the export
[8:22] controls kind of aren't working because
[8:24] they are innovating on the algorithm
[8:26] side. They are coming up with incredible
[8:29] algorithmic unlocks that make training
[8:31] and running inference of these models
[8:34] including DeepSeek incredibly efficient.
[8:37] And so even using nerfed GPUs, even
[8:41] using Chinese native GPUs, they're still
[8:45] able to create a frontier level model.
[8:48] And in fact, Nvidia, specifically
[8:49] Jensen, has made arguments for selling
[8:53] our top GPUs to them. China is going to
[8:55] be developing and producing their own AI
[8:58] chips. They should be built on American
[9:01] technology. And that argument is
[9:03] actually why Deepseek V4 is actually
[9:06] such a big deal and such a big threat to
[9:09] the US economy. But just the flip side
[9:12] to it, they're going to make their own
[9:14] chips. They're going to make their own
[9:17] incredible models and they are going to
[9:19] be very attractive to US companies and
[9:21] our allies. But more on that in a
[9:24] minute. All right, I want to talk about
[9:25] distillation hacking cuz it's all
[9:27] related. Just a few weeks ago, Anthropic
[9:30] put out a report basically saying they
[9:32] have proof that the top Chinese AI labs
[9:36] have been distillation attacking them
[9:39] for their clawed model. And what does
[9:41] that actually mean? The simplest way to
[9:43] explain what a distillation attack is is
[9:45] the Chinese AI labs are essentially
[9:48] trying to steal the data from Claude and
[9:50] from chat GPT. They're asking it
[9:53] questions, getting the answers, and then
[9:55] using those questionans answer pairs to
[9:58] train their own models. Those
[9:59] questionans answer pairs are everything.
[10:01] That's the IP of companies like
[10:03] Anthropic and OpenAI. And just
[10:05] yesterday, the US government put out a
[10:08] statement on distillation attacks. This
[10:10] is director Michael Kratzios. The US has
[10:13] evidence that foreign entities primarily
[10:15] in China are running industrialcale
[10:17] distillation campaigns to steal American
[10:19] AI. We will be taking action to protect
[10:21] American innovation. Now, this was
[10:24] already reported by Anthropic a few
[10:26] weeks ago. So, this is not really new
[10:27] news, but the US government actually
[10:30] saying yes, it's happening is the new
[10:32] part of it. And I'm going to explain why
[10:35] this ties into this overall story in a
[10:37] moment. These foreign entities are using
[10:39] tens of thousands of proxies and
[10:40] jailbreaking techniques and coordinated
[10:42] campaigns to systematically extract
[10:44] American breakthroughs. But here's the
[10:46] thing. If you look at Enthropics report,
[10:49] the Chinese labs and specifically
[10:51] DeepSeek didn't really steal all that
[10:54] much data. And there is actually an
[10:56] argument that they weren't stealing at
[10:58] all. Maybe it's against the terms of
[11:00] service, but a lot of it can be
[11:02] explained by simple benchmark
[11:04] comparisons. If you're a Frontier Lab
[11:07] and you want to know how well does my
[11:09] model do against my competitor model,
[11:11] well, the only way to know is to run
[11:14] benchmarks against both. And those
[11:16] benchmarks look exactly the same as a
[11:18] distillation attack. All right, so this
[11:20] is the report from Anthropic. I just
[11:22] want to very briefly show one thing. The
[11:24] scale of Deep Seek's distillation attack
[11:27] is just 150,000 exchanges. That is not
[11:31] much. Now, Moonshot, the company behind
[11:33] Kimmy, had 3.4 4 million and Miniax has
[11:37] 13 million. So certainly Deepseek of the
[11:40] Chinese labs have been doing this quote
[11:43] unquote dissolation attack far less than
[11:45] the other labs. And 150,000 exchanges is
[11:50] not really enough to explain the level
[11:53] of quality that DeepS has been able to
[11:55] achieve. And then you pair that with the
[11:57] fact that they've open sourced the whole
[11:59] thing. They have an incredibly detailed
[12:01] and thorough white paper that explains
[12:03] exactly how they were able to achieve
[12:05] it. It just doesn't mesh. And so back to
[12:09] our export controls actually working
[12:11] well. Twitter user Jukcon pointed out
[12:13] something very interesting in the report
[12:16] because of course like I said DeepSeek
[12:18] put out a very thorough report. It says
[12:21] due to constraints in high-end compute
[12:23] capacity, the current service capacity
[12:25] for Pro is very limited. After the 950
[12:28] super nodes are launched at scale in the
[12:31] second half of this year, the price of
[12:33] Pro is expected to be reduced
[12:34] significantly. So they are very compute
[12:37] constrained. They were able to bake and
[12:39] produce this model, but they can't even
[12:42] serve it in the most optimized way. And
[12:44] they're also charging more than they
[12:46] would have otherwise. So the price is
[12:48] going to continue to drop and the price
[12:50] is what I want to focus on now. So why
[12:52] is the price and efficiency of Deep Seek
[12:56] V4 such a big deal? Yes, it is nearly
[13:00] state-of-the-art. Nearly, not quite. It
[13:03] is almost as good as the top models Opus
[13:06] 47 and GPT 5.5. But here's the thing, it
[13:10] doesn't need to be as good. And just
[13:14] being nearly as good is good enough for
[13:17] almost everybody including enterprise
[13:20] companies in the United States. And that
[13:22] is what matters. Imagine you are a CEO
[13:26] of a company in the United States or one
[13:28] of our ally countries and you're looking
[13:31] at Opus 47. You're looking at GPT 5.5
[13:35] and you're looking at the costs and you
[13:37] see GPT 5.5 is $30 per million output
[13:41] tokens. You see, Opus 47 is similarly
[13:43] priced. And then you look at DeepSeek
[13:46] and it can accomplish all of your use
[13:49] cases because you're not doing frontier
[13:51] scientific research. You're not trying
[13:53] to crack some of the hardest coding
[13:55] problems in the world. You have a
[13:57] business and you're trying to run your
[13:59] business. And you look at the price and
[14:01] it is literally a fraction. And you get
[14:03] to control it more precisely. It's open
[14:06] source. You can fine-tune it all you
[14:08] like. You can make it exactly how you
[14:09] like, host it how you like, and your
[14:12] bill will be a fraction of the size it
[14:15] would be otherwise. The calculus that
[14:17] these CEOs are making becomes very
[14:20] obvious. Why would you pay so much more
[14:23] for a US frontier lab to serve you their
[14:27] model over an open-source Chinese model?
[14:30] And that's where the problem comes in
[14:32] because more and more US and our ally
[14:34] countries enterprise companies are going
[14:37] to think about this and make the
[14:39] decision to build on top of Chinese
[14:41] opensource technology. And that's the
[14:44] big argument. Remember Jensen just had
[14:47] the argument that hey China is going to
[14:48] be building their own chips. They're
[14:50] going to be building their own models.
[14:51] They might as well be built on US chips.
[14:54] Well, the same argument is on the flip
[14:57] side with US companies building on top
[14:59] of Chinese open- source models. That is
[15:01] a big security risk for the United
[15:03] States because if Chinese companies
[15:05] decide to change their architecture or
[15:08] cut us off suddenly, we're in a really
[15:10] bad spot. And so, let's think about
[15:12] this. We have trillions of dollars
[15:15] pouring into the AI industry in the
[15:17] United States. We have infrastructure
[15:20] buildout happening more quickly than any
[15:22] infrastructure buildout in history. So
[15:24] if you have all of this investment that
[15:26] requires a return and all of a sudden
[15:29] we're not getting that return, there is
[15:31] the potential for the US economy to
[15:33] collapse, especially because we are
[15:36] betting so heavily on artificial
[15:38] intelligence being the future of our
[15:41] economy. And then think about
[15:43] culturally. Think about how social media
[15:45] changed the world and social media came
[15:48] from the United States. We were able to
[15:50] control the narrative in a lot of
[15:52] places. Now flip that on its head.
[15:55] Imagine we're all built on Chinese
[15:57] models and they're dictating what the
[16:00] models are able to say and what they're
[16:02] not able to say. These are big questions
[16:05] that we're going to have to grapple with
[16:06] if US companies decide to build their AI
[16:09] strategy on top of Chinese open- source
[16:11] models and they are looking very
[16:14] attractive right now. All right, so
[16:16] where do we go from here? Well, I think
[16:18] there needs to be two big initiatives in
[16:21] the United States. Number one, we need
[16:23] to go much harder on open source. The
[16:26] frontier labs in the US are not open-
[16:29] source friendly for the most part. maybe
[16:31] with the exception of Google, but Google
[16:33] is building small open- source models,
[16:36] not the same level and not the same
[16:38] capability as a DeepSk V4. And then we
[16:41] also need to work on efficiency even if
[16:45] we are to maintain closed source and
[16:47] they're being served by OpenAI and
[16:49] anthropic. They need to get much cheaper
[16:52] much more quickly because US enterprise
[16:54] companies need to look at these
[16:56] different models and it needs to make
[16:57] sense costwise. That's going to enable
[17:00] the entire world to build on top of US
[17:04] artificial intelligence. So, if DeepSeek
[17:06] is doing everything right, Anthropic
[17:09] might be doing everything wrong, at
[17:10] least lately. I made a video about it,
[17:13] so check out the video on the screen
[17:15] right now. People say I went a little
[17:17] bit too hard on them, but I've been
[17:19] really frustrated.