[0:00] AI systems today are really powerful and [0:03] can do a lot. No question about that. [0:05] But, how do they really work? We have so [0:08] many questions. Do they think like [0:10] humans? How do they beat the best human [0:13] chess player? How do they beat the world [0:15] champion video game players? And how is [0:17] it possible that an AI chooses to not [0:20] play the game, but just collapse and can [0:23] trick the brain of another AI to [0:25] malfunction? Why does Claude think about [0:28] blackmailing people? I mean, who what is [0:30] going on here? If you look at the [0:32] activations inside an AI system like [0:35] Claude, you see a bunch of gibberish, [0:38] millions of numbers. Researchers tried [0:40] to make sense of it for years and years [0:42] now, but the results were very thin and [0:45] situational. We now see that it [0:47] understands that if you look at an image [0:49] and you have floppy ears, a black snout, [0:52] and so on, then it might be a dog, a [0:55] good boy. But, we asked a bunch of [0:57] questions and still no answers to those. [1:00] But, now Anthropic has excellent new [1:03] research with new insights on this. This [1:05] is when Anthropic is at its best, in my [1:08] opinion. I love seeing it. Here's the [1:10] idea. Take this bunch of numbers that [1:12] the AI thinks about and ask another AI [1:15] to translate it into text. Translate [1:18] from machine to human. And it did [1:22] something. Okay, but these systems often [1:25] make stuff up. So, how do we know if [1:27] this is a good translation? We don't. [1:30] So, what do we do here? Try it [1:32] separately with a bunch of different [1:34] models and see if they translated the [1:36] same way. Is that a good idea? Mm, not [1:40] quite. Imagine you are a teacher and you [1:42] give a problem to your students and all [1:45] of your students write the same answer. [1:47] Can you conclude it must be true? Well, [1:50] not necessarily. There are common [1:53] mistakes in any area and it is possible [1:56] that it is exactly the mistake they all [1:58] made. So, what do you do? Now, here [2:00] comes the genius idea. First, AI [2:04] translates numbers to text. Then, the [2:07] second AI secretly guess the text and [2:10] you ask it to translate it back to [2:13] numbers. Uh-huh. [2:15] And what happened here was kind of [2:17] insane. You see, H is the original [2:20] thought inside Claude. Numbers, AR theta [2:24] of Z is translating the text back to [2:26] numbers. And then, we look at the [2:29] difference between the two. Translate [2:31] forward, then translate back, and see [2:34] how much difference there is. This is to [2:37] be minimized to ensure the translation [2:39] works reliably. Do the whole round trip, [2:42] come back, and if you end up close to [2:44] the same place, you know that the path [2:47] is likely correct. But, here comes the [2:49] part where I fell off the chair when [2:51] reading this paper. And it is not what's [2:53] in this formula. No. [2:56] It is what is missing from the formula. [2:59] You see, absolutely nothing here in this [3:01] formula says that the result should be [3:04] readable. Not at all. Readability [3:07] emerges because both translators start [3:10] as Claude, and Claude finds English [3:12] easier than gibberish. But, it gets [3:15] better. With this tool, they picked the [3:17] brain of Claude and found many amazing [3:19] things. I will highlight what I think [3:21] are the three best ones. Dear fellow [3:24] scholars, this is Two Minute Papers with [3:26] Dr. Károly Zsolnai Fehér. One, it plans [3:29] ahead. When writing a rhyme, Claude [3:31] picks the final word before writing the [3:34] whole sentence. They caught it while it [3:36] was thinking rabbit, and it went to find [3:39] something that rhymes with it. Then, [3:41] they replaced rabbit with mouse, and it [3:45] actually rhymed with the mouse instead. [3:47] Sometimes, not always. Really cool. Two, [3:51] this is going to be super fun. [3:53] Researchers gave it a math problem for [3:55] which the answer is 491. [3:58] And then, [4:00] they gave it a rigged calculator that [4:03] returns 492 [4:05] instead. So, what did it do? Well, it [4:08] had an initial hunch for the solution, [4:11] and then when the calculator said [4:13] otherwise, it ignored it. [4:15] >> [laughter] [4:16] >> That is incredible. And three, now hold [4:19] on to your papers, fellow scholars, [4:20] because it knows when it is being [4:23] tested, and it gets crazier. It does not [4:26] tell you that it knows. You have to peer [4:28] into its mind to get to know that. This [4:31] sounds like something straight out of a [4:33] science fiction movie. What a time to be [4:36] alive. Now, okay, limitations. Let's not [4:39] get carried away here. One, this is not [4:42] nearly as easy as it sounds. For [4:44] instance, you need to find the right [4:46] layer in the neural network to train on. [4:48] Also, when minimizing the squared two [4:51] norm here in this formula, the [4:52] translation forward is done by one AI [4:56] and backwards by another. So, based on [4:59] my experience doing similar things, in [5:02] simple words, this is very finicky. Lots [5:05] of trial and error. The result is going [5:07] to be noisy. Two, despite the headlines [5:10] you see in the media, this is not a [5:12] perfect AI mind reader. No, this is a [5:16] natural language autoencoder. Okay, what [5:19] does that mean? Well, it is more like a [5:21] noisy translator. It catches real [5:24] things, yes, but it sometimes makes up [5:27] some of the specifics. Three, the cost [5:30] is bearable. For a 27 billion parameter [5:34] model, you train 1 and 1/2 days on 16 [5:37] H100 GPUs. And for a frontier model, the [5:41] cost is substantial. But, despite all [5:44] these, this work is lovely, amazing, and [5:47] it makes something previously impossible [5:50] possible. And two more papers down the [5:52] line, and I bet it will be done much [5:54] cheaper and better. What a time to be [5:57] alive. And now, please, use this to tell [6:00] me why ChatGPT keeps thinking about [6:03] goblins. Now, some of these videos come [6:05] out a bit later because I try to be a [6:08] bit more rigorous with them. You know, a [6:10] quick media headline brings in a lot of [6:12] clicks, especially if you write them [6:15] with AI. Then you can be super quick, [6:17] and people do that. But these videos, [6:20] they come from the heart. Subscribe and [6:22] hit the bell if you think this is the [6:24] way to do it. Here you see me running [6:26] the full Deep Seek AI model through [6:29] Lambda GPU Cloud. 671 [6:33] billion parameters running super fast [6:36] and super reliably. This is insane. I [6:39] love it, and I use it on a regular [6:42] basis. Lambda provides you with powerful [6:44] Nvidia GPUs to run your own chatbots and [6:48] experiments. Seriously, try it out now [6:51] at lambda.ai/papers, [6:54] or click the link in the description.