[0:00] AI systems today are really powerful and
[0:03] can do a lot. No question about that.
[0:05] But, how do they really work? We have so
[0:08] many questions. Do they think like
[0:10] humans? How do they beat the best human
[0:13] chess player? How do they beat the world
[0:15] champion video game players? And how is
[0:17] it possible that an AI chooses to not
[0:20] play the game, but just collapse and can
[0:23] trick the brain of another AI to
[0:25] malfunction? Why does Claude think about
[0:28] blackmailing people? I mean, who what is
[0:30] going on here? If you look at the
[0:32] activations inside an AI system like
[0:35] Claude, you see a bunch of gibberish,
[0:38] millions of numbers. Researchers tried
[0:40] to make sense of it for years and years
[0:42] now, but the results were very thin and
[0:45] situational. We now see that it
[0:47] understands that if you look at an image
[0:49] and you have floppy ears, a black snout,
[0:52] and so on, then it might be a dog, a
[0:55] good boy. But, we asked a bunch of
[0:57] questions and still no answers to those.
[1:00] But, now Anthropic has excellent new
[1:03] research with new insights on this. This
[1:05] is when Anthropic is at its best, in my
[1:08] opinion. I love seeing it. Here's the
[1:10] idea. Take this bunch of numbers that
[1:12] the AI thinks about and ask another AI
[1:15] to translate it into text. Translate
[1:18] from machine to human. And it did
[1:22] something. Okay, but these systems often
[1:25] make stuff up. So, how do we know if
[1:27] this is a good translation? We don't.
[1:30] So, what do we do here? Try it
[1:32] separately with a bunch of different
[1:34] models and see if they translated the
[1:36] same way. Is that a good idea? Mm, not
[1:40] quite. Imagine you are a teacher and you
[1:42] give a problem to your students and all
[1:45] of your students write the same answer.
[1:47] Can you conclude it must be true? Well,
[1:50] not necessarily. There are common
[1:53] mistakes in any area and it is possible
[1:56] that it is exactly the mistake they all
[1:58] made. So, what do you do? Now, here
[2:00] comes the genius idea. First, AI
[2:04] translates numbers to text. Then, the
[2:07] second AI secretly guess the text and
[2:10] you ask it to translate it back to
[2:13] numbers. Uh-huh.
[2:15] And what happened here was kind of
[2:17] insane. You see, H is the original
[2:20] thought inside Claude. Numbers, AR theta
[2:24] of Z is translating the text back to
[2:26] numbers. And then, we look at the
[2:29] difference between the two. Translate
[2:31] forward, then translate back, and see
[2:34] how much difference there is. This is to
[2:37] be minimized to ensure the translation
[2:39] works reliably. Do the whole round trip,
[2:42] come back, and if you end up close to
[2:44] the same place, you know that the path
[2:47] is likely correct. But, here comes the
[2:49] part where I fell off the chair when
[2:51] reading this paper. And it is not what's
[2:53] in this formula. No.
[2:56] It is what is missing from the formula.
[2:59] You see, absolutely nothing here in this
[3:01] formula says that the result should be
[3:04] readable. Not at all. Readability
[3:07] emerges because both translators start
[3:10] as Claude, and Claude finds English
[3:12] easier than gibberish. But, it gets
[3:15] better. With this tool, they picked the
[3:17] brain of Claude and found many amazing
[3:19] things. I will highlight what I think
[3:21] are the three best ones. Dear fellow
[3:24] scholars, this is Two Minute Papers with
[3:26] Dr. Károly Zsolnai Fehér. One, it plans
[3:29] ahead. When writing a rhyme, Claude
[3:31] picks the final word before writing the
[3:34] whole sentence. They caught it while it
[3:36] was thinking rabbit, and it went to find
[3:39] something that rhymes with it. Then,
[3:41] they replaced rabbit with mouse, and it
[3:45] actually rhymed with the mouse instead.
[3:47] Sometimes, not always. Really cool. Two,
[3:51] this is going to be super fun.
[3:53] Researchers gave it a math problem for
[3:55] which the answer is 491.
[3:58] And then,
[4:00] they gave it a rigged calculator that
[4:03] returns 492
[4:05] instead. So, what did it do? Well, it
[4:08] had an initial hunch for the solution,
[4:11] and then when the calculator said
[4:13] otherwise, it ignored it.
[4:15] >> [laughter]
[4:16] >> That is incredible. And three, now hold
[4:19] on to your papers, fellow scholars,
[4:20] because it knows when it is being
[4:23] tested, and it gets crazier. It does not
[4:26] tell you that it knows. You have to peer
[4:28] into its mind to get to know that. This
[4:31] sounds like something straight out of a
[4:33] science fiction movie. What a time to be
[4:36] alive. Now, okay, limitations. Let's not
[4:39] get carried away here. One, this is not
[4:42] nearly as easy as it sounds. For
[4:44] instance, you need to find the right
[4:46] layer in the neural network to train on.
[4:48] Also, when minimizing the squared two
[4:51] norm here in this formula, the
[4:52] translation forward is done by one AI
[4:56] and backwards by another. So, based on
[4:59] my experience doing similar things, in
[5:02] simple words, this is very finicky. Lots
[5:05] of trial and error. The result is going
[5:07] to be noisy. Two, despite the headlines
[5:10] you see in the media, this is not a
[5:12] perfect AI mind reader. No, this is a
[5:16] natural language autoencoder. Okay, what
[5:19] does that mean? Well, it is more like a
[5:21] noisy translator. It catches real
[5:24] things, yes, but it sometimes makes up
[5:27] some of the specifics. Three, the cost
[5:30] is bearable. For a 27 billion parameter
[5:34] model, you train 1 and 1/2 days on 16
[5:37] H100 GPUs. And for a frontier model, the
[5:41] cost is substantial. But, despite all
[5:44] these, this work is lovely, amazing, and
[5:47] it makes something previously impossible
[5:50] possible. And two more papers down the
[5:52] line, and I bet it will be done much
[5:54] cheaper and better. What a time to be
[5:57] alive. And now, please, use this to tell
[6:00] me why ChatGPT keeps thinking about
[6:03] goblins. Now, some of these videos come
[6:05] out a bit later because I try to be a
[6:08] bit more rigorous with them. You know, a
[6:10] quick media headline brings in a lot of
[6:12] clicks, especially if you write them
[6:15] with AI. Then you can be super quick,
[6:17] and people do that. But these videos,
[6:20] they come from the heart. Subscribe and
[6:22] hit the bell if you think this is the
[6:24] way to do it. Here you see me running
[6:26] the full Deep Seek AI model through
[6:29] Lambda GPU Cloud. 671
[6:33] billion parameters running super fast
[6:36] and super reliably. This is insane. I
[6:39] love it, and I use it on a regular
[6:42] basis. Lambda provides you with powerful
[6:44] Nvidia GPUs to run your own chatbots and
[6:48] experiments. Seriously, try it out now
[6:51] at lambda.ai/papers,
[6:54] or click the link in the description.