[0:00] The number of AI agents on the internet
[0:02] is increasing at such an insane rate. I
[0:06] don't think I've seen anything like
[0:07] this. This is crazy. And this is an area
[0:10] that is quite new, and the technology is
[0:12] still pretty rough. Improving rapidly,
[0:15] but pretty rough. And the promise of
[0:17] agents is incredible. It would book the
[0:19] cheapest plane ticket for you, or run 24
[0:22] hours a day to manage your schedule,
[0:24] submit insurance claims, continuously
[0:27] scan a codebase for vulnerabilities and
[0:29] patch it. Well, this is the good, but at
[0:31] the same time, you get so many news
[0:34] headlines about spam, security issues,
[0:36] and system breakdowns. And it gets even
[0:39] tougher when you have not one agent, but
[0:42] multiple agents. Imagine two agents
[0:45] organizing a holiday for you. The flight
[0:47] agent hallucinates a cheaper airport 400
[0:51] miles away from your real destination.
[0:53] Then, the hotel agent says, "Let's book
[0:56] something super cheap nearby." Well,
[0:59] super cheap is often non-refundable. And
[1:02] now congratulations.
[1:04] You now have a non-refundable room you
[1:07] will never see.
[1:09] And so many of these problems come from
[1:11] the fact that agent coordination is
[1:13] super difficult. Now, check out what
[1:16] this paper says we should do. Here is a
[1:18] math problem. First agent writes a plan.
[1:21] The next one critiques it, and the third
[1:24] one solves the problem. And at this
[1:26] point, I said, "Okay.
[1:28] I see nothing interesting here. This is
[1:31] what everyone does with agents." Yes,
[1:34] but here's the key. Most agents
[1:36] communicate a bit like we do, in words.
[1:40] Wait a second. Why should we do that?
[1:42] Look at this neural interface for
[1:45] brain-to-text communication. Yes, this
[1:48] really works. You just think about a
[1:50] letter in the alphabet, and it magically
[1:53] appears. And if you keep doing this a
[1:55] lot, you start asking. The alphabet is
[1:58] optimized for writing.
[2:00] Why use that? Why not use one that is
[2:03] optimized for thinking? And what would
[2:05] that even look like? Hint, it would look
[2:08] like this. We talked about this 500
[2:10] videos ago, paper in the description.
[2:13] Now, if you look at the agents, the
[2:15] first one does some work, packs it up,
[2:17] and passes it to the next one. So do the
[2:20] second and the third ones. Every
[2:22] [clears throat] time an agent wants to
[2:24] communicate something, it has to write
[2:26] out full sentences, decode tokens one by
[2:30] one, and the next guy has to read and
[2:33] re-encode the whole thing.
[2:36] Why are we doing that? Who said they
[2:38] should talk in plain English? And this
[2:41] is the part where I fell off the chair.
[2:43] Now, hold on to your papers, fellow
[2:44] scholars, because this work says, "Huh,
[2:47] forget English. You know what? Forget
[2:49] letters entirely." It says, "Instead,
[2:52] let's link up their brains." Kind of.
[2:56] Instead of using English words, they
[2:58] pass raw undecoded numbers directly to
[3:01] the next agent. Send raw brain signals,
[3:04] if you will. Call it cross-agent latent
[3:07] state transfer. So, the theory is that
[3:10] these three agents can work together
[3:12] round one, round two, and round three
[3:15] much cheaper than the text-based agents.
[3:18] They refine an answer, and you get
[3:20] better answers with the same amount of
[3:22] computation. So, is it better? Hmm,
[3:26] let's see. Dear fellow scholars, this is
[3:28] Two Minute Papers with Dr. Károly
[3:30] Zsolnai Fehér. Well, when given
[3:32] competition-level math questions, it
[3:34] goes from 73% to 86%.
[3:38] That is crazy.
[3:40] We are talking free sub-10 billion
[3:42] parameter models, not expensive frontier
[3:45] systems. And here is where it gets the
[3:48] Michelin star status. Look at that.
[3:52] Ooh.
[3:53] Token usage down 75%.
[3:57] They all evaporated into the latent
[4:00] space. Loving it. So, this can improve
[4:03] smaller systems to be in striking
[4:05] distance of much bigger, more expensive
[4:08] models on difficult math problems. So, I
[4:11] bet it costs a fortune to train, right?
[4:14] Well, look at that.
[4:17] Four bucks. Basically, you spend your
[4:19] coffee money on these agents and in
[4:21] return they punch a hole in space-time.
[4:25] Love it. Additionally, it might even
[4:27] unlock Wait, wait, wait. I shouldn't say
[4:30] unlock. That's AI speak. So, it might
[4:33] give us a new scaling law. More rounds,
[4:37] better results. And at this point, I
[4:39] thought we might have a deadly flaw
[4:41] here. And it's really subtle. So, the
[4:44] training for each agent's role is
[4:47] written by a giant AI model. So, if they
[4:50] perform well, you have to ask, are
[4:53] things better because of the brain
[4:55] linking or is it good distillation from
[4:58] an excellent teacher? So, which one is
[5:01] it? A good teacher or a good
[5:03] architecture? Well, fellow scholars, we
[5:06] are in luck. This is a really good
[5:08] paper. So, the scientists thought about
[5:11] this too. And look, goodness, a
[5:14] controlled comparison gives the same
[5:16] teacher to other architectures and this
[5:19] one. And the new one still outperforms.
[5:22] So, yes, the brain linking really works.
[5:26] What a time to be alive. Okay, now,
[5:29] let's not get too excited. This is
[5:31] two-minute papers and we respect the
[5:33] science here. Limitations. One, tests
[5:36] were on smaller models. We don't yet
[5:39] know how these insights scale up to
[5:41] bigger ones. If they don't, then this
[5:44] puts small models on steroids.
[5:47] Still good. If yes, potential huge
[5:50] game-changer. Two, there is an optimal
[5:52] latent thought length, and that is about
[5:55] 80 steps. This is somewhat of a limit on
[5:58] how much thinking an agent can do per
[6:01] round.
[6:02] >> [clears throat]
[6:02] >> I am thinking, you know, if it solves a
[6:04] mathematical Olympiad problems already,
[6:07] how bad can that be? And sure enough,
[6:10] after 80, you don't get a lot of value
[6:12] anyway, but I wanted to mention it.
[6:14] Okay? So, code and models are available
[6:17] for free. Note that this is still very
[6:20] rough, very early, but it shows
[6:22] potential. And this is still research.
[6:25] Please do not think you just plug this
[6:27] in and everything will fly immediately.
[6:30] We need new tools for the era of LLMs,
[6:33] and Weights & Biases now has Weave, a
[6:36] lightweight toolkit to confidently
[6:38] iterate on LLM applications. Use traces
[6:41] to debug how data flows through each
[6:43] step of your app, and use evaluations to
[6:46] measure your progress. It is the best.
[6:49] Try it out now at wnb.me/papers,
[6:53] or click the link in the description
[6:55] below.