AI Agents Are Exploding!
45sHigh energy opening about the explosive growth of AI agents grabs attention, and the relatable example of two agents messing up a holiday booking is both funny and concerning.
▶ Play ClipThe video discusses the rapid growth of AI agents on the internet and their ability to automate tasks like booking flights or managing schedules. It highlights a key problem: agents typically communicate in human language, which leads to inefficiencies and errors. The solution proposed is 'cross-agent latent state transfer,' where agents pass raw numerical data instead of text, leading to significant improvements in accuracy and cost.
The number of AI agents online is increasing rapidly, though the technology is still rough and improving fast.
Agent coordination is difficult, leading to errors like hallucinating an airport 400 miles away and booking non-refundable rooms.
Proposes passing raw undecoded numbers (latent states) between agents instead of text, calling it 'cross-agent latent state transfer'.
On competition-level math problems, accuracy improved from 73% to 86% using sub-10 billion parameter models, with token usage down 75%.
Training cost was only $4, making it extremely cheap.
Controlled comparison showed the new architecture outperforms others even with the same teacher model, confirming that latent transfer works.
Tests were on small models; scaling to larger ones is unknown. Optimal latent thought length is about 80 steps. Early research, code and models available for free.
"The title accurately reflects the key insight—latent state transfer as a 'better language'—though it slightly oversells the maturity of the research."
What is the name of the method where AI agents pass raw numerical data instead of text?
Cross-agent latent state transfer
3:07
By what percentage did token usage drop in the latent state transfer method?
75%
3:57
How much did it cost to train the latent state transfer agents?
$4
4:17
What was the accuracy improvement on competition-level math problems using latent state transfer?
From 73% to 86%
3:34
What is the optimal 'latent thought length' per round?
80 steps
5:55
Latent state transfer
Introduces a revolutionary concept for AI agent communication that bypasses human language.
3:07Token usage reduction
75% drop in tokens highlights major efficiency gains for multi-agent systems.
3:57Ultra-low training cost
Demonstrates that small models can achieve significant improvements for just $4.
4:17Optimal latent thought length
Identifies a practical limit (80 steps) that guides how much reasoning agents can do per round.
5:55Multi-agent coordination failures
Illustrates real-world problems when agents communicate in human language.
1:04[00:00] The number of AI agents on the internet
[00:02] is increasing at such an insane rate. I
[00:06] don't think I've seen anything like
[00:07] this. This is crazy. And this is an area
[00:10] that is quite new, and the technology is
[00:12] still pretty rough. Improving rapidly,
[00:15] but pretty rough. And the promise of
[00:17] agents is incredible. It would book the
[00:19] cheapest plane ticket for you, or run 24
[00:22] hours a day to manage your schedule,
[00:24] submit insurance claims, continuously
[00:27] scan a codebase for vulnerabilities and
[00:29] patch it. Well, this is the good, but at
[00:31] the same time, you get so many news
[00:34] headlines about spam, security issues,
[00:36] and system breakdowns. And it gets even
[00:39] tougher when you have not one agent, but
[00:42] multiple agents. Imagine two agents
[00:45] organizing a holiday for you. The flight
[00:47] agent hallucinates a cheaper airport 400
[00:51] miles away from your real destination.
[00:53] Then, the hotel agent says, "Let's book
[00:56] something super cheap nearby." Well,
[00:59] super cheap is often non-refundable. And
[01:02] now congratulations.
[01:04] You now have a non-refundable room you
[01:07] will never see.
[01:09] And so many of these problems come from
[01:11] the fact that agent coordination is
[01:13] super difficult. Now, check out what
[01:16] this paper says we should do. Here is a
[01:18] math problem. First agent writes a plan.
[01:21] The next one critiques it, and the third
[01:24] one solves the problem. And at this
[01:26] point, I said, "Okay.
[01:28] I see nothing interesting here. This is
[01:31] what everyone does with agents." Yes,
[01:34] but here's the key. Most agents
[01:36] communicate a bit like we do, in words.
[01:40] Wait a second. Why should we do that?
[01:42] Look at this neural interface for
[01:45] brain-to-text communication. Yes, this
[01:48] really works. You just think about a
[01:50] letter in the alphabet, and it magically
[01:53] appears. And if you keep doing this a
[01:55] lot, you start asking. The alphabet is
[01:58] optimized for writing.
[02:00] Why use that? Why not use one that is
[02:03] optimized for thinking? And what would
[02:05] that even look like? Hint, it would look
[02:08] like this. We talked about this 500
[02:10] videos ago, paper in the description.
[02:13] Now, if you look at the agents, the
[02:15] first one does some work, packs it up,
[02:17] and passes it to the next one. So do the
[02:20] second and the third ones. Every
[02:22] [clears throat] time an agent wants to
[02:24] communicate something, it has to write
[02:26] out full sentences, decode tokens one by
[02:30] one, and the next guy has to read and
[02:33] re-encode the whole thing.
[02:36] Why are we doing that? Who said they
[02:38] should talk in plain English? And this
[02:41] is the part where I fell off the chair.
[02:43] Now, hold on to your papers, fellow
[02:44] scholars, because this work says, "Huh,
[02:47] forget English. You know what? Forget
[02:49] letters entirely." It says, "Instead,
[02:52] let's link up their brains." Kind of.
[02:56] Instead of using English words, they
[02:58] pass raw undecoded numbers directly to
[03:01] the next agent. Send raw brain signals,
[03:04] if you will. Call it cross-agent latent
[03:07] state transfer. So, the theory is that
[03:10] these three agents can work together
[03:12] round one, round two, and round three
[03:15] much cheaper than the text-based agents.
[03:18] They refine an answer, and you get
[03:20] better answers with the same amount of
[03:22] computation. So, is it better? Hmm,
[03:26] let's see. Dear fellow scholars, this is
[03:28] Two Minute Papers with Dr. Károly
[03:30] Zsolnai Fehér. Well, when given
[03:32] competition-level math questions, it
[03:34] goes from 73% to 86%.
[03:38] That is crazy.
[03:40] We are talking free sub-10 billion
[03:42] parameter models, not expensive frontier
[03:45] systems. And here is where it gets the
[03:48] Michelin star status. Look at that.
[03:52] Ooh.
[03:53] Token usage down 75%.
[03:57] They all evaporated into the latent
[04:00] space. Loving it. So, this can improve
[04:03] smaller systems to be in striking
[04:05] distance of much bigger, more expensive
[04:08] models on difficult math problems. So, I
[04:11] bet it costs a fortune to train, right?
[04:14] Well, look at that.
[04:17] Four bucks. Basically, you spend your
[04:19] coffee money on these agents and in
[04:21] return they punch a hole in space-time.
[04:25] Love it. Additionally, it might even
[04:27] unlock Wait, wait, wait. I shouldn't say
[04:30] unlock. That's AI speak. So, it might
[04:33] give us a new scaling law. More rounds,
[04:37] better results. And at this point, I
[04:39] thought we might have a deadly flaw
[04:41] here. And it's really subtle. So, the
[04:44] training for each agent's role is
[04:47] written by a giant AI model. So, if they
[04:50] perform well, you have to ask, are
[04:53] things better because of the brain
[04:55] linking or is it good distillation from
[04:58] an excellent teacher? So, which one is
[05:01] it? A good teacher or a good
[05:03] architecture? Well, fellow scholars, we
[05:06] are in luck. This is a really good
[05:08] paper. So, the scientists thought about
[05:11] this too. And look, goodness, a
[05:14] controlled comparison gives the same
[05:16] teacher to other architectures and this
[05:19] one. And the new one still outperforms.
[05:22] So, yes, the brain linking really works.
[05:26] What a time to be alive. Okay, now,
[05:29] let's not get too excited. This is
[05:31] two-minute papers and we respect the
[05:33] science here. Limitations. One, tests
[05:36] were on smaller models. We don't yet
[05:39] know how these insights scale up to
[05:41] bigger ones. If they don't, then this
[05:44] puts small models on steroids.
[05:47] Still good. If yes, potential huge
[05:50] game-changer. Two, there is an optimal
[05:52] latent thought length, and that is about
[05:55] 80 steps. This is somewhat of a limit on
[05:58] how much thinking an agent can do per
[06:01] round.
[06:02] >> [clears throat]
[06:02] >> I am thinking, you know, if it solves a
[06:04] mathematical Olympiad problems already,
[06:07] how bad can that be? And sure enough,
[06:10] after 80, you don't get a lot of value
[06:12] anyway, but I wanted to mention it.
[06:14] Okay? So, code and models are available
[06:17] for free. Note that this is still very
[06:20] rough, very early, but it shows
[06:22] potential. And this is still research.
[06:25] Please do not think you just plug this
[06:27] in and everything will fly immediately.
[06:30] We need new tools for the era of LLMs,
[06:33] and Weights & Biases now has Weave, a
[06:36] lightweight toolkit to confidently
[06:38] iterate on LLM applications. Use traces
[06:41] to debug how data flows through each
[06:43] step of your app, and use evaluations to
[06:46] measure your progress. It is the best.
[06:49] Try it out now at wnb.me/papers,
[06:53] or click the link in the description
[06:55] below.
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.