---
title: 'Scientists Found A Better Language For AI Agents'
source: 'https://youtube.com/watch?v=dUmT0OIGoqE'
video_id: 'dUmT0OIGoqE'
date: 2026-06-28
duration_sec: 417
---

# Scientists Found A Better Language For AI Agents

> Source: [Scientists Found A Better Language For AI Agents](https://youtube.com/watch?v=dUmT0OIGoqE)

## Summary

The video discusses the rapid growth of AI agents on the internet and their ability to automate tasks like booking flights or managing schedules. It highlights a key problem: agents typically communicate in human language, which leads to inefficiencies and errors. The solution proposed is 'cross-agent latent state transfer,' where agents pass raw numerical data instead of text, leading to significant improvements in accuracy and cost.

### Key Points

- **Explosion of AI agents** [0:00] — The number of AI agents online is increasing rapidly, though the technology is still rough and improving fast.
- **Multi-agent coordination problems** [0:31] — Agent coordination is difficult, leading to errors like hallucinating an airport 400 miles away and booking non-refundable rooms.
- **Solution: latent state transfer** [2:56] — Proposes passing raw undecoded numbers (latent states) between agents instead of text, calling it 'cross-agent latent state transfer'.
- **Performance improvement** [3:34] — On competition-level math problems, accuracy improved from 73% to 86% using sub-10 billion parameter models, with token usage down 75%.
- **Low cost** [4:17] — Training cost was only $4, making it extremely cheap.
- **Controlled validation** [5:11] — Controlled comparison showed the new architecture outperforms others even with the same teacher model, confirming that latent transfer works.
- **Limitations** [5:29] — Tests were on small models; scaling to larger ones is unknown. Optimal latent thought length is about 80 steps. Early research, code and models available for free.

## Transcript

The number of AI agents on the internet
is increasing at such an insane rate. I
don't think I've seen anything like
this. This is crazy. And this is an area
that is quite new, and the technology is
still pretty rough. Improving rapidly,
but pretty rough. And the promise of
agents is incredible. It would book the
cheapest plane ticket for you, or run 24
hours a day to manage your schedule,
submit insurance claims, continuously
scan a codebase for vulnerabilities and
patch it. Well, this is the good, but at
the same time, you get so many news
headlines about spam, security issues,
and system breakdowns. And it gets even
tougher when you have not one agent, but
multiple agents. Imagine two agents
organizing a holiday for you. The flight
agent hallucinates a cheaper airport 400
miles away from your real destination.
Then, the hotel agent says, "Let's book
something super cheap nearby." Well,
super cheap is often non-refundable. And
now congratulations.
You now have a non-refundable room you
will never see.
And so many of these problems come from
the fact that agent coordination is
super difficult. Now, check out what
this paper says we should do. Here is a
math problem. First agent writes a plan.
The next one critiques it, and the third
one solves the problem. And at this
point, I said, "Okay.
I see nothing interesting here. This is
what everyone does with agents." Yes,
but here's the key. Most agents
communicate a bit like we do, in words.
Wait a second. Why should we do that?
Look at this neural interface for
brain-to-text communication. Yes, this
really works. You just think about a
letter in the alphabet, and it magically
appears. And if you keep doing this a
lot, you start asking. The alphabet is
optimized for writing.
Why use that? Why not use one that is
optimized for thinking? And what would
that even look like? Hint, it would look
like this. We talked about this 500
videos ago, paper in the description.
Now, if you look at the agents, the
first one does some work, packs it up,
and passes it to the next one. So do the
second and the third ones. Every
[clears throat] time an agent wants to
communicate something, it has to write
out full sentences, decode tokens one by
one, and the next guy has to read and
re-encode the whole thing.
Why are we doing that? Who said they
should talk in plain English? And this
is the part where I fell off the chair.
Now, hold on to your papers, fellow
scholars, because this work says, "Huh,
forget English. You know what? Forget
letters entirely." It says, "Instead,
let's link up their brains." Kind of.
Instead of using English words, they
pass raw undecoded numbers directly to
the next agent. Send raw brain signals,
if you will. Call it cross-agent latent
state transfer. So, the theory is that
these three agents can work together
round one, round two, and round three
much cheaper than the text-based agents.
They refine an answer, and you get
better answers with the same amount of
computation. So, is it better? Hmm,
let's see. Dear fellow scholars, this is
Two Minute Papers with Dr. Károly
Zsolnai Fehér. Well, when given
competition-level math questions, it
goes from 73% to 86%.
That is crazy.
We are talking free sub-10 billion
parameter models, not expensive frontier
systems. And here is where it gets the
Michelin star status. Look at that.
Ooh.
Token usage down 75%.
They all evaporated into the latent
space. Loving it. So, this can improve
smaller systems to be in striking
distance of much bigger, more expensive
models on difficult math problems. So, I
bet it costs a fortune to train, right?
Well, look at that.
Four bucks. Basically, you spend your
coffee money on these agents and in
return they punch a hole in space-time.
Love it. Additionally, it might even
unlock Wait, wait, wait. I shouldn't say
unlock. That's AI speak. So, it might
give us a new scaling law. More rounds,
better results. And at this point, I
thought we might have a deadly flaw
here. And it's really subtle. So, the
training for each agent's role is
written by a giant AI model. So, if they
perform well, you have to ask, are
things better because of the brain
linking or is it good distillation from
an excellent teacher? So, which one is
it? A good teacher or a good
architecture? Well, fellow scholars, we
are in luck. This is a really good
paper. So, the scientists thought about
this too. And look, goodness, a
controlled comparison gives the same
teacher to other architectures and this
one. And the new one still outperforms.
So, yes, the brain linking really works.
What a time to be alive. Okay, now,
let's not get too excited. This is
two-minute papers and we respect the
science here. Limitations. One, tests
were on smaller models. We don't yet
know how these insights scale up to
bigger ones. If they don't, then this
puts small models on steroids.
Still good. If yes, potential huge
game-changer. Two, there is an optimal
latent thought length, and that is about
80 steps. This is somewhat of a limit on
how much thinking an agent can do per
round.
>> [clears throat]
>> I am thinking, you know, if it solves a
mathematical Olympiad problems already,
how bad can that be? And sure enough,
after 80, you don't get a lot of value
anyway, but I wanted to mention it.
Okay? So, code and models are available
for free. Note that this is still very
rough, very early, but it shows
potential. And this is still research.
Please do not think you just plug this
in and everything will fly immediately.
We need new tools for the era of LLMs,
and Weights & Biases now has Weave, a
lightweight toolkit to confidently
iterate on LLM applications. Use traces
to debug how data flows through each
step of your app, and use evaluations to
measure your progress. It is the best.
Try it out now at wnb.me/papers,
or click the link in the description
below.