---
title: 'What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean'
source: 'https://youtube.com/watch?v=yz6I23VRbdg'
video_id: 'yz6I23VRbdg'
date: 2026-06-28
duration_sec: 0
---

# What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

> Source: [What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean](https://youtube.com/watch?v=yz6I23VRbdg)

## Summary

In this interview, Jeff Dean, Google's Chief Scientist, discusses the challenges and future of AI, including the shift towards inference-heavy workloads, the potential of extremely low-precision hardware, and the importance of continual learning. He also shares insights on data center reliability, the role of distillation in open-source models, and his excitement about multi-agent systems and infinite context windows.

### Key Points

- **Introduction: Jeff Dean's background** [00:00] — Jeff Dean is the chief scientist of Google, co-creator of MapReduce and TensorFlow, and led Google Brain. They discuss data center failures and cosmic ray bit flips.
- **Training data shortage is exaggerated** [03:19] — Training data is not running out; there is still much video data and synthetic data potential, along with algorithmic improvements to extract more from existing data.
- **Shift from training to inference compute** [06:22] — Inference now dominates data center compute (80%+), driving the need for specialized hardware like Google's TPU v8i and v8T chips with low precision (e.g., FP4).
- **Merging pre-training and post-training** [09:12] — Interleaving observation (pre-training) and action (learning from consequences) could lead to more capable models, though safety (red teaming) remains a challenge.
- **Future compute leaps enable autonomous engineering** [12:10] — With a millionfold compute increase over 10 years, multi-agent systems could design complex artifacts (e.g., airplanes) in days instead of years, as shown by AI autonomously building an OS running Doom.
- **Distillation's role in open-source AI** [15:01] — Open-source models rely heavily on distillation from larger frontier models; without new frontier models, open progress would slow.
- **Continual learning and infinite context windows** [27:29] — Continual learning remains an unsolved problem, and efficient context windows for billion-token inputs would enable 'lifetime AI' systems.

## Transcript

There used to be a chat group internally
called data centers on fire that would
have like exciting uh exciting events
happening.
>> A distant supernova goes off, a cosmic
ray hits a memory cell and a zero flips
to a one. Does that really happen?
>> Oh yeah.
>> So my question is do you enjoy these
Chuck Norris style jokes about you?
>> It could be true. um one problem that
you solved tried to solve many times but
have never been able to crack.
I cannot believe that this is happening
but I got to talk to a legendary
engineer the chief scientist of Google
Jeff Dean. He led Google Brain, one of
the most legendary AI labs in history.
He co-created map produce which taught
thousands of computers to work together
as one. He co-built TensorFlow, the
engine behind a huge chunk of AI
research. And for all this, they call
him the Chuck Norris of computer
science. Yes, I will tell him a joke
about that too. Now, when I see
interviews with these executives,
everyone is asking about China and taxes
and all that. Look, I know nothing about
that. I am just a student who loves to
talk about research. So, my goal was to
try to go a bit deeper and ask him
questions that maybe only he knows the
answer to, which is incredible. I'll
also ask him about problems that even he
couldn't solve yet. And I will ask him
about some of the secret sauce at Google
and see if we get something and more.
And I am so happy to share it with you
fellow scholars so we can learn
together. I am not sure if I saw Jeff
smile and laugh this much before. So, I
hope he enjoyed it too. And once again,
this is an incredible honor. I cannot
believe that I was sitting there. There
were some production issues with the
video part. I apologize for those. Also,
I was super nervous. I could barely hold
on to my papers. Now, fellow scholars,
let's learn together with Jeff Dean.
Thank you so much for doing this, Jeff.
We talked a bit last year and I learned
so much from you. It was incredible. And
then I got a message that we we get to
do this and I was so happy.
>> So thank you so much for this and and we
get to share your knowledge.
>> A small part of your knowledge with with
the fellow scholars. So that's that's
absolutely
>> it was great chatting with you last
year. I'm looking forward to this.
>> Thank you. Thank you. So everyone says
that we are running out of training data
for LLMs, but you you said that there is
still plenty of data out there.
>> What did you mean?
>> Yeah, I mean I think everyone has this
view that uh we're running out of
training data and um it's true we've
like used quite a lot of of the public
text data in the world. Um but I think
there's lots of interesting video data
that we're not really training on yet.
uh there's lots of interesting kind of
um ways to generate synthetic data and
then use that for training
>> and then I also think we can start doing
things like uh making more passes over
the data that we do have to make more
and more capable models and also come up
with algorithmic techniques that enable
us to get a lot more information from
every piece of data that we do have. So
I'm not too worried about that as like
an impediment to making progress. It
seems like there's lots and lots of
things we can do. People also say that
with so much simulation data as as you
mentioned sooner or later most of the
data will be AI generated which is then
used to train a different AI and then
suddenly everyone starts to you know
learn on the same thing but you said
wait it still helps I think the argument
was that uh if you have enough compute
you can crunch through a lot of data and
if there is just a little needle in the
haststack that's useful the system is
able to learn from it. Is that true?
because my previous crappy little
experiment uh it it was not true at all.
So you had to be very careful with the
data.
>> Yeah. I mean I think it is true in
general. I mean there's a lot of details
to get right to make this a reality.
Think about for example doing RL
training and rollouts to uh you know
figure out how to solve some fairly
highle phrased uh coding question right.
So you might explore a hundred or a
thousand different ways of generating
solutions to these problems and you
might have some, you know, some filters
that you apply to these things like does
the code even compile? Well, you can
throw out 800 of them right off the bat.
>> Uh does it pass the unit tests? Does it
like perform well? And so you can really
start to hone in on like which of these
you know potentially many solutions to
the problem is the one that actually
sort of generates the highest you know
characteristics that you're looking for
the reward in some sense
>> and that I think is is definitely true
like more compute will generate you more
interesting solutions and then those can
then be put into the training data they
can be enriched with like data
augmentation techniques you know I
generated the solution in Python
>> now I could generate a solution in Oh,
and have more go programming language
training data.
>> That's like an incredible kind of
augmentation like augmentation before
with convolutional neural networks, you
know, it was just just shift the image
by a couple pixels and whatnot and here
the augmentation can be like completely
different programming language and and
whatnot.
>> Yeah, I mean I think you know a lot of
times we think about coding based
problems as you go from natural language
which is
>> often very underspecified. It's like you
know make me a cool space invader game
or something. Um, but actually if you
have a program that already works that
does what you want and you want to
translate it, that's awesome because in
effect your prompt is the fully
specified behavior of the system you
want and you just want it in a different
language for whatever reason. Maybe
better performance or better safety
characteristics or whatever. So that
we've seen internally with some tools
that have been written in Python and
people have been able to sort of just
say
>> please use all the tests for this code
and the actual Python codebase and make
different versions of it and found you
know much faster solutions.
>> So you can you can suddenly get so much
more out of the same amount of data
basically.
>> Yeah. So that's that's why you're not
worried about the data. Okay. Nice. Now
Bod Deli has said that something like
90% of what happens in modern data
centers is not training anymore which I
I found really surprising. It's
inference like there's more less
training and more using like relatively
speaking.
>> Um how does that shift the way you
design hardware at Google? Yeah, I mean
I first there's a lot of other things
that are not either inference or
training that happen in data centers
like all the applications we run and
search and Gmail and so on. But of the
sort of machine learning workloads you
know I it is the case that training
uh is becoming you know less proportion
of the overall compute that we want to
do because there's so much you know
inference workload you want to do and
the inference workload includes both
like offline inference u sort of RL
rollouts during RL training uh and then
also online inference for handling user
requests or agent-based behavior.
Because of that shift and the different
characteristics of those two kinds of
computations, it makes a ton more sense
to now specialize much more for
inference workloads in hardware for
example. Um because the characteristics
are quite different. You need lower
precision. You
>> you know are handling a very large
volume of requests on this particular
model. The model weights don't
necessarily change uh at inference time.
Um all these things lead to very
different solutions for hardware and
much more energy efficiency can be
gained by specializing and so I think
you'll see a lot more in this area uh
you know now and in the future. We've
already done this with our TPU uh 8i and
8T chips that we announced a couple um
maybe a month ago.
>> Um but you'll see even more
specialization I think.
>> And that's pretty crazy that you said
that even FP4 kind of works. And I when
I first heard it I was like it cannot
possibly work. can do anything useful
and and it does.
>> Yeah. If you told that to a computer
scientist from 15 years ago, they'd be
like,
>> that's that's not not enough numbers.
>> Yeah. Yeah. Exactly.
>> And I look at every now and then at
these papers and you you you have these
these different transforms that are the
the the distance preserving transforms,
rotations between the points and all
kinds of compression. But still FP4,
that's unbelievable. It's not many bits
for expert or enters or sign
>> and it and it and and it's high quality,
you know, intelligence that comes out of
it. So, it's just
>> it's a good sign that it works.
>> Yeah. Yeah. But I I I don't know if we
can get lower. Uh what what do you think
like even lower?
>> Possible. I mean I think um you know
people are seeing and experimenting with
things where you have some even lower
precision and then it every so many
weights of that you know lower precision
you have a scaling factor and that seems
like you get a little bit of a higher
precision thing that's kind of shared
across all the other lower lower bit
precision u formats whatever they might
be two bit integer one bit integer you
know I haven't heard anyone say two bit
float because I'm not sure what that
would mean
>> but um yeah I
that plus a scaling factor seems to be
able to get you pretty far.
>> And the question is like how often do
you need the scaling factor? Is it every
64 or 128 or 256 weights?
>> Pre and post training are typically
separate steps today. Do you see that
split holding or do you expect the two
to merge as as capabilities increase?
>> Yeah, I mean I feel like it's a little
intellectually dissatisfying that they
are these distinct phases and you do one
and then you do the other. it like
conceptually the right uh thing to do is
to have interle periods where you're
sort of observing data and then periods
where you're trying to use that new
knowledge you've gotten from the data
you
>> like like with DQN this experience
replay kind of thing
>> yeah and then you want to now take
actions in some environment maybe it's a
simulated environment maybe it's the
world with a robot or whatever it is and
then you know learn from those actions
because I think you get a lot more
benefit from actually um taking actions
and observing the consequences or trying
to write code and seeing does the code
work
>> than you do from just passively sitting
there and seeing tokens streamed by you
which is really what most of
pre-training is these days. It's really
interesting that you say that that in an
interled manner because when I when I
hear merging the two what what in my
mind is continuous like continuous
learning
>> but at the same time people have to test
models you cannot just chuck it out
there you know you finish training you
finish the post and and then maybe the
red teaming steps and and and you know
safety and everything and then you
package it up and you say okay this is
good to go but if there's continuous
learning then then there's no challenges
because how do you know that this
>> intermediate state is actually safe.
Maybe some more research there too.
>> Yeah, I mean I think uh first like a
bunch of discrete steps where maybe you
do this a 100 times or a thousand times
starts to look more like an integral
than a summation.
>> Um and so um
>> I do think interle in that way will make
sense
>> but you're right
>> like you have a bunch of things you need
to do for a live model that is serving
user requests. You need to make sure
that it's safe. Um so it may be that the
continual learning happens and then
there's some uh application of uh you
know safety protocols and red teaming as
you say uh and then you release a new
version of that but then that model
still continues to learn kind of behind
the scenes and then before the newest
version of it is is provided to users
you redo the sort of final safety
testing and and teaming. Jensen likes to
say that compute capabilities advanced 1
millionx over the last 10 years. So if
in the next 10 years, assuming we get
another 1 millionx, what would we be
able to do that we cannot do now?
>> Yeah. I mean it's like imagining the
future is always a hard thing because
this field is moving quickly.
>> I mean I think if you think back, you
know, 10, it was 10 years.
>> 10 years.
>> 10 years. If you think back 10 years,
you know, we were kind of just starting
to have language models that were the
sequence to sequence paper had appeared.
You know, it was just before the
transformer.
>> LSTMs, maybe
>> LSTMs were were popular.
>> Um, and now those models sort of look uh
>> not nearly as ancient and not nearly as
capable as the models we have today. So,
I think if you project forward that
level of advancement, you're going to
see
>> huge investments in both like new kinds
of hardware
um you know new kinds of research
techniques uh there's just a lot more
attention being paid to the field. So I
I see that progress rate not slowing
down um over the next 10 years. And so
that's going to be incredible like the
multi- aent workflows we're now able to
start to
>> kind of get to work on very complicated
tasks like you saw in the IO uh keynote
>> being able to write an operating system
>> autonomously with a relatively simple
prompt.
>> Crazy. uh you know obviously there's a
lot of operating systemy like things in
the training data so it's not completely
out of distribution but you know the
fact that it's able to build an OS that
can run Doom uh successfully is is
pretty amazing
>> I couldn't couldn't believe it I mean
last year I heard a talk from Steven
Balaban the Lambda CEO
>> and he had this neural OS like hey you
know it it does more and more like like
forget the UI forget forget the maybe
the drivers I don't know but but just
let's let's have a neural OS and I was
like, "Yeah, that that sounds like an
amazing science fiction idea. I would
love to see it, but I don't know. I
mean, it sounds far off." A year later
and we got you, you know, not exactly
like that. I know but but if if you look
at the derivatives over time
>> I mean I would say one thing I'm
particularly excited about is
you know can we with these tools
accomplish so much more in you know
science Demis was mentioning in the
keynote or in you know complicated
engineering tasks that often would take
you know lots and lots of people
multiple years to accomplish. Could you
actually have a system that with the
correct access to the right kinds of
simulation environments and a learning
set of agents that are trying to
accomplish the task and break it down
into smaller tasks,
>> could you design an airplane in, you
know, five days instead of, you know,
many many years? That would be amazing.
>> 1 millionx and we we can we can try
again.
>> Yeah. I mean, we're not there yet, but
that would be a pretty pretty amazing
capability. Or designing new new
computer chips or computer systems, new
hardware. Um, you know, I'm pretty
excited about that.
>> Yeah, incredible times. Are open models
standing on the shoulders of giants? And
by that I mean if if Frontier models
suddenly stopped being released, would
open models improve as quickly as they
do now or is their progress mostly
driven by distillation?
>> Yeah, I mean I think certainly a bunch
of the progress is driven by
distillation. For example, our own Gemma
models are definitely distilled from
higher quality larger scale models. Um
and I think a lot of other open source
models are getting benefit from
distillation data. Uh distillation has
always been a you know amazing way to
get really capable models into a smaller
footprint thing and you know uh that's
how our flash models are quite capable
for their size relative to the pro
models is we're able to use the pro
model to
>> to teach the the flash models. So I mean
I think really the the question is
uh not so much one of closed versus
open. It's you know if we want small
incredibly capable models we have to
keep building larger scale models that
are maybe less inference efficient but
are more capable and then use
distillation
>> to uh you know transfer the knowledge
into into the smaller models whether
they are open or closed. Now I'm I'm
wondering you might be the only one who
can answer that. So I I really want to
ask this. Everyone has their their
flagship models and yes the distilled
models like pretty much every company
does this tiered level thing. the
quicker faster models are always were
well below the the frontier models and
at some point I think 3.1 where there
was one version where where the the
quick one was suddenly so so close to
the frontier one there was like a 3%
difference
>> in in in tough benchmarks and and I just
heard someone saying I don't even know
who that was that that yeah it's not
like just distillation there is some
magic sauce in there that's been in the
works for years. So, can I hear a bit
about that?
>> Sure. Well, not too much. I mean, there
is always some magic sauce that we don't
reveal, but distillation is definitely
one of the key things that makes those,
you know, much smaller models much
cheaper, much faster, much more
affordable um models be, you know,
nearly as good as those frontier models.
And then we push ahead and build an even
better frontier model. And then we have
to then do the process again where we
now transfer the the knowledge and the
really capable frontier model it back
into a a lighter weight one. And I think
um you know this is this is really
important because the flash models are
really the workhorse of what people
generally want to use because they're
you know they're almost as capable. We
saw it. Yeah.
>> Yeah. And uh
>> and they're they're quite good.
>> Yeah. It's unbelievable how close they
can get like this. This didn't used to
be like that at all. All right. What
trends in machine learning are you most
excited about right now? You you have a
separate talk about like exciting trends
in machine learning or something like
that.
>> Yeah. I mean
>> what's what's the newer version of that?
>> Yeah, the newer version I guess I mean
there's a few different trends that I
think are really exciting. The one is um
uh
so first I think continual learning is
still a little bit nent but I think
looking at ways to make models that are
more interled in their way use of so
sort of seeing data passively and taking
action and learning from that seems like
a really important thing. Uh you know
agents and multi- aent use of uh these
systems is really really important. Um,
as one trend of that though, I think as
you see, uh, you know, we're going to
need a lot more inference hardware and
capability for that because those
systems that are working autonomously in
the background actually consume lots of
tokens in order to sort of
>> do the the kind of important work
they've been asked to do. Um, you know,
I think, uh, being able to build really
efficient inference hardware will enable
a lot of of things. So looking at you
know co-design of model architectures
and hardware architectures to make sort
of the best use of um things and have
really good properties in terms of very
low latency you know much higher
performance per watt performance per
dollar are things we we really care
about.
um you know I think looking at how do
you you know the context window of these
models is an important characteristic
but
uh I think there's a lot we could do if
we come up with mechanisms that are sort
of cascaded series of things that kind
of give you the illusion that you have
all information in the context window
>> like you'd like to have the whole
internet at your model's fingertips
>> or on a personal level if you've opted
in you know all of your email and your
photos and your the videos you've
watched and things like that. Um, but
you can't really do it with the sort of
quadratic attention mechanism. But I
think you can build a series of kind of
retrieval and lighter weight mechanisms
and then ways of cascading from you know
here are the 30,000 documents out of 10
billion that seem most relevant and then
you know have a lighter weight model
that looks at those and decides these
117 things seem really relevant to what
you're trying to do and puts those in
the sort of more expensive context
window of a a bigger model perhaps. Uh
that's going to be kind of exciting. And
how do you orchestrate and interle all
that stuff so it gives you the illusion
uh without you having to even think
about it?
>> Interesting. So it's very advanced games
to be played with the context window
because obviously very expensive. So the
attention mechanism you get you get bigo
squared.
>> Uh are we still there or are do we have
some I mean I've heard some n login
things. Can we go lower? There's like a
whole series.
>> Obviously we can go lower but the
question is what what the trade-offs are
right like what do you have to pay for
that? Yep. um where are we in that?
>> Yeah, I mean I think there's actually
quite a large body of work there
probably, you know, hundred papers on
more efficient context uh uh algorithms
than than the than N squared one.
>> I mean the N squared one works really
well. uh so it has a pretty high bar but
I do think there is traction in finding
things that are you know much lower cost
whether it's you know reducing
algorithmic factors or very large
constant factors on the the base n squed
algorithm I think all of these are
pretty exciting you can actually combine
many of these these approaches
>> um and and get uh you know much cheaper
attention over many more tokens
>> yeah I think that's one of the most
important things because if it was
cheaper in some sense and and and and
you could still find the the needles in
the in the haststack over very long
contexts. Then you could you could have
some sort of lifetime AI thing.
>> Yeah, totally. Like I'd like my whole
life of all the digital things I've seen
uh in there. Uh as a say internal Google
developer, I'd love for the entire
Google codebase to be in there, which is
you know probably 10 billion lines of
codes, probably you know big you know
100 billion tokens.
>> I just want my wine list.
>> I just want 100 billion. All I want is a
100 billion tokens of attention. It's
all I need.
>> Amazing. I think we got to do this one.
So, Google's data centers run an
enormous number of machines. And at that
scale, anything that can go wrong will
go wrong. Like I hear that wires wear
down,
>> hard drives fall apart, motherboards
overheat. Um, is that something that
actually happens day by day? And do you
have any good stories?
>> Absolutely. I mean, I don't have that
many personal stories, but there used to
be a chat group internally called Data
Centers on Fire that would have like
exciting uh exciting events happening
and sometimes exciting videos. Um yeah,
I mean I think
>> at scale lots of things that are very
very unexpected happen and usually those
are the combination of one thing fails
and something else fails simultaneously
or in cascade of during the yeah you
have a cascaded failure of some sort.
You know, sometimes that means some
software system stops working. Sometimes
it means like the the bus bar overheats
and you get too much power to the to the
rack and like it catches on fire. I mean
that's a much rarer thing. But um you
know you have to be prepared for this
and I think one of the things even from
the very earliest days of Google is we
have really focused on how do you build
reliable systems out of unreliable
parts. Yes.
>> Right. Like in the earliest Google days,
we were buying consumer machines without
uh ECC memory didn't not not only not
ECC not even parody
>> we were buying consumer motherboards
that didn't have like redundant power
supplies and you can do that if you can
handle things at a higher level and
that's generally what we try to do in
all cases is
>> I actually wanted to ask you about that
the ECC thing because here here's one of
my favorite failure modes if if that's
true but you you tell me the distant
supernova goes off, a cosmic ray hits a
memory cell and a zero flips to a one.
Does that really happen?
>> Oh yeah. Yeah, absolutely. I mean, alpha
particles definitely can flip uh you
know DRAM state. We've actually observed
this because we have monitoring data of
how many ECC uh errors and like single
bit errors that are corrected and
two-bit errors that are not corrected
are happening in all of our machines.
And you can actually see this where some
clusters that are pointing in a
particular direction in the earth have a
much higher rate for a you know a brief
period like 10-minute period or
something and then the other ones in the
other side of the earth do not have
that. So it's definitely something that
happens.
>> How worried should I be? Because MacBook
Pros don't have ECC memory as far as I
know like for for one machine is it so
vanishingly you know unlikely that you
shouldn't care but for data center or
>> I mean for one machine it's generally
not too bad. I mean I I think they have
par so at least they detect it typically
if it's a single bit error
>> so detection but not fixing
>> right but ECC usually gives you single
bit error correction and dual bit dual
error detection. Yeah.
>> So for with that you don't have to worry
about it too much
>> um at a single machine level but even at
you know tens of thousands of machines
you'd have to start thinking about that.
So you know one of the things we did
when we were using machines without even
parody is we built an entire
softwarebased check summing system for
large amounts of our data. So
>> doing it by hand
>> doing it by hand essentially and like we
would you know for crawling web pages
and putting them in the index
>> you know if you detect that this
particular record is corrupted it's
usually generally okay to just you know
ignore that record.
>> Now I have something interesting for
you. I call it lightning round. So,
please try to answer in one sentence.
One word is okay. One one sentence.
>> Can I make run-on sentences?
>> We'll see. We'll see. So, I I read that
Jeff Dean's pin code is the last four
digits of pi. I I give this one an eight
out of 10. So, my question is, do you
enjoy these Chuck Norris style jokes
about you?
>> It could be true. Um uh I I do enjoy
them. I mean, it's a April Fool's joke
gone ary by my colleagues in 2009, but
it's very both flattering and kind of
embarrassing.
>> I think I think he felt the same way
about them, too. But he he he enjoyed
them, too. Legend. All right. One big
thing that you were wrong about and came
around.
I think AI is going to influence health
care quite dramatically, but I think it
is harder not necessarily for technical
reasons, but for you know, how do you
actually get things in regulated
industries that are super important and
have all kinds of privacy constraints
and safety concerns,
>> but I think ultimately that will happen.
It's just taking longer than than I I
hoped. Yes. Because I think there's
tremendous world benefit to do it. Um,
but we need to do it carefully and
safely.
>> Vim or Emacs or something else? Hint,
there's only one good answer.
>> Emacs. Was that it? Oh, no.
>> Look, I I'm a Vim person, but but I'm
I'm not
>> Maybe I'm I'm an embarrassment of a Vim
person because I I I looked at Emacs,
too, and I was like, that's pretty cool,
too, but I I don't want to learn both.
It's it's just so much time. So,
>> yeah, it's true. One can spend a lot of
time customizing Emacs. the VRC I wrote
up and then and then it never ends.
Yeah. One problem that you solved tried
to solve many times but have never been
able to crack.
>> I mean I think in some sense we still
don't have an answer to how do you do
continual learning appropriately? That's
something I've thought about a little.
I' I've dabbled a little bit with some
some techniques along with colleagues.
>> But I think uh you know if we're able to
crack that it's going to be amazing. Um,
but it's not there yet.
>> Last one. Favorite Two-Minute Papers
episode.
>> Oh,
yeah. I mean, I assume the the
Transformer one was a good one.
>> All right. All right. Well, that's
that's a good one. Okay, Jeeoff, I I
learned a lot today. Thank you so much.
This chatting with you again.
>> Thank you so much.
>> Thank you.
>> Here you see me running the full
Deepseek AI model through Lambda GPU
cloud. 671
billion parameters running super fast
and super reliably. This is insane. I
love it and I use it on a regular basis.
Lambda provides you with powerful NVIDIA
GPUs to run your own chatbots and
experiments. Seriously, try it out now
at lambda.ai/papers AI/papers
or click the link in the description.
