[0:00] continual learning, long-term reasoning,
[0:02] [music] uh some aspects of memory, these
[0:05] are still unsolved. I think all of these
[0:07] are going to be required for AGI.
[0:09] Depending on what your AGI timeline is,
[0:11] you know, mine's like 2030 or something
[0:13] like this, then [music] if you start off
[0:15] on a deep tech journey today, you have
[0:18] to just consider AGI appearing in the
[0:21] middle of that journey. It's not bad
[0:22] necessarily, but you have to take that
[0:24] into account. [music] You have to have
[0:26] an active system uh that can actively
[0:29] solve problems for you to get to AGI.
[0:31] So, agents are that path, and I think
[0:33] we're just getting going.
[0:39] Demis Hassabis has had one of the most
[0:43] unusual careers in tech. He was a chess
[0:46] prodigy as a kid, then designed his
[0:49] first hit video game, Theme Park, at 17.
[0:53] He then went back to school, got a PhD
[0:56] in cognitive neuroscience, published
[0:58] foundational work on how memory and
[0:59] imagination work in the brain, and then
[1:02] in 2010 co-founded DeepMind with one
[1:05] mission, solve intelligence.
[1:08] And I think they've done it.
[1:11] Since then,
[1:13] uh his lab has gone on to do things most
[1:15] people thought were decades away.
[1:17] AlphaGo beat a world champion at Go.
[1:19] AlphaFold cracked protein structure
[1:21] prediction, a 50-year grand challenge in
[1:24] biology, and they gave it away for free
[1:27] to every scientist on Earth.
[1:29] That work won him the Nobel Prize in
[1:31] chemistry last year. Today, Demis leads
[1:34] Google DeepMind, where he's building
[1:37] Gemini and pushing toward the same goal
[1:39] he set when he was a teenager,
[1:42] artificial general intelligence. Please
[1:44] welcome Demis Hassabis.
[1:53] So, you've been thinking about AGI
[1:54] longer than almost anyone. Uh when you
[1:56] look at the current paradigm,
[1:58] large-scale pre-training, RLHF, chain of
[2:00] thought, how much of the final
[2:02] architecture for AGI do you think we
[2:05] already have, and what's fundamentally
[2:07] missing right now? Well, first of all,
[2:09] thank thanks, Gary, for that great
[2:10] introduction, and it's great to be here.
[2:12] Thanks for for welcoming here. It's
[2:13] amazing space, actually. I'll have to
[2:15] come back here often. Very inspiring
[2:17] that you all get to work in in in this
[2:19] space. So, the question is I think the
[2:22] the components that you just mentioned,
[2:24] I'm pretty sure will be part of the
[2:26] final architecture for AGI. So, I think
[2:29] they've come such a long way now, uh and
[2:32] we've proven out so many things about
[2:33] what they can do.
[2:35] Uh I can't see a world in which we'll
[2:37] sort of realize in a couple of years
[2:39] this was a dead end. That doesn't make
[2:40] sense to me. But, there still might be
[2:42] one or two things missing on top of uh
[2:45] of of of what you've you know, what we
[2:47] already know works. So, um
[2:50] continual learning, long-term reasoning,
[2:52] uh some aspects of memory, these are
[2:55] still unsolved. Um and how to get the
[2:58] systems to be more consistent across the
[3:00] board. Um I think all of these are going
[3:02] to be required for AGI. Now, it might be
[3:05] that the existing techniques can just
[3:07] scale up to that with some innovation
[3:09] and some incremental innovation. Um but,
[3:12] it could be that there's still one or
[3:13] two big ideas left uh that need to be
[3:16] cracked. I don't think it's more than
[3:18] one or two if there are out there. And I
[3:20] think, you know, my betting is uh about
[3:23] 50/50 if that's the case. So, of course,
[3:25] at DeepMind at Google DeepMind we work
[3:27] on both those things. I guess that's all
[3:29] I mean, working with a bunch of
[3:31] identical systems. The wildest thing to
[3:32] me is to what degree It's the same
[3:35] weights ev- over and over. So, this idea
[3:37] of continual learning is so interesting
[3:39] because like you know, right now we're
[3:41] sort of cobbling it together with duct
[3:43] tape, you know? Yes. These dream cycles
[3:45] at night and things like that.
[3:47] >> Yeah. It's pretty cool, the dream
[3:48] cycles, and we we used to think about
[3:50] this with consolidation with episodic
[3:52] memory. It's actually that's what I
[3:53] studied for my PhD is how the
[3:54] hippocampus works and integrates, you
[3:57] know, new knowledge gracefully into the
[4:00] existing knowledge base. So, the brain
[4:02] does that amazingly well. It it it does
[4:04] it through you know, during sleep uh
[4:06] especially things like REM sleep,
[4:08] replaying back episodes that that are
[4:10] important so that you can learn from it.
[4:12] In fact, our very first Atari program
[4:15] DQN, one of the ways it was able to
[4:17] master Atari games was by doing
[4:19] experience replay. So, we sort of
[4:21] borrowed that from from neuroscience and
[4:23] replayed successful trajectories
[4:26] uh many times, you know, that's
[4:28] way back in 2013 now in the in the dark
[4:30] ages of AI. It was uh a really important
[4:33] thing. And and I agree with you, we're
[4:34] kind of using duct tape right now. So,
[4:36] like shove it all in the context window.
[4:38] Um this but this seems a bit
[4:40] unsatisfying, right? And actually, even
[4:42] though uh we're working on machines, not
[4:45] biological brains, and so you
[4:47] potentially you could have, you know,
[4:49] millions or tens of millions size
[4:51] context window or memory, and it can be
[4:53] perfect, there's still a cost to looking
[4:56] it up and finding the right thing uh
[4:59] that that's actually relevant for the
[5:01] specific uh decision you've got to make
[5:03] right now. And that's non-trivial that
[5:05] cost, even if you can potentially store
[5:07] it all. I think there's actually a lot
[5:09] of room for innovation in in areas like
[5:11] memory. Yeah. I mean, the one thing is
[5:13] like it feels like a million
[5:14] token context one is actually bigger
[5:17] than I mean, it's plenty big, honestly.
[5:18] You can do so.
[5:20] It's plenty big for for for most things
[5:22] that it should be used for. I mean, if
[5:24] you think about the context window is
[5:26] sort of equivalent to working memory,
[5:28] you know, humans have we have like a few
[5:30] digits, you know, it's like a a dozen
[5:32] digits maybe, you know, average of
[5:34] seven. We got million or, you know, 10
[5:36] million context windows, but the problem
[5:38] is is that we're trying to store
[5:40] everything in that. You know, things
[5:42] that aren't in not important, things
[5:43] that are wrong. It's pretty brute force
[5:45] currently, and that doesn't seem uh
[5:47] right. And then the problem is if you're
[5:49] an agent trying to try and process live
[5:51] video, and you're just going to naively
[5:53] record all the tokens, then actually a
[5:56] million tokens isn't that much. It's
[5:58] only like 20 minutes. So, actually you
[6:01] need more if you want something that's
[6:02] going to understand your, you know, your
[6:05] what's going on in your life over maybe
[6:06] a month or two. DeepMind has
[6:09] historically leaned into reinforcement
[6:11] learning and search.
[6:13] AlphaGo, AlphaZero, and MuZero. How much
[6:16] of that philosophy is actually embedded
[6:17] in how you're building Gemini today? Is
[6:21] RL still underrated? Yeah, I think
[6:24] potentially it is. It sort of goes in in
[6:26] abs and way waves. You know, we've
[6:28] worked on agents since the beginning of
[6:30] DeepMind. In fact, we also That's what
[6:32] we said we were working on. And so, all
[6:33] of the Atari work and AlphaGo, most
[6:36] specifically, they're agent systems. And
[6:38] what we meant by that is systems that
[6:40] are able to, you know, accomplish goals
[6:42] on their own.
[6:43] And make active decisions and and make
[6:46] plans. And so, of course, we were doing
[6:48] it in the domain of games to to to make
[6:51] it tractable. And then doing
[6:53] increasingly complex games, things like
[6:55] StarCraft after AlphaGo, AlphaStar. So,
[6:59] we basically did all the games that are
[7:01] out there.
[7:02] And then of course, the question is can
[7:04] you generalize those models to be world
[7:06] models or models of language, not just
[7:08] models of simple games or even complex
[7:11] games. And that's what the last few
[7:13] years has been about. But really, you
[7:15] can think of a lot of the things we're
[7:17] doing today, all the leading models with
[7:19] thinking modes and chain of thought
[7:21] reasoning as aspects of what was sort of
[7:23] pioneered with AlphaGo coming back now.
[7:26] And I actually think there's a lot of
[7:28] work we did back then that is relevant
[7:31] today. And we're sort of re-looking at
[7:33] some of those old ideas
[7:35] at scale today in a more general way.
[7:38] Including things like Monte Carlo tree
[7:39] search and other other ways of doing
[7:41] augmenting the RL
[7:43] on top of the the reinforcement learning
[7:45] we're ready to do today. And I think a
[7:47] lot of those ideas both from AlphaGo and
[7:49] AlphaZero are really really relevant to
[7:52] to where we are with today's foundation
[7:54] models. And I think a lot of that is
[7:56] what we're going to see of the advances
[7:58] the next few years. One question I would
[8:00] have like obviously today you need
[8:03] bigger and bigger models to be smarter
[8:05] and smarter, but then we're also seeing
[8:07] distillation working and then smaller
[8:09] models can be like quite a bit faster. I
[8:11] think you know, you guys have incredible
[8:13] flash models that are like nine like
[8:15] you're finding that they're 95% as good
[8:18] as the frontier and at like 1/10 the
[8:21] price. Is that right? I think that's one
[8:23] of our core strengths is I mean you have
[8:25] to build the biggest models to to to to
[8:27] have the frontier capabilities, but I
[8:30] think one of our biggest strengths has
[8:31] been
[8:32] distilling and packing that power into
[8:34] smaller and smaller models very quickly.
[8:37] Obviously we we you know, we invented
[8:39] the kind of distillation process and and
[8:41] people like Jeff and Oriol and and
[8:43] others and we're still world experts in
[8:46] that. And we also have a huge need to
[8:50] do it because we've got to serve the
[8:52] biggest probably AI surfaces there are.
[8:56] Obviously there's search with AI
[8:57] overviews and AI mode and there's Gemini
[8:59] app and now increasingly every single
[9:01] product at Google has you know, maps and
[9:04] YouTube and so on has some aspect of
[9:07] Gemini or Gemini related technology in
[9:09] it. And so that's billions of users a
[9:12] dozen more than a dozen billion user
[9:14] products
[9:15] and they have to be served extremely
[9:16] fast, extremely efficiently and cheaply
[9:18] and with low latency. So that that gives
[9:21] us a really important incentive to to
[9:24] make these flash and even smaller models
[9:26] flashlight models extremely efficient.
[9:29] And hopefully that ends up then being
[9:30] really useful for many of the workloads
[9:32] that all of you use for. I'm curious
[9:35] about how much smarter these smaller
[9:38] models can actually be. Like, are there
[9:39] limits to the distillation process?
[9:41] Like, could a 50B or 400B model be as
[9:45] smart as like a mythos for today? Yeah,
[9:48] I didn't I didn't see any I don't think
[9:49] we've got to any kind of or at least
[9:51] none of us know yet if we've got to any
[9:53] kind of informational limit. I mean,
[9:55] maybe at some point that will be the
[9:56] case where there's just an information
[9:58] density that can't we can't get beyond.
[10:00] But, I think for now there's the
[10:03] assumption we make is that you know, a
[10:05] year later one of our
[10:07] leading, you know, pro models or
[10:09] frontier models goes out, half a year
[10:12] later, a year later you'll have them in
[10:14] the the really tiny almost edge models.
[10:17] And you'll also see some of that
[10:18] goodness in our Gemma models, which
[10:20] hopefully you're all enjoying our Gemma
[10:21] four models, which I think are really
[10:23] amazing power for their sizes. So,
[10:26] again, that uses a lot of this these
[10:29] distillation techniques and and the idea
[10:31] of how to make things really efficient
[10:32] in these very small models. So, I don't
[10:34] really see any limit yet in terms of
[10:36] like some kind of theoretical limit. I
[10:37] think we're still pretty far off of
[10:39] that. That's a mean I mean that is
[10:40] really good.
[10:41] >> Yes.
[10:42] Uh you know, one of the weirder things
[10:43] that we're seeing right now is like
[10:45] engineers can do like 500 to 1,000 times
[10:48] the amount of work that they were doing
[10:50] like 6 months ago, I guess. I mean, the
[10:53] people in this room there are people who
[10:54] are doing about like a thousand X the
[10:56] work that like I Steve Yegge talks about
[10:59] this. It's like a thousand X the work
[11:01] that a Google engineer from the 2000s
[11:03] was doing. I think it's very exciting. I
[11:05] mean, I think models have many uses. One
[11:07] is obviously cost, but the speed can
[11:10] allow, you know, if you think about
[11:11] coding even or other things, you can
[11:13] iterate a lot faster. Also, especially
[11:15] if there's if you're collaborating with
[11:17] the system. I think there's a there's a
[11:20] a lot of need for having fast systems
[11:23] that maybe are not quite frontier. Like
[11:26] you said, like 95% 90%, but that's
[11:28] plenty good enough and actually gain
[11:29] back more than the 10% on the the
[11:32] iteration speed. So, and then the other
[11:34] big thing I think is running these
[11:36] things on the edge. Again, for
[11:37] efficiency reasons, but also for privacy
[11:40] and security reasons, too. Um if you
[11:42] think about different devices that you
[11:44] might run these systems on that that,
[11:47] you know, process very personal
[11:48] information, can also think about
[11:50] robotics, as well. Um you know, robots
[11:52] in your house. I think you're going to
[11:54] want very efficient, uh very powerful,
[11:57] uh local models, which maybe are
[11:59] orchestrated
[12:00] you know, with some bigger models,
[12:02] frontier models that are in the in the
[12:04] cloud, but you only delegate to that in
[12:06] certain circumstances. And perhaps you,
[12:09] you know, you process all of the
[12:11] audio-visual feed, let's say, locally,
[12:13] and that stays local. I could imagine uh
[12:16] that would be a very good sort of um end
[12:18] state. Y Combinator Startup School is
[12:20] back. We're hand-selecting the most
[12:22] promising builders in the world and
[12:24] flying them out to San Francisco on July
[12:26] 25th and 26th to discuss the cutting
[12:29] edge of tech. Apply now for a spot.
[12:31] Okay, back to the video. Going back to
[12:34] context and memory, models currently
[12:36] stateless, but, you know, continue like
[12:38] what would the developer experience even
[12:40] be like for someone who's using a
[12:42] continual learning model? Like, you
[12:44] know, any idea like how you'd steer it?
[12:46] I think it's really interesting. I think
[12:48] that's one of the not having continual
[12:50] learning currently is one of the things
[12:51] holding back agents from doing full
[12:55] uh tasks, you know? I think they're
[12:56] really useful for aspects of tasks right
[12:59] now, and you can patch them together and
[13:01] do some really cool things, but they
[13:03] don't adapt well with the context that
[13:06] you're in. And I think that's the
[13:08] missing piece for them being really kind
[13:10] of fire and forget, and they'll figure
[13:12] it out themselves. You know, I think
[13:14] they need to be able to learn um about
[13:16] the specific context um that you're
[13:19] going to put them in. So, um I think we
[13:22] have to crack that to get full general
[13:25] intelligence. Where are we on reasoning?
[13:27] So, models can do really impressive
[13:28] chain of thought now, but they still
[13:30] fail on things a smart undergrad
[13:32] wouldn't. What specifically needs to
[13:34] change and what progress do you expect
[13:36] in reasoning? There's a lot of
[13:39] innovation left in in think the thinking
[13:41] paradigms, I would say. Again, I think
[13:43] we're fairly we're doing fairly
[13:44] simplistic things, fairly brute force.
[13:48] One could imagine
[13:50] I think there's a lot of scope for
[13:51] example in monitoring the chain of
[13:52] thought, maybe interjecting midway
[13:55] through a thought process. I often get
[13:57] the impression with our systems and and
[13:59] our competitor systems that they're
[14:01] almost overthinking. They're almost
[14:03] getting into sort of loops of things.
[14:05] Like one thing I sometimes like to do is
[14:08] is play chess against Gemini. And you
[14:10] know, it's the all the leading
[14:12] foundation models are pretty poor at
[14:13] games, which is quite interesting. It's
[14:15] very
[14:17] cool to kind of look at the thinking
[14:18] traces cuz obviously these are can be a
[14:20] well-understood. You know, I can tell
[14:22] quite quickly if it's going off on a
[14:24] tangent and it's very sort of provable
[14:26] what the what the the thinking is doing,
[14:29] whether it's useful or not. And so, what
[14:32] we see is that, you know, sometimes it
[14:34] will it will it will consider a move. It
[14:36] will realize it's a blunder, but it
[14:38] can't find anything better, so it kind
[14:39] of goes back to that move and does it
[14:41] anyway. So, it you know, you just
[14:43] shouldn't be seeing that
[14:45] happening in a in a very precise
[14:47] reasoning system. So, there's just sort
[14:49] of huge gaps, I think, still, but it may
[14:52] only be one or two tweaks that are
[14:53] required to fix those kind of gaps just
[14:55] to be clear, but I think that's pretty
[14:57] pretty obvious they're all there. And
[14:59] that's why you get this kind of jagged
[15:01] intelligence. You know, on the one hand,
[15:03] it can solve gold medal problems in IMO,
[15:07] which is super hard, but on the other
[15:08] hand, as we've all seen, it can still
[15:10] make basic elementary maths errors if
[15:13] you pose the question in a certain way,
[15:16] right? So, or elementary reasoning
[15:18] errors. So, there's just something to me
[15:19] about the almost an introspection about
[15:22] its own thought process that I feel like
[15:24] there's there's something maybe missing
[15:26] there. Agents are really big. Some would
[15:28] say they're hyped. I personally think
[15:29] they're just getting started. It's
[15:31] [laughter] totally insane. What does
[15:33] DeepMind's internal research tell you
[15:34] about where agent capabilities actually
[15:36] are right now versus, you know, sort of
[15:38] the hype out there? I think we are I
[15:40] agree with you. I think we're just at
[15:42] the beginning. You have to have an
[15:43] active system that can actively solve
[15:46] problems for you to get to AGI. That was
[15:48] always clear to us. So, agents are that
[15:51] path and I think we're just getting
[15:52] going. I think all of us are getting
[15:54] used to how do we best work and you're
[15:56] leading the way in a lot of this in your
[15:58] own personal experiments. I'm sure many
[15:59] of you are doing that. I think how do
[16:01] you incorporate it into your
[16:03] workflow in a way that isn't just sort
[16:06] of a nice to have, but actually starting
[16:09] to do fundamental things. My I think My
[16:10] impression is at the moment we're all
[16:11] experimenting we're experimenting a lot
[16:13] of things, but we're only in the maybe
[16:15] the last couple of months starting to
[16:16] find the really valuable places. And the
[16:19] technology probably only getting good
[16:21] enough for that to be the case, right?
[16:23] Where that it's not a kind of toy nice
[16:25] demonstration, but actually really
[16:27] adding value to your to your to your
[16:29] time and efficiency.
[16:31] I had often wondered I see a lot of
[16:33] people working on
[16:34] like setting off, you know, dozens of
[16:37] agents for like 40 hours, but I'm not
[16:39] sure I've seen the output that yet of
[16:42] that quite justify that level of input
[16:45] going in, but I think it will come. So,
[16:47] I still think we're in the
[16:48] experimentation phase. We haven't seen a
[16:50] triple-A game that tops the App Store
[16:53] charts that was sort of vibe coded yet,
[16:56] right? I've seen and I've programmed and
[16:58] I'm sure many we've all done little nice
[16:59] demonstrations and it's like amazing. I
[17:01] can do a prototype of theme park in half
[17:04] an hour now which took me six months
[17:06] back when I was 17. It's kind of
[17:08] mind-blowing and I and I wish I I got
[17:10] this feeling if I spent the whole summer
[17:12] working on it, you could make something
[17:14] really incredible, but it still needs
[17:16] craft and, you know, human sort of soul
[17:19] into it and taste. I think that's that's
[17:21] something that can that's you have to
[17:23] make sure you still bring that to to
[17:26] whatever it is you're building. And I
[17:27] think it still shows like it's not quite
[17:29] there yet because why haven't we seen
[17:32] a kid making a hit game that's that
[17:34] sells 10 million copies, right? That
[17:36] should be possible given the effort
[17:38] that's gone in. So something's still
[17:40] somehow missing. Maybe it's to do with
[17:42] the process, or maybe it's to do with
[17:43] the tools. I'm not quite sure. You will
[17:45] probably know better than me cuz I'm
[17:46] sure you're all experimenting on that.
[17:48] But I haven't seen the result yet which
[17:50] I would expect once this is really
[17:53] delivering that full value. Which I
[17:55] think will come in the next 6 to 12
[17:57] months. Some of it is like how much of
[17:58] it will be autonomous versus I mean, I
[18:00] don't think we'd see autonomous first.
[18:02] We would actually probably see people in
[18:04] this room operating at 1000X, and then
[18:07] That's what you should see first, and
[18:09] then many of you, you know, they'll be
[18:11] like games companies or, you know, other
[18:14] types of companies that have built some
[18:16] kind of best-selling app, best-selling
[18:18] game using these tools. That's what you
[18:21] should see first, and then more of that
[18:23] will get automated. I mean, some of it
[18:25] is like there's a human in there, and
[18:27] then the human doesn't want to say that
[18:29] the the the agents did it yet. I think
[18:32] part of it might be though that um this
[18:35] if we want to discuss like creativity,
[18:37] what I often say about that is like if
[18:39] we look at the things we've done like
[18:41] AlphaGo. So obviously very famously
[18:44] you'll all know about the move 37 in
[18:45] game two, and for me I was waiting for a
[18:47] moment like that to start the science
[18:49] projects like AlphaFold. We started
[18:51] AlphaFold like the day we got back from
[18:53] Seoul, which is 10 years ago now. I'm
[18:55] going to Korea after this to celebrate
[18:58] the 10-year anniversary of AlphaGo. But
[19:01] it's not enough to come up with move 37.
[19:03] Like that's pretty cool, very useful,
[19:06] um but can it invent go?
[19:08] That's what I want a system that can
[19:10] invent go if you give it a high-level
[19:12] description, you know, like a game you
[19:14] can learn the rules of in 5 minutes, but
[19:17] it takes a many lifetimes to master.
[19:19] It's beautiful aesthetically,
[19:22] um but you can play it in a few hours in
[19:24] an afternoon. So, you know, maybe you
[19:26] could imagine that would be the
[19:27] high-level description I would give and
[19:29] then I'd want the the return the thing I
[19:31] get back is go.
[19:33] Right? And um clearly today's systems, I
[19:36] think can't do that. So, the question is
[19:39] why? Um and I think there's something
[19:41] still missing there. Well, someone in
[19:43] this room might might make it.
[19:44] >> Then the answer would be there's nothing
[19:45] missing. It just was the way we were
[19:47] using the systems. And that might
[19:49] actually be the answer. It might be that
[19:51] today's systems are capable of that with
[19:53] a brilliant enough creative person using
[19:56] it and providing that impetus that the
[19:59] soul of the project and being able to
[20:01] probably being
[20:03] au fait enough with the tools to like
[20:06] almost be at one with the tools. I could
[20:07] imagine that would be happening if you
[20:09] experimented with the tools all day and
[20:11] all night like probably many of you are
[20:12] doing that and you combine that with
[20:14] proper deep creativity.
[20:17] Um something, you know, more incredible
[20:18] could be done. Switching gears to open
[20:20] source, I mean or open open and open
[20:22] weights. I mean, the recent release of
[20:24] Gemma, you're making highly capable open
[20:28] and accessible ones that can actually
[20:29] run locally. What do you think that
[20:31] means for you will AI be something that
[20:35] is in the hands of the users instead of
[20:36] primarily in the cloud? And does that
[20:39] change who gets to, you know, build with
[20:41] these models? We're huge proponents of
[20:44] in general of open source and open
[20:46] science. And you mentioned AlphaFold at
[20:48] the beginning, you know, we put that all
[20:49] out there for free. And all of our
[20:51] science work even still today we publish
[20:54] in, you know, the big journals. We
[20:56] wanted to create uh world-leading models
[20:58] for their their sizes. Right? And so,
[21:00] that's what we hopefully we've done with
[21:02] Gemma. And we're, you know, very
[21:03] committed to that path. And hopefully
[21:05] you will experiment and build and and
[21:07] enjoy and using Gemini. I think it's
[21:08] been like 40 million downloads now and
[21:11] uh it's just in you know 2 and 1/2
[21:13] weeks. So we're really excited about
[21:14] that. And I also think it's important
[21:16] for there to be Western stacks on open
[21:19] source. You know, obviously a lot of the
[21:20] Chinese models are excellent and and
[21:23] they're currently well well leading in
[21:24] open source and we think Gemini is very
[21:26] competitive for its sizes
[21:28] uh in in all those respects. And for us,
[21:31] I mean there is a question of resources,
[21:33] talent, and compute. Like nobody has
[21:35] enough spare compute to just make two,
[21:38] you know, uh frontier models at maximum
[21:41] size, right? With different attributes.
[21:43] So that's pretty difficult. But also for
[21:45] what for now what we've we've decided is
[21:47] that our edge models, the things we want
[21:49] to use for Android and glasses and
[21:52] robotics, um it's best that they're open
[21:54] models because they're vulnerable anyway
[21:57] on the once you put them out on the
[21:58] surfaces. So they might as well be
[22:00] actually fully open, right? So we've
[22:03] sort of made a decision to kind of unify
[22:05] that
[22:06] uh at the at the kind of we call it nano
[22:09] size level. So that actually works for
[22:11] us uh strategically as well. Um and you
[22:15] know, we hope as many people as possible
[22:16] build on it. And of course, we'll be
[22:18] building on that, too. Earlier uh before
[22:20] we came on, I got to show you a demo of
[22:22] uh my version of Samantha from Her,
[22:24] which is Yes. uh harrowing for me to try
[22:26] to demo something to you. Yeah, very
[22:28] good. Um and it worked, which is
[22:30] amazing. Gemini was built multimodal and
[22:32] I spent a lot of time with a bunch of
[22:34] the models and I mean the depth of the
[22:36] context and the tool use with speech
[22:40] directly to model, like there's nothing
[22:42] like bar none, like the best one
[22:44] actually.
[22:44] >> Yeah. Yeah, I think I think that's the
[22:46] sort of still a slightly
[22:47] underappreciated aspect of of of the
[22:49] Gemini series is we we started it being
[22:52] multimodal from the start. That made it
[22:54] a little bit more difficult actually to
[22:56] begin with cuz then just focusing on
[22:57] text, for example. But I we believe
[23:00] we're going to gain from that in the
[23:02] long run. And I think we're seeing that
[23:03] now for
[23:05] things like world model building, so
[23:08] stuff like Genie that we build on top of
[23:10] Gemini. I think it's going to be really
[23:12] important for things like robotics. So
[23:14] this is why Gemini robotics which many
[23:15] of you probably played around with, I
[23:17] think it's going to be built on
[23:18] multimodal foundation models, the
[23:20] robotics models. And we think we have a
[23:23] sort of competitive advantage with with
[23:25] Gemini being so strong at multimodal.
[23:27] We're using it increasingly in things
[23:29] like Waymo. Um but also if you imagine
[23:32] devices and assistants that digital
[23:35] assistants that come with you into the
[23:36] real world, you know, maybe on your
[23:38] phone or glasses or some other device,
[23:41] it needs to understand the physical
[23:43] world around you and intuitive physics
[23:46] and and the and the physical context
[23:48] you're in. And that's what our systems
[23:50] are extremely good at and I think you
[23:52] found that's why you've enjoyed using it
[23:53] in your setup. We're planning to
[23:55] continue on that and I think we're
[23:57] far and away the strongest models on on
[23:59] those types of
[24:01] problems. So the cost of inference is
[24:03] dropping fast. What becomes possible
[24:05] when inference is essentially free and
[24:07] how does that change what your team is
[24:09] actually optimizing for? Yeah, I'm not
[24:11] sure inference
[24:13] will ever be essentially free. I mean
[24:15] there's sort of Jevons' paradox and
[24:17] other things about like I think we'll
[24:18] just end up using all of us will end up
[24:21] using whatever we can get our hands on
[24:24] and you could imagine
[24:26] millions of agents, swarms of agents
[24:28] working together on things. So that's
[24:30] one way to use the inference or you
[24:32] could imagine
[24:33] single agents or smaller groups of
[24:35] agents thinking for in multiple
[24:38] directions and then ensembling that. So
[24:40] we're experimenting with all these
[24:41] things, probably many of you are. All of
[24:43] that will use up any inference I think
[24:46] that's available. I mean one day maybe
[24:48] it can be almost cost zero, certainly
[24:51] the energy if we solve fusion or you
[24:53] know, superconductors or you know,
[24:54] optimal batteries or some set of those
[24:56] things which I think we will do with
[24:58] material science, And energy costs will
[25:00] be essentially zero, but there'll still
[25:02] be the physical creation of the chips
[25:04] and other things. There'll There'll be
[25:06] some bottleneck um at least for the next
[25:09] few decades, I think. And so if that's
[25:11] the case, there'll still be rationing on
[25:14] the inference side. You still have to
[25:16] use it, I think, efficiently. Yeah.
[25:18] Well, luckily the smaller models are
[25:19] getting smarter and smarter, which is
[25:20] fantastic. Uh we got a lot of bio and
[25:23] biotech founders in the audience. I can
[25:26] see a few. AlphaFold 3 took us beyond
[25:28] proteins to a broad spectrum of
[25:29] biomolecules. Uh how close are we to
[25:32] modeling full cellular systems, or is
[25:34] that still a fundamentally harder
[25:36] problem in a class of its own? Well, I
[25:39] Isomorphic Labs, which we spun out from
[25:41] from from from DeepMind after we did
[25:44] AlphaFold 2,
[25:45] um it's it's which is going amazingly
[25:47] well. It's it's it's trying to build out
[25:49] uh not just AlphaFold. It's just one
[25:51] piece of the drug discovery process, uh
[25:54] as many you know, but we're trying to do
[25:56] the the adjacent biochemistry and
[25:58] chemistry to design the right compounds
[26:00] with the right properties and so on.
[26:02] We'll have some big announcements for
[26:03] you know, very soon to talk about on the
[26:05] on that front. I think that's going
[26:07] really well. Eventually, you want a
[26:09] whole virtual cell. So I've talked about
[26:11] this in many of my science talks about a
[26:13] full working simulation of a cell that
[26:16] you can perturb, and then the you know,
[26:18] the the outputs of that would be close
[26:21] enough to experimental that it's useful,
[26:23] right? You could skip out a lot of the
[26:25] the search steps and generate lots of
[26:27] synthetic data to train other models
[26:30] that then would predict things about,
[26:31] you know, real cells. And um
[26:34] I think we're about 10 years away
[26:36] probably from something like a virtual
[26:37] cell, like a full virtual cell. You
[26:39] know, we're starting out This is we're
[26:41] working on the DeepMind side, science
[26:42] side, on a you know, virtual nucleus,
[26:45] cell nucleus first cuz relatively
[26:47] self-contained. The trick with all of
[26:49] these things is can you pick uh a slice
[26:51] of the complexity, you know, eventually
[26:53] you want to want to model a human body,
[26:55] but can you model it down to the right
[26:57] level of detail and what slice can you
[27:01] take out of it that will be
[27:02] self-contained enough? You can kind of
[27:04] model and approximate the inputs and
[27:07] outputs into that self-contained system
[27:09] and then just focus on the
[27:10] self-contained system. So, a nucleus is
[27:12] quite interesting from that perspective.
[27:15] Um, then the other issue is just there's
[27:16] not enough data yet. So, you need data
[27:20] and I talked to various, you know, top
[27:22] scientists about who work on electron
[27:24] microscopes and other imaging things. If
[27:27] we could image a live cell without
[27:29] killing the cell, that would be
[27:32] game-changing obviously cuz then you
[27:33] could convert it into a vision problem
[27:35] which we would know how to solve. Right?
[27:37] And but at the moment, there are at
[27:39] least I'm not aware of any techniques
[27:41] that can give you a kind of, you know,
[27:42] nanometer resolution
[27:45] but without destroying but in, you know,
[27:48] in a live dynamic cell. So, you can see
[27:50] all the interactions. Right? You can
[27:51] take static images at that resolution
[27:53] obviously.
[27:55] Really detailed now and that's quite
[27:56] exciting but it's not enough to turn it
[27:59] just into just into a complex vision
[28:02] problem.
[28:03] So, that's one way it could be solved.
[28:05] So, it could be a hardware driven data
[28:06] driven solution or it could be that we
[28:09] build better
[28:11] learn simulators of these dynamical
[28:14] systems. So, that's that's the more
[28:15] modeling way of solving it. You've been
[28:18] looking at all kinds of science and not
[28:19] just bio. There's material science, drug
[28:22] discovery, climate modeling,
[28:23] mathematics. If you had a rank which
[28:26] scientific domain will transform the
[28:27] most dramatically the next 5 years,
[28:29] what's in your list?
[28:30] >> all sounds exciting and that's why, I
[28:32] mean, that that for me has been my main
[28:34] passion and always the reason why I've
[28:36] worked on AI for my whole career for 30
[28:38] plus years now is to use AI as the
[28:41] ultimate tool. I always thought AI would
[28:43] be the ultimate tool for science and to
[28:45] invite such advanced scientific
[28:48] scientific discovery, and things like
[28:49] medicine, and just our understanding of
[28:51] the universe around us. So, actually,
[28:53] when you mentioned our original way we
[28:55] used to articulate our mission
[28:56] statement, which is still the way we
[28:58] think about it, is there was two steps
[28:59] to it. One was Step one was solve
[29:01] intelligence, i.e. build AGI, and then
[29:03] step two was use it to solve everything
[29:05] else. We had to change that a bit over
[29:07] time cuz people were like, "Do you
[29:08] really mean solve everything else?" And
[29:10] we did mean that, and I think people are
[29:12] sort of understanding what that means
[29:14] today. But, specifically, I was meaning
[29:16] solve other what I call root node
[29:18] problems in science. So, areas of
[29:20] science that would unlock whole new
[29:22] branches or avenues of discovery. And
[29:24] AlphaFold is the prototypical example of
[29:26] what we want to do. So, over 3 million
[29:28] researchers around the world, pretty
[29:30] much every biology researcher in the
[29:31] world uses AlphaFold now. And I was told
[29:34] by some of my, you know, former
[29:36] executive friends that, you know, almost
[29:39] every drug discovered from now on will
[29:42] have used AlphaFold at some point in its
[29:44] in the drug discovery process. So,
[29:46] that's something we're very proud of,
[29:48] and it's the sort of impact that we hope
[29:50] to have with with AI. But, I do think
[29:52] it's just the beginning. I I don't
[29:54] really see any area of science or
[29:56] engineering that this won't be able to
[29:57] help be helpful with. And the ones you
[29:59] mentioned, I think we're almost like an
[30:02] AlphaFold one moment. So, it's we've got
[30:04] very promising results, but it's not
[30:05] quite solved the the grand challenge yet
[30:08] in that domain. But, I think we're going
[30:10] to have a lot to talk about in the next
[30:12] couple of years on all those areas you
[30:13] mentioned, materials, which I I think is
[30:15] very exciting, all the way to
[30:17] mathematics. In in science, I mean, it
[30:19] feels Promethean. It's like, here is
[30:21] this capability, and you I think so. I
[30:24] mean, of course, along with that,
[30:25] including including what the the the
[30:27] parable of Prometheus, we have to also
[30:29] be careful with how we use that and what
[30:32] we use it for, and also the misuse that
[30:35] can happen with those same tools. A lot
[30:37] of people in this room are trying to
[30:37] build companies applying AI to science.
[30:40] For them, what's the difference between
[30:41] a startup that actually advances the
[30:43] frontier in your view versus one that's
[30:45] just wrapping an API around a foundation
[30:47] model and calling it AI for science?
[30:49] Well, look, I think there's one of the
[30:51] things I would recommend. I'm trying to
[30:52] think about and I think you mentioned
[30:53] this to me before. What would I do today
[30:55] myself if I was sitting in your place in
[30:58] Y Combinator, you know, looking at
[30:59] things. One thing you have to do is
[31:01] obviously intercept where the AI tech is
[31:04] going. So, that's one hard part of it.
[31:06] But, I do think there's huge scope for
[31:09] combining where AI is going with some
[31:12] other deep technology area. I just think
[31:14] that that sweet spot is is whether it's
[31:16] materials or medicine or other really
[31:18] hard areas of science. I think that
[31:21] those kinds of interdisciplinary teams,
[31:24] especially if it involves the world of
[31:25] atoms as well,
[31:27] there's not going to be a shortcut to
[31:28] that, at least in the foreseeable
[31:30] future. Those areas that are pretty safe
[31:33] from just getting swamped by whatever
[31:35] the next update is to the foundation
[31:37] models. So, I think if you're looking
[31:39] for things like that, that's one of the
[31:41] more defensible areas I would say. And
[31:43] I've always loved deep tech, so I'm kind
[31:45] of biased towards deep tech things. I
[31:48] think nothing
[31:49] that's really long-lasting and
[31:51] worthwhile is easy. And so, I'm always
[31:54] been drawn to to deep technologies.
[31:57] Obviously, AI was like that back in 2010
[31:59] when we started out, right? It was It
[32:01] was thought to just we know we know it
[32:03] doesn't work kind of thing is what I was
[32:05] told by investors and even in academia
[32:07] it was considered to be a very niche
[32:10] subject that we sort of tried in the
[32:12] '90s and we know doesn't work. But, if
[32:14] you, you know, if you have belief and
[32:16] conviction in your idea why it's
[32:18] different this time or what special
[32:19] combination from your background that
[32:22] you had, ideally you're expert in both
[32:24] those areas, both the machine learning
[32:26] and the other area you're applying it to
[32:27] or you can create a founding team with
[32:29] that expertise, I think there's huge
[32:31] impact to be made there and huge value
[32:33] to be built there. That's of important
[32:35] message. I mean, even I mean, it's hard
[32:38] it's easy to forget. Like, basically,
[32:39] once you've done it, you've done it.
[32:40] But, before you've done it, people are
[32:42] arrayed against you. Oh, sure. I mean,
[32:44] no one believes in it, which is why I
[32:45] think you got to you've also got to work
[32:47] in things that you're genuinely
[32:49] passionate about. Like, for me, I would
[32:52] have worked on AI no matter what
[32:54] happened. I just decided from a very
[32:56] young age it was the thing that um could
[32:59] be the most consequential thing I could
[33:01] think of. It's turned out that way, but
[33:02] it might not have. Maybe we would have
[33:03] been 50 years too early. And it was also
[33:06] the most interesting thing I could think
[33:08] of working on. And so, I would have
[33:10] still be working on AI today even if we
[33:13] were still, you know, in a little garage
[33:15] somewhere and it still wasn't quite
[33:16] working. I would have still been trying
[33:18] to find Maybe I'd have been back in
[33:19] academia or something, but I would have
[33:20] found some way of of continuing to work
[33:23] on it. So, I mean, AlphaFold was like an
[33:24] example of a spike that you pursued and
[33:27] it worked. You know, what makes a
[33:29] scientific domain ripe for an AlphaFold
[33:31] style breakthrough? And is there a
[33:32] pattern, a certain objective function?
[33:35] >> The way I I I I should write this up at
[33:37] some point when I have 5 minutes spare,
[33:39] but the lesson I've learned from all the
[33:42] Alpha projects we've done, specifically
[33:44] AlphaGo and AlphaFold, is um I think the
[33:48] techniques we have and the problems I
[33:49] look like to look for are great in if
[33:52] this if the situation can be described
[33:54] as massive combinatorial search space.
[33:56] The more massive the better in some
[33:58] ways. So, no brute force or special case
[34:00] algorithm will will solve it. And that's
[34:02] true of Go moves and of, you know,
[34:04] different configurations of proteins,
[34:07] far more than the atoms in the universe,
[34:08] both of those. And then, um you have a
[34:11] clear objective function. So, you know,
[34:13] you can think of it as minimizing the
[34:14] free energy in the proteins or, you
[34:16] know, the winning the game of Go. So,
[34:18] you need to be able to you need to
[34:19] specify your objective function clearly
[34:21] so you can hill climb. And then, um
[34:24] enough data and or simulator that can
[34:27] generate you uh lots of uh in
[34:29] distribution
[34:31] uh uh synthetic data. If those things
[34:33] are true, then I think um with today's
[34:36] methods, you can go a long way into
[34:38] tackling and finding the kind of needle
[34:40] in the haystack that you need uh to for
[34:42] the solution that you're trying to look
[34:44] for. And I think of just drug discovery,
[34:45] by the way, in the same way, right?
[34:47] There is a compound out there that would
[34:49] solve this disease if one could find it,
[34:52] if one could only find it, right? And
[34:54] that wouldn't have any side effects and
[34:55] so on. And as long as the laws of
[34:57] physics allows it, then the only
[35:00] question is how do you find it in an
[35:01] efficient way, in a tractable way? I
[35:04] think we showed for the first time,
[35:05] actually, with AlphaGo, that these
[35:07] systems could uh find those kinds of
[35:10] needles in a haystack, in that case, you
[35:12] know, the perfect Go move. I guess uh to
[35:14] get a little meta, I mean, we're we're
[35:15] talking about humans using these methods
[35:18] to create AlphaFold, but then there's a
[35:20] meta level, which is humans using AI to
[35:23] explore the space of possible
[35:25] hypotheses. How close are we to AI
[35:27] systems that can do genuine scientific
[35:29] reasoning, not just pattern matching on
[35:32] data?
[35:32] >> we're close. Um
[35:35] we're working on these general systems
[35:36] like that like I think we we have this
[35:38] system called co-scientist, and we have
[35:41] other algorithms like AlphaFold that can
[35:43] go a little bit beyond what the basic
[35:45] Gemini will do. And obviously, all the
[35:47] frontier labs are experimenting in this
[35:49] way. I've yet to seen anything so far,
[35:52] and we we all tinker with the same
[35:53] things, you know, some math problems
[35:54] that are a little bit harder than IMO
[35:56] and so on. I haven't seen anything yet
[35:59] um that is a true genuine, you know,
[36:02] massive discovery. That's my personal
[36:04] opinion. I think it's coming. I think it
[36:07] may be related to uh this earlier
[36:10] this thing we discussed about
[36:11] creativity, and and actually going on
[36:14] beyond the bounds of what's known. So,
[36:16] clearly, that's just not pattern
[36:17] matching at that point, cuz there is no
[36:19] pattern to match to, and it's a bit more
[36:21] than extrapolation. It's some kind of
[36:23] analogical reasoning, and I don't think
[36:25] these systems have that, or at least
[36:27] we're not using them in the in the right
[36:29] way to do that. So, the way I often say
[36:31] that in science is can it come up with a
[36:33] hypothesis that's really interesting,
[36:36] not just solve one. When I say just,
[36:38] we're not talking about just like
[36:39] solving the Riemann hypothesis or
[36:41] something. This would be obviously
[36:42] amazing, or one of the Millennium Prize
[36:44] problems, and maybe we're a couple of
[36:46] years out from doing that. Um but, I'd
[36:48] like to solve P equals NP. That's That's
[36:50] my favorite one. But, can you But, even
[36:52] harder than that would be to come up
[36:54] with a new set of of Millennium Prize
[36:56] problems that were regarded by top
[36:59] mathematicians to be as, you know, deep
[37:01] and meaningful and worthy of lifetime of
[37:04] study and effort to solve. Right? I
[37:07] think that's another level harder. And
[37:10] uh we don't have um you know, I still
[37:12] don't think we know how to do that. I
[37:14] don't think it's it's magical, though. I
[37:16] do think these systems will be
[37:18] eventually be able to do that. Maybe
[37:20] we're missing one or two things. And
[37:21] then, the way we would test that is, you
[37:23] know, I sometimes call it my Einstein
[37:25] test, which is, you know, can you train
[37:27] a system with the knowledge of cutoff of
[37:29] 1901, and then will it come up with you
[37:33] know, what Einstein did in 1905,
[37:34] including special relativity, you know,
[37:36] his annus mirabilis. Can Can it do that,
[37:39] right? Uh and then, I think we could run
[37:42] that test. May- Maybe we should just run
[37:44] that test and keep seeing if that's
[37:46] possible. And once that is, then I think
[37:48] we're on the verge of these systems
[37:49] being able to invent something new,
[37:51] truly novel. So, last last question. For
[37:54] the people who are deeply technical in
[37:55] this room who want to work on something,
[37:59] you know, even close to the scale that
[38:01] what you have created with you know,
[38:02] it's one of the largest AI efforts in
[38:04] the world, and you've been a pioneer for
[38:06] all these years. So, for that, I think
[38:08] everyone in this room thanks you and the
[38:10] folks at DeepMind very, very deeply from
[38:12] the bottom of our hearts. Thank you.
[38:14] What's the thing that you know now about
[38:16] building at the frontier that you wish
[38:17] you'd known at 25?
[38:20] I think we covered some of it in terms
[38:22] of actually you you work out that going
[38:24] after hard problems and deep problems um
[38:28] it's no more difficult in some ways than
[38:29] than going after a shallower, simpler,
[38:32] more superficial problem. They're
[38:33] they're they're just differently
[38:34] difficult. There's different things that
[38:36] are hard about each of those things, but
[38:38] I think given life's very short and you
[38:41] you know, you only have so much time and
[38:43] energy, you might as well put your life
[38:45] force into something that will really
[38:47] make a
[38:48] difference if you hadn't done it, if you
[38:50] hadn't been there to push it. So, I
[38:52] would just think of it through that
[38:54] lens. And then the other thing is if
[38:56] you're if you are and then we talked
[38:57] about deep tech and I love
[38:59] interdisciplinary
[39:00] uh work and I think that's going to be
[39:02] even more prevalent in the next few
[39:04] years in combinations of fields and uh
[39:07] uh finding the the the the connections
[39:09] between those fields. And it's going to
[39:11] be even easier to do that with AI. And
[39:13] then the only other thing I would say is
[39:15] if you know, if you have your depending
[39:17] on what your AGI timeline is, you know,
[39:19] mine's like 20 30 or something like
[39:20] this, then if you start off on a deep
[39:23] tech journey today
[39:26] usually that you're talking about a
[39:27] 10-year journey for for true deep tech
[39:30] in my opinion. So, then now you have to
[39:32] just consider AGI appearing in the
[39:35] middle of that journey. So, what does
[39:37] that mean? It doesn't it's not bad
[39:38] necessarily, but you have to take that
[39:40] into account, right? To will it be able
[39:43] to leverage it? What will the AGI system
[39:45] do with it? And it goes a little bit
[39:47] back to what you said earlier about
[39:48] AlphaFold and general AI systems. So,
[39:51] one thing I can think see happening is
[39:53] Gemini, Claude, or one of these general
[39:55] systems making use of AlphaFold like
[39:58] specialized systems as tools. I don't
[40:00] think we're going to have it just in one
[40:02] giant brain cuz it will have too much
[40:04] regression in if I put all the proteins
[40:07] into you know,
[40:08] Gemini, that wouldn't make sense. We
[40:10] don't need Gemini to do protein folding.
[40:12] Going back to your information
[40:13] efficiency, it will definitely affect
[40:15] its language skills or something like
[40:16] that, right? In a bad way. So, much
[40:19] better I think is to have really good
[40:21] general purpose tool usage models that
[40:23] will then
[40:24] maybe they could even train those
[40:26] specific tools, but they would be in a
[40:27] separate
[40:29] system. So, I think that's kind of
[40:31] interesting to think through the
[40:32] implications of that and then what you
[40:33] might build today. Also, physical things
[40:36] too like what kinds of factories would
[40:37] you build, what sorts of
[40:40] you know, finance systems and so on. So,
[40:42] I just think you need to really take
[40:44] that seriously and and and on the one
[40:46] hand is like an imagine what that world
[40:47] would look like and then build something
[40:49] that would be useful if that comes in
[40:51] halfway through.
[40:53] Demis Hassabis everyone.
[40:54] >> [applause]