[0:14] Yeah, we good. [0:17] >> Okay, folks, we're at capacity. Let's [0:20] kick off. I don't want you waiting here [0:22] for 25 more minutes before we some [0:24] arbitrary deadline. So, welcome. My name [0:28] is Matt. Uh I'm a teacher and I suppose [0:31] now I teach AI. Um [0:35] we have a link up here if you've not [0:37] already been to this which is has the [0:39] exercises for the um stuff we're going [0:41] to do today. This is going to be around [0:43] two hours. So we might just sort of kick [0:44] off two hours from now. Is that right [0:46] Mike? [0:48] >> Yeah. Perfect. Um, and the theory behind [0:52] this talk or at least the thesis under [0:53] which I've been operating for the last [0:55] kind of six months or so is that [0:59] we all think that AI is a new paradigm, [1:01] right? AI is obviously changing a lot of [1:03] things. You guys are obviously [1:04] interested in this and that's why you've [1:05] come to this talk. And [1:09] I feel that [1:12] when we talk about AI being a new [1:14] paradigm, we forget that actually [1:17] software engineering fundamentals, the [1:19] stuff that's really crucial to working [1:21] with humans, also works super well with [1:24] AI. And this is what my keynote is on [1:27] tomorrow. Really, I'm going to sort of [1:28] be fleshing that out a lot more. And in [1:30] this workshop, I'm hopefully going to be [1:32] able to direct your attention to those [1:34] things and uh hopefully show you that [1:38] I'm right, but we'll see. Um, can I get [1:41] a quick heads up first? How many of you [1:44] guys um are coding have ever coded with [1:47] AI? Raise your hand if you've ever coded [1:48] with AI. Perfect. Okay. Uh, keep your [1:51] hand raised. [1:53] Uh, let's all uh share those armpits [1:56] with the world. Um, [1:58] how many of you code every day with AI? [2:01] Cool. Okay. Uh, ra keep your hand raised [2:04] if you've ever been frustrated with AI. [2:08] Okay. Very good. You can put your hands [2:10] down. Thank you for that show of [2:12] obedience. I really appreciate that. Um, [2:14] we are also being live streamed to the [2:15] Gilgood room as well. I've not uh did we [2:18] send someone up to the Gilgood room to [2:20] just check they're okay? Don't know. But [2:22] I see you. Uh, and there is a way that [2:25] you can participate which is we have the [2:27] um a Q&A. We're going to be doing kind I [2:30] have a sort of hatred of Q&As's because [2:31] they're not very democratic. The mostly [2:33] the sort of um most talkative people get [2:36] to um get to participate and share. And [2:39] so we're going to be going through this [2:41] um QA here. So why do we have to wait [2:43] till 3:45? The room is packed. The doors [2:45] are closed. 100% agree. And so if you [2:48] want to uh ask a question, we're going [2:50] to be I would like you to pile into this [2:52] async and then we can vote on each [2:53] other's questions and hopefully get the [2:55] best question surface so the for the [2:57] entire room to enjoy. [3:00] So I want to talk about first the kind [3:02] of weird constraints that LLMs have and [3:07] those weird constraints are sort of what [3:09] we have to base a lot of our work [3:11] around. Now, [3:14] there's a guy called Dex Hy who runs a [3:16] company called Human Layer, and he came [3:18] up with this idea, which is that [3:21] when you're working with LLMs, they have [3:24] a smart zone and a dumb zone. When [3:28] you're first kind of like working with [3:30] an LM and it's like you just started a [3:32] new conversation, you start from [3:34] nothing. That's when the LLM is going to [3:35] do its best work because in that [3:37] situation, the attention relationships [3:39] are the least strained. Every time you [3:41] add a token to an LLM, it's kind of like [3:44] you're adding a team to a football [3:45] league. You think of the number of [3:47] matches that get added every time you [3:50] add a team to a football league. It just [3:51] go scales quadratically. And that's [3:54] because you have attention relationships [3:55] going from essentially each token to the [3:58] other that are positional and the sort [4:00] of meaning of the individual token. And [4:02] so this means that by around sort of 40% [4:05] or around I would say around 100k is [4:08] kind of my new marker for this because [4:09] it doesn't matter whether you're using 1 [4:11] million uh context window or 200k. It's [4:15] always going to be about this. [4:17] It starts to just get dumber. So as you [4:21] continually keep adding stuff to the [4:23] same context window, it just gets dumber [4:25] and dumber until it's making kind of [4:26] stupid decisions. Raise your hand if [4:28] that feels familiar to you. Yeah. Cool. [4:31] So this means that we kind of want to [4:34] size our tasks in a way that sticks [4:37] within the smart zone, right? We don't [4:39] want the AI to bite off more than it can [4:41] chew. And this goes back to old advice [4:44] like Martin Fowler in refactoring uh [4:46] like uh the pragmatic programmer talks [4:48] about this. Don't bite off more than you [4:50] can chew. Keep your tasks small so that [4:53] you as a developer, a human developer [4:55] don't freak out and don't start acting [4:57] and going into the dumb zone. [5:01] But how do you tackle big tasks? How do [5:04] you take a large task like I don't know [5:07] cloning a company or something or just [5:09] doing something crazy? And how do you [5:12] break it into small tasks so they all [5:13] fit into the dumb zone? One way of [5:16] course you could do is I mean kind of [5:18] what the AI companies maybe want you to [5:20] do or the natural way of doing it is [5:21] just keep going and going and going. You [5:23] end up in the dumb zone charging you [5:24] tons of tokens per request. You then [5:26] compact back down. We'll talk about [5:29] compacting properly in a minute. And you [5:31] keep going, keep going, keep going, [5:32] compact back down, keep going, keep [5:33] going, keep going. And I think that's [5:36] doesn't really work very well because [5:38] the more sediment, we'll talk about that [5:40] in a minute. So the theory here is then, [5:43] and this is what I was doing for a [5:44] while, is I would use these kind of [5:48] multi-phase plans where I would say, [5:50] okay, we have this sort of number four [5:53] thing here, this large large task. Let's [5:55] break it down into small sections so [5:57] that we can then kind of chunk it up and [5:59] do each little bit of work in the smart [6:01] zone. Raise your hand if you've ever [6:03] used a multi-phase plan before. Yeah, [6:06] really common practice, right? This is [6:08] kind of how we've been doing it. [6:09] Certainly, this is how I was doing it up [6:11] until December last year really. [6:14] And any developer worth their salt will [6:16] look at this and go, "This is a loop, [6:19] right? This is a loop. We've just got [6:21] phase one, phase two, phase three, phase [6:23] four. Why don't we just have phase n, [6:27] right? [6:29] Phase n where we essentially just say, [6:31] okay, we have, let's say, a plan [6:33] operating in the background and then we [6:35] just loop over the top of it and we go [6:37] through until it's complete. And this is [6:39] where um raise your hand if you've heard [6:41] of Ralph Wiggum as a software practice. [6:44] Okay, cool. Raise your hand if you've [6:45] not heard of Ralph Wigum as a software [6:46] practice. Actually, that's more like it. [6:48] Okay. So there's this idea called Ralph [6:50] Wigum uh which is kind of um sort of [6:52] based on this which is essentially [6:56] all you need to do is sort of specify [6:58] the end of the journey where you just [7:00] say okay we create a PRD a product [7:02] requirements document to say okay let's [7:05] describe where we're going and then we [7:07] just say to the AI just make a small [7:09] change make a small change that gets us [7:11] closer and closer to there and Ralph [7:14] works okay but I prefer a little bit [7:16] more structure so that's kind where we [7:18] got to in terms of thinking about the [7:21] smart zone. And that's kind of where I [7:23] want you to first start thinking about [7:25] here. Another weird constraint of LLM is [7:29] LLM are kind of like the guy from [7:30] Momento, right? They just continually [7:32] forget. They could just keep resetting [7:34] back to the base state. Let me pull up [7:36] this diagram. [7:38] I sort of I I I really should use [7:41] slides, but I just prefer just like [7:42] randomly scrolling around a infinite uh [7:45] TL draw canvas. Thank you, Steve. [7:48] Um, [7:49] so let's say another concept I want you [7:52] to have is that every session with an [7:53] LLM kind of goes through the same [7:55] stages. You have first of all the system [7:57] prompt here. This gray box here is [8:00] essentially the stuff that's always in [8:02] your context. You want this to be as [8:04] small as possible because if you have a [8:06] ton of stuff in here, if you have 250k [8:09] tokens, like I have seen people put in [8:11] there, then that you're just going to go [8:13] straight into the dumb zone without even [8:15] being able to do anything. So you want [8:17] this to be tiny. You then go into a kind [8:20] of exploratory phase. This blue is sort [8:22] of where the coding agent is going out [8:24] and exploring the codebase. Then you go [8:27] into implementation and then you go into [8:29] testing and kind of making sure that it [8:32] works, running your feedback loops and [8:33] things like this. Raise your hand if [8:35] that feels familiar based on what you've [8:36] done. Yep. Sort of the like the the main [8:40] cornerstones of any session. And when [8:42] you clear the context, you go right back [8:45] to the system prompt. Bof, you go right [8:47] back there. So you delete everything [8:49] that's come before. [8:51] And raise your hand if you've heard of [8:54] compacting as well. Yeah. Okay. There [8:56] are some people who've not heard of [8:57] compacting. So let's just quickly show [8:59] what that means. For instance, I've just [9:02] been having a little chat with my LLM. [9:06] Uh, I want to make sure we sort of, you [9:09] know, just cover the basics so we're all [9:10] sort of on the same wavelength here. [9:12] I've just been having a chat with my [9:13] LLM. I've been talking about a thing [9:15] that I want to build. How's the font [9:17] size? Should I bump it up? Folks in the [9:19] back. Bump bump bump bump bump. [9:24] I'm using claw code for this session, [9:25] but you don't need to use claw code. Uh, [9:28] in fact, it's often nice not to use claw [9:30] code. Um, so I've been having a chat [9:33] with the LM just sort of planning out [9:34] what I'm going to do next. It's asking [9:35] me a bunch of questions and I can I [9:38] highly recommend you do this. There's [9:40] this tiny little status line here that [9:43] tells me how many tokens I'm using. The [9:45] exact number of tokens I'm using. Um I [9:47] have a article on my website AI Hero if [9:50] you want to copy this. This is oh wow [9:53] that is that shakes doesn't it? Um, this [9:57] is essential information on every coding [9:59] session because you need to know exactly [10:01] how many tokens you're using so that you [10:02] know how close you are to the dump zone. [10:05] Absolutely essential. And so let's watch [10:07] it. So I've got two options. I can [10:09] either clear [10:12] and go back to nothing or I can compact. [10:15] And when I compact then it's going to [10:18] squeeze all of that conversation which [10:20] admittedly isn't very much into a much [10:22] smaller space. And this in diagram terms [10:26] kind of looks like this where you take [10:27] all of the information from the session [10:29] and you essentially create a history out [10:31] of it, a written record of what [10:33] happened. [10:36] And devs love compacting for some [10:38] reason, but I hate it. I much prefer my [10:42] AI to behave like the guy from Momento [10:45] because this state is always the same. [10:48] Always the same. Every time you do it, [10:49] you clear and you go back to the [10:51] beginning. And so if you're able to do [10:52] that and you're able to optimize for [10:53] that, then you're in a great spot. [10:56] So that's kind of the two things I want [10:58] you to think about with LLM, the two [10:59] constraints that we're working with. [11:01] They have a smart zone and a dumb zone. [11:04] And they're like the guy from Momento. [11:06] So let's take a look at the first [11:08] exercise. And I'm while I'm doing this, [11:11] the way I want this to work is I'm going [11:12] to sort of show you how um I'm going to [11:15] be sort of walking through it up here. [11:17] And I want you folks to be kind of like [11:19] tapping away and doing things as well. [11:21] So that was just a little lecture bit. [11:23] Let's now actually get and do some [11:24] coding. For anyone who arrived late or [11:26] anyone in the Gilgood room, uh go to [11:29] this link, [11:32] this link up here [11:35] to see the exercises and clone the repo. [11:38] You absolutely do not have to. You can [11:39] just watch me do it if you fancy it. But [11:41] let's go there myself and let's see what [11:42] exercises await us. [11:45] So essentially, I've built a um this is [11:48] from my course. This is a uh a course [11:52] management platform essentially a kind [11:54] of CMS for instructors for students and [11:56] this is what we're going to be building [11:57] a feature in. So I'm going to take you [12:00] from essentially the idea for the [12:02] feature all the way up to building a PRD [12:04] for the feature all the way up to [12:06] implementing the feature and hopefully [12:08] you can take inspiration from this [12:10] process and use it in your own work. So [12:15] uh let's kick off. episode. [12:17] We're going to start by using a skill [12:19] which is very close to my heart. It's [12:21] the grill me skill. And this grill me [12:24] skill is wonderfully small, wonderfully [12:28] tiny. And it helps prevent one of I [12:31] think the main issues when you're [12:32] working with an AI, which is [12:34] misalignment. [12:37] The uh the sort of silent idea that I'm [12:41] talking against here, that I'm arguing [12:43] against is the specs to code movement. [12:45] Has anyone heard of the specs to code [12:46] movement? Raise your hand. It's not [12:48] really a movement. I suppose it's just [12:49] sort of people saying specs to code. Um, [12:53] what it is is people say, okay, you can [12:55] write a program or you want to build an [12:57] app. The best way to build that app is [13:00] to take some specifications. [13:02] So to write some sort of like document [13:05] and then turn that document into code. [13:09] So just turn it into code. How do you do [13:10] that? You pass it to AI. if there's [13:13] something wrong with the resulting code. [13:14] You don't look at the code, you look [13:16] back at the specs, you change the specs [13:18] and you sort of just keep going like [13:20] this. This is kind of like vibe coding [13:22] by another name where you're essentially [13:24] ignoring the code. You don't need to [13:26] worry about the code. You just sort of [13:27] keep editing the specs and eventually [13:29] you just keep going. And I tried this. I [13:31] really tried it and it sucks. It doesn't [13:33] work because you need to keep a handle [13:36] on the code. You need to understand [13:38] what's in it. You need to shape it [13:39] because the code is your battleground. [13:41] And so [13:44] this again is where we're going. Let's [13:45] let's get some exercises. So what I'd [13:48] like you to do is go to this page, the [13:49] the grill me skill. And inside the repo [13:53] here, we have a Slack message [13:57] from our pal. Where is it? It's in the [14:00] root of the repo. And it's under [14:05] where is it? [14:07] Clientbrief.mmd. [14:09] It's a Slack message from Sarah Chin. [14:11] For some reason, the Claude always [14:12] chooses Sarah Chen as the name. I don't [14:13] know why. Um, it's saying that in [14:16] Cadence, our um course platform, our [14:20] retention numbers are not great. [14:21] Students sign up, do a few lessons, then [14:22] they drop off. I'd love to add some [14:24] gamification to the platform. And so, [14:27] when you're presented with an idea like [14:29] this, you need to find some way of [14:30] turning it into reality. Let's say Sarah [14:32] Chen is your client. You're on a tight [14:34] budget. You need to get this done fast. [14:35] How do you go and do it? Um, raise your [14:39] hand if you would. um enter plan mode [14:42] when you're doing this. Anyone a big [14:43] user of plan mode? Yep. Um let's [14:46] actually shout out quickly any other [14:48] ideas about what you would do with this [14:49] or raise your hand if you what would be [14:52] your first port of call. [14:54] >> Yeah, [14:55] >> sorry. [15:00] >> Yes, exactly. Let's imagine that Sarah [15:01] Chen's gone on hold. You have no idea, [15:03] right? Uh she's just posted this thing. [15:05] You need to action it before you go. [15:07] Well, my first protocol is I go for this [15:10] particular skill. I'm going to clear my [15:12] context. [15:15] I'm going to uh get rid of you. You [15:19] don't need to be there. And I'm going to [15:21] say [15:22] um I'm going to invoke a skill, which is [15:25] the grill me skill. Let's quickly check. [15:28] Raise your hands if you don't know what [15:30] this is. [15:32] Cool. Oh, sorry. Sorry. Let me be more [15:34] specific. Raise your hands if you don't [15:36] know what I'm doing here when I uh do a [15:39] forward slash and then type something. [15:42] Anyone everyone kind of understand what [15:43] that is? I'm invoking a skill. I'm [15:45] invoking the grill me skill. And what [15:48] I'm going to do is I'm going to say [15:49] grill me and I'm going to pass in the [15:51] client brief. [15:54] So now the LLM really has only a couple [15:58] of things here. It just has the skill [15:59] and it has the description of what I [16:01] want to do. [16:04] And this is virtually how I start every [16:06] piece of work with AI. And while it's [16:09] exploring the codebase, [16:11] I'm just going to show you what the [16:12] grill me skill does. So this is inside [16:15] the repo so you can check it out. It's [16:17] extremely short. Interview me [16:20] relentlessly about every aspect of this [16:22] plan until we reach a shared [16:23] understanding. Walk down each branch of [16:24] the design tree, resolving dependencies [16:27] one by one. For each question, provide [16:29] your recommended answer. Ask the [16:31] questions one at a time. uh blah blah [16:33] blah. What this does, and what I noticed [16:36] when I was working with AI, especially [16:38] in plan mode actually, is it would [16:42] really eagerly try to produce a plan for [16:44] me. It would say, "Okay, I think I've [16:46] got enough. I'm just goof plan." [16:49] And what I found was that [16:53] I was really trying to find the words [16:55] for this for for what I wanted instead [16:57] of that. And Frederick P. Brooks in the [17:00] design of design he has a great quote uh [17:03] talking about the design concept when [17:06] you're working on something new with [17:07] someone when you're uh all trying to [17:10] build something together [17:12] then there's this shared idea that's [17:14] shared between all participants and that [17:16] is the design concept and that's what I [17:18] realized I needed with Claude I needed [17:22] I needed to reach a shared understanding [17:25] I didn't need an asset I didn't need a [17:27] plan I needed to be on the same [17:28] wavelength as the AI as my agent. And [17:31] this is an extremely effective way of [17:33] doing it. So hopefully there we go. [17:35] Nice. It has done its exploration. First [17:38] of all, it's invoked a sub agent which [17:41] spent uh 97 93.7K tokens on Opus. [17:47] Um and it's asked me the first question. [17:51] Cool. We can see that even though the [17:52] sub agent burned a ton of tokens, I [17:55] haven't actually um uh increased my [17:58] token usage that much. Raise your hand [18:00] if you don't know what sub aents are. [18:02] It's an important question. Everyone [18:05] kind of clear what sub aents are? Okay, [18:06] I'll give a brief definition which is [18:08] that this this sub aents thing here, [18:10] this explore sub agents, it has [18:12] essentially gone and called another LLM [18:14] which has an isolated context window [18:18] and then that LLM has reported a summary [18:20] back. So a sub aent is kind of like a [18:22] delegation. You're delegating a task to [18:24] a sub agent. It goes eagerly does all [18:26] the thing, explores a ton of stuff and [18:28] then just drip feeds the important stuff [18:30] back up to the orchestrator agent to the [18:33] parent agent. So, okay. So, hopefully [18:36] you guys have seen the same thing. It's [18:37] done on explore. And we now have our [18:40] first question. Points economy. What [18:42] actions earn points and how much? Okay. [18:45] At this point, you can ask it, by the [18:47] way, questions to um deepen your [18:49] understanding of the repo. I obviously [18:50] know this repo really well because I [18:52] wrote it, but you might not um know [18:54] what's going on. So, let's say my [18:57] recommendation, keep it simple, twopoint [18:59] sources to start. What's so nice about [19:01] this is that not only does it give us a [19:03] question that kind of aligns us here, we [19:06] get a recommendation, too. And often [19:08] what I'll find is the AI's [19:09] recommendations are really good. And so [19:11] I'll just say skip video, watch events, [19:13] they're noisy and gameable. I agree. [19:16] Sarah's asked while keep lessons in the [19:17] bread and butter. [19:20] Yeah, [19:21] looks good, pal. [19:24] Now, what I usually do is I usually [19:26] dictate to the AI. I'm usually actually [19:29] chatting to the AI instead of uh typing [19:31] here, but uh this is a relatively new [19:33] laptop and I couldn't get my dictation [19:35] software working on it um because [19:37] Windows is crap. Um [19:41] so should points be retroactive? There [19:43] are existing lessons progress records. [19:45] We're completing out timestamps. This is [19:47] a really nasty question, right? Should [19:49] we actually go back and backfill all of [19:51] the lesson progress events? This is a [19:53] kind of question that you need to be [19:55] aligned on if you're going to fulfill [19:56] the feature properly. This is not [19:58] something I considered and Sarah Chen [19:59] certainly didn't consider. Do I want it [20:02] to be retroactive? H. Let's actually do [20:05] a vote inside here. Should we go back [20:08] and backfill all the records? Raise your [20:09] hand if you think we should backfill all [20:10] the records. [20:13] Raise your hand if you think we [20:14] shouldn't backfill all the records. [20:17] There are a lot of uh fence sitters in [20:19] the room. I'm going to say, [20:22] you know, this is the kind of discussion [20:23] you're sort of having with the AI. [20:24] You're getting further aligned. Yes, I'm [20:25] just going to go with this [20:26] recommendation because I'm lazy. [20:31] Notice, too, how I'm able to keep in the [20:33] loop here with AI. I'm not, you know, [20:35] it's it's pinging me these questions [20:36] pretty quickly. [20:39] I'm not having to go off and check [20:40] Twitter or something. Levels. What's the [20:43] progression curve? Yeah, that looks [20:45] about right, for instance. Yes. Okay. So [20:48] hopefully you should be able to go and [20:49] um kind of work through this with the AI [20:52] and essentially try to reach an [20:55] alignment. And this grill me skill this [20:57] can last a long time. This can I've had [21:00] it ask me 40 questions. I've had it ask [21:02] me 80 questions. I've had some people it [21:04] asks a hundred questions to literally [21:06] you're sat there for an hour chatting to [21:08] the AI. And what you end up with is [21:11] essentially this conversation history [21:13] that works really nicely and works [21:15] really nicely as an asset of the design [21:17] concept that you're creating. This can [21:20] also function like this. You can uh have [21:22] a meeting with someone who's a maybe a [21:24] domain expert. Maybe I have a meeting [21:26] with Sarah. I feed that meeting [21:28] transcript into uh I don't know Gemini [21:31] meetings or whatever you guys are using. [21:33] You take that, you feed it into a [21:35] grilling session and you grill through [21:37] the assumptions that you didn't have. [21:39] So, this ends up being a really nice [21:40] kind of um a really nice way of just [21:44] taking inputs from the world and then [21:46] just turning and validating them. So, [21:49] okay, [21:51] let's see. I really want to get to the [21:53] end of this, but I also don't want to [21:54] just like be sat here talking to the AI [21:56] in front of you for uh a thousand days. [21:58] So, I'm just going to say yes. [22:03] Let's see what happens. So, I tell you [22:05] what. Um, while you guys sort of have a [22:07] little fiddle with this locally, let's [22:09] start a little Q&A session now. And [22:13] let's see how's this going to work. Can [22:15] we keep the door closed? I'll turn up [22:16] the microphone. It's quite noisy. Uh, [22:20] let's see. Mike, can we uh Door closed? [22:23] Oh, it has been closed. Mark has [22:24] answered. Beautiful. So, what I'd like [22:27] you to do is there any air con? Yeah, [22:30] there is some air con. I think there is [22:32] some air con you guys aren't being lit [22:35] here. I'm being I'm being fried alive [22:37] here. Uh so what I'd like you to do is [22:40] go on to the slideo which you can join [22:42] here. Have a if if you're not taking the [22:44] exercise, go on to the slideo, have a [22:46] little fiddle and vote on some good [22:48] questions. I'm just going to chat to the [22:50] AI for a second uh until we reach a [22:53] stopping point. So do streaks earn [22:54] points? [22:56] Um, streaks are standalone. [23:06] Let's see what else it comes up with. [23:12] Where does gamification UI live? Let's [23:15] have it in the dashboard. [23:19] I'm just going to scan these and blast [23:20] through them basically. So, how we doing [23:22] with our slido? [23:24] Okay. [23:26] Have I tried specit open spec or [23:28] taskmaster instead of the grill me [23:30] skill? Do I find them more verbose or a [23:32] structured alternative? This is a great [23:33] question. So there are a ton of [23:35] different frameworks out there that [23:36] allow you to um sort of build up this [23:39] planning process for you. I personally [23:42] believe you at at this stage when [23:45] there's no clear winner, when there's no [23:46] kind of like one true way and when [23:48] things are changing all the time, you [23:50] need to own as much of your planning [23:52] stack as you possibly can. What I've [23:55] noticed and a lot of my students is [23:59] they tend to overuse a certain stack. [24:03] they get into trouble and they because [24:06] they don't own the stack and they don't [24:07] have observability over the whole thing, [24:09] they just go, "This isn't working. This [24:12] sucks." Whereas if um if you have [24:15] control over the whole thing, then at [24:17] least you know how to fix it or [24:19] potentially know how to fix it. So I'm [24:22] even though I'm sort of giving you uh a [24:26] stack basically, I believe in inversion [24:28] of control and you should be in control [24:30] of the stack. [24:32] So, can I press zero, please? [24:38] >> Sorry. [24:40] >> Sorry, that was a lot of sort of [24:41] mumbling. Can I [24:42] >> feedback? You have four options on the [24:44] bottom of you to hit dismiss. [24:48] >> Thank you. [24:50] I'm so sorry. Well, you didn't want to [24:52] give Claude good feedback. Why? What's [24:54] wrong with you? [24:58] Okay cool. [24:59] Uh many of the questions asked by the [25:01] grill me skill are not necessarily [25:02] appropriate for a developer rather a PO [25:04] in larger teams who should use it. Yeah. [25:06] Um raise your hand if um you've ever [25:10] done pair programming. Anyone ever done [25:12] pair programming? Right. Keep put your [25:15] hands down and raise your hand again if [25:16] you've ever done a pair programming [25:18] session with an AI. [25:20] Right. How did it go? Was it good? You [25:23] enjoy it? I think pair programming [25:25] sessions with AI is a great idea because [25:27] you've got a third person in the room [25:28] who will relentlessly quiz you and ask [25:30] you questions. It should if you don't [25:32] know the answer, it should be you, the [25:33] domain expert and the AI in the same [25:35] room. If you have a question about [25:37] implementation, it should be you, a [25:39] fellow developer and the AI in the same [25:41] room. You know, you can be sort of [25:43] working through these questions in your [25:44] team. And I think actually we're going [25:47] to look at implementation in a bit and [25:49] we're going to see how you can make [25:50] implementation so much faster. And but I [25:54] think the really crucial decisions, the [25:55] ones you need humans for, you actually [25:57] need a lot of humans and it doesn't [25:59] really matter how many humans are in [26:01] there. You can actually throw a bunch [26:02] like a kind of like mob programming with [26:04] AI essentially. [26:07] Uh what's my favorite metaprompting [26:08] tool? I think I kind of answered that. [26:10] Uh there's no air con. Let's just live [26:12] with it. Uh, how do I use the [26:14] conversation as an asset after the grill [26:16] me session? Well, we're going to get [26:18] there. [26:20] Um, okay. So, I really want to [26:24] I want to speed this up sort of [26:25] artificially. [26:28] >> Just what [26:30] >> I This is the thing. So, someone just [26:32] said, "Okay, Ralph loop this." But this [26:33] is crucial because I can't loop over [26:36] this, right? I can't um I think of there [26:40] as being two types of tasks in the AI [26:42] age where you have human in the loop [26:45] tasks where a human needs to sit there [26:47] and do it which is this we are the human [26:51] in the loop with multiple humans in the [26:52] loop and there are AFK tasks there are [26:55] tasks where the human can be away from [26:56] the keyboard and it doesn't matter [26:58] implementation as we'll see can be [27:00] turned into an AFK task but planning [27:03] this alignment phase has to be human in [27:06] the loop has to be. [27:09] So, I've got to do it, unfortunately. [27:11] Um, I don't know. Uh, give me a long [27:16] list of all your recommendations. [27:20] I'm running a workshop right now, [27:24] so I artificially [27:26] need you to [27:28] pull more weight. [27:31] So, let's see what it does. Uh, let's [27:34] answer a couple more questions while [27:35] it's doing its thing. [27:37] What is my opinion on PMS or other [27:39] non-dev rolls vibe coding task? [27:45] Um, I'm going to return to this later. I [27:48] think I'm going to leave this [27:49] unanswered. [27:51] A bit of mystery. [27:53] I notice I'm not using the ask user [27:55] questions UI for grill me. Why? Um, [27:57] there's a specific uh UI that you can [28:00] bring up in claude code which I'll [28:02] answer this just quickly. uh ask me a [28:05] question using the ask user question [28:09] tool. [28:10] And this UI um is just sort of broken in [28:13] Claude and I really hate it. [28:16] You notice I'm using Claude, but I don't [28:19] like Claude very much. Like you you [28:21] really are free with this method to [28:22] choose any um system you like. And this [28:24] is what the UI looks like. It's very [28:26] pleasing when you first encounter it, [28:27] but then you realize it is actually [28:28] broken in a ton of different ways. [28:32] All right, what did it come back with? [28:33] Oh, blime me. [28:35] Oh no. [28:38] So, [28:40] while this is doing its thing, let me do [28:41] some teaching in the meantime. The plan [28:44] here is that we take our grill me skill [28:47] and we need to essentially find some way [28:49] of turning it into a destination. [28:53] We need to go down to the uh we [28:57] essentially need to we're figuring out [28:59] the shape of this. That's what we're [29:01] doing. figuring out the shape of the [29:03] tasks during the grilling session. And [29:06] in order to turn it into a bunch of [29:09] actionable actions for the AI, we [29:12] essentially need to figure out the [29:14] destination. We need to know where we're [29:15] going. We need to know the shape of this [29:17] entire thing. So I think of there as [29:19] being two essential documents that we [29:21] need. We need a document that documents [29:25] the destination. [29:27] Oh no, [29:29] it's so not bright enough. There we go. [29:33] Still not bright enough. There we go. We [29:35] need something to document the [29:36] destination and we need something to [29:39] document the journey. In other words, we [29:41] need something a document that's going [29:43] to figure out what this even looks like [29:46] in all of its user stories and figure [29:48] out a definition of done. And then we [29:50] need to figure out what the split looks [29:52] like. So that's where we're going to go [29:54] to next. So once we finish with the [29:56] grilling session. [29:59] Yeah, it looks great. Fantastic. I love [30:01] it. It answered it answered 22 of its [30:04] own questions. There you go. That's [30:05] quite representative of what a grilling [30:07] session looks like. [30:09] So at this point now I have used 25k [30:14] tokens and all of that or loads of that [30:17] stuff is gold. I want to keep that [30:19] around. I've I've got 25k great tokens [30:23] there. And what I want to do is kind of [30:25] summarize it in some kind of destination [30:27] document. So this is um the next [30:30] exercise where we're going to [30:35] uh we're going to write a product [30:37] requirements document. And the product [30:40] requirements document or the PRD is [30:43] essentially that's its function. It's [30:46] the destination document. And it sort of [30:48] doesn't matter what shape it is. I've [30:51] got a shape that I prefer and that I [30:53] quite like, but you can just choose your [30:55] own shape or whatever your company uses. [31:00] And all we're really doing is too [31:03] worried about that. [31:05] All we're really doing is summarizing [31:07] the design concept that we have so far. [31:10] And [31:12] the So let let's try this. So I'm going [31:15] to initiate this. I'm going to say zoom [31:17] all the way to the bottom. All I'm going [31:19] to do is just say write a PRD. [31:23] And we can take a look at that skill [31:24] now. [31:26] Write a PRD. [31:29] So this skill, [31:31] it does a few things. It first asks the [31:35] user for a long detailed description of [31:36] the problem. You can use write a PRD [31:38] without grilling first, but I just like [31:39] to grill first and then write the PRD [31:41] afterwards. Then you can um get it to [31:44] explore the repo, which we've kind of [31:46] already done. Then we get it to [31:49] interview the user relentlessly. So have [31:50] a kind of grilling session again. And [31:52] then we start um putting together a PRD [31:56] template. So this is available in the [31:57] repo if you want to check it out. And [31:59] essentially this is what it looks like. [32:01] We've got some problem statements, the [32:02] problem the user is facing, the solution [32:04] to the problem, and a set of user [32:06] stories. And these user stories sort of [32:08] define what this is. You know, as you [32:11] you guys have probably seen things like [32:12] this if you've been a developer at all. [32:14] um you know there are cucumber is a [32:16] language you can use to write these in [32:17] or we just sort of um uh write them [32:20] ourselves essentially. Then we have a [32:22] list of implementation decisions that [32:24] were made and a list of crucially [32:26] testing decisions too. So [32:31] I'm going to run this. Okay. And so it's [32:33] finished its thing. Ah [32:37] Windows let me close the thing. Thank [32:39] you. I don't know why I bought a Windows [32:41] laptop. I think I just I like the [32:43] challenge. Um [32:46] so the first thing that it's going to [32:47] give me are a set of proposed modules it [32:51] wants to modify. [32:54] Now there's a deep reason why I'm [32:55] thinking about this. So this is at this [32:58] stage we have an idea. We have sort of [33:02] speced out the idea. We've reached a [33:04] sort of understanding of what we're [33:06] trying to do. And then we need to start [33:09] thinking about the code because at this [33:11] point we need to this is not specs to [33:14] code. This is not where we're ignoring [33:16] the code. We actually keep the code in [33:18] mind throughout the whole process. And [33:21] the way I like to do this is I like to [33:23] just sort of think about a set of [33:24] proposed modules to modify. We're going [33:26] to return to this this idea of [33:29] continually designing your system and [33:31] keeping your system in mind. So it's [33:33] it's saying recommend test for the [33:34] gamification service is the only deep [33:36] module with meaningful logic. These [33:38] modules look right. Yeah, that's good. [33:44] And it's going to ping out a PRD. [33:48] Now for ease of setup, I've got it so [33:51] that it creates a set of issues locally. [33:54] So it's just going to create essentially [33:55] a PD inside this issues directory. But [33:59] the way I usually do it, and you can [34:02] check this out yourself, is you can go [34:04] to my um essentially what I consider my [34:06] work repo, which is [34:08] github.com/mattpocco/course [34:12] video manager up here. And in here, this [34:16] is essentially a app that I create um [34:19] that I use all the time to record my [34:21] videos and things like this. I think [34:22] I've recorded like I pulled down the [34:25] sets. I think I've recorded like a [34:26] thousand videos in here or something [34:27] nuts. Um, and you can see here that it's [34:30] got 744 closed issues. And this is [34:33] essentially all of the uh PRDs and all [34:36] of the implementation issues that I've [34:38] put into here. So, this is how I usually [34:39] like to do it. [34:42] So, that's what I'm doing with the There [34:45] we go. Yeah, I'm just going to say yes [34:47] and uh [34:49] and get that issue out. Let's see. It is [34:52] inside here. So, we got the problem [34:54] statement. people sign up for courses, [34:58] uh the solution, the user stories, uh 18 [35:00] user stories, looks nice, some [35:02] implementation decisions, level [35:03] thresholds, etc. This is enough [35:05] information. We've kind of clarified [35:07] where we're going and what we're doing. [35:09] So that's what we do. We essentially [35:11] have a grilling session and we've [35:12] created an asset out of it. Now, raise [35:15] your hand. Should I be reviewing this [35:17] document? Raise your hand if you think I [35:20] should be reviewing the document. [35:22] Yeah, I don't I don't look at these. I [35:24] don't look at these. The reason I don't [35:27] look at these is because what am I [35:29] testing at this point? What am I like [35:31] when I read it? [35:33] What am I testing? What am I what are [35:34] the failure modes I'm trying to test [35:36] for? I know that LLMs are great at [35:38] summarization because they are they're [35:40] really good at summarization. I have [35:42] reached the same wavelength as the LLM, [35:45] right? Using the grill meme skill, we [35:46] have a shared design concept. So if I [35:48] have a shared design concept, all I'm [35:50] doing is I'm just essentially checking [35:53] the LLM's ability to summarize. [35:56] So I don't tend to read these. [36:00] Let's have let's have a Q&A because I [36:02] can feel you guys are itching for it. [36:03] And then I think we might have like I [36:06] don't know just a five minute comfort [36:07] break just to rest my voice and so you [36:08] can catch up with the exercises for a [36:10] minute if that's all right. So let's [36:11] have a little Q&A sesh. Uh, if I don't [36:15] like clawed code, which one do I [36:17] actually like? Um, [36:20] uh, have you ever heard the phrase, um, [36:23] uh, democracy is the worst way to run a [36:25] country apart from all the other ways? [36:27] That's how I feel about claw code. [36:30] Uh, we've answered that one. [36:33] Uh, [36:34] what's your thoughts on developers [36:36] needing to very deeply understand [36:37] Typescript now that fix the TS make no [36:40] mistakes exist? I don't understand the [36:42] phrasing of this but I think I [36:44] understand the meaning which is that [36:48] I believe that code is very important [36:50] and this is kind of going to feed [36:52] through the whole session and that bad [36:54] code bases make bad agents. If you have [36:57] a garbage codebase you're going to get [36:59] garbage out of the agent that's working [37:01] in that codebase. We'll talk more about [37:02] that in a bit. And so I think [37:04] understanding these tools very deeply, [37:06] understanding code deeply is going to [37:08] make you a much much better developer [37:10] and get more out of AI. [37:14] Uh, and that answers that question too. [37:16] Sweet. [37:19] Uh, get out of it. There you are. [37:24] Now that we have 1 million tokens [37:25] available, do we ever actually want to [37:27] take advantage of that? [37:30] I've noticed that the dumb zone has [37:31] become less dumb lately. Okay, great [37:33] question. This goes back to our kind of [37:35] initial idea on the dumb zone. [37:41] Uh [37:44] I um I recorded my Claude Code course [37:46] using a 200k context window and on the [37:49] day that I launched the course, they [37:50] announced the 1 million context window. [37:53] My take on this is that what Claude code [37:54] did is they essentially just did this. [37:58] They shipped a lot more dumb zone to you [38:01] essentially. Now, this is good for tasks [38:03] where you want to retrieve things from a [38:06] large context window. If you want to [38:07] pass five copies of War and Peace or [38:09] something to it, and you want to find [38:11] out all the things that uh [38:14] uh I can't remember a character from War [38:15] and Peace. Uh why did I start with that? [38:18] It's good for retrieval. It's less good [38:20] for coding. So, I consider that it is [38:24] about 100K at the moment is the smart [38:28] zone. the smart zone will get bigger and [38:30] that will be a really nice improvement. [38:33] So folks, we're going to take like a [38:34] five minute comfort break if that's all [38:36] right just for my voice and so maybe you [38:38] can have a little move around or [38:39] something or grab a drink. I can just [38:41] notice some sleepy eyes and I want to [38:42] make sure that we're awake for the next [38:44] bit if that's all right. So we'll take [38:46] five minutes and I'll see you back here [38:49] then. All right. [38:51] So we have our PR which I'm not going to [38:55] read a kind of destination document. [38:58] Let's quickly scan for any good [38:59] questions before we zoom ahead. [39:02] And [39:05] rediscovering the role of software [39:07] engineer in today's world. Top three [39:08] disciplines you recommend. Um, taekwondo [39:12] is good. I've heard I' have no idea how [39:13] to answer this question. Um, thank you [39:16] for asking it though. Um, top three [39:18] disciplines I recommend. [39:21] >> I mean, sorry, [39:22] >> plumbing. [39:23] >> Plumbing is a good one. Yeah. Yeah. [39:24] Yeah. I don't know if that's a [39:25] discipline. The plumbers I've hired are [39:27] not usually very disciplined. Um, [39:30] right. [39:32] So, okay, we now have our destination. [39:34] Okay. Um, [39:37] perfect. [39:39] So, how do we actually get to our [39:40] destination? How do we We have a sort of [39:42] vague PRD. How do we split it so that we [39:46] don't put things into the dump zone? In [39:49] other words, we have our number four. [39:50] How do we split it into this kind of [39:52] multi-phase plan? Well, probably what [39:54] you would do at this point is you would [39:55] say, "Okay, Claude, give me a [39:57] multi-phase plan that gets me to this [39:59] destination." Right? That sort of makes [40:01] sense. This is what we've been doing [40:02] before, but I have um a sort of better [40:04] way of doing it now, which is that [40:08] I like creating a canban board out of [40:12] this. Raise your hand if you don't know [40:14] what a canban board is. [40:17] Cool. Okay. A camon board is essentially [40:19] just a set of tickets that you put on [40:22] the wall that have blocking [40:23] relationships to each other. So, we're [40:25] going to see what it kind of looks like [40:26] here. This is how we've worked um as [40:30] developers for a long time, really since [40:31] agile came around. And what it does, we [40:35] can see it here. It has proposed that we [40:38] split this setup into um five different [40:42] tasks. Here we have the first one which [40:44] is the schema and the gamification [40:46] service. Yeah, that looks pretty good. [40:48] This is blocked by nothing. And we can [40:51] even see here that it's a it's given it [40:53] a type of AFK, too. Remember I talked [40:55] about human in the loop and AFK earlier. [40:57] This is an AFK task. This is something [40:58] we can just pass off to an agent to do [41:00] its thing. Streak tracking. Okay, that [41:02] looks good. [41:04] Uh then wire points and streaks into [41:07] lessons quiz completion. This is blocked [41:08] by one and two. Retroactive backfill. [41:11] This is blocked only by one. And then [41:14] this one here is blocked by all of the [41:16] tasks. Cool. [41:19] H. Now I consider this, you could say, [41:23] why don't we just make this sort of [41:25] generation of the issues? Why don't we [41:26] just hand that over to the AI? Why do I [41:28] need to be involved here? Right? Because [41:30] it's given us quite a good selection of [41:32] tools here. Why do I need to review this [41:34] and sort of figure out what's next? Now, [41:37] my take here is that this is really [41:40] cheap to do, like very quick to do once [41:42] I've done the PR. And I can immediately [41:44] see some issues here. [41:47] There's a really, really important [41:49] technique when you're kind of figuring [41:51] out what the shape of this journey [41:53] should look like. And [41:57] it sort of comes to this very classic [42:00] idea uh which comes from pragmatic [42:02] programmer called tracer bullets or [42:04] vertical slices. [42:07] and traceable. It's really transformed [42:09] the way I think about actually [42:11] getting AI to pick its own tasks. [42:14] Systems have layers, right? There are [42:17] layers in your system. These might be [42:19] different deployable units. You might [42:21] have a database that lives somewhere. [42:23] You might have an API that lives maybe [42:24] close to the database but in a separate [42:26] bit. You might have a front end that [42:27] lives somewhere totally different like a [42:29] CDN. Or within these deployable units, [42:32] you might have different layers within [42:34] those. In for instance the codebase that [42:36] we're working in, we have a ton of [42:38] different services servers. We have a [42:41] quiz service, a team service, user [42:43] service, coupon service, course service. [42:45] And these services have dependencies on [42:47] each other. So they're kind of like [42:48] individual layers. Well, what I noticed [42:53] is that AI loves to code horizontally. [42:57] So it loves to code layer by layer. So [43:00] in other words, in phase one, it will do [43:02] all of the database stuff, all of the [43:03] schema, all of the, you know, all the [43:06] stuff related to that unit. Then it will [43:08] go into phase two and do all of the API [43:10] stuff. Then it will add the front end on [43:12] top of that. [43:14] Does can anyone tell me what's wrong [43:16] with that picture? Why is that not a [43:18] good thing to do? Raise your hand if you [43:20] have an answer. [43:21] >> Yeah. [43:21] >> Have the whole feedback loop. [43:23] >> Exactly. You don't get feedback on your [43:26] work until you've really started or [43:29] completed phase three. [43:32] So what you really need to do is you're [43:35] not until you get to phase three, you're [43:36] not actually testing that all the layers [43:38] work together. [43:41] You haven't got an integrated system [43:42] that you can test against. And so [43:45] instead you need to think about vertical [43:47] layers. You need to think about thin [43:49] slices of functionality that cross all [43:52] of the layers that you need to. And this [43:55] is a much better way to work, much [43:57] better way for the AI to work too [43:59] because it means at the end of phase one [44:01] or during phase one, it can get feedback [44:02] on its entire flow. So what this means [44:05] to me [44:07] is inside the PRD to issues skill up [44:11] here I have got break a PRD into [44:15] independently grabbable issues using [44:17] vertical slices traceable it's written [44:19] as local markdown files we first locate [44:21] the PRD [44:23] uh again explore the codebase if this is [44:25] a fresh session we draft vertical slices [44:28] so we break the PRD into tracer bullet [44:30] issues a traceable bullet by the way is [44:33] Uh, essentially when you're like an [44:35] anti-aircraft gunner, it's quite a [44:37] violent idea actually, uh, and you're [44:40] looking up in the sky and it's night, if [44:42] you're just shooting normal bullets, you [44:43] have no idea what you're firing at, [44:45] right? You could just be, you know, you [44:46] see the plane, but you don't see where [44:47] your bullets are going. Tracer bullets [44:49] is they attach a tiny bit of [44:50] phosphoresence or phosphor or something [44:53] to make it glow as it goes. So, this [44:56] means that every sixth bullet or [44:57] something, you actually see a line in [44:58] the sky. So, you have feedback on where [45:01] you're aiming. So this is what this is [45:03] the idea here is that we increase our [45:05] level of feedback and we get near [45:07] instant feedback on what we're building [45:09] because without that the AI is kind of [45:11] coding blind until it reaches the later [45:13] phases. We've got some vertical slice [45:15] rules. We quiz the user and then we [45:17] create the issue files. So what I see [45:20] here is that even though I've I've told [45:24] it to do vertical slices, it's proposing [45:27] to [45:29] create the gamification service [45:32] first on its own. That's just one slice [45:35] there. And that to me feels like a [45:36] horizontal slice. What I want to see in [45:38] the first vertical slice especially is I [45:41] want to see the schema changes or some [45:42] schema changes. I want to see some new [45:45] service being created and I want a [45:47] minimal representation of that on the [45:48] front end. So I want it to go through [45:50] the vertical slices, not just the [45:52] horizontal. Does that make sense? Okay. [45:55] So I'm going to give the AI a [45:57] rollicking. [45:59] Uh bad boy. No, [46:02] I'm not going to waste tokens just being [46:04] just memeing. Um so the first slice is [46:08] too horizontal. I'll just start with [46:10] that and see if it picks it up. Does [46:12] that make sense as a concept? And I [46:14] think having that um what I really like [46:17] about going back to those old books is [46:20] that we are really trying to in this day [46:22] and age like get [46:25] uh verbalize best software practices in [46:28] English. And these books, 20-year-old [46:30] books have already done that. And it's [46:32] an absolute gold mine if you want to [46:34] throw that into prompts. But even with [46:35] that, it's not going to um not going to [46:37] do a perfect job each time. So, award [46:41] points for lesson completion visible on [46:43] dashboard. Yes, that's a beautiful [46:45] vertical slice because it's definitely a [46:48] big chunk of stuff. It's doing a lot of [46:49] stories there, but we're going to see [46:51] something visible at the end and the AI [46:53] will then just be able to add to that. [46:55] You see why that's preferable to the [46:56] first one. Cool. Uh, looks great. [47:01] So, we're getting closer now. And anyone [47:03] following at home as well, you're not at [47:05] home, but you get the idea. um we'll [47:08] hopefully see the same thing too and [47:09] start developing the same instincts. [47:11] Let's open up for questions just while [47:13] I'm sort of creating these GitHub issues [47:17] or not GitHub issues uh local issues. [47:20] When will I stop using Windows? Never. [47:22] What is your uh Okay, we'll get to that [47:24] later. [47:26] How does AI um decide when to stop [47:28] grilling? Because AI can ask [47:29] incessantly. Can we have a smarter way [47:31] to decide the stop point? Yeah, it does [47:33] tend to really um those grilling [47:35] sessions can be super intense. And the [47:37] thing about these skills is you can tune [47:38] them if you want to. If you feel like [47:40] the AI is just absolutely hammering you, [47:42] hammering you, hammering you, then you [47:43] can just tell it to just pull back a [47:46] little bit or get it to do, you know, [47:48] stop points and that kind of thing. So, [47:49] if that's a failure mode that you run [47:50] into a lot, then you just, you know, [47:52] change the skill. [47:55] Uh do I still use uh be extremely [47:57] concise, sacrifice grammar for the sake [47:58] of concision? Um there was a tip that I [48:01] gave folks um five months ago which is [48:05] that to basically increase the [48:07] readability of your plans. So when [48:09] you're using plan mode then you can put [48:11] it in your claw.md [48:13] and you can say okay yeah approve that. [48:17] Let's open up claw.md. [48:21] Do I have a claw.md? Maybe I don't. I I [48:23] really don't use clawd very much. I'm [48:25] just going to put a dummy inside here. [48:28] Um when no when talking to me [48:33] uh sacrifice grammar for the sake of [48:35] concision [48:40] and this um prompt was uh really useful [48:43] to me when I was reading the plans [48:45] because it meant that the plans would [48:46] come out and they would be very concise, [48:48] really nice, easy to read, often very uh [48:50] concise. But I've since dropped this [48:54] idea in preference to a grilling session [48:57] because what I noticed was it just I [48:59] didn't want to read the plans. I wanted [49:00] to get on the same wavelength as the [49:02] LLM. I wanted it to ask aggressive [49:03] questions to me. And when I stopped [49:05] reading the plans, I stopped needing [49:06] them to be concise. So I think of the [49:09] plans really in the destination document [49:11] as uh the end state. And I don't need [49:13] that end state to be concise. Hopefully [49:15] that answers your question. [49:19] Uh, what do I think will be the outcome [49:22] of the Mexican standoff of future roles [49:23] of PMS and other roles converging? Uh, I [49:26] have no idea. I'm not a pundit. I have [49:28] no idea. [49:30] Uh, okay. So, we should uh after a [49:34] couple of approvals [49:37] uh end up with a set of issues. Now, [49:40] these issues that we're creating, [49:43] they're designed to be independently [49:44] grabbable, which means that this canon [49:47] board ends up looking kind of like this [49:51] where you have [49:53] essentially a set of tickets with a [49:55] whole load of independent relationships. [49:57] So, this one needs to be done before [49:59] this one. This one needs to be done [50:00] before this one. And this one, let's say [50:03] we got another one over here. This one [50:05] needs to be done before this one. This [50:07] means that you can start to parallelize. [50:11] You can start to get agents working at [50:13] the same time on these tasks because [50:15] yeah, this one needs to be done first [50:18] and then these two [50:21] can be grabbed at the same time by [50:24] independent agents. Raise your hand if [50:26] you've done any kind of parallelization [50:28] work with agents. Okay, cool. So this [50:32] allows you um to turn those plans into [50:35] optimally kind of like into directed [50:38] asyclic graphs essentially where you [50:39] just are able to um essentially have [50:43] three phases here where you have phase [50:47] one. [50:49] Let me grab move that [50:51] uh above this line here you do this one. [50:56] Then phase two you do the two below it. [50:58] And then phase three you do this third [51:00] one. and add it onto there. And when you [51:03] think about there could be this could [51:05] this is a relatively simple plan but you [51:07] could have many different plans [51:08] operating all at once. It means that you [51:10] can do really nice parallelization and [51:12] we'll talk more about that in a bit. But [51:14] that's why I prefer a canon board set up [51:17] like this to a sequential plan because a [51:20] sequential plan can really only be [51:22] picked up by one agent. So this where [51:26] did it go? Over here. [51:29] Yeah, this plan here, this is really [51:32] only one loop, right? Only one agent can [51:35] work on these because we have numbered [51:36] phases and they're not parallelizable. [51:38] Does that make sense? Cool. [51:42] So, we've got our issues. Ah, come on. [51:44] Stop asking me for Oh, no. It's creating [51:46] them on GitHub. I really don't want [51:48] that. [51:49] Oh, no. You fool. [51:53] Create them in issues instead. [51:58] No, that's not precise enough. Uh, you [52:00] fool. Create them in local markdown [52:04] files instead referencing the local [52:08] version. [52:11] Sorry about this. [52:15] So, once we get to this point, we have a [52:18] bunch of issues locally that we can [52:21] start um looping over and implementing. [52:25] And it's at this point that the human [52:27] leaves the loop. So, so far, [52:31] let me pull up a a proper overview of [52:33] this kind of flow that we're exploring [52:35] here. [52:37] So far, [52:40] we have taken an idea, [52:43] zoom this in a bit for the folks at the [52:44] back, [52:47] and we've grilled ourselves about the [52:49] idea. [52:51] We can skip over research and prototype, [52:52] but we've turned that into a PRD into a [52:54] destination document. We've then turned [52:57] that PRD into a canon board and all of [53:00] those steps are human reviewed. And now [53:05] the implementation stage, we step back [53:08] and we let an agent um work through that [53:10] camp board or multiple agents work [53:12] through the camp board. [53:15] Now, what this means is that yeah, we've [53:17] spent a lot of time planning here, but [53:19] it means that we've queued up a lot of [53:21] work for the agent. We can think of this [53:23] as kind of like the day shift and the [53:24] night shift. This is the day shift for [53:26] the human, right? Planning everything, [53:28] getting all the uh all the stuff ready [53:30] and then once we kick it over to the [53:32] night shift, the AI can just work AFK. [53:34] But what does that look like? [53:37] Well, so I'm just going to Oh, yeah. [53:40] Just allow it. It's perfect. [53:42] So this looks like if we head to the [53:45] next exercise [53:47] which is [53:51] uh in fact the last exercise here [53:52] running your AFK agent. [53:55] Now [53:57] I've called this uh Ralph really because [53:59] it is a it is essentially a Ralph loop [54:02] and this prompt here I want to walk [54:04] through this really closely. [54:06] The first thing it's doing here is we're [54:08] essentially going to run Claude and [54:10] we're going to basically try to [54:12] encourage it to work um completely AFK. [54:16] I'll show you what the sort of script [54:17] for this looks like in a minute, but you [54:19] say, okay, local issue files from issues [54:21] are provided at the start of context. [54:24] The way we do that is if you look inside [54:26] once.sh SH here inside the repo [54:29] we have [54:31] uh it's essentially just a bash script [54:34] where we grab all of the issues um which [54:37] are inside markdown files and we cap [54:40] them into a local variable. So that [54:42] issues variable contains all of the [54:43] issues that are in our entire backlog. [54:47] Then we grab the last five commits. I'll [54:50] explain why in a minute. And then we [54:52] grab the prompt and we just run claude [54:54] code with permission mode except edits [54:58] and then just essentially just pass it [55:00] all of the information. This is what the [55:02] implement looks like. So that's what a [55:05] very very simple version of this sort of [55:07] loop looks like. And of course this is [55:08] not a loop. This is just running it [55:10] once. [55:12] The loop is in the AFK version up here [55:15] which is uh a fair bit more complicated. [55:18] And the crucial part here is we're [55:20] running it in Docker sandbox as well. So [55:23] I I don't want you to install Docker on [55:25] your laptops because we're just going to [55:27] be like you need to download a special [55:28] image and we're going to tank the [55:30] conference Wi-Fi if we do that. So I I [55:32] am going to demo this to you, but you um [55:34] won't need to run this yourself. But [55:35] I'll talk through this in a minute. But [55:37] essentially this once loop here, [55:44] we're just essentially running one [55:46] version of the thing that we're going to [55:48] loop again and again and again. So this [55:50] is kind of like the human in the loop [55:51] version. And this is essential. Running [55:54] this again and again is essential [55:55] because you're going to see what the [55:56] agent does and see how it ends up [55:59] working. And any tuning that you need to [56:01] add to the prompt, then you can do that. [56:03] Let's go to the prompt. [56:06] Um, [56:09] so local issue files are being passed [56:11] in. You're going to work on the AFK [56:13] issues only. That makes sense. If all [56:16] AFK tasks are complete, output this no [56:18] more tasks thing. And then the next [56:20] thing, pick the next task. So [56:26] what we're doing here is we're [56:27] essentially running a backlog or [56:30] curating a backlog that our AFK agent is [56:32] going to pick up. That's the purpose of [56:34] all of these um setups in the beginning [56:38] in this uh all the way to this canon [56:41] board here. We're just essentially [56:42] creating a backlog of tasks for the [56:44] night shift to pick up and the night [56:48] shift this sort of Ralph prompt here. [56:50] It's got its own idea about what a good [56:53] task looks like. So next pick up I'm I [56:56] did talk about parallelization. I will [56:58] show you this later, but this is [56:59] essentially a sequential loop here. [57:01] we're just going to run one coding agent [57:03] at a time. This is a good way to just [57:04] sort of um get your feet wet [57:06] essentially. [57:08] So, it's prioritizing critical bug [57:10] fixes, development infrastructure, then [57:13] traceable bullets, then polishing quick [57:15] wins and refactors. And then we just [57:17] have a very simple kind of instruction [57:19] on how to complete the task. So, we [57:22] explore the repo, use TDD to complete [57:24] the task. I'll get to that later. [57:27] And we then run some feedback loops. So [57:30] let's let's just try this and let's just [57:31] see what happens. So good. It's created [57:34] the issue files. We should be good to [57:35] go. I'm going to cancel out of this. I [57:38] clear and I'm going to run [57:40] uh where is it? Ralph once.sh. And you [57:44] can feel free if you're following along [57:46] to do the same thing. [57:48] So we can see it's just running Claude [57:50] inside here with the prompt and with all [57:53] of the issues that have been passed in. [57:56] And while it's doing its thing, [57:59] you probably have some questions about [58:01] this setup and about the decisions that [58:03] I've made to essentially delegate all of [58:06] my coding to AI, right? So, let's let's [58:09] do a quick Q&A while it's uh getting its [58:11] feet under. [58:14] Uh okay. [58:17] I'm going to just [58:19] remove those. [58:23] How do you retain negative decisions? [58:25] things that you decided against and [58:26] ration when persisting the results from [58:28] the Grommy session. A great question. [58:31] That's a very simple answer which is the [58:33] in the PRD uh write a PRD section there [58:37] is a stuff at the bottom a section of [58:39] the things that are out of scope. So the [58:41] things we're not going to tackle in this [58:43] PRD which is very important for giving a [58:45] definition of done. [58:47] Feel free to ping on the slido if you've [58:49] got any more questions. [58:51] Uh what's my front end workflow? Okay, [58:53] that's a great question. I'm gonna I'm [58:55] gonna answer that in a minute, I think. [58:58] How to deal with agents producing more [59:00] code that we can review? How to properly [59:02] parallelize and use multiple agents in a [59:05] separate way? Okay, that's um there's [59:06] two questions there. Um raise your hand [59:10] if you feel like you're doing more code [59:12] review now than you used to. [59:16] Yeah, definitely. Um I don't think [59:19] there's a way to avoid this. [59:22] If we delegate all of our coding to [59:25] agents, [59:27] you notice that the implementation here [59:29] is really the only AFK bit. We then also [59:32] need to QA the work and code review the [59:34] work, right? And if we are running these [59:38] loops where it's essentially going to [59:40] implement four issues in one, it's hard [59:42] to pair that with the dictim that you [59:46] should keep pull requests small and [59:47] self-contained, right? like small [59:50] self-contained pull requests means [59:52] you're needing to do fewer loops or [59:55] shorter loops or something. Or maybe you [59:57] do like a big stack of PRs, but that [59:59] seems horrible as well. That's still [60:00] just more separated code to review. I [60:03] don't honestly know what the answer to [60:05] this yet. I think we just need to be [60:07] ready to be doing more code review [60:09] essentially, which is not fun. That's [60:11] not a fun thing to say. That's not like [60:12] I don't know. I don't feel good saying [60:14] that, but I do think it's probably the [60:17] the way things are going. It's a great [60:19] question. [60:22] Uh, [60:23] can we grab a couple of questions from [60:25] the room as well? Let's not we won't do [60:27] the mic, but uh raise your hand if [60:28] you've got a question for me [60:29] immediately. [60:31] >> Yeah. [60:32] >> So, the approach looks very linear from [60:34] an idea to QA. [60:38] Of course, the real world is a lot more [60:39] messy. So you have all these ideas that [60:42] are in parallel and full picture and [60:46] while you're working on something else [60:48] comes in. [60:49] >> Yeah. [60:50] >> How do you deal with the messiness? How [60:51] do you feedback? [60:53] >> Great question. So the question was if [60:56] this all looks great if you're a solo [60:57] developer, but actually how do you [60:58] implement this in a team? How do you [61:00] gather team feedback on this? And my [61:03] answer to that is that if you have an [61:04] idea up there and essentially the sort [61:08] of journey from the idea to the [61:11] destination is something you need to [61:13] figure out with the team, right? So all [61:15] of this stuff up here, this is kind of [61:17] like team stuff, you know what I mean? [61:19] So if you have an idea and you do a [61:22] grilling session on it and you have a [61:23] question that you don't know how to [61:24] answer, then you need to loop in your [61:26] team as we described before. Then you [61:28] might need to go, okay, we just need to [61:30] build a prototype of this. We need to [61:32] actually hash this out. We need [61:33] something that the domain experts can [61:35] fiddle with. Oh, okay. We might need to [61:37] integrate a a third party library into [61:39] this. We might need to do some research. [61:41] We might need to actually kind of like [61:43] um ping this back and forth and find a [61:45] third party service that we can get the [61:47] most out of. We might need to go back [61:48] with the information that we gathered [61:50] there to the idea phase. So all the way [61:53] up to the sort of PRD and the journey, [61:55] that's something you need to involve [61:56] your team with. That's something where [61:58] these assets are going to be shared and [62:00] argued over and you're going to have [62:02] requests for comment on them and that [62:04] that loop is going to just keep grinding [62:06] and grinding until you figure out where [62:08] you're going. Once you figure out where [62:10] you're going, then you can start doing [62:12] the came on board the implementation. [62:13] But this is essentially super arguable [62:15] and the you'll be bouncing back and [62:17] forth between the phases. Does that make [62:18] sense? Yeah. [62:20] >> Would you not need a PR for your [62:22] prototype? [62:23] >> Say it again. Sorry. [62:23] >> Would you not want to have a PR for your [62:25] prototype? The question was, do you want [62:27] to go through this whole session just to [62:29] sort of create a prototype? Do you not [62:30] need a PRD for your prototype as well? [62:32] Let's just quickly talk about prototypes [62:34] for a second. Um, there was a question [62:36] about how do you make this work for [62:37] front end? [62:39] Like how do you because front end is [62:41] like really sensitive to human eyes. You [62:43] need human eyes looking at the front end [62:45] all the time to make sure that it looks [62:47] good. AI doesn't really have any eyes. [62:50] It can look at code, but it front end is [62:54] multimodal. And so my experiences with [62:57] trying to plug AI into um let's say [63:01] agent browser or playright MCP to give [63:03] it you can give it tools to allow it to [63:06] look through a front end and sort of [63:07] look at images but in my experience the [63:10] um it's not very good at that yet and it [63:12] can't create a nice front end in a [63:15] mature codebase. It can sort of spit one [63:17] out. But what it can do is you say okay [63:20] uh I want some ideas on how uh this [63:22] front end might look. give me three [63:24] prototypes um that I can click between [63:27] in a throwaway uh throwaway route that I [63:30] can decide which one looks best and you [63:33] take the asset of that prototype and you [63:34] then feed it back into the grilling [63:36] session or you get feedback on it blah [63:38] blah blah blah blah answer your question [63:40] kind of thing the prototype is just you [63:42] know it's messy it's supposed to give [63:44] you feedback early on in the process so [63:46] that's a great way of working with front [63:47] end code great way of looking at [63:48] software architecture in general let's [63:50] go one more question yeah [63:51] >> yes in your system How do you integrate [63:54] respecting an architecture a design with [63:57] API contracts and fitting with a larger [64:00] system [64:02] security constraints? All kinds of [64:03] conraints like that. [64:04] >> Yeah, there's a lot in that question. [64:07] The question was how do you conform with [64:09] existing architecture? How do you do um [64:12] how do you make it conform to the code [64:13] standards like of your codebase or [64:16] >> Yeah. architecture design API security [64:19] rules that constraints your designs. [64:22] >> Yeah. [64:23] I'm going to answer that in a bit if [64:25] that's okay. So hopefully we have [64:28] started to get some stuff cooking. [64:31] It's just pinging on the explore phase [64:34] here. [64:37] Tempted to just start running it AFK. [64:40] Maybe I will, maybe I won't. Um, what [64:44] it's essentially doing is it's exploring [64:46] the repo. It's going to then start [64:47] implementing based on what we wanted. [64:49] Let's actually have one more question [64:50] just while it's running. Yeah. [64:58] Yeah. So the question was why do you not [65:02] get AI to QA? [65:05] AI to QA. I just got jargon overload for [65:08] a second. Um why do you not get AI to uh [65:11] test its own code? Now of course you [65:14] absolutely can. And I think while it's [65:16] doing while it's cooking here, okay, [65:18] it's got a clear picture of the [65:20] codebase. It's assessing the issues. [65:22] It's doing issue O2 is the next task. [65:24] I'm again going to show you that in a [65:26] bit. I think the sort of uh because you [65:28] definitely should do an automated review [65:31] step as part of implementation. So you [65:34] have your implementation. You should [65:35] then because tokens are pretty cheap and [65:37] AI is actually really good at reviewing [65:39] stuff. You should get it to review its [65:41] own code before you then QA it. I found [65:43] that that catches a ton of different [65:45] bugs. And [65:48] the way that works is I will just do a [65:50] little diagram is if you have let's say [65:53] an implementation that's sort of like [65:54] used up a bunch of tokens in the smart [65:56] zone. If you get it to sort of try to do [66:00] its reviewing, it's going to be doing [66:02] the reviewing in the dumb zone. And so [66:05] the reviewer will be dumber than the [66:07] thing that actually implemented it. If [66:08] we imagine this is the uh let's be [66:11] consistent, that's the review. That's [66:14] the implementation. [66:15] Whereas, if you clear the context, [66:19] then you're essentially going to be able [66:22] to just review in the smart zone, which [66:24] is where you want to be. [66:27] Let's see how our implementation is [66:28] doing. Okay, good. It's generating a [66:31] migration. That looks pretty nice. We're [66:33] getting some code spitting out. [66:37] And while I'm sort of like, aha, here we [66:41] go. TDD. [66:43] Let's talk about TDD and then I think [66:45] we'll have a little another little [66:47] break. TDD I found is absolutely [66:50] essential for getting the most out of [66:52] agents. Uh raise your hand if uh you [66:54] know what TDD is. Cool. Okay. TDD is [66:58] testdriven development. What it's [66:59] essentially doing is it's doing a [67:02] something called red green refactor. And [67:04] if you look in the codebase, you'll be [67:05] able to find a um a skill which really [67:09] describes how to do red green refactor. [67:11] and teaches the AI how to do it. So what [67:14] it's doing is it's writing a failing [67:16] test first. So it's saying, okay, I've [67:19] broken down the idea of what I'm doing [67:21] and I'm just going to write a single [67:23] test that fails and then I need to make [67:26] the implementation pass. I have found [67:29] that first of all, this adds tests to [67:31] the codebase and this this tends to add [67:33] good tests to the codebase. And so we've [67:36] got this kind of gamification service. [67:38] It looks like it's using some existing [67:41] stuff to create a test database. Test [67:43] fails because the module doesn't exist [67:44] yet. Okay, we've confirmed red. And then [67:47] it goes and hopefully runs it and it [67:50] passes. I found that uh raise your hand [67:54] if you've ever had AI write bad tests. [67:58] Yeah, it tends to try to cheat at the [68:00] tests because it's sort of doing it in [68:03] layers. it will do the entire [68:04] implementation and then it will do the [68:06] entire test layer just below it. Uh I'm [68:09] just going to say yes, you're allowed to [68:10] use npxv text. And using this technique, [68:14] it generally is a lot harder to cheat [68:18] because it's sort of instrumenting the [68:21] code before it's then writing the code. [68:24] So I find that TDD is so so good for [68:27] places where you can pull it off. And in [68:29] fact, it's so good that I sort of warp [68:31] my whole uh technique around getting TDD [68:34] to work better. I can see some drooping [68:36] eyes. It is so hot in here. You can [68:39] imagine how hot it is up here. Let's [68:40] take another five minute comfort break. [68:41] Let's come back at quarter two. I think [68:46] have a nice generous one. And we'll be [68:48] back in about six, seven minutes and [68:50] I'll talk about how uh I think about [68:53] modules, think about constructing a [68:55] codebase to make this possible. I've [68:57] just been sort of fiddling with the AI [68:59] here and we have end up with some with a [69:01] commit. So we have something to test. [69:04] Issue number two is complete. Here's [69:06] what was done. This is kind of what it [69:08] looks like when a Ralph loop completes [69:10] is you end up with a little summary. Um [69:12] and we have now something we can QA [69:15] because we did the feedback loops or [69:17] because we did the tracer bullets [69:18] because we were uh said okay give us [69:21] something reviewable at the end of this [69:22] we can immediately go and QA it. Now, [69:24] there's nothing uh less exciting than [69:26] watching someone else QA something, but [69:29] hopefully we can have a little play. [69:31] Let's just check that it uh works at [69:33] all. In fact, before I go there, I just [69:36] want to sort of work through what just [69:38] happened, which is we see that it's [69:41] created some stuff on the dashboard [69:45] and it then ran the feedback loops. So, [69:47] it then ran the tests and the types. [69:51] Now TDD is obviously really important [69:53] and it's really important because these [69:55] feedback loops are essential to AI [69:59] essential to get AI to produce anything [70:01] reasonable because without this AI is [70:04] totally coding blind right you have to [70:07] have to um if if your codebase doesn't [70:10] have feedback loops you're never ever [70:13] ever going to get decent AI decent [70:15] output out of AI and often what you'll [70:18] find is that the quality of your [70:20] feedback back loops influences how good [70:23] your AI can code. Essentially, that is [70:24] the ceiling. So, if you're getting bad [70:27] outputs from your AI, you often need to [70:29] increase the quality of your feedback [70:31] loops. We'll talk about how to do that [70:33] in a minute. [70:35] Now, so it ran uh npm run test, npm ran [70:39] type check. It got one type error and it [70:41] needed to fix it with a nice bit of [70:43] TypeScript magic. Very good. Yeah. Typo [70:46] level thresholds number. Okay. [70:49] You see why I stopped teaching [70:50] Typescript because just AI knows [70:51] everything now. Um, [70:54] so and it ran the tests and it passed [70:57] and it's looking good. So we now end up [70:58] with 284 tests in this repo. Pretty [71:01] good. [71:03] I I do find uh front end really hard to [71:06] test here. We're essentially just [71:07] testing the service. So we've created a [71:10] gamification service if we look up here [71:13] and then we have a test for that [71:14] service. You can see the the service and [71:16] the test itself. Now, if I was doing [71:18] code review here, I would then go to re [71:20] I would first go to review the tests, [71:22] make sure the tests were testing [71:24] reasonable things and then go and kind [71:27] of review the code itself just to make [71:29] sure that it's it's not doing anything [71:31] too crazy, right? The essential thing is [71:33] I need to actually um look at the [71:35] dashboard. I'm going to log in as a [71:39] student. Oh, if it'll let me. Maybe it [71:41] won't let me. Come on, son. There we go. [71:44] Let's log in as Emma Wilson. Head into [71:47] courses. [71:49] Uh, let's say I've got an introduction [71:50] to TypeScript. [71:52] Continue learning. [71:54] Uh, yes, I completed this lesson. [71:57] Something went wrong. I imagine it's [71:59] because I don't have [72:02] uh SQLite error. I don't have the right [72:05] table. So, I need a table point events. [72:08] Point events is a strange table name. [72:09] I'm not sure quite what it was thinking [72:10] there. Uh, let's suspend. Let's run uh [72:14] npmdb migrate or push, I think. [72:19] Can't remember which one it was, but you [72:22] kind of get the idea, right? I I'm not [72:23] going to subject you to uh watching me [72:25] do QA because it's so dull. Um but at [72:28] this point, I would essentially go back [72:30] in. I would um let me open the project [72:33] back up. [72:35] Uh, and I would this this is a crucial [72:38] moment. Um, and it's so important to um [72:42] QA it manually here because QA Oh dear. [72:45] Oh dear. What's going wrong? There we [72:46] go. QA is how I then um impose my [72:51] uh opinions back onto the codebase, how [72:54] I impose my taste. What you'll often [72:56] find is that um there are teams out [72:59] there who are trying to automate [73:00] everything like every part of this [73:02] process and they will tend to [73:06] uh if you try to like automate the sort [73:08] of creation of the idea, automate uh the [73:11] QA, automate the research, automate the [73:13] prototype, you end up with uh apps that [73:16] I feel just lack taste and are bad. [73:22] maybe they just don't work or they they [73:24] don't even work as intended or there's [73:26] just no AI. You need a human touch when [73:28] you're building this stuff because [73:29] without that you just end up with slop [73:32] and we are not producing slop here. [73:33] We're trying to produce high quality [73:34] stuff and so that's what the QA is for. [73:39] So I'm going to do two things in this [73:42] final section which is I'm going to [73:44] first tell you how to [73:46] there's probably a question in your mind [73:48] here which is let's say I have a [73:50] codebase that I'm working on and it's a [73:53] bad codebase. It's a codebase that's [73:55] like really complicated uh that AI just [73:58] never does good work in and maybe [74:00] actually most humans that go into that [74:01] codebase don't do good work. How what [74:04] how do I improve that codebase? And the [74:07] second thing is I'll show you my setup [74:08] for parallelization. [74:10] So let's go with um bad code first. [74:14] Now where is it? Where's the diagram? [74:17] Here it is. [74:19] In his book um the philosophy of [74:22] software design, John Alistster talks [74:24] about [74:25] the ideal type of module. [74:29] And let's imagine that you have a [74:30] codebase that looks like this. Each of [74:32] these uh blocks here are individual [74:34] files. And these files export things [74:37] from them. You know, they have um things [74:39] that you pull from the files that you [74:41] then use in other things. And so you [74:42] might have these weird dependencies [74:44] where this file over here might rely on [74:46] this file or might rely on that file for [74:48] instance. Now, if these files are small [74:51] and they don't kind of ex like export [74:54] many things, then John would call these [74:57] shallow modules essentially where [74:59] they're not very um they kind of look [75:02] like uh this. If I actually no I can't [75:06] can't make a good diagram of it. They're [75:08] essentially lots and lots of small [75:09] chunks. Now this is hard for the AI to [75:12] navigate because it doesn't really [75:14] understand the dependencies between [75:15] everything. It can't work out where [75:16] everything is. You know it has to sort [75:18] of manually track through the entire [75:20] graph and go okay this relies on this [75:22] one relies on this one. This one relies [75:24] on this one. [75:26] And it's then also hard to test this as [75:28] well because where do you draw your test [75:29] boundaries here? Do you test each module [75:32] individually? [75:35] Like just literally draw a test [75:36] boundary. No, don't do that. Around this [75:39] one and then maybe another test boundary [75:42] around the next one and then the next [75:43] one [75:46] or should you sort of do big groups of [75:48] it? Should you say, okay, we're going to [75:50] test all of these related modules [75:51] together and just sort of, you know, [75:53] hope and pray that they work. [75:57] Now this means that if I think that bad [76:01] tests mostly look like that where the AI [76:04] essentially tries to sort of wrap every [76:06] tiny function in its own test boundary [76:09] and then just sort of test that those [76:11] individually work. But what that does is [76:13] it means that when let's say this module [76:16] over here calls those two. So it depends [76:19] on both of these. Then this module might [76:22] misorder the functions or there might be [76:24] sort of stuff inside that poor module [76:27] that's worth testing on its own. And if [76:29] you then wrap this in a test boundary, [76:31] what do you do? Do you mock the other [76:32] two modules? How does that work? [76:37] So actually figuring out how to um build [76:40] a codebase that is easy to test is [76:44] essential here because if our codebase [76:46] is easy to test then our code our [76:48] feedback loops are going to be better [76:50] and the AI is going to do better work in [76:52] our codebase. Does that make sense? So [76:54] what does a good codebase looks like? [76:56] Look like well not like that. It looks [77:00] like this [77:02] where you have [77:04] what John Asterhout calls deep modules. [77:07] Modules that have a little interface on [77:10] there that expose a small simple [77:11] interface that have a lot of [77:13] functionality inside them. Now [77:18] what this means is that these are easy [77:20] to test because you just let's say that [77:22] there's a dependency between this one [77:23] and this one. My arrow working? Yeah, [77:26] there we go. [77:29] Then [77:30] what you do is you just wrap a big test [77:32] boundary around that one module around [77:34] this one up here. And you're going to [77:36] catch a lot of good stuff [77:40] because there's lots of functionality [77:42] that you're testing and really the [77:44] caller, the person calling the module is [77:46] going to have a simple interface to work [77:47] from. So it's not not too tricky. That [77:50] makes sense. Deep modules versus shallow [77:52] modules. This is good. This shallow [77:55] version is bad. And what I find is that [77:58] unaided [78:00] um or if you don't [78:03] uh if you don't watch AI carefully, it's [78:06] going to produce a codebase that looks [78:07] like this. So you need to be really [78:09] really careful when you're directing it. [78:11] And that's why too is that if we look [78:13] inside the PD, [78:16] uh where is the PR gone? It's inside the [78:18] issues. It's inside the gamification [78:20] system. Uh not found. Of course, it's [78:23] not. Here it is. [78:25] Then I have [78:27] uh inside here data model the modules. [78:32] So it's specifically saying okay this [78:34] gamification service is a new deep [78:36] module which we're going to test around. [78:38] It's going to have this particular [78:41] interface and it's going to have um okay [78:44] we're modifying the progress service [78:46] too. We're modifying the lesson route [78:48] modifying the dashboard roots etc. So, [78:50] it's I'm being really specific about the [78:52] modules that I'm editing and I'm making [78:54] sure that I keep that module map in my [78:56] mind at all times throughout the [78:58] planning and then throughout the [78:59] implementation. That make sense? Very, [79:02] very useful. It's useful for one other [79:04] reason, too. Not only does it make your [79:05] app more testable, but you get to do a [79:08] little mental trick. [79:11] And I'm going to refill my water while [79:13] you wait for what that is. [79:17] Uh, let me [79:20] Let me get a question from you guys. So, [79:21] raise your hands if you feel like. [79:26] Uh, if you feel like you're working [79:28] harder than ever before with AI. [79:32] Yeah. Uh, raise your hands if you feel [79:35] like you know your codebase less well [79:38] than you used to. [79:40] Yeah. [79:43] This is a real thing. um because we're [79:45] moving fast, because we're delegating [79:47] more things, we end up losing a sense of [79:50] our codebase. And if we lose the sense [79:52] of our codebase, we're not going to be [79:55] able to improve it. And we're [79:56] essentially delegating the shape of it [79:57] to AI. I don't think that's good. But [80:00] then how do we [80:03] how do we make it so that we can move [80:04] fast while still keeping enough space in [80:06] our brains? I think that this is a way [80:09] to do it because what you're doing here [80:12] is not only are you thinking about [80:14] creating big shapes in your codebase, [80:16] big services. [80:19] What I think you should do is design the [80:22] interface for these modules, but then [80:24] delegate the implementation. [80:27] In other words, these modules can become [80:29] like gray boxes where you just need to [80:31] know the shape of them. You need to know [80:33] what they do and sort of how they [80:34] behave, but you can delegate the [80:36] implementation of those modules. I found [80:38] this is really nice. I don't necessarily [80:40] need to co-review everything inside that [80:42] module. I don't necessarily need to know [80:44] everything of what it's doing. I just [80:46] need to know that it behaves a certain [80:47] way under certain conditions and that it [80:49] does its thing. So, it's kind of like, [80:52] okay, I've got a big overview of my [80:54] codebase and I understand kind of the [80:55] shapes inside it, understand what the [80:57] interfaces all do, but I can delegate [81:00] what's inside. I found that has been a [81:03] really nice way to retain my sense of [81:05] the codebase while preserving my sanity. [81:08] Make sense? [81:12] And so you might ask, how do I take a [81:14] codebase that looks like this and then [81:18] turn it into a codebase that looks like [81:20] this? How do I deepen the modules? Well, [81:23] we have hopefully it's in here. Pretty [81:25] sure it is. We have a skill and that [81:28] skill is called improve codebase [81:30] architecture. [81:32] Nice and direct. [81:35] Uh let's run it. What this skill is [81:38] going to do is it's essentially just [81:39] going to do a scan of our codebase and [81:41] looking for what's available here. And [81:43] feel free to run this yourself if you're [81:44] um uh running the exercises. And it's [81:49] exploring the architecture, exploring um [81:52] essentially how to work within this [81:53] codebase. and it's going to attempt to [81:57] uh find places to deepen the modules. [82:00] Pretty simple. One really cool um thing [82:04] that it found here is part of my uh part [82:07] of my course video manager app is a [82:09] video editor. A video editor built in [82:11] the browser, which is really hardcore. [82:13] Uh it's a decent bit of engineering. And [82:16] I wanted a way that I could wrap the [82:18] entire front end all the way to the back [82:21] end in like a single big module so that [82:23] I could test the fact that I press [82:25] something on the front end and it goes [82:26] all the way to the back end. And so I [82:28] found a way essentially by using a kind [82:30] of discriminated union between the two [82:32] types here by sort of I was able to use [82:35] this uh skill to essentially have a huge [82:39] great big module that just tested from [82:41] the outside or was testable from the [82:43] outside this video editor [82:45] infrastructure. And it meant that AI [82:47] could see the entire flow, could act on [82:49] the entire flow and test on the entire [82:51] flow. And honestly, it was just night [82:53] and day in terms of the uh ability of AI [82:56] to actually make changes because AI [82:57] working on a video editor is pretty [82:59] brutal if you don't give it good tests. [83:01] So that is honestly I if you take one [83:04] thing away from today, just try running [83:06] this skill on your repo and see what [83:08] happens. Let's go to slider. Let's ask a [83:11] uh check a couple of questions just [83:13] while this is running. [83:15] So let's see. Have you tried claude's [83:17] auto mode with claude enable auto mode? [83:19] Uh that way you can avoid many of the [83:20] obvious permission checks. We'll talk [83:21] about permission checks in a second. Do [83:24] I keep the markdown plans and issues for [83:27] later reference? [83:29] Okay, this is a great question. So [83:34] let's say that you uh have a great idea, [83:38] you turn it into a PR [83:40] raise and you then implement that PRD [83:43] and the PRD is essentially done. Raise [83:45] your hand if you keep that information [83:48] in the repo. So you turn it into a [83:49] markdown file. Raise your hand if you [83:51] want to keep that around. [83:53] Cool. Okay. And raise your hand if you [83:55] if you don't want to keep it around. If [83:57] you want to get rid of it as soon as [83:58] possible. Yeah. This is I think an [84:02] a question that doesn't have a clear [84:03] answer. What I'm really scared of [84:08] with any documentation decision is that [84:11] let's say that we have a PRD for this [84:13] gamification system. We keep it in the [84:14] repo. We go on, go on, go on. Let's say [84:17] a month later, we want some edits to the [84:19] gamification system. And we go in with [84:22] Claude and it finds this old PR and [84:24] says, "Yes, I found the original [84:26] documentation for the PRD system." Well, [84:28] it turns out that the actual code has [84:30] changed so much from the original PRD [84:32] that it's almost unrecognizable. The [84:33] names of things have changed. The um [84:35] file structure has changed. Even the [84:37] requirements may have changed. We might [84:38] have actually tested it with users. This [84:40] is dock rot where the documentation for [84:43] something is rotting away in your repo [84:46] and influencing claude badly or claude [84:49] agents badly. So I tend to not keep it [84:53] around. I tend to get rid of it. And for [84:55] me because my setup uses GitHub issues, [84:58] I just mark it as closed. It can fetch [85:00] it if it wants to, but it's got a visual [85:01] indicator that it's done. So I tend to [85:03] prefer ditching these. [85:07] Thoughts on the beads framework from [85:08] Steve? Uh I've not tested it, but it [85:11] seems like sort of um another way to [85:13] manage Canvan boards and issues. Seems [85:15] uh very good, but I've not tried it. [85:18] Um [85:22] uh let me just quickly check the uh [85:24] setup here. Let's take a couple of [85:27] questions from the room. Anybody got any [85:29] questions at this point about anything [85:30] that we've covered so far, especially [85:31] this last bit? Yes. [85:40] like code. How about migrations? Like [85:43] with migration files, we can also squash [85:45] them off [85:47] >> like database migrations. [85:49] >> Yeah, [85:51] >> I don't know. [85:53] >> I hope that answers your question. I'm [85:54] so sorry. No, no, I think database [85:56] migrations are a different thing because [85:57] you have a sort of running record of [85:59] exactly what changed and it's more [86:01] deterministic and I think [86:04] yeah, it's an interesting analogy. I'm [86:06] not sure. Let's talk about it [86:07] afterwards. [86:08] That's a good way of saying I have no [86:10] idea. [86:11] >> Yeah. Yeah. [86:16] >> Sorry guys. Um I'm just trying to listen [86:18] to this guy's question. [86:30] >> Yeah. The question the question here is [86:33] um should I um in the sort of early [86:37] planning stage be trying to optimize the [86:39] plan? This is something I actually see a [86:41] lot of people doing and it's a really [86:43] good um idea. So when you [86:49] let's go back to the phases. So let's [86:51] say that you have all of these phases [86:53] here [86:55] and you uh you get to the point where [86:58] you've sort of figured out everything [86:59] with the LLM. you understand where [87:01] you're going. You've created this sort [87:02] of journey destination document here. [87:05] How do you then uh like should you then [87:09] try to optimize and optimize and [87:10] optimize that PRD until it's the perfect [87:12] PR you can possibly imagine? I don't [87:15] think there's a lot of value in that [87:18] because I think the journey is really [87:20] just sort of a hint of where you want to [87:22] go and the place that you need to be [87:24] putting the work is in QA and you can [87:27] sort of do that AFK I suppose but in my [87:29] experience you're not going to get a lot [87:30] of juice out of it like it's the the [87:33] thing that really matters is getting [87:34] alignment with the AI which is you do in [87:37] the grilling session initially. [87:40] Let's have one more question. You got [87:41] any more? Yeah. How do you get in your [87:44] workflow to get it to code the way you [87:46] want it to code? So by the time you get [87:48] to code review, it's at least familiar, [87:50] use the libraries you wanted to use. [87:52] >> Yeah. Um, we had this question before [87:54] actually, which was like uh how do you [87:56] uh enforce your coding standards on the [87:59] agent? Essentially, how do you get it to [88:00] code how you want it to code? Now, [88:03] there's essentially two different ways [88:04] of doing it. Um, you've got [88:09] Come on. Push [88:11] and you've got pull. [88:14] What do I mean by push and pull? [88:17] Um, push is where you push instructions [88:20] to the LLM. So you say, okay, if you put [88:23] something in claw.md, [88:25] uh, talk like a pirate, that instruction [88:28] is always going to be sent to the agent, [88:30] right? So that is a push action. You're [88:32] pushing tokens to it. Pull is where you [88:35] give the agent an opportunity to pull [88:38] more information. [88:40] And [88:42] that's for instance like skills. So a [88:44] skill is something that can sit in the [88:46] repo and it has a little description [88:47] header that says okay agent you may pull [88:50] this when you want to. [88:53] My thinking my current thinking about [88:55] code review and about coding standards [88:57] looks like this. when you have an [89:00] implement. [89:03] What's going on? There we go. [89:04] Implementer. [89:06] I'm going to make this less red in a [89:08] second. Um, then you want the coding [89:12] standards to be available via pull. If [89:15] it has a question, you want it to be [89:16] able to sort of answer it. But if you [89:18] then have an automated reviewer [89:21] afterwards, then you want it to push. [89:24] You want to push that information to the [89:25] reviewer. You want to say, "These are [89:27] our coding standards." um make sure that [89:29] this code um follows them. So if you [89:32] have skills for instance, then you want [89:34] to push that stuff to the reviewer so [89:36] the reviewer has both the code that's [89:38] written and the coding standards to [89:40] compare to. [89:42] Hopefully that answers your question. I [89:43] can show you an automated version of [89:44] this as well. Actually, um yeah, let's [89:47] do that now just while it's fresh in my [89:48] mind. I recently um spent [89:54] uh maybe a week or so uh building this [89:57] thing called Sand Castle. And Sand [89:59] Castle is a I was sort of unhappy with [90:02] the options out there for [90:05] um running agents AFK. And what this [90:07] does is it's essentially a TypeScript [90:10] library for running these loops. So you [90:12] have uh a run function that creates a [90:16] work tree um sandboxes it in a docker [90:19] container and then allows you to run a [90:22] prompt inside there. And in that work [90:24] tree then it's just a git branch and you [90:26] have that code and you can then merge it [90:28] later. If I open up [90:32] um there are some really really nice [90:35] ways of viewing this and it essentially [90:37] allows you to run these kind of [90:38] automated loops and allows you to [90:41] parallelize across multiple different [90:43] agents really simply. So I'll go into my [90:46] sand castle file go into main.ts here [90:48] and let's just walk through this. [90:51] So this is kind of like I showed you um [90:54] a sort of version of the Ralph loop [90:56] earlier. This is where we take it from [90:58] sequential into parallel. [91:01] We have here first of all a planner that [91:04] takes in it's has a plan prompt here [91:06] that looks at the backlog and chooses a [91:10] certain number of issues to work on in [91:12] parallel. Remember I showed you that [91:13] canon board where it had all the [91:14] blocking relationships. It works out all [91:16] of the phases. So this one will say okay [91:19] uh let's say we have uh you can ignore [91:21] all this glue code here. This is [91:23] essentially just a set of issues, GitHub [91:26] issues with a title and with a a branch [91:30] for you to work on. And then for each [91:34] issue, we create a sandbox [91:38] and then we run an implement in that [91:40] sandbox passing in the issue number, [91:42] issue title and the branch. This is like [91:43] the loop that we ran just before. [91:46] Then if it created some commits, we then [91:49] review those commits. This is [91:51] essentially the loop. What do we do with [91:54] those commits? We pass those into a [91:58] merger agent [92:01] which takes in a merge prompt, takes in [92:03] the branches that were created, takes in [92:04] the issues, and it just merges them in. [92:06] If there are any issues with the merge, [92:08] you know, with the types and tests and [92:09] that kind of thing, it solves them. And [92:11] this has been my uh flow for quite a [92:13] while now for working on most projects. [92:16] It works super super well. And uh yeah, [92:19] I recommend you check out sand castle if [92:20] you want to sort of learn more. And to [92:23] answer your question properly is that in [92:26] the reviewer [92:28] uh I would push the coding standards in [92:30] the implement I would allow it to pull. [92:33] And I'm actually using uh sonet for [92:35] implementation and opus for um reviewing [92:39] because I consider reviewing sort of I [92:40] need I need the smarts. Then [92:44] any question? Actually, let let me uh [92:46] before we do more questions, let's go [92:48] back here. Okay, where are we at? Okay, [92:53] we're sort of zooming everywhere in this [92:55] uh talk because I'm kind of having to [92:56] run things in parallel. So, let's go [92:58] back to the improved codebase [93:00] architecture. It has finally finished [93:01] running and it's found a bunch of [93:04] architectural improvement candidates. So [93:06] it's got essentially a cluster of [93:08] different modules that are all kind of [93:10] related that could probably be tested as [93:12] a unit. Got number one the quiz scoring [93:14] service. There's some reordering logic [93:17] extraction as well. It has arguments for [93:20] why they're coupled and it has a [93:22] dependency category as well. So local [93:23] substitutable in SQLite within memory [93:26] test DB [93:28] quiz scoring service currently has zero [93:30] test. This is the biggest gap. So this [93:32] is what it looks like when we come back [93:33] of uh improved codebase architecture. [93:37] Okay. [93:39] So we have nominally kind of 17 minutes [93:43] left. I don't know about you, but I'm [93:45] knackered. [93:47] Um I want to [93:50] let let me kind of sum up for you [93:52] because I think we're sort of reaching [93:54] the end of our stamina. I'm going to be [93:56] available for the full time if you want [93:57] to um come and ask me questions. Um, I [93:59] might do one more check of the slider, [94:00] but let's kind of sum up where we've got [94:02] to. [94:04] So, [94:06] this is essentially the flow [94:10] where throughout this whole process, [94:12] we're bearing in mind the shape of our [94:14] codebase. This is not a specttocode [94:17] compiler. This is not an AI that's sort [94:19] of just like churning out code. We are [94:21] being very intentional with the kind of [94:23] modules and the shape of the codebase [94:24] that we want. We are making sure that we [94:26] are as aligned as possible by using the [94:29] grilling session by really hammering out [94:31] our idea. We're not overindexing into [94:34] the PRD. We're not trying to read every [94:35] part of it. We're not thinking too much [94:37] about it even. We're then just turning [94:39] that into a set of parallelizable issues [94:41] which can be worked on by agents in [94:42] parallel. We implement it and we QA and [94:46] code review the hell out of it and then [94:48] keep going back to that implementation. [94:50] One thing I didn't really mention is [94:51] that in the QA phase, what the QA phase [94:54] is for is creating more issues for that [94:56] canon board. So while it's implementing [94:59] even, you can be QAing the stuff and [95:00] going back adding more issues. And the [95:02] canon board just allows you to add [95:03] blocking issues kind of um sort of [95:06] infinitely really. And then once that's [95:08] all done, once you've got code that [95:09] you're happy with, once you've got work [95:10] that you're happy with, then you can [95:12] share it with your team and you can get [95:13] a full review. So this is kind of like [95:16] once you get here, this is kind of one [95:17] developer or maybe a couple of [95:18] developers sort of managing this and [95:21] then it's kind of up to you to figure [95:22] out how to merge it back in. [95:27] Of course, all of this can be customized [95:30] by you. This is just something that I [95:32] have found works. I'm not trying to like [95:34] sell you on a kind of approach here. [95:37] What I recommend if you take one thing [95:39] away from this session is that you [95:41] should head back you should head to [95:42] Amazon and just buy a ton of those old [95:44] books because I mean I just found it so [95:47] enlightening reading them. Uh you know [95:51] preai writing is always like a really [95:53] fun to read anyway and [95:56] I just on every single page I found that [95:58] there was something useful and something [96:00] interesting to to read. So thank you so [96:03] much. Thank you for putting up with the [96:04] heat. Um hopefully your body [96:05] temperatures will reset soon. Uh thank [96:08] you very much.