AI Summary
Matt Pocock presents a workshop on integrating AI into software engineering workflows, emphasizing that traditional software engineering fundamentals remain crucial when working with AI. He introduces concepts like the 'smart zone' and 'dumb zone' of LLMs, and demonstrates a structured workflow from idea to implementation using AI agents.
Chapters
LLMs have a smart zone (early in conversation, fewer tokens) and a dumb zone (as context grows, performance degrades). Tasks should be sized to stay within the smart zone.
A skill that relentlessly interviews the user about every aspect of a plan until shared understanding is reached. It prevents misalignment and ensures the AI and human are on the same wavelength.
After grilling, a Product Requirements Document (PRD) is created to document the destination. It includes problem statements, user stories, implementation decisions, and testing decisions.
Instead of horizontal layers (all DB, then all API, then all frontend), use vertical slices that cross all layers to get early feedback on the entire flow.
Implementation can be delegated to an AFK (away from keyboard) agent that picks tasks from a kanban board, implements them, runs feedback loops (tests, types), and produces commits for review.
Test-Driven Development is essential for AI coding. Writing a failing test first prevents the AI from cheating and ensures good test coverage.
Deep modules (small interface, lots of functionality) are easier for AI to test and work with compared to shallow modules (many small files with complex dependencies).
For implementation, allow AI to pull coding standards (via skills). For automated review, push standards to the reviewer for comparison.
The key to effective AI-assisted coding is applying software engineering fundamentals: keep tasks small, maintain shared understanding, use vertical slices for feedback, and design deep modules. The workflow involves human-in-the-loop planning followed by AFK implementation and thorough QA.
Clickbait Check
95% Legit"Title accurately describes a full walkthrough of an AI coding workflow, delivering exactly what it promises."
Mentioned in this Video
Tutorial Checklist
Study Flashcards (10)
What is the 'smart zone' of an LLM?
easy
Click to reveal answer
What is the 'smart zone' of an LLM?
The early part of a conversation where the LLM performs best because attention relationships are least strained.
03:00
What happens to LLM performance as tokens are added to the context?
medium
Click to reveal answer
What happens to LLM performance as tokens are added to the context?
It gets dumber; performance degrades quadratically due to increasing attention relationships.
03:41
What is the purpose of the 'Grill Me' skill?
easy
Click to reveal answer
What is the purpose of the 'Grill Me' skill?
To relentlessly interview the user about every aspect of a plan until a shared understanding is reached, preventing misalignment.
12:17
What is a 'tracer bullet' in software development?
medium
Click to reveal answer
What is a 'tracer bullet' in software development?
A vertical slice of functionality that crosses all layers, providing early feedback on the entire flow.
42:07
Why is TDD important when coding with AI?
medium
Click to reveal answer
Why is TDD important when coding with AI?
It prevents the AI from cheating by writing a failing test first, ensuring good test coverage and reliable code.
66:43
What is a 'deep module' according to John Ousterhout?
hard
Click to reveal answer
What is a 'deep module' according to John Ousterhout?
A module with a small, simple interface but a lot of functionality inside, making it easier to test and work with.
74:14
What is the difference between 'push' and 'pull' for enforcing coding standards with AI?
hard
Click to reveal answer
What is the difference between 'push' and 'pull' for enforcing coding standards with AI?
Push sends instructions to the LLM (e.g., in claw.md); pull allows the LLM to fetch information when needed (e.g., via skills).
88:03
What is the recommended context window size for staying in the smart zone?
medium
Click to reveal answer
What is the recommended context window size for staying in the smart zone?
Around 100k tokens, regardless of the maximum context window available.
04:05
What is the role of the PRD in the workflow?
easy
Click to reveal answer
What is the role of the PRD in the workflow?
It documents the destination: problem statement, solution, user stories, implementation decisions, and testing decisions.
30:09
How does the AFK agent loop work?
hard
Click to reveal answer
How does the AFK agent loop work?
It picks the next task from a kanban board, implements it with TDD, runs feedback loops (tests, types), and produces a commit summary.
53:55
💡 Key Takeaways
Smart Zone vs Dumb Zone
Fundamental insight about LLM performance degradation with context growth, guiding task sizing.
03:00Grill Me Skill for Alignment
Practical technique to achieve shared understanding with AI, preventing misalignment.
12:17Tracer Bullets / Vertical Slices
Key principle from Pragmatic Programmer applied to AI coding for early feedback.
42:07TDD Essential for AI
TDD prevents AI from cheating and ensures good test coverage, critical for reliable AI output.
66:43Deep Modules Improve AI Performance
Applying software design principles (deep modules) directly improves AI's ability to code and test.
74:14Full Transcript
[00:14] Yeah, we good.
[00:17] >> Okay, folks, we're at capacity. Let's
[00:20] kick off. I don't want you waiting here
[00:22] for 25 more minutes before we some
[00:24] arbitrary deadline. So, welcome. My name
[00:28] is Matt. Uh I'm a teacher and I suppose
[00:31] now I teach AI. Um
[00:35] we have a link up here if you've not
[00:37] already been to this which is has the
[00:39] exercises for the um stuff we're going
[00:41] to do today. This is going to be around
[00:43] two hours. So we might just sort of kick
[00:44] off two hours from now. Is that right
[00:46] Mike?
[00:48] >> Yeah. Perfect. Um, and the theory behind
[00:52] this talk or at least the thesis under
[00:53] which I've been operating for the last
[00:55] kind of six months or so is that
[00:59] we all think that AI is a new paradigm,
[01:01] right? AI is obviously changing a lot of
[01:03] things. You guys are obviously
[01:04] interested in this and that's why you've
[01:05] come to this talk. And
[01:09] I feel that
[01:12] when we talk about AI being a new
[01:14] paradigm, we forget that actually
[01:17] software engineering fundamentals, the
[01:19] stuff that's really crucial to working
[01:21] with humans, also works super well with
[01:24] AI. And this is what my keynote is on
[01:27] tomorrow. Really, I'm going to sort of
[01:28] be fleshing that out a lot more. And in
[01:30] this workshop, I'm hopefully going to be
[01:32] able to direct your attention to those
[01:34] things and uh hopefully show you that
[01:38] I'm right, but we'll see. Um, can I get
[01:41] a quick heads up first? How many of you
[01:44] guys um are coding have ever coded with
[01:47] AI? Raise your hand if you've ever coded
[01:48] with AI. Perfect. Okay. Uh, keep your
[01:51] hand raised.
[01:53] Uh, let's all uh share those armpits
[01:56] with the world. Um,
[01:58] how many of you code every day with AI?
[02:01] Cool. Okay. Uh, ra keep your hand raised
[02:04] if you've ever been frustrated with AI.
[02:08] Okay. Very good. You can put your hands
[02:10] down. Thank you for that show of
[02:12] obedience. I really appreciate that. Um,
[02:14] we are also being live streamed to the
[02:15] Gilgood room as well. I've not uh did we
[02:18] send someone up to the Gilgood room to
[02:20] just check they're okay? Don't know. But
[02:22] I see you. Uh, and there is a way that
[02:25] you can participate which is we have the
[02:27] um a Q&A. We're going to be doing kind I
[02:30] have a sort of hatred of Q&As's because
[02:31] they're not very democratic. The mostly
[02:33] the sort of um most talkative people get
[02:36] to um get to participate and share. And
[02:39] so we're going to be going through this
[02:41] um QA here. So why do we have to wait
[02:43] till 3:45? The room is packed. The doors
[02:45] are closed. 100% agree. And so if you
[02:48] want to uh ask a question, we're going
[02:50] to be I would like you to pile into this
[02:52] async and then we can vote on each
[02:53] other's questions and hopefully get the
[02:55] best question surface so the for the
[02:57] entire room to enjoy.
[03:00] So I want to talk about first the kind
[03:02] of weird constraints that LLMs have and
[03:07] those weird constraints are sort of what
[03:09] we have to base a lot of our work
[03:11] around. Now,
[03:14] there's a guy called Dex Hy who runs a
[03:16] company called Human Layer, and he came
[03:18] up with this idea, which is that
[03:21] when you're working with LLMs, they have
[03:24] a smart zone and a dumb zone. When
[03:28] you're first kind of like working with
[03:30] an LM and it's like you just started a
[03:32] new conversation, you start from
[03:34] nothing. That's when the LLM is going to
[03:35] do its best work because in that
[03:37] situation, the attention relationships
[03:39] are the least strained. Every time you
[03:41] add a token to an LLM, it's kind of like
[03:44] you're adding a team to a football
[03:45] league. You think of the number of
[03:47] matches that get added every time you
[03:50] add a team to a football league. It just
[03:51] go scales quadratically. And that's
[03:54] because you have attention relationships
[03:55] going from essentially each token to the
[03:58] other that are positional and the sort
[04:00] of meaning of the individual token. And
[04:02] so this means that by around sort of 40%
[04:05] or around I would say around 100k is
[04:08] kind of my new marker for this because
[04:09] it doesn't matter whether you're using 1
[04:11] million uh context window or 200k. It's
[04:15] always going to be about this.
[04:17] It starts to just get dumber. So as you
[04:21] continually keep adding stuff to the
[04:23] same context window, it just gets dumber
[04:25] and dumber until it's making kind of
[04:26] stupid decisions. Raise your hand if
[04:28] that feels familiar to you. Yeah. Cool.
[04:31] So this means that we kind of want to
[04:34] size our tasks in a way that sticks
[04:37] within the smart zone, right? We don't
[04:39] want the AI to bite off more than it can
[04:41] chew. And this goes back to old advice
[04:44] like Martin Fowler in refactoring uh
[04:46] like uh the pragmatic programmer talks
[04:48] about this. Don't bite off more than you
[04:50] can chew. Keep your tasks small so that
[04:53] you as a developer, a human developer
[04:55] don't freak out and don't start acting
[04:57] and going into the dumb zone.
[05:01] But how do you tackle big tasks? How do
[05:04] you take a large task like I don't know
[05:07] cloning a company or something or just
[05:09] doing something crazy? And how do you
[05:12] break it into small tasks so they all
[05:13] fit into the dumb zone? One way of
[05:16] course you could do is I mean kind of
[05:18] what the AI companies maybe want you to
[05:20] do or the natural way of doing it is
[05:21] just keep going and going and going. You
[05:23] end up in the dumb zone charging you
[05:24] tons of tokens per request. You then
[05:26] compact back down. We'll talk about
[05:29] compacting properly in a minute. And you
[05:31] keep going, keep going, keep going,
[05:32] compact back down, keep going, keep
[05:33] going, keep going. And I think that's
[05:36] doesn't really work very well because
[05:38] the more sediment, we'll talk about that
[05:40] in a minute. So the theory here is then,
[05:43] and this is what I was doing for a
[05:44] while, is I would use these kind of
[05:48] multi-phase plans where I would say,
[05:50] okay, we have this sort of number four
[05:53] thing here, this large large task. Let's
[05:55] break it down into small sections so
[05:57] that we can then kind of chunk it up and
[05:59] do each little bit of work in the smart
[06:01] zone. Raise your hand if you've ever
[06:03] used a multi-phase plan before. Yeah,
[06:06] really common practice, right? This is
[06:08] kind of how we've been doing it.
[06:09] Certainly, this is how I was doing it up
[06:11] until December last year really.
[06:14] And any developer worth their salt will
[06:16] look at this and go, "This is a loop,
[06:19] right? This is a loop. We've just got
[06:21] phase one, phase two, phase three, phase
[06:23] four. Why don't we just have phase n,
[06:27] right?
[06:29] Phase n where we essentially just say,
[06:31] okay, we have, let's say, a plan
[06:33] operating in the background and then we
[06:35] just loop over the top of it and we go
[06:37] through until it's complete. And this is
[06:39] where um raise your hand if you've heard
[06:41] of Ralph Wiggum as a software practice.
[06:44] Okay, cool. Raise your hand if you've
[06:45] not heard of Ralph Wigum as a software
[06:46] practice. Actually, that's more like it.
[06:48] Okay. So there's this idea called Ralph
[06:50] Wigum uh which is kind of um sort of
[06:52] based on this which is essentially
[06:56] all you need to do is sort of specify
[06:58] the end of the journey where you just
[07:00] say okay we create a PRD a product
[07:02] requirements document to say okay let's
[07:05] describe where we're going and then we
[07:07] just say to the AI just make a small
[07:09] change make a small change that gets us
[07:11] closer and closer to there and Ralph
[07:14] works okay but I prefer a little bit
[07:16] more structure so that's kind where we
[07:18] got to in terms of thinking about the
[07:21] smart zone. And that's kind of where I
[07:23] want you to first start thinking about
[07:25] here. Another weird constraint of LLM is
[07:29] LLM are kind of like the guy from
[07:30] Momento, right? They just continually
[07:32] forget. They could just keep resetting
[07:34] back to the base state. Let me pull up
[07:36] this diagram.
[07:38] I sort of I I I really should use
[07:41] slides, but I just prefer just like
[07:42] randomly scrolling around a infinite uh
[07:45] TL draw canvas. Thank you, Steve.
[07:48] Um,
[07:49] so let's say another concept I want you
[07:52] to have is that every session with an
[07:53] LLM kind of goes through the same
[07:55] stages. You have first of all the system
[07:57] prompt here. This gray box here is
[08:00] essentially the stuff that's always in
[08:02] your context. You want this to be as
[08:04] small as possible because if you have a
[08:06] ton of stuff in here, if you have 250k
[08:09] tokens, like I have seen people put in
[08:11] there, then that you're just going to go
[08:13] straight into the dumb zone without even
[08:15] being able to do anything. So you want
[08:17] this to be tiny. You then go into a kind
[08:20] of exploratory phase. This blue is sort
[08:22] of where the coding agent is going out
[08:24] and exploring the codebase. Then you go
[08:27] into implementation and then you go into
[08:29] testing and kind of making sure that it
[08:32] works, running your feedback loops and
[08:33] things like this. Raise your hand if
[08:35] that feels familiar based on what you've
[08:36] done. Yep. Sort of the like the the main
[08:40] cornerstones of any session. And when
[08:42] you clear the context, you go right back
[08:45] to the system prompt. Bof, you go right
[08:47] back there. So you delete everything
[08:49] that's come before.
[08:51] And raise your hand if you've heard of
[08:54] compacting as well. Yeah. Okay. There
[08:56] are some people who've not heard of
[08:57] compacting. So let's just quickly show
[08:59] what that means. For instance, I've just
[09:02] been having a little chat with my LLM.
[09:06] Uh, I want to make sure we sort of, you
[09:09] know, just cover the basics so we're all
[09:10] sort of on the same wavelength here.
[09:12] I've just been having a chat with my
[09:13] LLM. I've been talking about a thing
[09:15] that I want to build. How's the font
[09:17] size? Should I bump it up? Folks in the
[09:19] back. Bump bump bump bump bump.
[09:24] I'm using claw code for this session,
[09:25] but you don't need to use claw code. Uh,
[09:28] in fact, it's often nice not to use claw
[09:30] code. Um, so I've been having a chat
[09:33] with the LM just sort of planning out
[09:34] what I'm going to do next. It's asking
[09:35] me a bunch of questions and I can I
[09:38] highly recommend you do this. There's
[09:40] this tiny little status line here that
[09:43] tells me how many tokens I'm using. The
[09:45] exact number of tokens I'm using. Um I
[09:47] have a article on my website AI Hero if
[09:50] you want to copy this. This is oh wow
[09:53] that is that shakes doesn't it? Um, this
[09:57] is essential information on every coding
[09:59] session because you need to know exactly
[10:01] how many tokens you're using so that you
[10:02] know how close you are to the dump zone.
[10:05] Absolutely essential. And so let's watch
[10:07] it. So I've got two options. I can
[10:09] either clear
[10:12] and go back to nothing or I can compact.
[10:15] And when I compact then it's going to
[10:18] squeeze all of that conversation which
[10:20] admittedly isn't very much into a much
[10:22] smaller space. And this in diagram terms
[10:26] kind of looks like this where you take
[10:27] all of the information from the session
[10:29] and you essentially create a history out
[10:31] of it, a written record of what
[10:33] happened.
[10:36] And devs love compacting for some
[10:38] reason, but I hate it. I much prefer my
[10:42] AI to behave like the guy from Momento
[10:45] because this state is always the same.
[10:48] Always the same. Every time you do it,
[10:49] you clear and you go back to the
[10:51] beginning. And so if you're able to do
[10:52] that and you're able to optimize for
[10:53] that, then you're in a great spot.
[10:56] So that's kind of the two things I want
[10:58] you to think about with LLM, the two
[10:59] constraints that we're working with.
[11:01] They have a smart zone and a dumb zone.
[11:04] And they're like the guy from Momento.
[11:06] So let's take a look at the first
[11:08] exercise. And I'm while I'm doing this,
[11:11] the way I want this to work is I'm going
[11:12] to sort of show you how um I'm going to
[11:15] be sort of walking through it up here.
[11:17] And I want you folks to be kind of like
[11:19] tapping away and doing things as well.
[11:21] So that was just a little lecture bit.
[11:23] Let's now actually get and do some
[11:24] coding. For anyone who arrived late or
[11:26] anyone in the Gilgood room, uh go to
[11:29] this link,
[11:32] this link up here
[11:35] to see the exercises and clone the repo.
[11:38] You absolutely do not have to. You can
[11:39] just watch me do it if you fancy it. But
[11:41] let's go there myself and let's see what
[11:42] exercises await us.
[11:45] So essentially, I've built a um this is
[11:48] from my course. This is a uh a course
[11:52] management platform essentially a kind
[11:54] of CMS for instructors for students and
[11:56] this is what we're going to be building
[11:57] a feature in. So I'm going to take you
[12:00] from essentially the idea for the
[12:02] feature all the way up to building a PRD
[12:04] for the feature all the way up to
[12:06] implementing the feature and hopefully
[12:08] you can take inspiration from this
[12:10] process and use it in your own work. So
[12:15] uh let's kick off. episode.
[12:17] We're going to start by using a skill
[12:19] which is very close to my heart. It's
[12:21] the grill me skill. And this grill me
[12:24] skill is wonderfully small, wonderfully
[12:28] tiny. And it helps prevent one of I
[12:31] think the main issues when you're
[12:32] working with an AI, which is
[12:34] misalignment.
[12:37] The uh the sort of silent idea that I'm
[12:41] talking against here, that I'm arguing
[12:43] against is the specs to code movement.
[12:45] Has anyone heard of the specs to code
[12:46] movement? Raise your hand. It's not
[12:48] really a movement. I suppose it's just
[12:49] sort of people saying specs to code. Um,
[12:53] what it is is people say, okay, you can
[12:55] write a program or you want to build an
[12:57] app. The best way to build that app is
[13:00] to take some specifications.
[13:02] So to write some sort of like document
[13:05] and then turn that document into code.
[13:09] So just turn it into code. How do you do
[13:10] that? You pass it to AI. if there's
[13:13] something wrong with the resulting code.
[13:14] You don't look at the code, you look
[13:16] back at the specs, you change the specs
[13:18] and you sort of just keep going like
[13:20] this. This is kind of like vibe coding
[13:22] by another name where you're essentially
[13:24] ignoring the code. You don't need to
[13:26] worry about the code. You just sort of
[13:27] keep editing the specs and eventually
[13:29] you just keep going. And I tried this. I
[13:31] really tried it and it sucks. It doesn't
[13:33] work because you need to keep a handle
[13:36] on the code. You need to understand
[13:38] what's in it. You need to shape it
[13:39] because the code is your battleground.
[13:41] And so
[13:44] this again is where we're going. Let's
[13:45] let's get some exercises. So what I'd
[13:48] like you to do is go to this page, the
[13:49] the grill me skill. And inside the repo
[13:53] here, we have a Slack message
[13:57] from our pal. Where is it? It's in the
[14:00] root of the repo. And it's under
[14:05] where is it?
[14:07] Clientbrief.mmd.
[14:09] It's a Slack message from Sarah Chin.
[14:11] For some reason, the Claude always
[14:12] chooses Sarah Chen as the name. I don't
[14:13] know why. Um, it's saying that in
[14:16] Cadence, our um course platform, our
[14:20] retention numbers are not great.
[14:21] Students sign up, do a few lessons, then
[14:22] they drop off. I'd love to add some
[14:24] gamification to the platform. And so,
[14:27] when you're presented with an idea like
[14:29] this, you need to find some way of
[14:30] turning it into reality. Let's say Sarah
[14:32] Chen is your client. You're on a tight
[14:34] budget. You need to get this done fast.
[14:35] How do you go and do it? Um, raise your
[14:39] hand if you would. um enter plan mode
[14:42] when you're doing this. Anyone a big
[14:43] user of plan mode? Yep. Um let's
[14:46] actually shout out quickly any other
[14:48] ideas about what you would do with this
[14:49] or raise your hand if you what would be
[14:52] your first port of call.
[14:54] >> Yeah,
[14:55] >> sorry.
[15:00] >> Yes, exactly. Let's imagine that Sarah
[15:01] Chen's gone on hold. You have no idea,
[15:03] right? Uh she's just posted this thing.
[15:05] You need to action it before you go.
[15:07] Well, my first protocol is I go for this
[15:10] particular skill. I'm going to clear my
[15:12] context.
[15:15] I'm going to uh get rid of you. You
[15:19] don't need to be there. And I'm going to
[15:21] say
[15:22] um I'm going to invoke a skill, which is
[15:25] the grill me skill. Let's quickly check.
[15:28] Raise your hands if you don't know what
[15:30] this is.
[15:32] Cool. Oh, sorry. Sorry. Let me be more
[15:34] specific. Raise your hands if you don't
[15:36] know what I'm doing here when I uh do a
[15:39] forward slash and then type something.
[15:42] Anyone everyone kind of understand what
[15:43] that is? I'm invoking a skill. I'm
[15:45] invoking the grill me skill. And what
[15:48] I'm going to do is I'm going to say
[15:49] grill me and I'm going to pass in the
[15:51] client brief.
[15:54] So now the LLM really has only a couple
[15:58] of things here. It just has the skill
[15:59] and it has the description of what I
[16:01] want to do.
[16:04] And this is virtually how I start every
[16:06] piece of work with AI. And while it's
[16:09] exploring the codebase,
[16:11] I'm just going to show you what the
[16:12] grill me skill does. So this is inside
[16:15] the repo so you can check it out. It's
[16:17] extremely short. Interview me
[16:20] relentlessly about every aspect of this
[16:22] plan until we reach a shared
[16:23] understanding. Walk down each branch of
[16:24] the design tree, resolving dependencies
[16:27] one by one. For each question, provide
[16:29] your recommended answer. Ask the
[16:31] questions one at a time. uh blah blah
[16:33] blah. What this does, and what I noticed
[16:36] when I was working with AI, especially
[16:38] in plan mode actually, is it would
[16:42] really eagerly try to produce a plan for
[16:44] me. It would say, "Okay, I think I've
[16:46] got enough. I'm just goof plan."
[16:49] And what I found was that
[16:53] I was really trying to find the words
[16:55] for this for for what I wanted instead
[16:57] of that. And Frederick P. Brooks in the
[17:00] design of design he has a great quote uh
[17:03] talking about the design concept when
[17:06] you're working on something new with
[17:07] someone when you're uh all trying to
[17:10] build something together
[17:12] then there's this shared idea that's
[17:14] shared between all participants and that
[17:16] is the design concept and that's what I
[17:18] realized I needed with Claude I needed
[17:22] I needed to reach a shared understanding
[17:25] I didn't need an asset I didn't need a
[17:27] plan I needed to be on the same
[17:28] wavelength as the AI as my agent. And
[17:31] this is an extremely effective way of
[17:33] doing it. So hopefully there we go.
[17:35] Nice. It has done its exploration. First
[17:38] of all, it's invoked a sub agent which
[17:41] spent uh 97 93.7K tokens on Opus.
[17:47] Um and it's asked me the first question.
[17:51] Cool. We can see that even though the
[17:52] sub agent burned a ton of tokens, I
[17:55] haven't actually um uh increased my
[17:58] token usage that much. Raise your hand
[18:00] if you don't know what sub aents are.
[18:02] It's an important question. Everyone
[18:05] kind of clear what sub aents are? Okay,
[18:06] I'll give a brief definition which is
[18:08] that this this sub aents thing here,
[18:10] this explore sub agents, it has
[18:12] essentially gone and called another LLM
[18:14] which has an isolated context window
[18:18] and then that LLM has reported a summary
[18:20] back. So a sub aent is kind of like a
[18:22] delegation. You're delegating a task to
[18:24] a sub agent. It goes eagerly does all
[18:26] the thing, explores a ton of stuff and
[18:28] then just drip feeds the important stuff
[18:30] back up to the orchestrator agent to the
[18:33] parent agent. So, okay. So, hopefully
[18:36] you guys have seen the same thing. It's
[18:37] done on explore. And we now have our
[18:40] first question. Points economy. What
[18:42] actions earn points and how much? Okay.
[18:45] At this point, you can ask it, by the
[18:47] way, questions to um deepen your
[18:49] understanding of the repo. I obviously
[18:50] know this repo really well because I
[18:52] wrote it, but you might not um know
[18:54] what's going on. So, let's say my
[18:57] recommendation, keep it simple, twopoint
[18:59] sources to start. What's so nice about
[19:01] this is that not only does it give us a
[19:03] question that kind of aligns us here, we
[19:06] get a recommendation, too. And often
[19:08] what I'll find is the AI's
[19:09] recommendations are really good. And so
[19:11] I'll just say skip video, watch events,
[19:13] they're noisy and gameable. I agree.
[19:16] Sarah's asked while keep lessons in the
[19:17] bread and butter.
[19:20] Yeah,
[19:21] looks good, pal.
[19:24] Now, what I usually do is I usually
[19:26] dictate to the AI. I'm usually actually
[19:29] chatting to the AI instead of uh typing
[19:31] here, but uh this is a relatively new
[19:33] laptop and I couldn't get my dictation
[19:35] software working on it um because
[19:37] Windows is crap. Um
[19:41] so should points be retroactive? There
[19:43] are existing lessons progress records.
[19:45] We're completing out timestamps. This is
[19:47] a really nasty question, right? Should
[19:49] we actually go back and backfill all of
[19:51] the lesson progress events? This is a
[19:53] kind of question that you need to be
[19:55] aligned on if you're going to fulfill
[19:56] the feature properly. This is not
[19:58] something I considered and Sarah Chen
[19:59] certainly didn't consider. Do I want it
[20:02] to be retroactive? H. Let's actually do
[20:05] a vote inside here. Should we go back
[20:08] and backfill all the records? Raise your
[20:09] hand if you think we should backfill all
[20:10] the records.
[20:13] Raise your hand if you think we
[20:14] shouldn't backfill all the records.
[20:17] There are a lot of uh fence sitters in
[20:19] the room. I'm going to say,
[20:22] you know, this is the kind of discussion
[20:23] you're sort of having with the AI.
[20:24] You're getting further aligned. Yes, I'm
[20:25] just going to go with this
[20:26] recommendation because I'm lazy.
[20:31] Notice, too, how I'm able to keep in the
[20:33] loop here with AI. I'm not, you know,
[20:35] it's it's pinging me these questions
[20:36] pretty quickly.
[20:39] I'm not having to go off and check
[20:40] Twitter or something. Levels. What's the
[20:43] progression curve? Yeah, that looks
[20:45] about right, for instance. Yes. Okay. So
[20:48] hopefully you should be able to go and
[20:49] um kind of work through this with the AI
[20:52] and essentially try to reach an
[20:55] alignment. And this grill me skill this
[20:57] can last a long time. This can I've had
[21:00] it ask me 40 questions. I've had it ask
[21:02] me 80 questions. I've had some people it
[21:04] asks a hundred questions to literally
[21:06] you're sat there for an hour chatting to
[21:08] the AI. And what you end up with is
[21:11] essentially this conversation history
[21:13] that works really nicely and works
[21:15] really nicely as an asset of the design
[21:17] concept that you're creating. This can
[21:20] also function like this. You can uh have
[21:22] a meeting with someone who's a maybe a
[21:24] domain expert. Maybe I have a meeting
[21:26] with Sarah. I feed that meeting
[21:28] transcript into uh I don't know Gemini
[21:31] meetings or whatever you guys are using.
[21:33] You take that, you feed it into a
[21:35] grilling session and you grill through
[21:37] the assumptions that you didn't have.
[21:39] So, this ends up being a really nice
[21:40] kind of um a really nice way of just
[21:44] taking inputs from the world and then
[21:46] just turning and validating them. So,
[21:49] okay,
[21:51] let's see. I really want to get to the
[21:53] end of this, but I also don't want to
[21:54] just like be sat here talking to the AI
[21:56] in front of you for uh a thousand days.
[21:58] So, I'm just going to say yes.
[22:03] Let's see what happens. So, I tell you
[22:05] what. Um, while you guys sort of have a
[22:07] little fiddle with this locally, let's
[22:09] start a little Q&A session now. And
[22:13] let's see how's this going to work. Can
[22:15] we keep the door closed? I'll turn up
[22:16] the microphone. It's quite noisy. Uh,
[22:20] let's see. Mike, can we uh Door closed?
[22:23] Oh, it has been closed. Mark has
[22:24] answered. Beautiful. So, what I'd like
[22:27] you to do is there any air con? Yeah,
[22:30] there is some air con. I think there is
[22:32] some air con you guys aren't being lit
[22:35] here. I'm being I'm being fried alive
[22:37] here. Uh so what I'd like you to do is
[22:40] go on to the slideo which you can join
[22:42] here. Have a if if you're not taking the
[22:44] exercise, go on to the slideo, have a
[22:46] little fiddle and vote on some good
[22:48] questions. I'm just going to chat to the
[22:50] AI for a second uh until we reach a
[22:53] stopping point. So do streaks earn
[22:54] points?
[22:56] Um, streaks are standalone.
[23:06] Let's see what else it comes up with.
[23:12] Where does gamification UI live? Let's
[23:15] have it in the dashboard.
[23:19] I'm just going to scan these and blast
[23:20] through them basically. So, how we doing
[23:22] with our slido?
[23:24] Okay.
[23:26] Have I tried specit open spec or
[23:28] taskmaster instead of the grill me
[23:30] skill? Do I find them more verbose or a
[23:32] structured alternative? This is a great
[23:33] question. So there are a ton of
[23:35] different frameworks out there that
[23:36] allow you to um sort of build up this
[23:39] planning process for you. I personally
[23:42] believe you at at this stage when
[23:45] there's no clear winner, when there's no
[23:46] kind of like one true way and when
[23:48] things are changing all the time, you
[23:50] need to own as much of your planning
[23:52] stack as you possibly can. What I've
[23:55] noticed and a lot of my students is
[23:59] they tend to overuse a certain stack.
[24:03] they get into trouble and they because
[24:06] they don't own the stack and they don't
[24:07] have observability over the whole thing,
[24:09] they just go, "This isn't working. This
[24:12] sucks." Whereas if um if you have
[24:15] control over the whole thing, then at
[24:17] least you know how to fix it or
[24:19] potentially know how to fix it. So I'm
[24:22] even though I'm sort of giving you uh a
[24:26] stack basically, I believe in inversion
[24:28] of control and you should be in control
[24:30] of the stack.
[24:32] So, can I press zero, please?
[24:38] >> Sorry.
[24:40] >> Sorry, that was a lot of sort of
[24:41] mumbling. Can I
[24:42] >> feedback? You have four options on the
[24:44] bottom of you to hit dismiss.
[24:48] >> Thank you.
[24:50] I'm so sorry. Well, you didn't want to
[24:52] give Claude good feedback. Why? What's
[24:54] wrong with you?
[24:58] Okay cool.
[24:59] Uh many of the questions asked by the
[25:01] grill me skill are not necessarily
[25:02] appropriate for a developer rather a PO
[25:04] in larger teams who should use it. Yeah.
[25:06] Um raise your hand if um you've ever
[25:10] done pair programming. Anyone ever done
[25:12] pair programming? Right. Keep put your
[25:15] hands down and raise your hand again if
[25:16] you've ever done a pair programming
[25:18] session with an AI.
[25:20] Right. How did it go? Was it good? You
[25:23] enjoy it? I think pair programming
[25:25] sessions with AI is a great idea because
[25:27] you've got a third person in the room
[25:28] who will relentlessly quiz you and ask
[25:30] you questions. It should if you don't
[25:32] know the answer, it should be you, the
[25:33] domain expert and the AI in the same
[25:35] room. If you have a question about
[25:37] implementation, it should be you, a
[25:39] fellow developer and the AI in the same
[25:41] room. You know, you can be sort of
[25:43] working through these questions in your
[25:44] team. And I think actually we're going
[25:47] to look at implementation in a bit and
[25:49] we're going to see how you can make
[25:50] implementation so much faster. And but I
[25:54] think the really crucial decisions, the
[25:55] ones you need humans for, you actually
[25:57] need a lot of humans and it doesn't
[25:59] really matter how many humans are in
[26:01] there. You can actually throw a bunch
[26:02] like a kind of like mob programming with
[26:04] AI essentially.
[26:07] Uh what's my favorite metaprompting
[26:08] tool? I think I kind of answered that.
[26:10] Uh there's no air con. Let's just live
[26:12] with it. Uh, how do I use the
[26:14] conversation as an asset after the grill
[26:16] me session? Well, we're going to get
[26:18] there.
[26:20] Um, okay. So, I really want to
[26:24] I want to speed this up sort of
[26:25] artificially.
[26:28] >> Just what
[26:30] >> I This is the thing. So, someone just
[26:32] said, "Okay, Ralph loop this." But this
[26:33] is crucial because I can't loop over
[26:36] this, right? I can't um I think of there
[26:40] as being two types of tasks in the AI
[26:42] age where you have human in the loop
[26:45] tasks where a human needs to sit there
[26:47] and do it which is this we are the human
[26:51] in the loop with multiple humans in the
[26:52] loop and there are AFK tasks there are
[26:55] tasks where the human can be away from
[26:56] the keyboard and it doesn't matter
[26:58] implementation as we'll see can be
[27:00] turned into an AFK task but planning
[27:03] this alignment phase has to be human in
[27:06] the loop has to be.
[27:09] So, I've got to do it, unfortunately.
[27:11] Um, I don't know. Uh, give me a long
[27:16] list of all your recommendations.
[27:20] I'm running a workshop right now,
[27:24] so I artificially
[27:26] need you to
[27:28] pull more weight.
[27:31] So, let's see what it does. Uh, let's
[27:34] answer a couple more questions while
[27:35] it's doing its thing.
[27:37] What is my opinion on PMS or other
[27:39] non-dev rolls vibe coding task?
[27:45] Um, I'm going to return to this later. I
[27:48] think I'm going to leave this
[27:49] unanswered.
[27:51] A bit of mystery.
[27:53] I notice I'm not using the ask user
[27:55] questions UI for grill me. Why? Um,
[27:57] there's a specific uh UI that you can
[28:00] bring up in claude code which I'll
[28:02] answer this just quickly. uh ask me a
[28:05] question using the ask user question
[28:09] tool.
[28:10] And this UI um is just sort of broken in
[28:13] Claude and I really hate it.
[28:16] You notice I'm using Claude, but I don't
[28:19] like Claude very much. Like you you
[28:21] really are free with this method to
[28:22] choose any um system you like. And this
[28:24] is what the UI looks like. It's very
[28:26] pleasing when you first encounter it,
[28:27] but then you realize it is actually
[28:28] broken in a ton of different ways.
[28:32] All right, what did it come back with?
[28:33] Oh, blime me.
[28:35] Oh no.
[28:38] So,
[28:40] while this is doing its thing, let me do
[28:41] some teaching in the meantime. The plan
[28:44] here is that we take our grill me skill
[28:47] and we need to essentially find some way
[28:49] of turning it into a destination.
[28:53] We need to go down to the uh we
[28:57] essentially need to we're figuring out
[28:59] the shape of this. That's what we're
[29:01] doing. figuring out the shape of the
[29:03] tasks during the grilling session. And
[29:06] in order to turn it into a bunch of
[29:09] actionable actions for the AI, we
[29:12] essentially need to figure out the
[29:14] destination. We need to know where we're
[29:15] going. We need to know the shape of this
[29:17] entire thing. So I think of there as
[29:19] being two essential documents that we
[29:21] need. We need a document that documents
[29:25] the destination.
[29:27] Oh no,
[29:29] it's so not bright enough. There we go.
[29:33] Still not bright enough. There we go. We
[29:35] need something to document the
[29:36] destination and we need something to
[29:39] document the journey. In other words, we
[29:41] need something a document that's going
[29:43] to figure out what this even looks like
[29:46] in all of its user stories and figure
[29:48] out a definition of done. And then we
[29:50] need to figure out what the split looks
[29:52] like. So that's where we're going to go
[29:54] to next. So once we finish with the
[29:56] grilling session.
[29:59] Yeah, it looks great. Fantastic. I love
[30:01] it. It answered it answered 22 of its
[30:04] own questions. There you go. That's
[30:05] quite representative of what a grilling
[30:07] session looks like.
[30:09] So at this point now I have used 25k
[30:14] tokens and all of that or loads of that
[30:17] stuff is gold. I want to keep that
[30:19] around. I've I've got 25k great tokens
[30:23] there. And what I want to do is kind of
[30:25] summarize it in some kind of destination
[30:27] document. So this is um the next
[30:30] exercise where we're going to
[30:35] uh we're going to write a product
[30:37] requirements document. And the product
[30:40] requirements document or the PRD is
[30:43] essentially that's its function. It's
[30:46] the destination document. And it sort of
[30:48] doesn't matter what shape it is. I've
[30:51] got a shape that I prefer and that I
[30:53] quite like, but you can just choose your
[30:55] own shape or whatever your company uses.
[31:00] And all we're really doing is too
[31:03] worried about that.
[31:05] All we're really doing is summarizing
[31:07] the design concept that we have so far.
[31:10] And
[31:12] the So let let's try this. So I'm going
[31:15] to initiate this. I'm going to say zoom
[31:17] all the way to the bottom. All I'm going
[31:19] to do is just say write a PRD.
[31:23] And we can take a look at that skill
[31:24] now.
[31:26] Write a PRD.
[31:29] So this skill,
[31:31] it does a few things. It first asks the
[31:35] user for a long detailed description of
[31:36] the problem. You can use write a PRD
[31:38] without grilling first, but I just like
[31:39] to grill first and then write the PRD
[31:41] afterwards. Then you can um get it to
[31:44] explore the repo, which we've kind of
[31:46] already done. Then we get it to
[31:49] interview the user relentlessly. So have
[31:50] a kind of grilling session again. And
[31:52] then we start um putting together a PRD
[31:56] template. So this is available in the
[31:57] repo if you want to check it out. And
[31:59] essentially this is what it looks like.
[32:01] We've got some problem statements, the
[32:02] problem the user is facing, the solution
[32:04] to the problem, and a set of user
[32:06] stories. And these user stories sort of
[32:08] define what this is. You know, as you
[32:11] you guys have probably seen things like
[32:12] this if you've been a developer at all.
[32:14] um you know there are cucumber is a
[32:16] language you can use to write these in
[32:17] or we just sort of um uh write them
[32:20] ourselves essentially. Then we have a
[32:22] list of implementation decisions that
[32:24] were made and a list of crucially
[32:26] testing decisions too. So
[32:31] I'm going to run this. Okay. And so it's
[32:33] finished its thing. Ah
[32:37] Windows let me close the thing. Thank
[32:39] you. I don't know why I bought a Windows
[32:41] laptop. I think I just I like the
[32:43] challenge. Um
[32:46] so the first thing that it's going to
[32:47] give me are a set of proposed modules it
[32:51] wants to modify.
[32:54] Now there's a deep reason why I'm
[32:55] thinking about this. So this is at this
[32:58] stage we have an idea. We have sort of
[33:02] speced out the idea. We've reached a
[33:04] sort of understanding of what we're
[33:06] trying to do. And then we need to start
[33:09] thinking about the code because at this
[33:11] point we need to this is not specs to
[33:14] code. This is not where we're ignoring
[33:16] the code. We actually keep the code in
[33:18] mind throughout the whole process. And
[33:21] the way I like to do this is I like to
[33:23] just sort of think about a set of
[33:24] proposed modules to modify. We're going
[33:26] to return to this this idea of
[33:29] continually designing your system and
[33:31] keeping your system in mind. So it's
[33:33] it's saying recommend test for the
[33:34] gamification service is the only deep
[33:36] module with meaningful logic. These
[33:38] modules look right. Yeah, that's good.
[33:44] And it's going to ping out a PRD.
[33:48] Now for ease of setup, I've got it so
[33:51] that it creates a set of issues locally.
[33:54] So it's just going to create essentially
[33:55] a PD inside this issues directory. But
[33:59] the way I usually do it, and you can
[34:02] check this out yourself, is you can go
[34:04] to my um essentially what I consider my
[34:06] work repo, which is
[34:08] github.com/mattpocco/course
[34:12] video manager up here. And in here, this
[34:16] is essentially a app that I create um
[34:19] that I use all the time to record my
[34:21] videos and things like this. I think
[34:22] I've recorded like I pulled down the
[34:25] sets. I think I've recorded like a
[34:26] thousand videos in here or something
[34:27] nuts. Um, and you can see here that it's
[34:30] got 744 closed issues. And this is
[34:33] essentially all of the uh PRDs and all
[34:36] of the implementation issues that I've
[34:38] put into here. So, this is how I usually
[34:39] like to do it.
[34:42] So, that's what I'm doing with the There
[34:45] we go. Yeah, I'm just going to say yes
[34:47] and uh
[34:49] and get that issue out. Let's see. It is
[34:52] inside here. So, we got the problem
[34:54] statement. people sign up for courses,
[34:58] uh the solution, the user stories, uh 18
[35:00] user stories, looks nice, some
[35:02] implementation decisions, level
[35:03] thresholds, etc. This is enough
[35:05] information. We've kind of clarified
[35:07] where we're going and what we're doing.
[35:09] So that's what we do. We essentially
[35:11] have a grilling session and we've
[35:12] created an asset out of it. Now, raise
[35:15] your hand. Should I be reviewing this
[35:17] document? Raise your hand if you think I
[35:20] should be reviewing the document.
[35:22] Yeah, I don't I don't look at these. I
[35:24] don't look at these. The reason I don't
[35:27] look at these is because what am I
[35:29] testing at this point? What am I like
[35:31] when I read it?
[35:33] What am I testing? What am I what are
[35:34] the failure modes I'm trying to test
[35:36] for? I know that LLMs are great at
[35:38] summarization because they are they're
[35:40] really good at summarization. I have
[35:42] reached the same wavelength as the LLM,
[35:45] right? Using the grill meme skill, we
[35:46] have a shared design concept. So if I
[35:48] have a shared design concept, all I'm
[35:50] doing is I'm just essentially checking
[35:53] the LLM's ability to summarize.
[35:56] So I don't tend to read these.
[36:00] Let's have let's have a Q&A because I
[36:02] can feel you guys are itching for it.
[36:03] And then I think we might have like I
[36:06] don't know just a five minute comfort
[36:07] break just to rest my voice and so you
[36:08] can catch up with the exercises for a
[36:10] minute if that's all right. So let's
[36:11] have a little Q&A sesh. Uh, if I don't
[36:15] like clawed code, which one do I
[36:17] actually like? Um,
[36:20] uh, have you ever heard the phrase, um,
[36:23] uh, democracy is the worst way to run a
[36:25] country apart from all the other ways?
[36:27] That's how I feel about claw code.
[36:30] Uh, we've answered that one.
[36:33] Uh,
[36:34] what's your thoughts on developers
[36:36] needing to very deeply understand
[36:37] Typescript now that fix the TS make no
[36:40] mistakes exist? I don't understand the
[36:42] phrasing of this but I think I
[36:44] understand the meaning which is that
[36:48] I believe that code is very important
[36:50] and this is kind of going to feed
[36:52] through the whole session and that bad
[36:54] code bases make bad agents. If you have
[36:57] a garbage codebase you're going to get
[36:59] garbage out of the agent that's working
[37:01] in that codebase. We'll talk more about
[37:02] that in a bit. And so I think
[37:04] understanding these tools very deeply,
[37:06] understanding code deeply is going to
[37:08] make you a much much better developer
[37:10] and get more out of AI.
[37:14] Uh, and that answers that question too.
[37:16] Sweet.
[37:19] Uh, get out of it. There you are.
[37:24] Now that we have 1 million tokens
[37:25] available, do we ever actually want to
[37:27] take advantage of that?
[37:30] I've noticed that the dumb zone has
[37:31] become less dumb lately. Okay, great
[37:33] question. This goes back to our kind of
[37:35] initial idea on the dumb zone.
[37:41] Uh
[37:44] I um I recorded my Claude Code course
[37:46] using a 200k context window and on the
[37:49] day that I launched the course, they
[37:50] announced the 1 million context window.
[37:53] My take on this is that what Claude code
[37:54] did is they essentially just did this.
[37:58] They shipped a lot more dumb zone to you
[38:01] essentially. Now, this is good for tasks
[38:03] where you want to retrieve things from a
[38:06] large context window. If you want to
[38:07] pass five copies of War and Peace or
[38:09] something to it, and you want to find
[38:11] out all the things that uh
[38:14] uh I can't remember a character from War
[38:15] and Peace. Uh why did I start with that?
[38:18] It's good for retrieval. It's less good
[38:20] for coding. So, I consider that it is
[38:24] about 100K at the moment is the smart
[38:28] zone. the smart zone will get bigger and
[38:30] that will be a really nice improvement.
[38:33] So folks, we're going to take like a
[38:34] five minute comfort break if that's all
[38:36] right just for my voice and so maybe you
[38:38] can have a little move around or
[38:39] something or grab a drink. I can just
[38:41] notice some sleepy eyes and I want to
[38:42] make sure that we're awake for the next
[38:44] bit if that's all right. So we'll take
[38:46] five minutes and I'll see you back here
[38:49] then. All right.
[38:51] So we have our PR which I'm not going to
[38:55] read a kind of destination document.
[38:58] Let's quickly scan for any good
[38:59] questions before we zoom ahead.
[39:02] And
[39:05] rediscovering the role of software
[39:07] engineer in today's world. Top three
[39:08] disciplines you recommend. Um, taekwondo
[39:12] is good. I've heard I' have no idea how
[39:13] to answer this question. Um, thank you
[39:16] for asking it though. Um, top three
[39:18] disciplines I recommend.
[39:21] >> I mean, sorry,
[39:22] >> plumbing.
[39:23] >> Plumbing is a good one. Yeah. Yeah.
[39:24] Yeah. I don't know if that's a
[39:25] discipline. The plumbers I've hired are
[39:27] not usually very disciplined. Um,
[39:30] right.
[39:32] So, okay, we now have our destination.
[39:34] Okay. Um,
[39:37] perfect.
[39:39] So, how do we actually get to our
[39:40] destination? How do we We have a sort of
[39:42] vague PRD. How do we split it so that we
[39:46] don't put things into the dump zone? In
[39:49] other words, we have our number four.
[39:50] How do we split it into this kind of
[39:52] multi-phase plan? Well, probably what
[39:54] you would do at this point is you would
[39:55] say, "Okay, Claude, give me a
[39:57] multi-phase plan that gets me to this
[39:59] destination." Right? That sort of makes
[40:01] sense. This is what we've been doing
[40:02] before, but I have um a sort of better
[40:04] way of doing it now, which is that
[40:08] I like creating a canban board out of
[40:12] this. Raise your hand if you don't know
[40:14] what a canban board is.
[40:17] Cool. Okay. A camon board is essentially
[40:19] just a set of tickets that you put on
[40:22] the wall that have blocking
[40:23] relationships to each other. So, we're
[40:25] going to see what it kind of looks like
[40:26] here. This is how we've worked um as
[40:30] developers for a long time, really since
[40:31] agile came around. And what it does, we
[40:35] can see it here. It has proposed that we
[40:38] split this setup into um five different
[40:42] tasks. Here we have the first one which
[40:44] is the schema and the gamification
[40:46] service. Yeah, that looks pretty good.
[40:48] This is blocked by nothing. And we can
[40:51] even see here that it's a it's given it
[40:53] a type of AFK, too. Remember I talked
[40:55] about human in the loop and AFK earlier.
[40:57] This is an AFK task. This is something
[40:58] we can just pass off to an agent to do
[41:00] its thing. Streak tracking. Okay, that
[41:02] looks good.
[41:04] Uh then wire points and streaks into
[41:07] lessons quiz completion. This is blocked
[41:08] by one and two. Retroactive backfill.
[41:11] This is blocked only by one. And then
[41:14] this one here is blocked by all of the
[41:16] tasks. Cool.
[41:19] H. Now I consider this, you could say,
[41:23] why don't we just make this sort of
[41:25] generation of the issues? Why don't we
[41:26] just hand that over to the AI? Why do I
[41:28] need to be involved here? Right? Because
[41:30] it's given us quite a good selection of
[41:32] tools here. Why do I need to review this
[41:34] and sort of figure out what's next? Now,
[41:37] my take here is that this is really
[41:40] cheap to do, like very quick to do once
[41:42] I've done the PR. And I can immediately
[41:44] see some issues here.
[41:47] There's a really, really important
[41:49] technique when you're kind of figuring
[41:51] out what the shape of this journey
[41:53] should look like. And
[41:57] it sort of comes to this very classic
[42:00] idea uh which comes from pragmatic
[42:02] programmer called tracer bullets or
[42:04] vertical slices.
[42:07] and traceable. It's really transformed
[42:09] the way I think about actually
[42:11] getting AI to pick its own tasks.
[42:14] Systems have layers, right? There are
[42:17] layers in your system. These might be
[42:19] different deployable units. You might
[42:21] have a database that lives somewhere.
[42:23] You might have an API that lives maybe
[42:24] close to the database but in a separate
[42:26] bit. You might have a front end that
[42:27] lives somewhere totally different like a
[42:29] CDN. Or within these deployable units,
[42:32] you might have different layers within
[42:34] those. In for instance the codebase that
[42:36] we're working in, we have a ton of
[42:38] different services servers. We have a
[42:41] quiz service, a team service, user
[42:43] service, coupon service, course service.
[42:45] And these services have dependencies on
[42:47] each other. So they're kind of like
[42:48] individual layers. Well, what I noticed
[42:53] is that AI loves to code horizontally.
[42:57] So it loves to code layer by layer. So
[43:00] in other words, in phase one, it will do
[43:02] all of the database stuff, all of the
[43:03] schema, all of the, you know, all the
[43:06] stuff related to that unit. Then it will
[43:08] go into phase two and do all of the API
[43:10] stuff. Then it will add the front end on
[43:12] top of that.
[43:14] Does can anyone tell me what's wrong
[43:16] with that picture? Why is that not a
[43:18] good thing to do? Raise your hand if you
[43:20] have an answer.
[43:21] >> Yeah.
[43:21] >> Have the whole feedback loop.
[43:23] >> Exactly. You don't get feedback on your
[43:26] work until you've really started or
[43:29] completed phase three.
[43:32] So what you really need to do is you're
[43:35] not until you get to phase three, you're
[43:36] not actually testing that all the layers
[43:38] work together.
[43:41] You haven't got an integrated system
[43:42] that you can test against. And so
[43:45] instead you need to think about vertical
[43:47] layers. You need to think about thin
[43:49] slices of functionality that cross all
[43:52] of the layers that you need to. And this
[43:55] is a much better way to work, much
[43:57] better way for the AI to work too
[43:59] because it means at the end of phase one
[44:01] or during phase one, it can get feedback
[44:02] on its entire flow. So what this means
[44:05] to me
[44:07] is inside the PRD to issues skill up
[44:11] here I have got break a PRD into
[44:15] independently grabbable issues using
[44:17] vertical slices traceable it's written
[44:19] as local markdown files we first locate
[44:21] the PRD
[44:23] uh again explore the codebase if this is
[44:25] a fresh session we draft vertical slices
[44:28] so we break the PRD into tracer bullet
[44:30] issues a traceable bullet by the way is
[44:33] Uh, essentially when you're like an
[44:35] anti-aircraft gunner, it's quite a
[44:37] violent idea actually, uh, and you're
[44:40] looking up in the sky and it's night, if
[44:42] you're just shooting normal bullets, you
[44:43] have no idea what you're firing at,
[44:45] right? You could just be, you know, you
[44:46] see the plane, but you don't see where
[44:47] your bullets are going. Tracer bullets
[44:49] is they attach a tiny bit of
[44:50] phosphoresence or phosphor or something
[44:53] to make it glow as it goes. So, this
[44:56] means that every sixth bullet or
[44:57] something, you actually see a line in
[44:58] the sky. So, you have feedback on where
[45:01] you're aiming. So this is what this is
[45:03] the idea here is that we increase our
[45:05] level of feedback and we get near
[45:07] instant feedback on what we're building
[45:09] because without that the AI is kind of
[45:11] coding blind until it reaches the later
[45:13] phases. We've got some vertical slice
[45:15] rules. We quiz the user and then we
[45:17] create the issue files. So what I see
[45:20] here is that even though I've I've told
[45:24] it to do vertical slices, it's proposing
[45:27] to
[45:29] create the gamification service
[45:32] first on its own. That's just one slice
[45:35] there. And that to me feels like a
[45:36] horizontal slice. What I want to see in
[45:38] the first vertical slice especially is I
[45:41] want to see the schema changes or some
[45:42] schema changes. I want to see some new
[45:45] service being created and I want a
[45:47] minimal representation of that on the
[45:48] front end. So I want it to go through
[45:50] the vertical slices, not just the
[45:52] horizontal. Does that make sense? Okay.
[45:55] So I'm going to give the AI a
[45:57] rollicking.
[45:59] Uh bad boy. No,
[46:02] I'm not going to waste tokens just being
[46:04] just memeing. Um so the first slice is
[46:08] too horizontal. I'll just start with
[46:10] that and see if it picks it up. Does
[46:12] that make sense as a concept? And I
[46:14] think having that um what I really like
[46:17] about going back to those old books is
[46:20] that we are really trying to in this day
[46:22] and age like get
[46:25] uh verbalize best software practices in
[46:28] English. And these books, 20-year-old
[46:30] books have already done that. And it's
[46:32] an absolute gold mine if you want to
[46:34] throw that into prompts. But even with
[46:35] that, it's not going to um not going to
[46:37] do a perfect job each time. So, award
[46:41] points for lesson completion visible on
[46:43] dashboard. Yes, that's a beautiful
[46:45] vertical slice because it's definitely a
[46:48] big chunk of stuff. It's doing a lot of
[46:49] stories there, but we're going to see
[46:51] something visible at the end and the AI
[46:53] will then just be able to add to that.
[46:55] You see why that's preferable to the
[46:56] first one. Cool. Uh, looks great.
[47:01] So, we're getting closer now. And anyone
[47:03] following at home as well, you're not at
[47:05] home, but you get the idea. um we'll
[47:08] hopefully see the same thing too and
[47:09] start developing the same instincts.
[47:11] Let's open up for questions just while
[47:13] I'm sort of creating these GitHub issues
[47:17] or not GitHub issues uh local issues.
[47:20] When will I stop using Windows? Never.
[47:22] What is your uh Okay, we'll get to that
[47:24] later.
[47:26] How does AI um decide when to stop
[47:28] grilling? Because AI can ask
[47:29] incessantly. Can we have a smarter way
[47:31] to decide the stop point? Yeah, it does
[47:33] tend to really um those grilling
[47:35] sessions can be super intense. And the
[47:37] thing about these skills is you can tune
[47:38] them if you want to. If you feel like
[47:40] the AI is just absolutely hammering you,
[47:42] hammering you, hammering you, then you
[47:43] can just tell it to just pull back a
[47:46] little bit or get it to do, you know,
[47:48] stop points and that kind of thing. So,
[47:49] if that's a failure mode that you run
[47:50] into a lot, then you just, you know,
[47:52] change the skill.
[47:55] Uh do I still use uh be extremely
[47:57] concise, sacrifice grammar for the sake
[47:58] of concision? Um there was a tip that I
[48:01] gave folks um five months ago which is
[48:05] that to basically increase the
[48:07] readability of your plans. So when
[48:09] you're using plan mode then you can put
[48:11] it in your claw.md
[48:13] and you can say okay yeah approve that.
[48:17] Let's open up claw.md.
[48:21] Do I have a claw.md? Maybe I don't. I I
[48:23] really don't use clawd very much. I'm
[48:25] just going to put a dummy inside here.
[48:28] Um when no when talking to me
[48:33] uh sacrifice grammar for the sake of
[48:35] concision
[48:40] and this um prompt was uh really useful
[48:43] to me when I was reading the plans
[48:45] because it meant that the plans would
[48:46] come out and they would be very concise,
[48:48] really nice, easy to read, often very uh
[48:50] concise. But I've since dropped this
[48:54] idea in preference to a grilling session
[48:57] because what I noticed was it just I
[48:59] didn't want to read the plans. I wanted
[49:00] to get on the same wavelength as the
[49:02] LLM. I wanted it to ask aggressive
[49:03] questions to me. And when I stopped
[49:05] reading the plans, I stopped needing
[49:06] them to be concise. So I think of the
[49:09] plans really in the destination document
[49:11] as uh the end state. And I don't need
[49:13] that end state to be concise. Hopefully
[49:15] that answers your question.
[49:19] Uh, what do I think will be the outcome
[49:22] of the Mexican standoff of future roles
[49:23] of PMS and other roles converging? Uh, I
[49:26] have no idea. I'm not a pundit. I have
[49:28] no idea.
[49:30] Uh, okay. So, we should uh after a
[49:34] couple of approvals
[49:37] uh end up with a set of issues. Now,
[49:40] these issues that we're creating,
[49:43] they're designed to be independently
[49:44] grabbable, which means that this canon
[49:47] board ends up looking kind of like this
[49:51] where you have
[49:53] essentially a set of tickets with a
[49:55] whole load of independent relationships.
[49:57] So, this one needs to be done before
[49:59] this one. This one needs to be done
[50:00] before this one. And this one, let's say
[50:03] we got another one over here. This one
[50:05] needs to be done before this one. This
[50:07] means that you can start to parallelize.
[50:11] You can start to get agents working at
[50:13] the same time on these tasks because
[50:15] yeah, this one needs to be done first
[50:18] and then these two
[50:21] can be grabbed at the same time by
[50:24] independent agents. Raise your hand if
[50:26] you've done any kind of parallelization
[50:28] work with agents. Okay, cool. So this
[50:32] allows you um to turn those plans into
[50:35] optimally kind of like into directed
[50:38] asyclic graphs essentially where you
[50:39] just are able to um essentially have
[50:43] three phases here where you have phase
[50:47] one.
[50:49] Let me grab move that
[50:51] uh above this line here you do this one.
[50:56] Then phase two you do the two below it.
[50:58] And then phase three you do this third
[51:00] one. and add it onto there. And when you
[51:03] think about there could be this could
[51:05] this is a relatively simple plan but you
[51:07] could have many different plans
[51:08] operating all at once. It means that you
[51:10] can do really nice parallelization and
[51:12] we'll talk more about that in a bit. But
[51:14] that's why I prefer a canon board set up
[51:17] like this to a sequential plan because a
[51:20] sequential plan can really only be
[51:22] picked up by one agent. So this where
[51:26] did it go? Over here.
[51:29] Yeah, this plan here, this is really
[51:32] only one loop, right? Only one agent can
[51:35] work on these because we have numbered
[51:36] phases and they're not parallelizable.
[51:38] Does that make sense? Cool.
[51:42] So, we've got our issues. Ah, come on.
[51:44] Stop asking me for Oh, no. It's creating
[51:46] them on GitHub. I really don't want
[51:48] that.
[51:49] Oh, no. You fool.
[51:53] Create them in issues instead.
[51:58] No, that's not precise enough. Uh, you
[52:00] fool. Create them in local markdown
[52:04] files instead referencing the local
[52:08] version.
[52:11] Sorry about this.
[52:15] So, once we get to this point, we have a
[52:18] bunch of issues locally that we can
[52:21] start um looping over and implementing.
[52:25] And it's at this point that the human
[52:27] leaves the loop. So, so far,
[52:31] let me pull up a a proper overview of
[52:33] this kind of flow that we're exploring
[52:35] here.
[52:37] So far,
[52:40] we have taken an idea,
[52:43] zoom this in a bit for the folks at the
[52:44] back,
[52:47] and we've grilled ourselves about the
[52:49] idea.
[52:51] We can skip over research and prototype,
[52:52] but we've turned that into a PRD into a
[52:54] destination document. We've then turned
[52:57] that PRD into a canon board and all of
[53:00] those steps are human reviewed. And now
[53:05] the implementation stage, we step back
[53:08] and we let an agent um work through that
[53:10] camp board or multiple agents work
[53:12] through the camp board.
[53:15] Now, what this means is that yeah, we've
[53:17] spent a lot of time planning here, but
[53:19] it means that we've queued up a lot of
[53:21] work for the agent. We can think of this
[53:23] as kind of like the day shift and the
[53:24] night shift. This is the day shift for
[53:26] the human, right? Planning everything,
[53:28] getting all the uh all the stuff ready
[53:30] and then once we kick it over to the
[53:32] night shift, the AI can just work AFK.
[53:34] But what does that look like?
[53:37] Well, so I'm just going to Oh, yeah.
[53:40] Just allow it. It's perfect.
[53:42] So this looks like if we head to the
[53:45] next exercise
[53:47] which is
[53:51] uh in fact the last exercise here
[53:52] running your AFK agent.
[53:55] Now
[53:57] I've called this uh Ralph really because
[53:59] it is a it is essentially a Ralph loop
[54:02] and this prompt here I want to walk
[54:04] through this really closely.
[54:06] The first thing it's doing here is we're
[54:08] essentially going to run Claude and
[54:10] we're going to basically try to
[54:12] encourage it to work um completely AFK.
[54:16] I'll show you what the sort of script
[54:17] for this looks like in a minute, but you
[54:19] say, okay, local issue files from issues
[54:21] are provided at the start of context.
[54:24] The way we do that is if you look inside
[54:26] once.sh SH here inside the repo
[54:29] we have
[54:31] uh it's essentially just a bash script
[54:34] where we grab all of the issues um which
[54:37] are inside markdown files and we cap
[54:40] them into a local variable. So that
[54:42] issues variable contains all of the
[54:43] issues that are in our entire backlog.
[54:47] Then we grab the last five commits. I'll
[54:50] explain why in a minute. And then we
[54:52] grab the prompt and we just run claude
[54:54] code with permission mode except edits
[54:58] and then just essentially just pass it
[55:00] all of the information. This is what the
[55:02] implement looks like. So that's what a
[55:05] very very simple version of this sort of
[55:07] loop looks like. And of course this is
[55:08] not a loop. This is just running it
[55:10] once.
[55:12] The loop is in the AFK version up here
[55:15] which is uh a fair bit more complicated.
[55:18] And the crucial part here is we're
[55:20] running it in Docker sandbox as well. So
[55:23] I I don't want you to install Docker on
[55:25] your laptops because we're just going to
[55:27] be like you need to download a special
[55:28] image and we're going to tank the
[55:30] conference Wi-Fi if we do that. So I I
[55:32] am going to demo this to you, but you um
[55:34] won't need to run this yourself. But
[55:35] I'll talk through this in a minute. But
[55:37] essentially this once loop here,
[55:44] we're just essentially running one
[55:46] version of the thing that we're going to
[55:48] loop again and again and again. So this
[55:50] is kind of like the human in the loop
[55:51] version. And this is essential. Running
[55:54] this again and again is essential
[55:55] because you're going to see what the
[55:56] agent does and see how it ends up
[55:59] working. And any tuning that you need to
[56:01] add to the prompt, then you can do that.
[56:03] Let's go to the prompt.
[56:06] Um,
[56:09] so local issue files are being passed
[56:11] in. You're going to work on the AFK
[56:13] issues only. That makes sense. If all
[56:16] AFK tasks are complete, output this no
[56:18] more tasks thing. And then the next
[56:20] thing, pick the next task. So
[56:26] what we're doing here is we're
[56:27] essentially running a backlog or
[56:30] curating a backlog that our AFK agent is
[56:32] going to pick up. That's the purpose of
[56:34] all of these um setups in the beginning
[56:38] in this uh all the way to this canon
[56:41] board here. We're just essentially
[56:42] creating a backlog of tasks for the
[56:44] night shift to pick up and the night
[56:48] shift this sort of Ralph prompt here.
[56:50] It's got its own idea about what a good
[56:53] task looks like. So next pick up I'm I
[56:56] did talk about parallelization. I will
[56:58] show you this later, but this is
[56:59] essentially a sequential loop here.
[57:01] we're just going to run one coding agent
[57:03] at a time. This is a good way to just
[57:04] sort of um get your feet wet
[57:06] essentially.
[57:08] So, it's prioritizing critical bug
[57:10] fixes, development infrastructure, then
[57:13] traceable bullets, then polishing quick
[57:15] wins and refactors. And then we just
[57:17] have a very simple kind of instruction
[57:19] on how to complete the task. So, we
[57:22] explore the repo, use TDD to complete
[57:24] the task. I'll get to that later.
[57:27] And we then run some feedback loops. So
[57:30] let's let's just try this and let's just
[57:31] see what happens. So good. It's created
[57:34] the issue files. We should be good to
[57:35] go. I'm going to cancel out of this. I
[57:38] clear and I'm going to run
[57:40] uh where is it? Ralph once.sh. And you
[57:44] can feel free if you're following along
[57:46] to do the same thing.
[57:48] So we can see it's just running Claude
[57:50] inside here with the prompt and with all
[57:53] of the issues that have been passed in.
[57:56] And while it's doing its thing,
[57:59] you probably have some questions about
[58:01] this setup and about the decisions that
[58:03] I've made to essentially delegate all of
[58:06] my coding to AI, right? So, let's let's
[58:09] do a quick Q&A while it's uh getting its
[58:11] feet under.
[58:14] Uh okay.
[58:17] I'm going to just
[58:19] remove those.
[58:23] How do you retain negative decisions?
[58:25] things that you decided against and
[58:26] ration when persisting the results from
[58:28] the Grommy session. A great question.
[58:31] That's a very simple answer which is the
[58:33] in the PRD uh write a PRD section there
[58:37] is a stuff at the bottom a section of
[58:39] the things that are out of scope. So the
[58:41] things we're not going to tackle in this
[58:43] PRD which is very important for giving a
[58:45] definition of done.
[58:47] Feel free to ping on the slido if you've
[58:49] got any more questions.
[58:51] Uh what's my front end workflow? Okay,
[58:53] that's a great question. I'm gonna I'm
[58:55] gonna answer that in a minute, I think.
[58:58] How to deal with agents producing more
[59:00] code that we can review? How to properly
[59:02] parallelize and use multiple agents in a
[59:05] separate way? Okay, that's um there's
[59:06] two questions there. Um raise your hand
[59:10] if you feel like you're doing more code
[59:12] review now than you used to.
[59:16] Yeah, definitely. Um I don't think
[59:19] there's a way to avoid this.
[59:22] If we delegate all of our coding to
[59:25] agents,
[59:27] you notice that the implementation here
[59:29] is really the only AFK bit. We then also
[59:32] need to QA the work and code review the
[59:34] work, right? And if we are running these
[59:38] loops where it's essentially going to
[59:40] implement four issues in one, it's hard
[59:42] to pair that with the dictim that you
[59:46] should keep pull requests small and
[59:47] self-contained, right? like small
[59:50] self-contained pull requests means
[59:52] you're needing to do fewer loops or
[59:55] shorter loops or something. Or maybe you
[59:57] do like a big stack of PRs, but that
[59:59] seems horrible as well. That's still
[1:00:00] just more separated code to review. I
[1:00:03] don't honestly know what the answer to
[1:00:05] this yet. I think we just need to be
[1:00:07] ready to be doing more code review
[1:00:09] essentially, which is not fun. That's
[1:00:11] not a fun thing to say. That's not like
[1:00:12] I don't know. I don't feel good saying
[1:00:14] that, but I do think it's probably the
[1:00:17] the way things are going. It's a great
[1:00:19] question.
[1:00:22] Uh,
[1:00:23] can we grab a couple of questions from
[1:00:25] the room as well? Let's not we won't do
[1:00:27] the mic, but uh raise your hand if
[1:00:28] you've got a question for me
[1:00:29] immediately.
[1:00:31] >> Yeah.
[1:00:32] >> So, the approach looks very linear from
[1:00:34] an idea to QA.
[1:00:38] Of course, the real world is a lot more
[1:00:39] messy. So you have all these ideas that
[1:00:42] are in parallel and full picture and
[1:00:46] while you're working on something else
[1:00:48] comes in.
[1:00:49] >> Yeah.
[1:00:50] >> How do you deal with the messiness? How
[1:00:51] do you feedback?
[1:00:53] >> Great question. So the question was if
[1:00:56] this all looks great if you're a solo
[1:00:57] developer, but actually how do you
[1:00:58] implement this in a team? How do you
[1:01:00] gather team feedback on this? And my
[1:01:03] answer to that is that if you have an
[1:01:04] idea up there and essentially the sort
[1:01:08] of journey from the idea to the
[1:01:11] destination is something you need to
[1:01:13] figure out with the team, right? So all
[1:01:15] of this stuff up here, this is kind of
[1:01:17] like team stuff, you know what I mean?
[1:01:19] So if you have an idea and you do a
[1:01:22] grilling session on it and you have a
[1:01:23] question that you don't know how to
[1:01:24] answer, then you need to loop in your
[1:01:26] team as we described before. Then you
[1:01:28] might need to go, okay, we just need to
[1:01:30] build a prototype of this. We need to
[1:01:32] actually hash this out. We need
[1:01:33] something that the domain experts can
[1:01:35] fiddle with. Oh, okay. We might need to
[1:01:37] integrate a a third party library into
[1:01:39] this. We might need to do some research.
[1:01:41] We might need to actually kind of like
[1:01:43] um ping this back and forth and find a
[1:01:45] third party service that we can get the
[1:01:47] most out of. We might need to go back
[1:01:48] with the information that we gathered
[1:01:50] there to the idea phase. So all the way
[1:01:53] up to the sort of PRD and the journey,
[1:01:55] that's something you need to involve
[1:01:56] your team with. That's something where
[1:01:58] these assets are going to be shared and
[1:02:00] argued over and you're going to have
[1:02:02] requests for comment on them and that
[1:02:04] that loop is going to just keep grinding
[1:02:06] and grinding until you figure out where
[1:02:08] you're going. Once you figure out where
[1:02:10] you're going, then you can start doing
[1:02:12] the came on board the implementation.
[1:02:13] But this is essentially super arguable
[1:02:15] and the you'll be bouncing back and
[1:02:17] forth between the phases. Does that make
[1:02:18] sense? Yeah.
[1:02:20] >> Would you not need a PR for your
[1:02:22] prototype?
[1:02:23] >> Say it again. Sorry.
[1:02:23] >> Would you not want to have a PR for your
[1:02:25] prototype? The question was, do you want
[1:02:27] to go through this whole session just to
[1:02:29] sort of create a prototype? Do you not
[1:02:30] need a PRD for your prototype as well?
[1:02:32] Let's just quickly talk about prototypes
[1:02:34] for a second. Um, there was a question
[1:02:36] about how do you make this work for
[1:02:37] front end?
[1:02:39] Like how do you because front end is
[1:02:41] like really sensitive to human eyes. You
[1:02:43] need human eyes looking at the front end
[1:02:45] all the time to make sure that it looks
[1:02:47] good. AI doesn't really have any eyes.
[1:02:50] It can look at code, but it front end is
[1:02:54] multimodal. And so my experiences with
[1:02:57] trying to plug AI into um let's say
[1:03:01] agent browser or playright MCP to give
[1:03:03] it you can give it tools to allow it to
[1:03:06] look through a front end and sort of
[1:03:07] look at images but in my experience the
[1:03:10] um it's not very good at that yet and it
[1:03:12] can't create a nice front end in a
[1:03:15] mature codebase. It can sort of spit one
[1:03:17] out. But what it can do is you say okay
[1:03:20] uh I want some ideas on how uh this
[1:03:22] front end might look. give me three
[1:03:24] prototypes um that I can click between
[1:03:27] in a throwaway uh throwaway route that I
[1:03:30] can decide which one looks best and you
[1:03:33] take the asset of that prototype and you
[1:03:34] then feed it back into the grilling
[1:03:36] session or you get feedback on it blah
[1:03:38] blah blah blah blah answer your question
[1:03:40] kind of thing the prototype is just you
[1:03:42] know it's messy it's supposed to give
[1:03:44] you feedback early on in the process so
[1:03:46] that's a great way of working with front
[1:03:47] end code great way of looking at
[1:03:48] software architecture in general let's
[1:03:50] go one more question yeah
[1:03:51] >> yes in your system How do you integrate
[1:03:54] respecting an architecture a design with
[1:03:57] API contracts and fitting with a larger
[1:04:00] system
[1:04:02] security constraints? All kinds of
[1:04:03] conraints like that.
[1:04:04] >> Yeah, there's a lot in that question.
[1:04:07] The question was how do you conform with
[1:04:09] existing architecture? How do you do um
[1:04:12] how do you make it conform to the code
[1:04:13] standards like of your codebase or
[1:04:16] >> Yeah. architecture design API security
[1:04:19] rules that constraints your designs.
[1:04:22] >> Yeah.
[1:04:23] I'm going to answer that in a bit if
[1:04:25] that's okay. So hopefully we have
[1:04:28] started to get some stuff cooking.
[1:04:31] It's just pinging on the explore phase
[1:04:34] here.
[1:04:37] Tempted to just start running it AFK.
[1:04:40] Maybe I will, maybe I won't. Um, what
[1:04:44] it's essentially doing is it's exploring
[1:04:46] the repo. It's going to then start
[1:04:47] implementing based on what we wanted.
[1:04:49] Let's actually have one more question
[1:04:50] just while it's running. Yeah.
[1:04:58] Yeah. So the question was why do you not
[1:05:02] get AI to QA?
[1:05:05] AI to QA. I just got jargon overload for
[1:05:08] a second. Um why do you not get AI to uh
[1:05:11] test its own code? Now of course you
[1:05:14] absolutely can. And I think while it's
[1:05:16] doing while it's cooking here, okay,
[1:05:18] it's got a clear picture of the
[1:05:20] codebase. It's assessing the issues.
[1:05:22] It's doing issue O2 is the next task.
[1:05:24] I'm again going to show you that in a
[1:05:26] bit. I think the sort of uh because you
[1:05:28] definitely should do an automated review
[1:05:31] step as part of implementation. So you
[1:05:34] have your implementation. You should
[1:05:35] then because tokens are pretty cheap and
[1:05:37] AI is actually really good at reviewing
[1:05:39] stuff. You should get it to review its
[1:05:41] own code before you then QA it. I found
[1:05:43] that that catches a ton of different
[1:05:45] bugs. And
[1:05:48] the way that works is I will just do a
[1:05:50] little diagram is if you have let's say
[1:05:53] an implementation that's sort of like
[1:05:54] used up a bunch of tokens in the smart
[1:05:56] zone. If you get it to sort of try to do
[1:06:00] its reviewing, it's going to be doing
[1:06:02] the reviewing in the dumb zone. And so
[1:06:05] the reviewer will be dumber than the
[1:06:07] thing that actually implemented it. If
[1:06:08] we imagine this is the uh let's be
[1:06:11] consistent, that's the review. That's
[1:06:14] the implementation.
[1:06:15] Whereas, if you clear the context,
[1:06:19] then you're essentially going to be able
[1:06:22] to just review in the smart zone, which
[1:06:24] is where you want to be.
[1:06:27] Let's see how our implementation is
[1:06:28] doing. Okay, good. It's generating a
[1:06:31] migration. That looks pretty nice. We're
[1:06:33] getting some code spitting out.
[1:06:37] And while I'm sort of like, aha, here we
[1:06:41] go. TDD.
[1:06:43] Let's talk about TDD and then I think
[1:06:45] we'll have a little another little
[1:06:47] break. TDD I found is absolutely
[1:06:50] essential for getting the most out of
[1:06:52] agents. Uh raise your hand if uh you
[1:06:54] know what TDD is. Cool. Okay. TDD is
[1:06:58] testdriven development. What it's
[1:06:59] essentially doing is it's doing a
[1:07:02] something called red green refactor. And
[1:07:04] if you look in the codebase, you'll be
[1:07:05] able to find a um a skill which really
[1:07:09] describes how to do red green refactor.
[1:07:11] and teaches the AI how to do it. So what
[1:07:14] it's doing is it's writing a failing
[1:07:16] test first. So it's saying, okay, I've
[1:07:19] broken down the idea of what I'm doing
[1:07:21] and I'm just going to write a single
[1:07:23] test that fails and then I need to make
[1:07:26] the implementation pass. I have found
[1:07:29] that first of all, this adds tests to
[1:07:31] the codebase and this this tends to add
[1:07:33] good tests to the codebase. And so we've
[1:07:36] got this kind of gamification service.
[1:07:38] It looks like it's using some existing
[1:07:41] stuff to create a test database. Test
[1:07:43] fails because the module doesn't exist
[1:07:44] yet. Okay, we've confirmed red. And then
[1:07:47] it goes and hopefully runs it and it
[1:07:50] passes. I found that uh raise your hand
[1:07:54] if you've ever had AI write bad tests.
[1:07:58] Yeah, it tends to try to cheat at the
[1:08:00] tests because it's sort of doing it in
[1:08:03] layers. it will do the entire
[1:08:04] implementation and then it will do the
[1:08:06] entire test layer just below it. Uh I'm
[1:08:09] just going to say yes, you're allowed to
[1:08:10] use npxv text. And using this technique,
[1:08:14] it generally is a lot harder to cheat
[1:08:18] because it's sort of instrumenting the
[1:08:21] code before it's then writing the code.
[1:08:24] So I find that TDD is so so good for
[1:08:27] places where you can pull it off. And in
[1:08:29] fact, it's so good that I sort of warp
[1:08:31] my whole uh technique around getting TDD
[1:08:34] to work better. I can see some drooping
[1:08:36] eyes. It is so hot in here. You can
[1:08:39] imagine how hot it is up here. Let's
[1:08:40] take another five minute comfort break.
[1:08:41] Let's come back at quarter two. I think
[1:08:46] have a nice generous one. And we'll be
[1:08:48] back in about six, seven minutes and
[1:08:50] I'll talk about how uh I think about
[1:08:53] modules, think about constructing a
[1:08:55] codebase to make this possible. I've
[1:08:57] just been sort of fiddling with the AI
[1:08:59] here and we have end up with some with a
[1:09:01] commit. So we have something to test.
[1:09:04] Issue number two is complete. Here's
[1:09:06] what was done. This is kind of what it
[1:09:08] looks like when a Ralph loop completes
[1:09:10] is you end up with a little summary. Um
[1:09:12] and we have now something we can QA
[1:09:15] because we did the feedback loops or
[1:09:17] because we did the tracer bullets
[1:09:18] because we were uh said okay give us
[1:09:21] something reviewable at the end of this
[1:09:22] we can immediately go and QA it. Now,
[1:09:24] there's nothing uh less exciting than
[1:09:26] watching someone else QA something, but
[1:09:29] hopefully we can have a little play.
[1:09:31] Let's just check that it uh works at
[1:09:33] all. In fact, before I go there, I just
[1:09:36] want to sort of work through what just
[1:09:38] happened, which is we see that it's
[1:09:41] created some stuff on the dashboard
[1:09:45] and it then ran the feedback loops. So,
[1:09:47] it then ran the tests and the types.
[1:09:51] Now TDD is obviously really important
[1:09:53] and it's really important because these
[1:09:55] feedback loops are essential to AI
[1:09:59] essential to get AI to produce anything
[1:10:01] reasonable because without this AI is
[1:10:04] totally coding blind right you have to
[1:10:07] have to um if if your codebase doesn't
[1:10:10] have feedback loops you're never ever
[1:10:13] ever going to get decent AI decent
[1:10:15] output out of AI and often what you'll
[1:10:18] find is that the quality of your
[1:10:20] feedback back loops influences how good
[1:10:23] your AI can code. Essentially, that is
[1:10:24] the ceiling. So, if you're getting bad
[1:10:27] outputs from your AI, you often need to
[1:10:29] increase the quality of your feedback
[1:10:31] loops. We'll talk about how to do that
[1:10:33] in a minute.
[1:10:35] Now, so it ran uh npm run test, npm ran
[1:10:39] type check. It got one type error and it
[1:10:41] needed to fix it with a nice bit of
[1:10:43] TypeScript magic. Very good. Yeah. Typo
[1:10:46] level thresholds number. Okay.
[1:10:49] You see why I stopped teaching
[1:10:50] Typescript because just AI knows
[1:10:51] everything now. Um,
[1:10:54] so and it ran the tests and it passed
[1:10:57] and it's looking good. So we now end up
[1:10:58] with 284 tests in this repo. Pretty
[1:11:01] good.
[1:11:03] I I do find uh front end really hard to
[1:11:06] test here. We're essentially just
[1:11:07] testing the service. So we've created a
[1:11:10] gamification service if we look up here
[1:11:13] and then we have a test for that
[1:11:14] service. You can see the the service and
[1:11:16] the test itself. Now, if I was doing
[1:11:18] code review here, I would then go to re
[1:11:20] I would first go to review the tests,
[1:11:22] make sure the tests were testing
[1:11:24] reasonable things and then go and kind
[1:11:27] of review the code itself just to make
[1:11:29] sure that it's it's not doing anything
[1:11:31] too crazy, right? The essential thing is
[1:11:33] I need to actually um look at the
[1:11:35] dashboard. I'm going to log in as a
[1:11:39] student. Oh, if it'll let me. Maybe it
[1:11:41] won't let me. Come on, son. There we go.
[1:11:44] Let's log in as Emma Wilson. Head into
[1:11:47] courses.
[1:11:49] Uh, let's say I've got an introduction
[1:11:50] to TypeScript.
[1:11:52] Continue learning.
[1:11:54] Uh, yes, I completed this lesson.
[1:11:57] Something went wrong. I imagine it's
[1:11:59] because I don't have
[1:12:02] uh SQLite error. I don't have the right
[1:12:05] table. So, I need a table point events.
[1:12:08] Point events is a strange table name.
[1:12:09] I'm not sure quite what it was thinking
[1:12:10] there. Uh, let's suspend. Let's run uh
[1:12:14] npmdb migrate or push, I think.
[1:12:19] Can't remember which one it was, but you
[1:12:22] kind of get the idea, right? I I'm not
[1:12:23] going to subject you to uh watching me
[1:12:25] do QA because it's so dull. Um but at
[1:12:28] this point, I would essentially go back
[1:12:30] in. I would um let me open the project
[1:12:33] back up.
[1:12:35] Uh, and I would this this is a crucial
[1:12:38] moment. Um, and it's so important to um
[1:12:42] QA it manually here because QA Oh dear.
[1:12:45] Oh dear. What's going wrong? There we
[1:12:46] go. QA is how I then um impose my
[1:12:51] uh opinions back onto the codebase, how
[1:12:54] I impose my taste. What you'll often
[1:12:56] find is that um there are teams out
[1:12:59] there who are trying to automate
[1:13:00] everything like every part of this
[1:13:02] process and they will tend to
[1:13:06] uh if you try to like automate the sort
[1:13:08] of creation of the idea, automate uh the
[1:13:11] QA, automate the research, automate the
[1:13:13] prototype, you end up with uh apps that
[1:13:16] I feel just lack taste and are bad.
[1:13:22] maybe they just don't work or they they
[1:13:24] don't even work as intended or there's
[1:13:26] just no AI. You need a human touch when
[1:13:28] you're building this stuff because
[1:13:29] without that you just end up with slop
[1:13:32] and we are not producing slop here.
[1:13:33] We're trying to produce high quality
[1:13:34] stuff and so that's what the QA is for.
[1:13:39] So I'm going to do two things in this
[1:13:42] final section which is I'm going to
[1:13:44] first tell you how to
[1:13:46] there's probably a question in your mind
[1:13:48] here which is let's say I have a
[1:13:50] codebase that I'm working on and it's a
[1:13:53] bad codebase. It's a codebase that's
[1:13:55] like really complicated uh that AI just
[1:13:58] never does good work in and maybe
[1:14:00] actually most humans that go into that
[1:14:01] codebase don't do good work. How what
[1:14:04] how do I improve that codebase? And the
[1:14:07] second thing is I'll show you my setup
[1:14:08] for parallelization.
[1:14:10] So let's go with um bad code first.
[1:14:14] Now where is it? Where's the diagram?
[1:14:17] Here it is.
[1:14:19] In his book um the philosophy of
[1:14:22] software design, John Alistster talks
[1:14:24] about
[1:14:25] the ideal type of module.
[1:14:29] And let's imagine that you have a
[1:14:30] codebase that looks like this. Each of
[1:14:32] these uh blocks here are individual
[1:14:34] files. And these files export things
[1:14:37] from them. You know, they have um things
[1:14:39] that you pull from the files that you
[1:14:41] then use in other things. And so you
[1:14:42] might have these weird dependencies
[1:14:44] where this file over here might rely on
[1:14:46] this file or might rely on that file for
[1:14:48] instance. Now, if these files are small
[1:14:51] and they don't kind of ex like export
[1:14:54] many things, then John would call these
[1:14:57] shallow modules essentially where
[1:14:59] they're not very um they kind of look
[1:15:02] like uh this. If I actually no I can't
[1:15:06] can't make a good diagram of it. They're
[1:15:08] essentially lots and lots of small
[1:15:09] chunks. Now this is hard for the AI to
[1:15:12] navigate because it doesn't really
[1:15:14] understand the dependencies between
[1:15:15] everything. It can't work out where
[1:15:16] everything is. You know it has to sort
[1:15:18] of manually track through the entire
[1:15:20] graph and go okay this relies on this
[1:15:22] one relies on this one. This one relies
[1:15:24] on this one.
[1:15:26] And it's then also hard to test this as
[1:15:28] well because where do you draw your test
[1:15:29] boundaries here? Do you test each module
[1:15:32] individually?
[1:15:35] Like just literally draw a test
[1:15:36] boundary. No, don't do that. Around this
[1:15:39] one and then maybe another test boundary
[1:15:42] around the next one and then the next
[1:15:43] one
[1:15:46] or should you sort of do big groups of
[1:15:48] it? Should you say, okay, we're going to
[1:15:50] test all of these related modules
[1:15:51] together and just sort of, you know,
[1:15:53] hope and pray that they work.
[1:15:57] Now this means that if I think that bad
[1:16:01] tests mostly look like that where the AI
[1:16:04] essentially tries to sort of wrap every
[1:16:06] tiny function in its own test boundary
[1:16:09] and then just sort of test that those
[1:16:11] individually work. But what that does is
[1:16:13] it means that when let's say this module
[1:16:16] over here calls those two. So it depends
[1:16:19] on both of these. Then this module might
[1:16:22] misorder the functions or there might be
[1:16:24] sort of stuff inside that poor module
[1:16:27] that's worth testing on its own. And if
[1:16:29] you then wrap this in a test boundary,
[1:16:31] what do you do? Do you mock the other
[1:16:32] two modules? How does that work?
[1:16:37] So actually figuring out how to um build
[1:16:40] a codebase that is easy to test is
[1:16:44] essential here because if our codebase
[1:16:46] is easy to test then our code our
[1:16:48] feedback loops are going to be better
[1:16:50] and the AI is going to do better work in
[1:16:52] our codebase. Does that make sense? So
[1:16:54] what does a good codebase looks like?
[1:16:56] Look like well not like that. It looks
[1:17:00] like this
[1:17:02] where you have
[1:17:04] what John Asterhout calls deep modules.
[1:17:07] Modules that have a little interface on
[1:17:10] there that expose a small simple
[1:17:11] interface that have a lot of
[1:17:13] functionality inside them. Now
[1:17:18] what this means is that these are easy
[1:17:20] to test because you just let's say that
[1:17:22] there's a dependency between this one
[1:17:23] and this one. My arrow working? Yeah,
[1:17:26] there we go.
[1:17:29] Then
[1:17:30] what you do is you just wrap a big test
[1:17:32] boundary around that one module around
[1:17:34] this one up here. And you're going to
[1:17:36] catch a lot of good stuff
[1:17:40] because there's lots of functionality
[1:17:42] that you're testing and really the
[1:17:44] caller, the person calling the module is
[1:17:46] going to have a simple interface to work
[1:17:47] from. So it's not not too tricky. That
[1:17:50] makes sense. Deep modules versus shallow
[1:17:52] modules. This is good. This shallow
[1:17:55] version is bad. And what I find is that
[1:17:58] unaided
[1:18:00] um or if you don't
[1:18:03] uh if you don't watch AI carefully, it's
[1:18:06] going to produce a codebase that looks
[1:18:07] like this. So you need to be really
[1:18:09] really careful when you're directing it.
[1:18:11] And that's why too is that if we look
[1:18:13] inside the PD,
[1:18:16] uh where is the PR gone? It's inside the
[1:18:18] issues. It's inside the gamification
[1:18:20] system. Uh not found. Of course, it's
[1:18:23] not. Here it is.
[1:18:25] Then I have
[1:18:27] uh inside here data model the modules.
[1:18:32] So it's specifically saying okay this
[1:18:34] gamification service is a new deep
[1:18:36] module which we're going to test around.
[1:18:38] It's going to have this particular
[1:18:41] interface and it's going to have um okay
[1:18:44] we're modifying the progress service
[1:18:46] too. We're modifying the lesson route
[1:18:48] modifying the dashboard roots etc. So,
[1:18:50] it's I'm being really specific about the
[1:18:52] modules that I'm editing and I'm making
[1:18:54] sure that I keep that module map in my
[1:18:56] mind at all times throughout the
[1:18:58] planning and then throughout the
[1:18:59] implementation. That make sense? Very,
[1:19:02] very useful. It's useful for one other
[1:19:04] reason, too. Not only does it make your
[1:19:05] app more testable, but you get to do a
[1:19:08] little mental trick.
[1:19:11] And I'm going to refill my water while
[1:19:13] you wait for what that is.
[1:19:17] Uh, let me
[1:19:20] Let me get a question from you guys. So,
[1:19:21] raise your hands if you feel like.
[1:19:26] Uh, if you feel like you're working
[1:19:28] harder than ever before with AI.
[1:19:32] Yeah. Uh, raise your hands if you feel
[1:19:35] like you know your codebase less well
[1:19:38] than you used to.
[1:19:40] Yeah.
[1:19:43] This is a real thing. um because we're
[1:19:45] moving fast, because we're delegating
[1:19:47] more things, we end up losing a sense of
[1:19:50] our codebase. And if we lose the sense
[1:19:52] of our codebase, we're not going to be
[1:19:55] able to improve it. And we're
[1:19:56] essentially delegating the shape of it
[1:19:57] to AI. I don't think that's good. But
[1:20:00] then how do we
[1:20:03] how do we make it so that we can move
[1:20:04] fast while still keeping enough space in
[1:20:06] our brains? I think that this is a way
[1:20:09] to do it because what you're doing here
[1:20:12] is not only are you thinking about
[1:20:14] creating big shapes in your codebase,
[1:20:16] big services.
[1:20:19] What I think you should do is design the
[1:20:22] interface for these modules, but then
[1:20:24] delegate the implementation.
[1:20:27] In other words, these modules can become
[1:20:29] like gray boxes where you just need to
[1:20:31] know the shape of them. You need to know
[1:20:33] what they do and sort of how they
[1:20:34] behave, but you can delegate the
[1:20:36] implementation of those modules. I found
[1:20:38] this is really nice. I don't necessarily
[1:20:40] need to co-review everything inside that
[1:20:42] module. I don't necessarily need to know
[1:20:44] everything of what it's doing. I just
[1:20:46] need to know that it behaves a certain
[1:20:47] way under certain conditions and that it
[1:20:49] does its thing. So, it's kind of like,
[1:20:52] okay, I've got a big overview of my
[1:20:54] codebase and I understand kind of the
[1:20:55] shapes inside it, understand what the
[1:20:57] interfaces all do, but I can delegate
[1:21:00] what's inside. I found that has been a
[1:21:03] really nice way to retain my sense of
[1:21:05] the codebase while preserving my sanity.
[1:21:08] Make sense?
[1:21:12] And so you might ask, how do I take a
[1:21:14] codebase that looks like this and then
[1:21:18] turn it into a codebase that looks like
[1:21:20] this? How do I deepen the modules? Well,
[1:21:23] we have hopefully it's in here. Pretty
[1:21:25] sure it is. We have a skill and that
[1:21:28] skill is called improve codebase
[1:21:30] architecture.
[1:21:32] Nice and direct.
[1:21:35] Uh let's run it. What this skill is
[1:21:38] going to do is it's essentially just
[1:21:39] going to do a scan of our codebase and
[1:21:41] looking for what's available here. And
[1:21:43] feel free to run this yourself if you're
[1:21:44] um uh running the exercises. And it's
[1:21:49] exploring the architecture, exploring um
[1:21:52] essentially how to work within this
[1:21:53] codebase. and it's going to attempt to
[1:21:57] uh find places to deepen the modules.
[1:22:00] Pretty simple. One really cool um thing
[1:22:04] that it found here is part of my uh part
[1:22:07] of my course video manager app is a
[1:22:09] video editor. A video editor built in
[1:22:11] the browser, which is really hardcore.
[1:22:13] Uh it's a decent bit of engineering. And
[1:22:16] I wanted a way that I could wrap the
[1:22:18] entire front end all the way to the back
[1:22:21] end in like a single big module so that
[1:22:23] I could test the fact that I press
[1:22:25] something on the front end and it goes
[1:22:26] all the way to the back end. And so I
[1:22:28] found a way essentially by using a kind
[1:22:30] of discriminated union between the two
[1:22:32] types here by sort of I was able to use
[1:22:35] this uh skill to essentially have a huge
[1:22:39] great big module that just tested from
[1:22:41] the outside or was testable from the
[1:22:43] outside this video editor
[1:22:45] infrastructure. And it meant that AI
[1:22:47] could see the entire flow, could act on
[1:22:49] the entire flow and test on the entire
[1:22:51] flow. And honestly, it was just night
[1:22:53] and day in terms of the uh ability of AI
[1:22:56] to actually make changes because AI
[1:22:57] working on a video editor is pretty
[1:22:59] brutal if you don't give it good tests.
[1:23:01] So that is honestly I if you take one
[1:23:04] thing away from today, just try running
[1:23:06] this skill on your repo and see what
[1:23:08] happens. Let's go to slider. Let's ask a
[1:23:11] uh check a couple of questions just
[1:23:13] while this is running.
[1:23:15] So let's see. Have you tried claude's
[1:23:17] auto mode with claude enable auto mode?
[1:23:19] Uh that way you can avoid many of the
[1:23:20] obvious permission checks. We'll talk
[1:23:21] about permission checks in a second. Do
[1:23:24] I keep the markdown plans and issues for
[1:23:27] later reference?
[1:23:29] Okay, this is a great question. So
[1:23:34] let's say that you uh have a great idea,
[1:23:38] you turn it into a PR
[1:23:40] raise and you then implement that PRD
[1:23:43] and the PRD is essentially done. Raise
[1:23:45] your hand if you keep that information
[1:23:48] in the repo. So you turn it into a
[1:23:49] markdown file. Raise your hand if you
[1:23:51] want to keep that around.
[1:23:53] Cool. Okay. And raise your hand if you
[1:23:55] if you don't want to keep it around. If
[1:23:57] you want to get rid of it as soon as
[1:23:58] possible. Yeah. This is I think an
[1:24:02] a question that doesn't have a clear
[1:24:03] answer. What I'm really scared of
[1:24:08] with any documentation decision is that
[1:24:11] let's say that we have a PRD for this
[1:24:13] gamification system. We keep it in the
[1:24:14] repo. We go on, go on, go on. Let's say
[1:24:17] a month later, we want some edits to the
[1:24:19] gamification system. And we go in with
[1:24:22] Claude and it finds this old PR and
[1:24:24] says, "Yes, I found the original
[1:24:26] documentation for the PRD system." Well,
[1:24:28] it turns out that the actual code has
[1:24:30] changed so much from the original PRD
[1:24:32] that it's almost unrecognizable. The
[1:24:33] names of things have changed. The um
[1:24:35] file structure has changed. Even the
[1:24:37] requirements may have changed. We might
[1:24:38] have actually tested it with users. This
[1:24:40] is dock rot where the documentation for
[1:24:43] something is rotting away in your repo
[1:24:46] and influencing claude badly or claude
[1:24:49] agents badly. So I tend to not keep it
[1:24:53] around. I tend to get rid of it. And for
[1:24:55] me because my setup uses GitHub issues,
[1:24:58] I just mark it as closed. It can fetch
[1:25:00] it if it wants to, but it's got a visual
[1:25:01] indicator that it's done. So I tend to
[1:25:03] prefer ditching these.
[1:25:07] Thoughts on the beads framework from
[1:25:08] Steve? Uh I've not tested it, but it
[1:25:11] seems like sort of um another way to
[1:25:13] manage Canvan boards and issues. Seems
[1:25:15] uh very good, but I've not tried it.
[1:25:18] Um
[1:25:22] uh let me just quickly check the uh
[1:25:24] setup here. Let's take a couple of
[1:25:27] questions from the room. Anybody got any
[1:25:29] questions at this point about anything
[1:25:30] that we've covered so far, especially
[1:25:31] this last bit? Yes.
[1:25:40] like code. How about migrations? Like
[1:25:43] with migration files, we can also squash
[1:25:45] them off
[1:25:47] >> like database migrations.
[1:25:49] >> Yeah,
[1:25:51] >> I don't know.
[1:25:53] >> I hope that answers your question. I'm
[1:25:54] so sorry. No, no, I think database
[1:25:56] migrations are a different thing because
[1:25:57] you have a sort of running record of
[1:25:59] exactly what changed and it's more
[1:26:01] deterministic and I think
[1:26:04] yeah, it's an interesting analogy. I'm
[1:26:06] not sure. Let's talk about it
[1:26:07] afterwards.
[1:26:08] That's a good way of saying I have no
[1:26:10] idea.
[1:26:11] >> Yeah. Yeah.
[1:26:16] >> Sorry guys. Um I'm just trying to listen
[1:26:18] to this guy's question.
[1:26:30] >> Yeah. The question the question here is
[1:26:33] um should I um in the sort of early
[1:26:37] planning stage be trying to optimize the
[1:26:39] plan? This is something I actually see a
[1:26:41] lot of people doing and it's a really
[1:26:43] good um idea. So when you
[1:26:49] let's go back to the phases. So let's
[1:26:51] say that you have all of these phases
[1:26:53] here
[1:26:55] and you uh you get to the point where
[1:26:58] you've sort of figured out everything
[1:26:59] with the LLM. you understand where
[1:27:01] you're going. You've created this sort
[1:27:02] of journey destination document here.
[1:27:05] How do you then uh like should you then
[1:27:09] try to optimize and optimize and
[1:27:10] optimize that PRD until it's the perfect
[1:27:12] PR you can possibly imagine? I don't
[1:27:15] think there's a lot of value in that
[1:27:18] because I think the journey is really
[1:27:20] just sort of a hint of where you want to
[1:27:22] go and the place that you need to be
[1:27:24] putting the work is in QA and you can
[1:27:27] sort of do that AFK I suppose but in my
[1:27:29] experience you're not going to get a lot
[1:27:30] of juice out of it like it's the the
[1:27:33] thing that really matters is getting
[1:27:34] alignment with the AI which is you do in
[1:27:37] the grilling session initially.
[1:27:40] Let's have one more question. You got
[1:27:41] any more? Yeah. How do you get in your
[1:27:44] workflow to get it to code the way you
[1:27:46] want it to code? So by the time you get
[1:27:48] to code review, it's at least familiar,
[1:27:50] use the libraries you wanted to use.
[1:27:52] >> Yeah. Um, we had this question before
[1:27:54] actually, which was like uh how do you
[1:27:56] uh enforce your coding standards on the
[1:27:59] agent? Essentially, how do you get it to
[1:28:00] code how you want it to code? Now,
[1:28:03] there's essentially two different ways
[1:28:04] of doing it. Um, you've got
[1:28:09] Come on. Push
[1:28:11] and you've got pull.
[1:28:14] What do I mean by push and pull?
[1:28:17] Um, push is where you push instructions
[1:28:20] to the LLM. So you say, okay, if you put
[1:28:23] something in claw.md,
[1:28:25] uh, talk like a pirate, that instruction
[1:28:28] is always going to be sent to the agent,
[1:28:30] right? So that is a push action. You're
[1:28:32] pushing tokens to it. Pull is where you
[1:28:35] give the agent an opportunity to pull
[1:28:38] more information.
[1:28:40] And
[1:28:42] that's for instance like skills. So a
[1:28:44] skill is something that can sit in the
[1:28:46] repo and it has a little description
[1:28:47] header that says okay agent you may pull
[1:28:50] this when you want to.
[1:28:53] My thinking my current thinking about
[1:28:55] code review and about coding standards
[1:28:57] looks like this. when you have an
[1:29:00] implement.
[1:29:03] What's going on? There we go.
[1:29:04] Implementer.
[1:29:06] I'm going to make this less red in a
[1:29:08] second. Um, then you want the coding
[1:29:12] standards to be available via pull. If
[1:29:15] it has a question, you want it to be
[1:29:16] able to sort of answer it. But if you
[1:29:18] then have an automated reviewer
[1:29:21] afterwards, then you want it to push.
[1:29:24] You want to push that information to the
[1:29:25] reviewer. You want to say, "These are
[1:29:27] our coding standards." um make sure that
[1:29:29] this code um follows them. So if you
[1:29:32] have skills for instance, then you want
[1:29:34] to push that stuff to the reviewer so
[1:29:36] the reviewer has both the code that's
[1:29:38] written and the coding standards to
[1:29:40] compare to.
[1:29:42] Hopefully that answers your question. I
[1:29:43] can show you an automated version of
[1:29:44] this as well. Actually, um yeah, let's
[1:29:47] do that now just while it's fresh in my
[1:29:48] mind. I recently um spent
[1:29:54] uh maybe a week or so uh building this
[1:29:57] thing called Sand Castle. And Sand
[1:29:59] Castle is a I was sort of unhappy with
[1:30:02] the options out there for
[1:30:05] um running agents AFK. And what this
[1:30:07] does is it's essentially a TypeScript
[1:30:10] library for running these loops. So you
[1:30:12] have uh a run function that creates a
[1:30:16] work tree um sandboxes it in a docker
[1:30:19] container and then allows you to run a
[1:30:22] prompt inside there. And in that work
[1:30:24] tree then it's just a git branch and you
[1:30:26] have that code and you can then merge it
[1:30:28] later. If I open up
[1:30:32] um there are some really really nice
[1:30:35] ways of viewing this and it essentially
[1:30:37] allows you to run these kind of
[1:30:38] automated loops and allows you to
[1:30:41] parallelize across multiple different
[1:30:43] agents really simply. So I'll go into my
[1:30:46] sand castle file go into main.ts here
[1:30:48] and let's just walk through this.
[1:30:51] So this is kind of like I showed you um
[1:30:54] a sort of version of the Ralph loop
[1:30:56] earlier. This is where we take it from
[1:30:58] sequential into parallel.
[1:31:01] We have here first of all a planner that
[1:31:04] takes in it's has a plan prompt here
[1:31:06] that looks at the backlog and chooses a
[1:31:10] certain number of issues to work on in
[1:31:12] parallel. Remember I showed you that
[1:31:13] canon board where it had all the
[1:31:14] blocking relationships. It works out all
[1:31:16] of the phases. So this one will say okay
[1:31:19] uh let's say we have uh you can ignore
[1:31:21] all this glue code here. This is
[1:31:23] essentially just a set of issues, GitHub
[1:31:26] issues with a title and with a a branch
[1:31:30] for you to work on. And then for each
[1:31:34] issue, we create a sandbox
[1:31:38] and then we run an implement in that
[1:31:40] sandbox passing in the issue number,
[1:31:42] issue title and the branch. This is like
[1:31:43] the loop that we ran just before.
[1:31:46] Then if it created some commits, we then
[1:31:49] review those commits. This is
[1:31:51] essentially the loop. What do we do with
[1:31:54] those commits? We pass those into a
[1:31:58] merger agent
[1:32:01] which takes in a merge prompt, takes in
[1:32:03] the branches that were created, takes in
[1:32:04] the issues, and it just merges them in.
[1:32:06] If there are any issues with the merge,
[1:32:08] you know, with the types and tests and
[1:32:09] that kind of thing, it solves them. And
[1:32:11] this has been my uh flow for quite a
[1:32:13] while now for working on most projects.
[1:32:16] It works super super well. And uh yeah,
[1:32:19] I recommend you check out sand castle if
[1:32:20] you want to sort of learn more. And to
[1:32:23] answer your question properly is that in
[1:32:26] the reviewer
[1:32:28] uh I would push the coding standards in
[1:32:30] the implement I would allow it to pull.
[1:32:33] And I'm actually using uh sonet for
[1:32:35] implementation and opus for um reviewing
[1:32:39] because I consider reviewing sort of I
[1:32:40] need I need the smarts. Then
[1:32:44] any question? Actually, let let me uh
[1:32:46] before we do more questions, let's go
[1:32:48] back here. Okay, where are we at? Okay,
[1:32:53] we're sort of zooming everywhere in this
[1:32:55] uh talk because I'm kind of having to
[1:32:56] run things in parallel. So, let's go
[1:32:58] back to the improved codebase
[1:33:00] architecture. It has finally finished
[1:33:01] running and it's found a bunch of
[1:33:04] architectural improvement candidates. So
[1:33:06] it's got essentially a cluster of
[1:33:08] different modules that are all kind of
[1:33:10] related that could probably be tested as
[1:33:12] a unit. Got number one the quiz scoring
[1:33:14] service. There's some reordering logic
[1:33:17] extraction as well. It has arguments for
[1:33:20] why they're coupled and it has a
[1:33:22] dependency category as well. So local
[1:33:23] substitutable in SQLite within memory
[1:33:26] test DB
[1:33:28] quiz scoring service currently has zero
[1:33:30] test. This is the biggest gap. So this
[1:33:32] is what it looks like when we come back
[1:33:33] of uh improved codebase architecture.
[1:33:37] Okay.
[1:33:39] So we have nominally kind of 17 minutes
[1:33:43] left. I don't know about you, but I'm
[1:33:45] knackered.
[1:33:47] Um I want to
[1:33:50] let let me kind of sum up for you
[1:33:52] because I think we're sort of reaching
[1:33:54] the end of our stamina. I'm going to be
[1:33:56] available for the full time if you want
[1:33:57] to um come and ask me questions. Um, I
[1:33:59] might do one more check of the slider,
[1:34:00] but let's kind of sum up where we've got
[1:34:02] to.
[1:34:04] So,
[1:34:06] this is essentially the flow
[1:34:10] where throughout this whole process,
[1:34:12] we're bearing in mind the shape of our
[1:34:14] codebase. This is not a specttocode
[1:34:17] compiler. This is not an AI that's sort
[1:34:19] of just like churning out code. We are
[1:34:21] being very intentional with the kind of
[1:34:23] modules and the shape of the codebase
[1:34:24] that we want. We are making sure that we
[1:34:26] are as aligned as possible by using the
[1:34:29] grilling session by really hammering out
[1:34:31] our idea. We're not overindexing into
[1:34:34] the PRD. We're not trying to read every
[1:34:35] part of it. We're not thinking too much
[1:34:37] about it even. We're then just turning
[1:34:39] that into a set of parallelizable issues
[1:34:41] which can be worked on by agents in
[1:34:42] parallel. We implement it and we QA and
[1:34:46] code review the hell out of it and then
[1:34:48] keep going back to that implementation.
[1:34:50] One thing I didn't really mention is
[1:34:51] that in the QA phase, what the QA phase
[1:34:54] is for is creating more issues for that
[1:34:56] canon board. So while it's implementing
[1:34:59] even, you can be QAing the stuff and
[1:35:00] going back adding more issues. And the
[1:35:02] canon board just allows you to add
[1:35:03] blocking issues kind of um sort of
[1:35:06] infinitely really. And then once that's
[1:35:08] all done, once you've got code that
[1:35:09] you're happy with, once you've got work
[1:35:10] that you're happy with, then you can
[1:35:12] share it with your team and you can get
[1:35:13] a full review. So this is kind of like
[1:35:16] once you get here, this is kind of one
[1:35:17] developer or maybe a couple of
[1:35:18] developers sort of managing this and
[1:35:21] then it's kind of up to you to figure
[1:35:22] out how to merge it back in.
[1:35:27] Of course, all of this can be customized
[1:35:30] by you. This is just something that I
[1:35:32] have found works. I'm not trying to like
[1:35:34] sell you on a kind of approach here.
[1:35:37] What I recommend if you take one thing
[1:35:39] away from this session is that you
[1:35:41] should head back you should head to
[1:35:42] Amazon and just buy a ton of those old
[1:35:44] books because I mean I just found it so
[1:35:47] enlightening reading them. Uh you know
[1:35:51] preai writing is always like a really
[1:35:53] fun to read anyway and
[1:35:56] I just on every single page I found that
[1:35:58] there was something useful and something
[1:36:00] interesting to to read. So thank you so
[1:36:03] much. Thank you for putting up with the
[1:36:04] heat. Um hopefully your body
[1:36:05] temperatures will reset soon. Uh thank
[1:36:08] you very much.