TubeSum ← Transcribe a video

How to Build Effective Claude Code Agents in 2026

Transcribed Jun 19, 2026 Watch on YouTube ↗
Intermediate 34 min read For: Software developers, AI enthusiasts, and business professionals looking to effectively use AI coding assistants like Claude Code for automation and development.
16.7K
Views
592
Likes
47
Comments
15
Dislikes
3.8%
📈 Moderate

AI Summary

This podcast episode features a conversation between Nate and Cole Medine about how to effectively use Claude Code as a coding agent. Cole emphasizes moving beyond 'vibe coding' to a structured approach where the user acts as a director, focusing on planning, verification, and system evolution. The discussion covers key concepts like the 'dumb zone' of large language models, the importance of context management, and building a harness for reliable, repeatable results.

[0:03]
Be the Director of Your Coding Agents

The main goal is to learn how to be the director of coding agents, creating a system that evolves over time, rather than just using the tool to code.

[0:18]
The Dumb Zone of LLMs

Large language models have a 'dumb zone' where they become overloaded with information. For Opus, this typically starts around 250,000 tokens, leading to obvious mistakes.

[0:31]
Verification Checks Improve Results

Without verification checks, first-pass results might be 65-70% correct. With checks, you can achieve 92% on the first pass.

[0:44]
Real-World Agent Failure

An agent misinterpreted a task and sent an email with a discount code to the entire list, which was not supposed to go out. This highlights the need for strict permissions.

[1:31]
Using Claude Code as a Second Brain

Claude Code can be used as a 'second brain' or 'AIOS' to make a business AI-native, going beyond just coding to automate various business processes.

[9:56]
From Vibe Coding to Directing

The goal is to move from 'vibe coding' (prompting and praying) to a system where you direct the agent for reliable and repeatable results. This involves planning, building, and verifying.

[10:53]
System Evolution is Key

Every time you go through the loop with Claude Code, there is an opportunity to evolve your system. This means improving the way you work with the tool so that next time it's better.

[14:50]
Verification Harness for Self-Checking

A verification harness allows the coding agent to validate its own work. For example, a diagram skill renders a PNG and the agent checks the image for issues like padding or overlap.

[19:48]
Planning is More Important Than Building

With coding agents, you should spend more time planning than building. The success of the agent is dependent on the quality of the plan, which should include goals, success criteria, and validation strategy.

[27:06]
Attention is Scarce: Manage Context

Attention is scarce. Even with a 1 million token context window, the model's performance degrades after 100-200k tokens. You must be careful about what you give it upfront versus what it discovers when needed.

[33:38]
Harness Engineering for Large Tasks

For production-grade work, you need harness engineering—building workflows that orchestrate multiple coding agent sessions to handle larger tasks, avoiding the dumb zone. The Ralf loop is a basic example.

[44:56]
Assume the Agent Will Touch Everything

You must assume that anything the agent can read or touch, it will, even if you never ask it to. This mindset is crucial for preventing database deletions and other security issues.

[47:04]
Using Hooks for Security

Claude Code hooks can be used for security by running code before a tool is invoked, checking if the agent is trying to access forbidden folders or run dangerous commands.

[48:07]
Loopholes in Security Checks

Even with security checks, agents can find loopholes. For example, if you block a delete command, the agent can write a script to do the deletion and then run it.

[51:36]
System Evolution: Every Bug is a Permanent Upgrade

The most important thing is system evolution. Instead of just fixing an issue, use it as an opportunity to improve the system (e.g., add a new rule to CLAUDE.md) so the problem never happens again.

[57:36]
Adversarial Development with Agent Teams

Using agent teams for adversarial development, where one agent plays devil's advocate against another, can help surface problems and ensure robustness.

[60:59]
Top Three Claude Code Features

Cole's top three features are: 1) Hooks (for security and memory), 2) Sub-agents (for research and context extraction), and 3) Skills (for reusable prompts and workflows).

[65:27]
Act as a Product Manager for Claude Code

Think of yourself as the product manager for Claude Code. You don't need to describe how to build something, but you must shape the vision and give the 'why' behind the task.

To effectively use Claude Code, you must move from 'vibe coding' to a structured approach where you act as a director, focusing on planning, verification, and system evolution. The key is to manage context, build security harnesses, and treat every bug as an opportunity for a permanent upgrade.

Clickbait Check

85% Legit

"The title accurately reflects the content, as the podcast provides a detailed framework for building effective Claude Code agents, focusing on planning, verification, and system evolution."

Mentioned in this Video

Study Flashcards (12)

What is the 'dumb zone' in large language models?

easy Click to reveal answer

The point in the context window where the model becomes overloaded with information and starts making obvious mistakes.

0:18

According to the podcast, what is the typical token count where Opus enters the 'dumb zone'?

medium Click to reveal answer

Around 250,000 tokens.

0:20

What is the main difference between 'vibe coding' and 'directing' a coding agent?

medium Click to reveal answer

Vibe coding is prompting and praying, while directing involves a structured approach with planning, building, and verification for reliable results.

9:56

What are the three main steps in the framework for using Claude Code effectively?

easy Click to reveal answer

Plan with context, build out the thing, and have an approach for verifying.

10:43

What is a 'harness' in the context of AI coding?

hard Click to reveal answer

The wrapper around the large language model that includes the tools and context it has access to, defining how it works.

17:47

Why is it important to manage context when using a coding agent?

medium Click to reveal answer

Because attention is scarce; even with a large context window, the model's performance degrades after a certain point (the dumb zone).

27:06

What is the 'Ralf loop'?

hard Click to reveal answer

A basic example of a harness that strings together multiple coding agent sessions to handle a larger task, avoiding the dumb zone.

34:46

What is the key mindset to prevent security issues with coding agents?

easy Click to reveal answer

Assume that anything the agent can read or touch, it will, even if you never ask it to.

44:56

How can Claude Code hooks be used for security?

medium Click to reveal answer

By running code before a tool is invoked to check if the agent is trying to access forbidden folders or run dangerous commands.

47:04

What is 'system evolution' in the context of using Claude Code?

medium Click to reveal answer

Using every issue or bug as an opportunity to improve the system (e.g., adding a rule to CLAUDE.md) so the problem never happens again.

51:36

What are Cole Medine's top three Claude Code features?

medium Click to reveal answer

1) Hooks, 2) Sub-agents, 3) Skills.

60:59

What is the recommended way to think of your role when using Claude Code?

easy Click to reveal answer

As a product manager for Claude Code, shaping the vision and giving the 'why' behind the task.

65:27

💡 Key Takeaways

📊

The Dumb Zone Concept

Introduces a critical limitation of LLMs that many users are unaware of, explaining why performance degrades with long contexts.

0:18
📊

Verification Improves First-Pass Accuracy

Provides a concrete metric (65-70% to 92%) showing the significant impact of verification checks on agent output quality.

0:31
⚖️

System Evolution as a Core Practice

Emphasizes a proactive, iterative approach to improving the AI system over time, turning every bug into a learning opportunity.

10:53
⚖️

Assume the Agent Will Touch Everything

A crucial security mindset that prevents catastrophic failures by acknowledging the agent's potential for unintended actions.

44:56
💡

Act as a Product Manager for Claude Code

Reframes the user's role from a coder to a strategic director, focusing on vision and intent rather than implementation details.

65:27

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

The AI Dumb Zone: Why 1M Tokens Is a Lie

44s

Reveals a critical limitation of AI models that many users overlook, sparking debate and engagement from developers.

▶ Play Clip

Stop Vibe Coding: The Plan-Validate Sandwich

41s

Provides a clear contrast to the popular 'vibe coding' trend, offering a structured approach that appeals to both novices and experts.

▶ Play Clip

Attention Is Scarce: AI Context Management

50s

Delivers actionable advice on managing AI context windows to avoid costly mistakes, highly educational for all AI users.

▶ Play Clip

The Ralph Loop: Multi-Agent Assembly Line

53s

Introduces a cutting-edge concept for orchestrating multiple AI agents, driving curiosity and shares among tech enthusiasts.

▶ Play Clip

[00:00] What would you say by the end of this

[00:01] podcast that everyone will have learned

[00:03] from you?

[00:03] >> The main thing I want to talk about

[00:04] today is how we can be the director of

[00:07] our coding agents. Everyone is hearing

[00:08] nowadays how large language models can

[00:11] support up to 1 million tokens in their

[00:13] context. That's like the Harry Potter

[00:14] book five times over. Large language

[00:16] models have what's called the dumb zone.

[00:18] With Opus right now, it's usually around

[00:20] 250,000 tokens and I feel like it gets

[00:22] into the dumb zone.

[00:24] >> It definitely comes with a false sense

[00:25] of security with people now thinking

[00:27] that they have the million. With coding

[00:28] agents, you spend more time planning

[00:30] than you actually do building.

[00:31] >> Without the verification checks, maybe

[00:32] it's 65 or 70, but now you can get

[00:34] something that is 92 on the first pass.

[00:36] >> If you tell it never to to wipe a

[00:38] database, it's still going to do that.

[00:40] If you don't allow it to delete a

[00:42] folder, it can still write a script to

[00:44] do that.

[00:44] >> Recently, something did happen to us.

[00:46] The agent was trying to be proactive and

[00:48] it actually saw something on its task

[00:50] list, but it misinterpreted it and it

[00:51] ended up sending an email to our entire

[00:53] list with a discount code and it was not

[00:56] supposed to go out. If you have the

[00:57] mindset that anything that the agent can

[00:59] read or can touch, you have to assume

[01:02] that it will, even if you never ask it

[01:03] to, that assumption is what's going to

[01:05] save you from having your database

[01:06] deleted.

[01:10] >> All right, Cole, thank you so much for

[01:12] being here today. I'm so excited to dig

[01:13] in.

[01:15] >> I'm excited to be here. Yeah, thanks for

[01:16] bringing me on to your podcast, Nate.

[01:18] I'm looking forward to this.

[01:19] >> Absolutely. Yeah, it's been a long time

[01:20] since we've talked, so I'm excited to

[01:22] hear what you've been up to and to hear

[01:23] kind of like the sauce that you're going

[01:24] to drop on everyone today. So real

[01:26] quick, what would you say by the end of

[01:27] this podcast that everyone will have

[01:29] learned from you?

[01:31] >> Yeah. So the main thing I want to talk

[01:32] about today is how we can really be the

[01:34] director of our coding agents and

[01:37] specifically cloud code because that's

[01:38] what most people use right now. That's

[01:39] what I use. But really, it's creating

[01:41] that system where you have your your way

[01:44] of working with cloud code that evolves

[01:45] itself over time. And we're going to

[01:47] talk about more than just using it to

[01:49] code. Really, I use my cloud code as my

[01:52] second brain. I like to call it. I know

[01:54] Nate kind of calls it as AIOS. Everyone

[01:56] has their term for it, but really like

[01:57] using cloud code as the tool to make

[01:59] your business AI native. We're going to

[02:00] get into all of that and just some

[02:02] highle strategies that honestly you can

[02:04] start applying today.

[02:05] >> I love that. Yeah, I'm I'm super excited

[02:06] to dig in because, you know, I don't

[02:09] come from a formal software engineering

[02:11] background and I think that I would I

[02:13] would guess that the majority of my

[02:14] audience doesn't either, but obviously

[02:16] with the the products being called Cloud

[02:18] Code, I think a lot of people that I

[02:19] bring that up to who aren't super deep

[02:21] in the AI space, they obviously think

[02:22] that it's a tool that is for coders and

[02:24] you need to understand code in order to

[02:26] use it. So, um I love that framing. And

[02:28] real quick before we jump in, you know,

[02:31] me and you have we've known each other

[02:32] for quite a bit. I feel like, you know,

[02:33] right when I kind of quit my job and

[02:35] started on the space, you were one of

[02:36] the main channels that I followed and I

[02:38] still follow to stay up to date and to

[02:40] to learn about how to work with AI in

[02:42] the right way. And um we've kind of just

[02:45] been able to see each other grow and and

[02:47] you know, check in. So, I'm really

[02:48] excited to dive in, but I wanted to make

[02:50] sure you got a chance to real quick give

[02:51] everyone a quick intro if they haven't

[02:53] seen your channel before on what you do

[02:55] and um

[02:56] >> yeah, what you're up to. Yeah, sounds

[02:58] good. You know, before I give an intro

[02:59] though, I kind of want to share

[03:01] something a little bit about what you're

[03:02] talking about. Like when we first met,

[03:03] it's funny because I I actually remember

[03:05] I had um about 50,000 subscribers when

[03:07] Nate first reached out to me and he had

[03:10] like 10,000 and now it's a little bit

[03:12] different. I have like 200,000. You're

[03:14] you're almost 800,000 now, right? Like

[03:16] it's pretty crazy. Um it's been really

[03:17] cool to see you grow, how fast you've

[03:19] grown. But yeah, we were both like

[03:21] smaller channels at the time. Um so

[03:23] yeah, it's it's been a long time. Wild

[03:25] journey. Uh yeah. Anyway, as far as what

[03:27] I actually do, so like Nate said, I come

[03:29] from a software engineering background.

[03:30] So, I've been an engineer my entire

[03:32] life. Ever since I was eight years old,

[03:34] actually, I I started with this language

[03:36] called Scratch. It's developed by MIT.

[03:38] So, I was just like building video games

[03:40] as a kid, like Super Mario Bros. and

[03:42] Pokemon, like really cliche stuff. Um,

[03:45] but that that's what got me into the

[03:46] world of coding. And so I took that

[03:48] through high school, college, got my

[03:50] bachelor's in computer science and um

[03:52] then I had just like a software

[03:54] engineering job in a Fortune 500 company

[03:56] and it was great but I always wanted to

[03:59] be an entrepreneur. And so when

[04:01] generative AI started to really become a

[04:03] big thing at the end of 2022 with the

[04:05] release of chat GPT you know and it took

[04:08] the world by storm that's when I knew

[04:09] like okay this is where I want to go all

[04:11] in cuz there's like a really big

[04:12] opportunity for software engineers

[04:14] specifically to build agentic

[04:16] applications and so I started doing a

[04:17] lot of that like for my company and for

[04:20] friends with their startups pretty much

[04:21] dedicating all day and all night to it

[04:23] for a very long time like over a year

[04:26] and so it got to the point well I know a

[04:28] year might not feel like a long time but

[04:29] in the AI space a year is a long time.

[04:31] So it it got to the point where like

[04:32] okay I got some things to teach people.

[04:34] So that's why I started my YouTube

[04:35] channel.

[04:36] >> So originally it was like really really

[04:39] technical like I was there like writing

[04:40] line by line. I wasn't even using AI

[04:42] coding assistants back then just showing

[04:44] how to build AI agents with like you

[04:46] know lang chain and langraph at the

[04:48] time. And um now that's evolved to a lot

[04:51] of different things like I do a lot of

[04:52] like focusing on AI coding assistance

[04:54] which is why we're talking about that

[04:55] today. Um, and yeah, I quit my my

[04:58] full-time job like three months after

[05:00] starting my YouTube channel, which I

[05:01] think is about the same for you, Nate.

[05:03] Yeah. U because it's crazy like how fast

[05:05] when you when you do it right and and

[05:07] you're teaching people valuable things

[05:08] like how fast a channel can explode. And

[05:10] so now now what I'm up to is I have my

[05:12] AI community um similar to Nate where

[05:14] I've got course content, weekly

[05:16] workshops that I do. I've also been

[05:18] doing some more enterprise level

[05:19] training. So coming into a team and

[05:21] doing like a 4-hour session, helping

[05:23] them adopt a full system for using AI

[05:26] coding assistance so they can really

[05:27] have as like a standard for the team,

[05:29] you know, get away from Vive coding to

[05:31] really have a structured approach and

[05:32] helping them actually bring that into

[05:34] their existing processes and tech stack

[05:36] and things like that. So that's been

[05:38] pretty awesome. And so like really like

[05:39] that and everything I teach in the

[05:41] community, I'm bringing a lot of that

[05:43] here to what we're going to be chatting

[05:45] about today.

[05:45] >> 100%. Real quick, guys, quick break to

[05:47] tell you about today's sponsor, ClickUp.

[05:49] ClickUp is the software to replace all

[05:51] software, which I think is pretty funny,

[05:53] but very true. If you guys have been

[05:54] following me for a while, you know that

[05:55] I've been using ClickUp for a long, long

[05:58] time. Everything that I do with my team

[05:59] lives in ClickUp. All of our

[06:00] communication, all of our project

[06:01] management, all of our chats, and

[06:03] everything I was doing with my clients

[06:04] back when I was running the agency

[06:05] day-to-day, we were also inviting them

[06:07] to a ClickUp. So, it had replaced Slack

[06:09] for us, and it had also replaced our

[06:10] project management tools. So, if you're

[06:12] already using ClickUp, you have to try

[06:13] this new feature called Brain 2. But if

[06:15] you don't use ClickUp already, then

[06:16] Brain 2 is an amazing reason to try out

[06:18] ClickUp. It's kind of like a

[06:19] supercomputer that can do a ton of cool

[06:21] stuff. And I'll talk about in a sec.

[06:23] They have super agents in here. But you

[06:24] can switch between the different chat

[06:25] models that you probably already use and

[06:27] love. Right here, you can see that I've

[06:28] used Brain myself to look through

[06:30] everything that's going on in our

[06:31] projects and then create me a monthly

[06:33] presentation for the team. So, what that

[06:35] could look like is me asking Brain to

[06:36] create an investor presentation pitch

[06:38] deck for our texttospech startup called

[06:40] Glido. And I told it to just use mock

[06:42] data, but make sure that it's

[06:43] professional and engaging. And just like

[06:44] that, we have the deck, which I can open

[06:46] up full screen right here. We've got the

[06:47] voice AI platform that makes every brand

[06:49] sound human. And as I start to navigate

[06:51] through here, you can see that we also

[06:52] have animations in here. So, it's not

[06:54] just a static, you know, slide deck. We

[06:56] get to actually go through and we feel

[06:58] the animations. And think about the fact

[07:00] that this was just a one sentence

[07:01] prompt. If we really started to put more

[07:02] and more data into this thing, it would

[07:04] be really, really solid. And this right

[07:06] here is just one of the many use cases

[07:08] of Brain 2. So, it's not just a chatbot.

[07:10] Like I said, it can do things and you

[07:12] can build your own super agents in here.

[07:13] And what I think is really cool about

[07:14] the super agents is they're 24/7 agents.

[07:16] You can tag them in ClickUp. You know,

[07:18] you can at message them and they'll wake

[07:20] up and respond to you and they can

[07:21] search through everything. Which is why,

[07:22] in my opinion, it's a lot cooler that

[07:24] ClickUp is doing this compared to

[07:25] something like chucking an OpenClaw or

[07:27] Hermes agent into ClickUp because these

[07:30] agents already have full context and can

[07:32] search through everything. So, right

[07:33] now, because you're watching this video,

[07:35] you can claim this super awesome offer

[07:36] that is on screen right now by using the

[07:38] link in the description. Now, let's get

[07:40] back to the video. Yeah. Well, I am just

[07:42] I'm so glad that that we both took the

[07:44] leap because it's, you know, it's not an

[07:46] easy decision, but um your brain just

[07:49] gets it. And so, it's been great to see,

[07:51] you know, the consistency and what

[07:52] you've been up to. But I think that if

[07:55] you think back to, I don't know, 5 10

[07:58] years ago when people were going out to

[08:00] get their, you know, CS degrees and

[08:02] stuff, it's like that was such a safe

[08:03] bet at the time, you know, and I don't

[08:05] think a lot of people

[08:07] >> were predicting how much how quick that

[08:09] was going to flip as far as like, you

[08:11] know, that graphic of what is AI being

[08:12] applied to and right now it's just

[08:14] majority is coding and software

[08:15] engineering and obviously everything's

[08:16] going to catch up. But um it's just

[08:19] great that you were able to, you know,

[08:20] make that pivot and be ahead of the

[08:21] curve and now now here we are. So um

[08:24] being able to us have this conversation,

[08:27] one of us coming from like a

[08:28] non-technical background completely and

[08:29] one of us coming from a technical

[08:30] background is going to be really cool.

[08:32] So yeah, let's just jump right in.

[08:33] >> Yeah, sounds good. Cool. So for for what

[08:36] I have prepared for today, um you'll see

[08:39] like

[08:40] >> you'll see it shine through that I come

[08:42] from a technical background, but but

[08:44] really what it comes down to is like I'm

[08:46] going to bring these concepts into using

[08:48] cloud code for far more than just coding

[08:50] like I alluded to at the start. And so I

[08:53] think um you know for me like I I really

[08:56] enjoy leaning on my technical expertise

[08:59] because a lot of the ways that you'll

[09:00] use an AI coding assistant for your ops

[09:03] your AIOS your second brain whatever you

[09:05] want to call it um you are going to be

[09:08] borrowing from software engineering

[09:09] principles whether you realize it or

[09:10] not. So a lot of times just as you learn

[09:13] how to use these tools effectively and

[09:14] you're just learning best practices from

[09:16] Nate's YouTube or Anthropics blog or

[09:18] Boris Journey or whoever like they're

[09:20] bringing software engineering principles

[09:21] and a lot of like product management

[09:23] manager principles as well. And so yeah,

[09:26] like some of the examples that I have

[09:28] here um that will cover like they're a

[09:30] little technical. Um but that's really

[09:32] just like to illustrate how how I

[09:35] started using this tool and then of

[09:36] course I'll like generalize things a lot

[09:38] as well and um give some specific

[09:41] examples too.

[09:42] >> Um so if you if you want Nate, I can

[09:44] just like dive right into the first part

[09:45] that we have here. Okay, cool. Yeah. So,

[09:47] I got like just quick over I mean we'll

[09:49] go pretty quick through this because I I

[09:51] want to keep this pretty casual and I

[09:53] know you you do as well, Nate, but just

[09:54] like a few different pillars here of how

[09:56] we can go from simply using cloud code

[09:58] to what a lot of people call vibe

[10:00] coding, you know, prompting and praying

[10:02] where you're you're pulling that lever

[10:04] like a slot machine, getting to the

[10:06] point where we're really directing it

[10:07] and having that system for reliable and

[10:09] repeatable results. And um it really can

[10:12] be simpler than you would think, right?

[10:15] Like most of what people do that you

[10:17] really shouldn't do is you throw in a

[10:20] request and you don't do much of the

[10:23] planning up front or the validation

[10:25] after. Like those are the two things

[10:27] that I really want to talk about here.

[10:28] And that applies to uh writing any code

[10:31] or any kind of application. It applies

[10:32] to evolving your system like as you're

[10:34] creating skills and integrations for

[10:37] cloud code or even just using it to

[10:39] automate things in your business. Um,

[10:41] and so yeah, the approach is you always

[10:43] want to plan with context, build out

[10:46] that thing that you're looking to do,

[10:47] and then have an approach for verifying

[10:49] like as high level as I can possibly

[10:51] keep it. And then the other like kind of

[10:53] golden nugget here is every time you go

[10:56] through this loop with clawed code, any

[10:58] kind of agentic workflow or thing that

[10:59] you're building, there's always going to

[11:01] be an opportunity at the end to evolve

[11:04] your system. And we'll talk about what

[11:06] that means in a little bit here, but

[11:08] like really that comes down to there's

[11:10] going to be something in the way you

[11:11] work with cloud code that you can

[11:13] improve so that next time it's going to

[11:15] be better.

[11:16] >> And I'm being high level here on purpose

[11:19] cuz I'll get into some more examples.

[11:21] But a lot of people don't think about

[11:22] doing this, right? They kind of like get

[11:24] to the point where it's like, okay, my

[11:25] application works. Like this website

[11:27] looks good or it's now able to automate

[11:30] creating invoices, like whatever it is.

[11:31] And they're like, all right, we're done.

[11:32] like let's next time I want to create an

[11:34] invoice, I'm just going to go through

[11:35] the same process again. But like really

[11:37] there are going to be those problems

[11:38] that come up over time where you can

[11:40] engineer so that they happen less often,

[11:44] right? That that system evolution is

[11:45] kind of what I like to call it.

[11:46] >> So you're having you're having it learn

[11:48] just like you would an employee, right?

[11:50] >> Absolutely.

[11:50] >> Yeah. Like my my second brain, I

[11:52] literally call it my co-founder, right?

[11:54] So I want it to like learn me better

[11:56] over time and how I like to work, how I

[11:58] want it to work as well.

[11:59] >> Mh. Yeah. And I think this four-step

[12:02] kind of framework or whatever you want

[12:03] to call it, it yes, when you kind of

[12:06] maybe look at it like this, it might

[12:07] feel like it's a technical software

[12:09] engineering thing, but if you just

[12:10] relate that back to the same way you

[12:12] would maybe like let's just say build a

[12:13] treehouse, like you would plan that

[12:15] thing out first. You would draw it out.

[12:16] You would understand how much wood you

[12:18] need and where, you would get the right

[12:19] gear, and then once you've built it,

[12:20] you're not just going to put your kids

[12:21] on it. You're going to like test it.

[12:23] You're going to make sure that thing's

[12:24] not going to fall. So,

[12:25] >> um it's just a great way to think about

[12:27] it. And especially if you think about

[12:29] some of the the pitfalls that these

[12:31] models have with like the sick of fancy

[12:33] essentially just being a yes man.

[12:35] >> If you say, "Hey, you know, I want to do

[12:37] this. Does that look good?" And they're

[12:38] just going to say, "Yeah, it does."

[12:39] without really looking over the plan.

[12:41] And then

[12:42] >> on the verification side,

[12:44] >> you know, sometimes they do tell you

[12:45] something's done, but it's not. So

[12:47] having your own method of doing that as

[12:48] well,

[12:49] >> really important.

[12:50] >> By the way, guys, I know we are diving

[12:52] into a ton of information in this

[12:53] episode. So, what I did is I broke all

[12:55] of this down into a free resource guide

[12:57] that you can access for completely free

[12:58] by joining the free school community.

[13:00] The link for that is down in the

[13:01] description. Also, if you want to check

[13:02] out some of the key moments from this

[13:04] episode and all future podcasts on my

[13:06] channel, then go ahead and check out the

[13:08] AI Automation Society YouTube channel

[13:10] where we're going to be posting some of

[13:11] the best moments from the podcast over

[13:13] there. I'll link that YouTube channel in

[13:14] the description of this video as well.

[13:16] Anyways, thanks guys. Let's get back to

[13:17] the podcast.

[13:18] >> Yeah, verification really comes down to

[13:20] prove to me it's actually done and

[13:22] working,

[13:22] >> right? Right. And so like for any kind

[13:24] of coding task that's things like unit

[13:26] tests and linting and like that's where

[13:29] it gets a little bit more technical, but

[13:30] like really you can apply that to

[13:31] anything. Um like I this is an example

[13:33] that I'm going to spoil right now. Um I

[13:36] use claw code to generate this entire

[13:38] diagram. Like I have

[13:39] >> I had a feeling you did. Yeah.

[13:40] >> Yeah. Yeah. Yeah. So I have I have a

[13:42] skill. It's my scaladraw diagram skill.

[13:44] I've covered it on my YouTube channel

[13:46] actually. So I use it to build this

[13:47] whole thing. And um I was going to talk

[13:50] about this example a bit more right here

[13:51] when we really get into like verifying

[13:53] the work. Um but I think it's just such

[13:55] a good like non there's nothing to do

[13:56] with coding here. It's just creating a

[13:58] diagram. But as far as far as

[13:59] verification goes, I actually have it

[14:02] take the Excal diagram and render a PNG.

[14:06] So there's like an integration that I

[14:07] built into the skill for Cloud Code. So

[14:09] it can render it as an image. And as a

[14:11] lot of you know, like Cloud Code is able

[14:13] to understand images incredibly well

[14:15] now. for like the last year, it's been

[14:17] so good at um even viewing like a like

[14:19] if I zoom out here, like there's quite a

[14:21] bit of context, but like it can pick out

[14:23] the tiniest piece of text in a larger

[14:25] image like this. And so I have it look

[14:26] at that

[14:28] >> and then figure out like if there's any

[14:30] kind of like padding or spacing issues,

[14:32] like if there's any sort of overlap and

[14:34] and trust me, there was like it had to

[14:35] iterate a couple times to build

[14:37] something this big. Uh but then the the

[14:39] point is like it is able to iterate by

[14:41] itself. So, we don't really care about

[14:43] the initial mess ups that it has. As

[14:45] long as it like does that by itself, we

[14:47] just care about that that last thing it

[14:48] hands back to us when it says it's done.

[14:50] So, if we have this if we have this step

[14:52] when it says it's done, then like it

[14:54] actually is or at least it's closer. I

[14:55] mean, it's still probably not going to

[14:56] be perfect, but you get the idea. Yeah.

[14:58] >> Yeah. 100%. I've done something pretty

[15:00] similar with my video editing pipeline

[15:02] with the motion graphics it adds and

[15:03] sometimes things would be out of bounds.

[15:05] But like you said, the whole idea is

[15:07] it's almost never going to be 100% on

[15:10] that first pass, but without the

[15:11] verification checks, maybe it's 65 or

[15:13] 70, but now you can get something that

[15:15] is 92 on the first pass.

[15:17] >> Right. Exactly. Yeah. Yeah. It's it's

[15:20] good. So I mean verification,

[15:22] validation, whatever you want to call

[15:23] it, like that is one of the biggest

[15:24] things that I'm focusing on right now

[15:27] for any kind of application or

[15:29] automation that I'm creating. I want

[15:31] some kind of harness for the coding

[15:33] agent to be able to validate its own

[15:35] work for code to validate its own work.

[15:37] And for some things like um website

[15:40] design, it's actually pretty easy.

[15:42] There's a lot of tools out there uh

[15:44] maybe you've heard of Playright or

[15:45] Verscell's agent browser um for it to

[15:48] really just spin up the site, right? It

[15:50] can run the command to start the website

[15:52] and then it can visit it just as a user

[15:54] would take screenshots along the way to

[15:57] prove things to you or even just view

[15:58] the the UI itself. It's pretty easy for

[16:01] other kinds of things that you'll build.

[16:03] U it can be kind of hard to have the

[16:06] agent really verify its own work

[16:07] effectively. one like really simple

[16:09] example kind of silly example. Um I in

[16:12] my spare time like I' I've always loved

[16:14] like video games as a kid. I mean like I

[16:16] talked about with Scratch. I mean I was

[16:17] building like Pokemon and and uh Mario

[16:20] Bros and stuff. And so like I've

[16:21] actually like been doing a little bit of

[16:23] just trying to I mean I hate to admit it

[16:24] but Vibe Code video games, right? It's

[16:26] just a hobby. I'm not trying to like do

[16:28] something too crazy and it's more just

[16:29] like having it run in the background for

[16:31] fun. But like one of the things I had to

[16:32] think about is like how do I build a

[16:34] harness for the coding agent to be able

[16:36] to actually play the video game. It's a

[16:38] bit trickier because they can't like

[16:40] coding agents they need time to think,

[16:42] right? So if you have a game that's

[16:43] running at 60 frames per second, it's

[16:45] not really going to be able to react to

[16:47] things the way that a human would. So

[16:49] thinking about a system where it can

[16:50] basically like slow down the frame rate.

[16:52] I know it's kind of like a silly

[16:53] example, but it's just like that's one

[16:54] of the biggest things you have to

[16:55] engineer for for anything is like how

[16:58] would the agent actually verify that as

[17:00] a user would because just like looking

[17:02] at the code it creates or the skill it

[17:03] builds for you like that's not enough

[17:05] for it to just do that sort of like

[17:07] review highle review which is good but

[17:09] like you need to wait for it to really

[17:11] like use the application or whatever

[17:13] you're making as you would.

[17:15] >> Yeah, absolutely. And real quick, for

[17:16] anyone that might not have heard the

[17:18] term harness before, what is your kind

[17:20] of quick definition of that?

[17:22] >> Yeah. No, that's good. I know it gets it

[17:24] gets technical,

[17:26] >> right? Yeah. So, um, usually when people

[17:29] talk about harnesses, they're talking

[17:31] about something more like what I was

[17:32] going to talk about a bit here at the

[17:34] end. Um, so what I'm talking about as

[17:37] far as like validation is more like

[17:40] I mean it's it's kind of I I have to

[17:43] think about like how to actually explain

[17:45] what a harness is really. It's it's the

[17:47] wrapper around the large language model,

[17:49] the tools and context that it has access

[17:52] to. So it knows what it's working on and

[17:55] how to work on it effectively. So if we

[17:58] think of like a harness for AI coding,

[18:00] cloud code is actually a harness, right?

[18:02] like it when you download Claude code

[18:04] and you run it, it loads a system prompt

[18:06] on top of Claude as a large language

[18:08] model. It gives it the tools so it can

[18:10] run commands and create files on your

[18:13] computer. Um, that's what really makes

[18:15] it a harness. And and then when I was

[18:17] giving the example of like a harness for

[18:19] testing, it's more like u giving it a

[18:22] system where it's like, okay, these are

[18:23] the commands I can run to start the game

[18:25] and then like slow down the frame rate

[18:27] so that I can interact with it frame by

[18:28] frame and like really stop and analyze

[18:31] and think before I take another action.

[18:33] So it you can think of it kind of like a

[18:35] so I mean maybe I will just jump ahead

[18:37] here. You can think of the harness as

[18:38] the thing that just wraps the model. And

[18:40] then there's also that that component of

[18:42] the harness that you get to build

[18:44] yourself. I call it the AI layer. And so

[18:46] for cloud code, that's like your

[18:47] claw.mmd and your skills and your hooks

[18:50] and any kind of MCP servers that you're

[18:51] bringing in to connect it to your other

[18:53] platforms like your CRM or your task

[18:56] management software, right? That's

[18:57] that's building on top of the harness.

[18:59] So it's kind of like the large language

[19:01] model is the reasoning. It's it's the

[19:02] brain at the center and then you pick

[19:04] the tool like cloud code or codeex or

[19:07] whatever and then you can sort of like

[19:09] build the context and integrations on

[19:12] top.

[19:12] >> Absolutely. I love it. Yeah, well said.

[19:14] I think something something fun anyone

[19:16] listening should try real quick is if

[19:18] you go to an AI model and ask it to

[19:19] explain an AI harness or an agent

[19:21] harness. I would be willing to bet it

[19:23] does the whole car analogy where the

[19:25] engine is the AI model and the car is

[19:27] the harness. So let me know if you guys

[19:29] run that and and see if that's what you

[19:31] get.

[19:32] >> Sounds good. I mean we could we could

[19:33] test it right now.

[19:36] >> No, we won't we don't need to do it

[19:37] right now. But yeah, that's that's your

[19:38] homework for today.

[19:39] >> Yeah.

[19:40] >> Yeah. Yeah. Cool. Um Yeah. Yeah. So, I

[19:44] mean, we've talked about like validation

[19:46] a lot. Um, planning is the other thing

[19:48] that I really want to hit on cuz most

[19:50] people don't do enough of it.

[19:52] >> And it takes it takes patience. And this

[19:54] is like one of those um software

[19:56] engineering disciplines that I like to

[19:58] bring into um even when I'm talking to

[20:00] someone who's not writing code or who

[20:01] isn't technical is you have to spend I

[20:04] mean with coding agents you spend more

[20:06] time planning than you actually do

[20:07] building because you you really put a

[20:10] lot of your effort up front into the

[20:12] plan and then you use that to delegate

[20:14] as much of the coding as you possibly

[20:16] can or for a lot of us all of the coding

[20:18] to the AI coding assistant. And so its

[20:21] success is really just dependent on how

[20:22] good is your plan. Usually you have some

[20:24] kind of like a lot of people like using

[20:26] markdown, right? I use markdown a lot.

[20:27] So I'll have like a single markdown

[20:29] document that outlines um you know like

[20:32] the goal. What are we building here?

[20:33] What is success actually look like? And

[20:35] like of course with that comes the

[20:36] validation strategy um that we've

[20:38] already talked about. So how does it

[20:40] know that uh the work is done and

[20:42] working well? And then um not to get

[20:45] like too technical here, but especially

[20:47] more for any kind of like coding task,

[20:49] you're going to have like the

[20:50] integration points, right? Like if

[20:51] you're building on top of an existing

[20:53] automation or application or website,

[20:55] whatever, like what are the parts of the

[20:56] codebase that we actually have to touch?

[20:58] And so if you are more technical, you

[21:01] can sort of evaluate like make sure it's

[21:03] understanding it's correct of like,

[21:04] okay, what files are we really going to

[21:06] create and edit here? Not that you need

[21:08] that. Um, and then once you have that

[21:12] plan, then this is kind of what my

[21:14] workflow looks like. And then this is

[21:16] for anything. So you do some kind of

[21:17] like context loading up front, any sorts

[21:19] of like documents that your agent needs

[21:22] related to the task at hand. And then

[21:24] I'll typically have it do some kind of

[21:26] research, usually using sub agents for

[21:28] that. So if I'm building a new

[21:30] application, maybe I'll have one sub

[21:31] agent research what's a good tech stack

[21:33] for this. What's a good like approach if

[21:36] there are people that have built similar

[21:37] applications, right? So like especially

[21:40] if you're not as technical, that can be

[21:41] really useful for it to just gather a

[21:43] lot of information and then propose a

[21:45] plan to you. And so that's when you you

[21:47] create the plan with the coding agent.

[21:49] This is also where usually you want to

[21:51] have the coding agent ask you a lot of

[21:53] questions. Like I know Nate, you just

[21:54] put out a video today on uh Matt Poc's

[21:56] grill me skill, which is really good.

[21:59] Like you need to make sure that you that

[22:01] the coding agent is not assuming a ton

[22:03] of things about what you want it to do,

[22:05] like the workflow you want to build, the

[22:06] skill you want to build, whatever. And

[22:08] so having it ask you a lot of questions,

[22:10] clarify those things is good. So that

[22:12] way you can be confident that once you

[22:14] have that final plan like this is about

[22:16] this is what we're going to go and do

[22:18] now that both you and the coding agent

[22:20] are aligned on what's actually going to

[22:22] be done and and how you're going to

[22:23] validate it.

[22:25] >> Absolutely. Yeah. I love it. When you do

[22:27] that, are you typically using in cloud

[22:29] code plan mode or are you kind of

[22:30] planning but not in plan mode?

[22:33] >> Yeah, usually I don't use plan mode.

[22:35] Okay. It's It's good, but plan mode like

[22:38] puts Claude code into a bit of a

[22:39] different behavior that I'd rather be

[22:40] able to control my control more myself.

[22:43] So, my skill for planning is like

[22:46] instructions for how I want it to ask me

[22:48] questions and then just like generally

[22:49] how I want to go about researching and

[22:52] organizing things into a plan.

[22:54] >> Yeah.

[22:54] >> And so, like I want to define the

[22:56] sections. If you don't, then you're just

[22:58] using Cloud Codes plan mode. Like it'll

[23:00] build something actually pretty much

[23:02] like this. But I just like having that

[23:05] more um that that higher level of

[23:07] control. I think that's a theme that you

[23:09] get a lot through my content in general

[23:11] is that I I like to have control and

[23:13] customizability cuz in the end that's

[23:15] how you get the best results. It's just

[23:16] it's kind of like that learning curve to

[23:18] get to the point. Um like for example, I

[23:20] I don't use OpenClaw or Hermes. I have

[23:23] my own second brain that literally is

[23:24] just built directly on top of Clawed

[23:26] Code. And I'm a big proponent of that

[23:28] even though those other open source

[23:30] tools are very powerful because you're

[23:32] running something that you don't

[23:33] understand and it's harder for you to

[23:35] like really take as your own and it's

[23:37] not like a foundational component that

[23:39] you can create your own system on top

[23:41] of. So you're more like adopting someone

[23:43] else's system. And these tools have done

[23:45] a really really good job making it easy

[23:47] to extend and and really make your own.

[23:50] But like in the end building something

[23:51] from the ground up is always going to

[23:53] give you the most control even though

[23:54] that can be pretty daunting. Yeah, I

[23:57] hear you. Yeah, that's interesting. I

[23:58] mean, it it really does make sense. I

[24:00] always love, you know, that's something

[24:02] I just say a lot, which is a very simple

[24:04] theory is just to be genuinely curious

[24:07] to understand what's going on,

[24:08] especially when I don't understand what

[24:12] these lines of Python code that it that

[24:13] just got written mean, you know, and the

[24:15] whole idea of dark code.

[24:18] >> And I guess what do you think about that

[24:20] whole idea because I know you talk a lot

[24:21] about vibe coding and and preaching

[24:23] understanding things at their core. So

[24:25] when someone is generating automations

[24:28] or code that they don't

[24:31] understand how to read,

[24:33] >> yeah,

[24:33] >> how do they actually feel secure and

[24:35] safe about that?

[24:37] >> Yeah, that's a really good question. So

[24:39] >> pretty loaded, too.

[24:40] >> No, I'm No, that's that's good. I I

[24:42] welcome it. So I I'll answer in two

[24:44] ways. I'll answer first by saying that

[24:47] like maybe not everyone loves to hear

[24:49] this but like if you are using an AI

[24:51] coding assistant to write code cuz

[24:53] you're building your second brain you're

[24:55] creating automations whatever it is I

[24:57] would recommend at least trying to get

[25:01] to the point where you can understand

[25:02] the code and really at first that can be

[25:04] as simple as just asking cloud code or

[25:07] whatever coding agent to explain what it

[25:09] just wrote because code can look pretty

[25:12] intimidating but when you get over that

[25:15] like initial hump like it kind of reads

[25:18] like English and maybe that's just me

[25:20] being extremely ignorant because I've

[25:21] lived and breathed it since I was eight

[25:23] years old but it starts like as long as

[25:25] you understand the core primitives of

[25:26] like this is a class this is a while

[25:28] loop this is a if statement like it

[25:30] starts to read like English you're like

[25:31] okay I understand when this part of the

[25:33] code is going to execute now just asking

[25:36] your coding agent constantly and so um I

[25:39] mean like in cloud code there's the

[25:40] slashby the way feature so like you can

[25:43] always just kind of a sidecar

[25:44] conversation where it's like, "Hey, help

[25:46] me understand like what the heck is

[25:47] going on right here." And then it

[25:48] doesn't have to to dilute your main

[25:50] context and just kind of like keep

[25:53] throwing context at at Claw Co. Like you

[25:55] can have that separate conversation for

[25:56] your own understanding and then go back

[25:58] to the main task at hand without it

[26:00] being affected. So I would recommend

[26:02] that. And then you know if someone is

[26:04] really not inclined to learn how to code

[26:07] like that's just not your goal. You want

[26:09] to use cloud code to automate things and

[26:11] not have to like engineer applications.

[26:13] I totally get that as well. Really comes

[26:15] down to your validation strategy is

[26:18] what's going to dictate how confident

[26:19] you can really be and what is created.

[26:21] So if you're spending a lot of time in

[26:23] this is why I say like whenever you're

[26:24] building something with cloud code, the

[26:26] way that you don't vibe code is that you

[26:28] sandwich the delegation of the coding

[26:32] between the planning and the validation

[26:34] process that you're heavily involved

[26:36] with. Right? Like the only reason I'm

[26:39] ever going to say, "All right, Claude,

[26:40] go rip through this." is because I made

[26:42] sure I created a really detailed spec

[26:44] and I've defined like this is how you're

[26:46] going to tell me that you're done and

[26:48] how you can be confident that you

[26:50] actually are.

[26:52] >> I love it. Very well said. Nothing to

[26:54] nothing to add there.

[26:55] >> Cool. All right. Sounds good. Yeah. Um

[26:58] Yeah. And as far as like creating that

[27:00] plan with the coding agent, the most

[27:03] important thing is to manage the context

[27:06] like what your coding agent is going to

[27:08] really be paying attention to at the

[27:11] start of any kind of planning session.

[27:13] So the the thing here is that attention

[27:15] is scarce. And so there's a big

[27:18] misconception right now for a lot of

[27:20] people where they think that like it

[27:22] doesn't really matter how much you throw

[27:23] at a coding agent because everyone is

[27:25] hearing nowadays how like large language

[27:27] models can support up to 1 million

[27:29] tokens in their context when they're

[27:31] like oh that that's like the Harry

[27:33] Potter book five times over I forget the

[27:35] exact but people like always throw like

[27:38] some some analogy where it just like

[27:40] makes it pretty obvious where it's like

[27:41] 1 million tokens is an insane amount of

[27:43] information and it actually is but

[27:46] there's two massive caveats here. The

[27:48] first one is that that context will go

[27:50] way faster than you think because if

[27:52] it's reading through um a bunch of

[27:55] skills that you set up for it or a bunch

[27:56] of code that can be tens of hundreds of

[27:59] thousands of tokens very quickly and

[28:01] then the other thing is uh large

[28:04] language models have what's called the

[28:05] dumb zone. And so you have the the

[28:09] little bit of context up front. Maybe I

[28:10] can just draw like a quick little

[28:12] analogy here. So if like this is Oh,

[28:14] that is a fat marker. Um, hold on. Okay,

[28:18] I I give up already. I'm not going to

[28:20] try that. Okay, so you you have to

[28:22] imagine this with me here, but imagine

[28:23] you have a box that represents the the

[28:26] LLM's context window. You have that

[28:28] initial part at the start of the

[28:31] conversation up to the first, you know,

[28:32] 100 or 200,000 tokens where the large

[28:34] language model feels very sharp or at

[28:36] least it feels like it's at its best.

[28:39] Once the conversation surpasses that

[28:41] first 100 200,000 tokens, obviously it

[28:44] uh depends on the model when you reach

[28:46] the dumb zone, you get to the point

[28:47] where it just feels like it's overloaded

[28:50] with information and it starts missing

[28:51] things and making mistakes that seem so

[28:54] obvious to you or like the kind of thing

[28:56] where you're like, if I had had a fresh

[28:58] context here, like there's no way it

[29:00] would have made that mistake. Like it

[29:01] writes a really bad line of code or it

[29:04] uh doesn't use a skill that you thought

[29:07] it should have known to use. right? Like

[29:09] that kind of thing if it's in the middle

[29:10] of a larger workflow.

[29:12] >> And so that that's why I say attention

[29:14] is scarce. Like don't don't get under

[29:16] that false notion that you don't really

[29:18] have to care about how much you give it.

[29:20] Like if you're trying to have it handle

[29:22] a larger workflow, you still have to you

[29:25] have to be very careful like what you

[29:26] give it up front versus what you allow

[29:28] it to discover when it actually needs.

[29:30] And like that's one of the most

[29:31] important things with skills with Claude

[29:33] is you're giving it procedures and best

[29:35] practices, but it gets to decide like,

[29:37] okay, now I need to rely on this process

[29:40] or this information. So you're not just

[29:42] dumping a bunch of things up front. A

[29:44] lot of people do that. Like even with

[29:46] MCP servers back in the day, they would

[29:48] they would connect their like 20 MCP

[29:51] servers to cloud code and each one of

[29:52] them was was uh filling the context with

[29:55] like 20,000 tokens up front of

[29:58] information because it has like all the

[30:00] tool calls or the tools that come with

[30:02] the MCP server. And so their large

[30:04] language model would always act super

[30:06] dumb. And so they're like, I'm using the

[30:08] latest opus. Like why am I getting

[30:10] terrible results? And it's really it

[30:11] comes down to just how much of the

[30:14] context is filled right away. Yeah. Oh

[30:16] my gosh. It drives me nuts. It It truly

[30:18] drives me crazy when you hear people

[30:21] blaming the model when it really is kind

[30:23] of a skills problem. And we see this at,

[30:28] you know, when you look at these studies

[30:30] and surveys too about business adoption

[30:32] >> where it really is these people either

[30:35] have not yet felt the ROI because they

[30:38] can't they don't know enough about how

[30:40] to use it truly,

[30:42] >> right? And also people claiming that

[30:44] they have the skills to, but they're

[30:46] just not doing it. And like the adoption

[30:48] is then another problem. But I mean,

[30:50] obviously I'm not doing heavy heavy

[30:53] coding, building software and and apps.

[30:56] But, you know, we're doing some pretty

[30:57] cool things and I've seen some people do

[30:58] some really awesome things and it's just

[31:00] >> yeah,

[31:00] >> there's a lot of things like you know,

[31:02] if you kind of think about your your

[31:03] diagram that you had, you got the model

[31:05] in the middle, you got the agent

[31:06] hardness around that and then obviously

[31:08] a huge layer is what you put in there as

[31:10] well and the way that you manage your

[31:12] stuff. And I think that the 1 million

[31:14] context window specifically for you know

[31:16] like let's just say Opus 4.8 at the

[31:17] moment. Obviously, it's great, but it

[31:19] definitely comes with a false sense of

[31:21] security with people now thinking that

[31:23] they have the million, but when, and I

[31:27] know this might be outdated by next

[31:29] month or two months away, but let's say

[31:31] right now when you're in cloud code,

[31:32] >> when do you typically

[31:34] >> do your compact or a session handoff and

[31:37] clear and when do you get out of there?

[31:40] Yeah. So, with Opus right now, it's

[31:43] usually around 250,000 tokens, and I

[31:46] feel like it gets into the dump.

[31:47] >> That's my exact number, too.

[31:48] >> Oh, really? Okay. Yeah. Yeah. Good.

[31:50] Cool. So, and that, by the way, is like

[31:51] really subjective. Like, I'm not going

[31:53] to um bet million dollars on on like the

[31:58] on Boris Churnney or someone saying

[31:59] like, "Yeah, it's also 250,000." Like,

[32:01] >> quarter million is just clean, right?

[32:03] >> Yeah. It just it sounds good and it is

[32:05] like pretty accurate. I would say like

[32:06] Opus 4.7 was around like 200,000 and

[32:10] then like Sonnet 4.6 is like honestly

[32:13] probably only like 100 125,000. Um like

[32:16] it as you go to these smaller models

[32:19] like the dumb zone becomes a pretty

[32:21] small amount of context relative to like

[32:23] what it theoretically can handle. You

[32:25] just never want to get to that point. So

[32:27] then with the dumb zone thing, I've also

[32:29] heard stuff about the model being really

[32:31] good at remembering things that are at

[32:33] the front and the very end and the

[32:34] middle is where it loses. So where does

[32:36] that play into the whole dumb zone

[32:37] conversation?

[32:39] >> Yeah. So basically that issue is just

[32:42] amplified the more you get into the dumb

[32:44] zone. Yeah. And um yeah, as far as like

[32:46] I mean we don't have to get into like

[32:47] the super technical details for how the

[32:49] attention mechanism works for LMS, but

[32:51] yeah, you can think of I mean like the

[32:52] analogy I always like to use is the

[32:54] needle in the hay stack problem. Yeah,

[32:55] >> like if you have that like little piece

[32:57] of information that you want the agent

[32:59] to remember in the middle of a massive

[33:01] conversation, it's like trying to find a

[33:03] needle in the hay stack. Like you can't

[33:04] expect the model to just because of the

[33:06] way that large language models are

[33:08] engineered. Um you can't expect it to

[33:10] like always be able to pick out that

[33:11] little piece of information.

[33:13] >> 100%. Yeah.

[33:15] >> Yeah. I wish you could. That would be

[33:17] nice if there wasn't a such thing as a

[33:18] dumb zone. It would make it much more

[33:21] convenient for us to hand it massive

[33:23] tasks and let it just rip through

[33:24] things. But a lot of the reason we have

[33:26] to create a harness and like a lot of

[33:28] the things I'm focusing on right now on

[33:30] my channel and just like generally what

[33:32] I'm building is creating harnesses that

[33:35] build a workflow that can bind multiple

[33:38] coding agent sessions together. And so

[33:40] basically it's like one model does the

[33:42] planning and then my orchestrator will

[33:44] like automatically take that handoff

[33:47] document like the plan and then feed it

[33:49] into another agent for implementation.

[33:51] And then when the implementation is

[33:52] done, it'll create like an execution

[33:54] report and then it'll hand that off to

[33:56] the next agent to validate things and do

[33:58] a code review. And it might sound like

[34:00] like that's a lot of engineering and it

[34:02] is, but it's very necessary right now

[34:04] because if you're trying to do any kind

[34:06] of like real work for like production

[34:08] grade software or building an automation

[34:10] that's like critical for your business,

[34:11] you can't just throw the whole thing at

[34:13] a single cloud code session unless you

[34:16] can like confidently build it in that um

[34:18] that zone that you have before you get

[34:20] to the dumb zone. And most of the time

[34:21] you just can't do that or at least you

[34:23] can't really trust that's going to be

[34:24] the case because you never know how much

[34:25] it's going to have to iterate on

[34:27] something. And

[34:27] >> so that's why I'm really like I guess

[34:30] you could say bullish right now on um

[34:32] harness engineering which is like

[34:34] building a the workflow that uh

[34:37] orchestrates many coding agent sessions

[34:39] to handle a larger task. And like a

[34:42] really basic example of that kind of

[34:44] harness is the Ralph loop. It went like

[34:46] super viral at the start of this year.

[34:48] Um so I feel I feel like even if you

[34:50] haven't heard too much about harness

[34:51] engineering you probably have at least

[34:52] heard of the Ralph loop. And that's like

[34:54] really like the foundation of that kind

[34:56] of harness, right? Like the Ralph loop

[34:58] is stringing together multiple coding

[35:00] agent sessions. I I wish I had uh one of

[35:02] my diagrams up for this right now. I'll

[35:04] just have to explain it verbally but

[35:06] like you know basically you have the

[35:07] first cloud code session read in your

[35:10] larger spec for like a bigger automation

[35:12] you want to build and then um it'll

[35:15] define like the the task list like first

[35:18] phase is this second phase is this and

[35:20] then it'll have many coding agents

[35:22] handle one phase at a time but it'll

[35:24] like do it all automatically in a loop.

[35:26] That's why it's called a Ralph loop cuz

[35:28] like agent one will do phase one and

[35:29] then it'll write up its little report

[35:31] like its handoff to the second agent

[35:32] that'll continue the work. And like the

[35:35] main reason the Ralph loop matters is

[35:37] because it you can't have one agent

[35:40] handle that larger task without it

[35:42] getting into the dumb zone and like you

[35:44] know halfway through phase two, right?

[35:46] Like you have to break things up.

[35:47] >> Yeah. So it sounds like from like a a

[35:50] high level view the idea or kind of the

[35:53] mindset that you've got like this

[35:55] assembly line and you have an agent

[35:58] doing something. Each agent kind of does

[35:59] one thing really well and hands over

[36:02] their input to the next agent in a way

[36:04] where

[36:05] >> the agent has enough context to

[36:06] understand what has been done and what

[36:08] is left to do and what its current job

[36:10] is.

[36:12] >> Yeah, exactly. Yeah, assembly line is a

[36:14] a really good analogy and um I mean that

[36:17] that applies to a lot more than than

[36:20] just writing code. Um like like one

[36:22] example that comes to mind when I think

[36:24] about like cuz I I know that I've been

[36:26] talking about like coding as an example

[36:28] for a lot of things, but um I I work

[36:30] with a lot of companies that are in sort

[36:32] of like the like B2B side of things. And

[36:36] when you're B2B, like you do a lot of um

[36:40] creating quotes, like estimates, right?

[36:42] like you have um construction company or

[36:45] uh like I've worked with companies in

[36:47] the print industry where like they'll

[36:48] have like a request for like all right

[36:50] make me like 100,000 flyers or whatever

[36:52] and like for those companies one of the

[36:54] biggest opportunities for them to use AI

[36:56] is to use something like Claude to help

[36:59] them take in a request and automatically

[37:02] create an estimate like a quote for how

[37:05] much that uh job's going to cost

[37:07] >> cuz that's like a really really

[37:09] laborious job like more than you would

[37:11] think Like when I when I've talked to

[37:12] these companies, like it's crazy how

[37:14] much work goes into that because they

[37:16] have to like take the request and they

[37:18] have to understand like how much labor

[37:20] goes into you know parts obviously like

[37:21] depending on the industry and then they

[37:23] have to do research on like the latest

[37:24] prices for things and making sure

[37:26] they're getting it from the right

[37:27] vendor. Like there is so much that goes

[37:28] into that and so like that kind of thing

[37:31] u is it's like a really good example

[37:33] like nothing to do with creating code

[37:35] still using something like cloud code.

[37:37] You can use coding agents for this to go

[37:39] through that larger workflow of like

[37:41] looking at their inventory, looking at

[37:43] prices, comparing vendors, um, all based

[37:46] on what's going to be needed to

[37:47] accomplish that task like that remodel

[37:50] that the 100,000 flyers for whatever

[37:52] that request is from the other company

[37:55] and then creating that estimate and then

[37:57] understanding how the company works like

[37:59] what kind of padding they want on top of

[38:01] u based on the the labor and the cost

[38:03] for the parts or whatever. Like there's

[38:05] a lot that goes into that. And so like

[38:07] that's the kind of thing where like

[38:08] you'd build a workflow where you have

[38:10] one agent that's going to research

[38:12] inventory, one agent that's going to

[38:14] look at prices and and compare prices

[38:16] for parts, and then one agent that's

[38:18] going to draft the PDF, and then maybe

[38:20] another one that's going to make it look

[38:21] good. I mean, I'm kind of stretching the

[38:22] example here, but you get the idea of

[38:24] like you you actually don't have just

[38:25] one agent handle the entire thing for

[38:27] something that big. And you are going to

[38:29] be doing a lot of planning, right? like

[38:31] you're going to plan, you're going to

[38:33] have a validation at the end like what

[38:35] kind of calculations can I do at the end

[38:36] to make sure that like this job uh has

[38:38] the the margin that we want on it for

[38:40] example.

[38:41] >> Yeah. Yeah. And I think I think back to

[38:45] one of our biggest failures back when I

[38:48] was still kind of in the day-to-day of

[38:49] running an agency was that exact use

[38:52] case was having to look through tons and

[38:55] tons of examples, past quotes, past

[38:57] client work, past proposals, and and

[39:00] needing to generate these quotes with so

[39:02] many different factors that go into it.

[39:04] And that was one of our biggest failures

[39:06] because me personally, I underscoped

[39:08] that build. And we went into it not

[39:10] realizing how much actually is necessary

[39:13] to get to an accurate quote. So that was

[39:15] a great lesson for me to learn not only

[39:18] about the importance of asking enough

[39:20] questions and scoping, but just

[39:22] >> in the way that you split up the work.

[39:25] And I think, you know, obviously Cole

[39:26] mentioned he's he's talked a lot of

[39:28] these examples have been kind of around

[39:29] coding, but I don't really do much

[39:32] coding. I mean, at the end of the day,

[39:33] these automations are code. So yes, it's

[39:34] coding. Yeah. But

[39:36] >> I'm not doing like

[39:37] >> software. I'm not building products, but

[39:39] every one of these theories that we

[39:41] talked about in these mindsets and

[39:42] frameworks has, you know, directly

[39:44] applies to the knowledge work is kind of

[39:45] what I like to call it of of what I do

[39:47] on the day-to-day and what probably most

[39:48] of you guys need to do. That gives you

[39:50] an insane amount of leverage right away

[39:53] in cloud code. And I think that

[39:55] >> when you think about your job or you

[39:58] think about some of your

[39:59] responsibilities, it's not just one

[40:01] responsibility. it is. You can drill

[40:04] that down into so many little subtasks

[40:05] like Cole just said like one agent does

[40:07] the research, one agent does the PDF

[40:09] generation, all these little strings of

[40:12] subtasks that flow up together to

[40:14] actually make the overall responsibility

[40:16] which might be 10 little tasks that get

[40:19] strung together. So when you can

[40:21] actually break down a process by just

[40:23] writing it down or or you know flowing

[40:26] it out on on a piece of paper,

[40:28] >> it makes things a lot more clear,

[40:30] >> right? Yeah. Yeah. And one thing I want

[40:33] to say here is that a lot of people they

[40:34] want to simplify it down to just using

[40:36] sub aents. So like for this this larger

[40:39] workflow, what if I just have my main

[40:41] cloud code dish out a bunch of tasks to

[40:44] sub aents? Like that can work for some

[40:46] things. I do love using sub aents

[40:48] especially when I'm initially planning

[40:50] any kind of automation or or uh

[40:53] application,

[40:54] but it's hard to really make those

[40:56] communicate well with each other. Like

[40:58] we've talked a lot about handoffs here.

[41:00] A lot of times one agent when it's

[41:02] taking that next step in a workflow it

[41:04] has to understand the work that was done

[41:05] with the by the previous one whether

[41:07] that's work you know actually writing

[41:09] code or if it's just doing research or

[41:11] if it's pulling information from your

[41:13] CRM for example like it has to have that

[41:15] kind of handoff document and it's really

[41:17] difficult to um do that well with sub

[41:20] agents claude has tried their hand at

[41:24] doing something with agent teams so they

[41:26] they that's kind of like the step above

[41:27] sub aents where they can really

[41:29] communicate with each other but uh that

[41:31] is like really unrefined. It's a really

[41:32] good idea but it's really unrefined and

[41:34] it's very expensive like tokenheavy.

[41:36] >> Yeah.

[41:36] >> And so yeah like and that's actually

[41:38] what I'm working on. So there's a open-

[41:40] source project that I'm working on

[41:41] called archon and that's really the

[41:43] problem it's solving is how can we more

[41:45] like the word I use is deterministic

[41:47] like how can we build the AI model like

[41:51] build cloud code into a system instead

[41:53] of having cloud code trying to

[41:55] orchestrate everything because that's

[41:56] when it becomes difficult for

[41:58] communication and everything becomes

[42:00] very tokenheavy right so like the way

[42:02] that I like to put it is we want to um

[42:05] pick when the AI model works in a

[42:08] workflow instead of having it drive the

[42:10] whole thing.

[42:11] >> Mhm. Yeah. Yeah. How do you make such an

[42:16] autonomous non-deterministic system as

[42:18] deterministic as possible?

[42:20] >> Pretty much. Yep. Yep. As deterministic

[42:21] as possible. I wish I could say make it

[42:23] deterministic, but that is never going

[42:24] to happen. Unfortunately, that is

[42:26] fundamentally impossible.

[42:28] >> Yeah.

[42:28] >> I love it. Yeah. Completely agree with

[42:29] you there.

[42:31] >> Cool. Yeah. Um, so I mean really we

[42:34] we've talked about most of the other

[42:35] things I have in the diagram here. Like

[42:37] we've talked about verification, making

[42:39] sure that it's able to check its own

[42:42] work and um yeah, I mean like the the

[42:45] main thing here is we don't really care

[42:46] about what it does it f on its first

[42:49] pass. If we build a system where it's

[42:50] able to iterate, that's all we really

[42:52] care about as long as it doesn't take

[42:53] billions of tokens to get to that final

[42:55] stage. But like when I'm whenever I'm

[42:58] using cloud code for something, I'm

[43:01] never optimizing for speed. I mean, at

[43:04] least like I don't want it to be

[43:05] unrealistically slow, but any kind of

[43:07] task I have for it, I don't really care

[43:09] if it's something that I have to uh have

[43:11] it work through for a half hour or an

[43:13] hour and a half. Like, I'll send off

[43:14] that request and then I'll just go to

[43:16] another Cloud Code session for whatever

[43:18] else I have to work on or I'll do

[43:20] something, believe it or not, without an

[43:21] agent for a little bit, like if I have

[43:22] to uh record a video. Um, well, I mean,

[43:26] maybe I'm using an agent in the video,

[43:27] but you get the point. But anyway, like

[43:28] the the point is that I don't really

[43:30] care how long it takes because I just

[43:32] care getting the best results possible.

[43:35] >> Um, and so yeah, that's why like I I

[43:37] spend a lot of time engineering systems

[43:40] for coding agents to check their own

[43:42] work, whether it's browser automation

[43:43] for a website or they silly example I

[43:46] gave earlier, like a way for it to sort

[43:47] of like play a video game as a human

[43:49] would. And that's like a really

[43:51] fascinating problem for me to solve

[43:52] right now. It's just like that

[43:53] verification layer at the end for a

[43:55] coding agent, which um also extends to

[43:57] things like security as well. And so

[44:00] like that's not something as interesting

[44:01] to talk about right now, but like

[44:02] security is pretty important to me. It's

[44:04] something that um vibe coders get very

[44:06] burned for. I mean, you hear those

[44:07] horror stories like at least once a

[44:09] month

[44:09] >> of uh you know like their superb base

[44:12] private or secret key getting leaked in

[44:14] their uh JavaScript files and things

[44:16] like that because they're just

[44:17] completely vive coding. Like I mean

[44:19] that's like the simplest example but

[44:21] yeah like that kind of part of

[44:22] verification is really important as

[44:24] well.

[44:25] >> Yeah. And on that whole element of

[44:28] security and

[44:30] >> what could go wrong when you think about

[44:32] sort of like the permission layer that

[44:34] you're putting around your agents. I see

[44:36] a lot of false sense of security once

[44:39] again where people think that their

[44:42] prompts

[44:44] are a good enough permission layer when

[44:46] really that permission layer needs to be

[44:48] >> scoped keys or you actually can't touch

[44:51] this at all because I think I was

[44:53] talking to my team and

[44:54] >> we kind of got to this conclusion of

[44:56] >> if you have the mindset that

[44:58] >> anything that the agent can read or can

[45:01] touch it will like you have to assume

[45:03] that it will even if you never ask it to

[45:05] That assumption is what's going to save

[45:07] you from having your database deleted.

[45:10] >> Yes. And that's funny you bring up that

[45:12] example specifically because I was just

[45:14] about to say like if you tell it never

[45:16] to to wipe a database, it's still going

[45:18] to do that.

[45:19] >> Mhm.

[45:19] >> Like there was a story that went viral

[45:21] um like a month or two ago was someone

[45:24] like really high up at Meta that had

[45:26] their database. I'm still not convinced

[45:28] that's real. I feel like they might have

[45:29] been I don't know cuz people get so much

[45:31] attention when they have stupid stories

[45:33] like that. But but anyway,

[45:34] >> conspiracy theories with Cole,

[45:36] >> right? Yeah. But like it is it is

[45:39] definitely possible and I do know some

[45:40] stories of that actually happening just

[45:42] to a smaller extent.

[45:43] >> It just feels so weird that or it sounds

[45:45] so stupid that it's like their actual

[45:47] production database was wiped. But I

[45:48] mean even if you have a test database

[45:50] wiped, it can still be a bummer if that

[45:51] slows you down a lot. And so so yeah, it

[45:54] is super important. You never want to

[45:56] assume that just because you tell an

[45:58] agent to not do something, it never

[46:00] will. I mean it's the same thing like if

[46:02] you tell a kid to not do something they

[46:05] just might not listen. I mean even even

[46:07] adults

[46:07] >> there actually recently something did

[46:09] happen to us which is kind of why we

[46:10] started talking about this.

[46:12] >> Okay.

[46:12] >> We had this this incident where the

[46:15] agent had the right intentions. It was

[46:17] trying to be proactive and it actually

[46:18] saw something on its task list but it

[46:21] misinterpreted it

[46:22] >> and it ended up sending an email to our

[46:24] entire list

[46:25] >> with a discount code and it was like not

[46:28] supposed to go out. So, we had to like

[46:31] change the code, update the page, we

[46:33] emailed out an apology. So, if you guys

[46:35] are on the email list and you got that,

[46:36] that's what happened. But it's just

[46:38] like, you know, I wasn't mad at the

[46:40] person who was kind of responsible for

[46:41] the agent. It was just a really good

[46:42] opportunity for us to think about, okay,

[46:44] why did this happen?

[46:45] >> And, you know, she wrote up a case

[46:46] study. We sent it to the whole team and

[46:48] everyone was like, okay, that's a

[46:49] really, really good reminder of how

[46:51] careful you have to be. Because, you

[46:53] know, if you connect to an MCP server

[46:54] and you don't limit the permissions, it

[46:56] has everything, you know. Yeah. Yep. Now

[46:59] that's good. Yeah. The main way that I

[47:01] restrict actions from my coding agent is

[47:04] with hooks.

[47:05] >> So like cloud code hooks is a really

[47:06] good way because um basically a hook and

[47:09] cloud code is a little piece of code

[47:12] that you can run whenever a certain

[47:14] event happens in the tool. So whenever

[47:17] you start a session, whenever you end a

[47:19] session, right before cloud code uses a

[47:21] tool, you can run some kind of code that

[47:24] does a security check. I mean there's a

[47:26] lot of other things but like I love

[47:27] using hooks for security. And so what

[47:29] you can do in cloud is every time it's

[47:31] about to invoke a tool like it wants to

[47:34] write out to a file or make some

[47:36] requests to the web, you can uh check

[47:39] against that command to make sure it's

[47:40] not trying to mess with a folder you

[47:43] don't want it to touch or run some kind

[47:45] of command you don't want it to run. And

[47:46] there's a lot of different ways you can

[47:47] check for that that we don't have to get

[47:48] into right now. Um, but that's like one

[47:50] of my favorite ways to make sure it's

[47:52] like not reading my environment

[47:53] variables or it's not running a a delete

[47:56] command for a database.

[47:58] >> Um, and it it it's really hard to make

[48:00] sure you're you're covering all the

[48:02] loopholes cuz there's a lot of things

[48:04] that coding agents can do.

[48:06] >> Yeah.

[48:07] >> To get around those kind of checks as

[48:08] well. A lot of people have false false

[48:10] sense of security around that as well.

[48:12] So, you kind of have that like first

[48:14] false sense of security where it's like,

[48:15] well, I told it to never delete my

[48:17] database. And then you have the second

[48:18] level where it's like I block all delete

[48:20] SQL statements. But then there's that

[48:22] third level that you have to like make

[48:24] sure you're engineering for um like for

[48:26] example a coding agent. If you if you

[48:29] don't allow it to call the like delete

[48:32] like remove command to delete a folder,

[48:34] it can still write a script to do that.

[48:36] So it just has to do twostep like write

[48:38] the script and then run the script and

[48:40] then it's still able to remove a file or

[48:41] folder on your computer. So it's I mean

[48:44] they're less likely to do that. So, it's

[48:45] still like you're getting there if you

[48:48] are at least have like that that second

[48:50] false sense of security, but like you

[48:52] got to be really safe. You got to it's

[48:53] it's actually a tough problem to solve.

[48:55] >> Yeah, man. AGI is it's it's scary.

[48:58] >> Yeah.

[48:59] >> But I would love to see and maybe you've

[49:01] already got one out, but I would love to

[49:02] see a a Cole Hooks master class because

[49:05] I actually just recorded one

[49:07] >> and I don't use hooks that much to be

[49:10] honest. Like I really don't. I think my

[49:11] my main hook that I have is just to give

[49:13] me a noise notification when it's done

[49:15] or when it needs me. But yeah, like I

[49:17] have underutilized hooks for sure. And

[49:20] I'm not sure if that's because they're

[49:22] mainly valuable when you're doing heavy

[49:24] coding but

[49:26] >> I would assume that there's a lot of

[49:27] things that I could be doing in my

[49:28] day-to-day where hooks would be really

[49:30] good and I need to definitely look into

[49:33] a little bit more how I can be utilizing

[49:35] them. But anyways,

[49:37] >> yeah. Yeah, I I definitely should do a

[49:39] master class on hooks because there's a

[49:41] lot of ways that I use them. Um, yeah,

[49:44] since we're on the topic, like one of

[49:45] the really interesting way to use hooks

[49:47] is you can use them to automatically

[49:50] suggest like you can have cloud code

[49:52] like automatically suggest ways to uh

[49:55] improve your AI layer, like make your

[49:57] rules better, make your skills better.

[50:00] And a lot a lot of tools like Hermes and

[50:02] OpenClaw, they kind of do this. I don't

[50:04] think they like explicitly use hooks,

[50:06] but like OpenClaw for example, every

[50:08] like 10 20 turns, I think it's

[50:10] configurable. It will like kind of

[50:13] compact your conversation and store it

[50:15] as a memory, right? So that you have

[50:17] like the whole like daily log thing with

[50:19] the uh memory MD file. Like all of that

[50:22] comes from what's essentially a cloud

[50:23] code hook. Like so with the way I use

[50:25] cla code with my second brain

[50:27] >> is uh every time I have a memory

[50:29] compaction, which I try to avoid those I

[50:31] don't want to get that far into

[50:33] conversation. um or I end a session, it

[50:36] automatically creates a summary of the

[50:37] conversation, puts that in a daily log,

[50:39] and then I have a process every day.

[50:41] It's basically like cloud code dreaming

[50:43] where it's going to look at the daily

[50:44] log and then extract any like really

[50:46] important things to store and sort of

[50:48] like promote to my primary memory file.

[50:51] Like here are the decisions that I've

[50:52] made recently or like the things that

[50:53] I'm actively working on and where I'm at

[50:55] with them.

[50:56] >> So like hook hooks actually drives a

[50:57] whole thing like this terminal that that

[50:59] this is like the second time it's popped

[51:01] up. Um, that's actually a hook that just

[51:03] fired there. So, I'm just like testing

[51:05] some other things. I forgot to turn it

[51:07] off, which is unfortunate, but actually

[51:08] now it made for good illustration. I had

[51:10] I had a hook run as I'm talking about

[51:12] it. I'm just testing something else

[51:14] right now.

[51:15] >> Yeah. Just just yet another way to make

[51:17] these non-deterministic things as

[51:19] deterministic as possible. So, what do

[51:21] we have next after this verify the work

[51:24] section?

[51:24] >> Yeah. Yeah. So, really, this is the last

[51:26] thing. So, we've already talked about

[51:27] the the harness, but the the last and

[51:30] honestly probably the most important

[51:32] thing is the system evolution that I

[51:33] talked about just a little bit earlier.

[51:36] And and really the the mindset here is

[51:39] what like I this is out of everything

[51:42] that makes it so you're really directing

[51:43] cloud code instead of just being a user

[51:45] of it, the building the system is the

[51:47] most important thing. So anytime there's

[51:50] an issue that comes up, instead of just

[51:52] fixing the issue and moving on, it's an

[51:54] opportunity for you to with the help of

[51:56] the coding agent, with the help of cloud

[51:58] code, figuring out like what could we

[51:59] make better so that this doesn't happen

[52:01] again. Like maybe there's a new rule

[52:03] that we can add in our claw.md or

[52:05] there's a new document that we can give

[52:08] it when we're in our planning process or

[52:10] there's maybe an update to our skill

[52:13] that we could make. And I'm being kind

[52:14] of general on purpose here because

[52:16] there's a million different ways that we

[52:17] can improve our system. And so this is

[52:19] kind of like the example that you gave,

[52:21] Nate, where you had the email go out or

[52:23] at least it went out to way more people

[52:25] than it should have. And so you wrote up

[52:27] that report of like here's what

[52:28] happened. Here's what we can do better.

[52:30] And so it's kind of doing that but like

[52:32] for the agent so that going forward it

[52:35] has that rule so it it doesn't do that

[52:36] thing anymore. Like maybe it uh didn't

[52:38] run all the validation you wanted it to.

[52:41] So now you just like make sure that like

[52:42] that's a part of the rule where it's

[52:43] like this you make sure you don't forget

[52:45] this validation kind of as a silly

[52:47] example but that way every bug becomes a

[52:50] permanent upgrade. So once you have this

[52:52] kind of system in place you actually

[52:54] almost welcome bugs like I want

[52:56] something to go wrong because then I can

[52:57] make sure it never happens again

[52:59] >> right like I almost have I almost feel

[53:01] kind of nervous when everything's going

[53:02] too well cuz then it's like oh shoot I

[53:03] have no way to like make my agent better

[53:05] right now. So it's it can kind of become

[53:07] nice.

[53:07] >> Yeah. Absolutely.

[53:09] >> Yeah. Just get better over time. I've

[53:10] got an interesting question for you. So,

[53:12] >> I completely agree. Every single time

[53:14] that you have a failure, you should look

[53:16] at that as data and an opportunity to

[53:17] improve the system.

[53:19] >> Now, what what about before you get

[53:21] those failures? How do you think about

[53:24] to your best to the best of your ability

[53:27] finding those edge cases or predicting

[53:29] what edge cases might happen and trying

[53:31] to build in guardrails before

[53:33] >> the whole testing part?

[53:35] Well, it can never be perfect, which is

[53:37] why I lean on this so much. But

[53:39] generally, when you're looking out for

[53:40] edge cases,

[53:42] um, I mean, cloud code is actually

[53:44] pretty good at it.

[53:46] >> It's not going to cover like nearly all

[53:47] the edge cases, but even just asking it

[53:49] like how could this go wrong is a

[53:51] question that sometimes people are

[53:52] honestly like nervous to even ask, but

[53:55] it's a really good question once you're

[53:56] done with the implementation. And like

[53:58] this is a part of like my code review

[54:00] skill that I have built in where it's

[54:02] like ask yourself what could go wrong

[54:04] here and then try to engineer a scenario

[54:07] where you're really testing that. Like

[54:09] if I'm building an automation where I

[54:11] think there might be an edge case where

[54:12] it doesn't handle this kind of input

[54:14] correctly, I'm going to as a part of my

[54:15] agent's code review have it like create

[54:18] that like it'll invoke the application

[54:20] with that input like a web hook or

[54:23] whatever and try to break it and see

[54:24] what happens. And then if it does break

[54:26] then I mean that's obviously going to be

[54:28] like going back here a part of our

[54:30] verification where um it'll then uh

[54:34] address that thing and then do the tests

[54:36] again right like iterate like you find a

[54:37] problem fix it and then also retest

[54:39] right don't forget to retest because

[54:41] maybe you're fixant to actually address

[54:42] the problem.

[54:43] >> Mhm. Yeah. Well, I think, you know,

[54:45] something that I've realized after

[54:48] responding to YouTube comments, Q&As's

[54:50] in the community, chatting to you, and

[54:52] and just seeing what's going on when

[54:54] people are learning these kinds of tools

[54:55] is that you really at, you know, the

[54:58] simplest way to describe it, you just

[55:00] have to treat it like your best friend

[55:03] who is the smartest person in the world.

[55:06] Meaning, you know, treat it like a

[55:07] mentor. It's not going to laugh at you

[55:09] if you ask it something stupid. You just

[55:11] need to be curious and you need to ask

[55:13] >> ask the questions that you are wondering

[55:15] in your head. And I think when you kind

[55:16] of get over that idea that it can teach

[55:18] you anything and it can for the most

[55:20] part, especially if you ask it right, it

[55:22] can help you figure out the majority of

[55:24] your problems when you have, you know,

[55:26] that that sort of uneasiness because

[55:28] maybe you don't understand what it did.

[55:30] So that's like a huge mindset shift for

[55:33] anyone I've talked to that is like

[55:35] trying to get into it and doesn't

[55:36] understand it. if they, you know, maybe

[55:38] they text me a question or they drop a

[55:39] question. It's just like the response

[55:41] can a lot of times be, "Have you asked

[55:43] Claude code that?"

[55:44] >> I know. Yeah. I feel bad saying that,

[55:46] but like, yeah, it comes, it comes to

[55:48] that a lot. Um, yeah. Where it's like,

[55:50] no, you should have just usually how I

[55:53] can be more helpful is like telling them

[55:54] what to ask exactly, like give it give

[55:57] it this link, give it that thing, and

[55:58] then here's how I'd ask it. But yeah, a

[56:00] lot of times it does come down to that.

[56:02] And I mean, you can't you can't just ask

[56:04] Claude for everything because of like

[56:06] the psychic fancy you mentioned earlier.

[56:08] Sometimes if you're asking it for its

[56:09] opinion like asking a large language

[56:12] model for its opinion is a really

[56:14] slippery slope.

[56:15] >> Yeah.

[56:15] >> But but what you can what we can ask it

[56:18] for is to like understand how something

[56:21] works. Like that's when it can do a

[56:22] really good job. So, like going back to

[56:23] the example earlier, if you're not

[56:24] technical, but you want to try to

[56:26] actually be able to understand the

[56:28] automations and things that it's

[56:29] building for you, like that's a really

[56:31] good thing to ask it because it's not

[56:33] going to like there's no sick fancy

[56:34] there, right? It's not just trying to

[56:35] appease you. It's it's just helping you

[56:37] understand. The way to appease you is to

[56:39] explain the thing, right?

[56:41] >> Um, so like that that's a really good

[56:43] case just trying to understand anything.

[56:45] And then like what we were talking about

[56:46] just here with verification like trying

[56:48] to find edge cases. If there's anything

[56:49] where there's like actual empirical data

[56:51] like there is a way to verify that like

[56:53] this this automation doesn't handle this

[56:55] input well. I mean there's no room for

[56:57] sickop fancy there. It's like it it

[56:58] either works or it doesn't and there's

[57:00] not really like any kind of gray area or

[57:02] opinion.

[57:03] >> So if you think there might be an edge

[57:04] case or it thinks there might be it can

[57:06] test it and then it's it's black or

[57:08] white.

[57:09] >> Right.

[57:10] >> So I want to hear what you think about

[57:13] this because You briefly mentioned the

[57:16] agent teams earlier and

[57:18] >> I actually find myself using them quite

[57:20] a bit for mainly one specific use case

[57:22] and I want to hear what you think about

[57:24] it. So really the time when I reach for

[57:27] agent teams is when

[57:29] >> I am trying to help you know I'm trying

[57:31] to decide something but I don't want to

[57:33] just ask for cloud code's opinion like

[57:35] you just said.

[57:36] >> Yeah.

[57:36] >> And so what I'll do a lot is I'll spin

[57:38] up like a debate panel or like a war

[57:40] room.

[57:41] >> Nice. And I will say, you know, like one

[57:43] of you guys is a CEO, one is a beginner,

[57:45] one is a college student, and just like

[57:46] a bunch of different personas, sometimes

[57:48] even like seven. And I will just have

[57:50] them all do independent research, form

[57:52] their own opinions, and then I'll have

[57:53] them debate.

[57:54] >> And then I'll just be able to read the

[57:56] debate, and I'll be able to sometimes

[57:58] I'll say like, "Keep debating until you

[58:00] all come to some sort of consensus."

[58:02] >> But I do that quite a bit. And that

[58:03] doesn't mean whatever the agent team

[58:05] spits out, I do.

[58:07] >> But sometimes it's just really great for

[58:08] me to read through all those opinions.

[58:10] But I want to see do you do you like

[58:12] that? Do you think that's a major flaw?

[58:14] Like what thoughts do you have about

[58:15] that?

[58:16] >> I do actually like that. Uh I've I've

[58:18] never done that before.

[58:19] >> You've never done that? You should

[58:20] definitely No. Yeah. I feel like tonight

[58:21] I literally got to try that. It's really

[58:23] fun.

[58:23] >> Yeah. I like that idea a lot cuz some

[58:25] something that I have done

[58:26] experimentation with that's sort of

[58:28] similar. I call it adversarial

[58:30] development where basically after a

[58:33] cloud code finishes building something,

[58:36] I'll have a separate cloud code session

[58:38] u where I prompt it specifically to play

[58:40] the devil's advocate. Like I want you to

[58:42] be mean to the other CL code session to

[58:45] like really make sure that it's not just

[58:47] being happy golucky when there are

[58:48] actually some some problems that need to

[58:50] be surfaced

[58:51] >> and like that works really well. So just

[58:54] generally like pitting large language

[58:55] models against each other is a good

[58:57] idea. Um, I wish I had tried that

[59:00] before. So, yeah, I'll I'll give that a

[59:01] shot. I think that that's that's a

[59:02] really good use for Asian teams because

[59:04] at that point at that point, it's like

[59:06] you're not relying on it to getting the

[59:08] perfect answer. Like they it's very

[59:11] tokenheavy and the communication is

[59:12] never really perfect. So, that's why I

[59:14] don't really recommend agent teams when

[59:15] you're trying to do like deep

[59:16] development or like building any kinds

[59:17] of like complex automations. But, when

[59:19] it's more like research and just like

[59:21] forming a consensus, I I think that it

[59:23] does really it would do really well for

[59:24] that. I'll try it out.

[59:26] >> Cool. Yeah. Let me know if you try it

[59:28] out and what you think. But I've never

[59:30] >> really talked about that or made a video

[59:31] because I know people would go do it and

[59:33] then be like, "You just killed my 5 hour

[59:35] limit."

[59:36] >> Right. Yeah. Fortunately.

[59:38] >> So, um,

[59:38] >> how much of your limit does it use when

[59:40] you typically do it?

[59:41] >> I mean, on the 200 buck 200 bucks a

[59:44] month plan anywhere from, you know, 4%

[59:47] to to 10 sometimes. Like, you know,

[59:48] >> it's not too bad.

[59:49] >> It's not too bad. But, you know, if you

[59:51] if you say something like a

[59:53] >> don't stop until everyone agrees and

[59:54] they just keep going, then you could you

[59:56] could run into some some trouble. But

[59:59] >> to close us off here, I have to ask,

[1:00:02] >> I just did a video about my favorite

[1:00:05] features in Cloud Code. And I prefaced

[1:00:07] that whole video and basically said this

[1:00:10] is not a list of the best features or,

[1:00:12] you know, the most used or the the most

[1:00:14] useful. These are basically the way that

[1:00:16] I use Cloud Code on my day-to-day, the

[1:00:18] ones that I like the most. And I I had

[1:00:19] like a numbered list of 12 at the top.

[1:00:22] But I would love to hear from you to put

[1:00:24] you on the spot. If you had like a top

[1:00:25] three based on because I'm I'm assuming

[1:00:27] we use very differently.

[1:00:28] >> What would you say are like your top

[1:00:30] three favorites?

[1:00:32] >> Yeah. So, uh, Hooks is definitely maybe

[1:00:35] not like my favorite favorite, but

[1:00:37] probably the one that like most people

[1:00:38] wouldn't put in their top three.

[1:00:40] >> And that's because of what I've been

[1:00:41] doing with it for security and then the

[1:00:43] whole um integration with the second

[1:00:45] brain. So, it's able to basically like

[1:00:46] extract summaries and like remember

[1:00:48] things over time.

[1:00:49] >> We definitely need a hooks coal video.

[1:00:51] Yeah, I honestly I should just do that

[1:00:53] next week.

[1:00:53] >> Yeah, we definitely need it.

[1:00:54] >> Yeah. Okay. Yeah, thanks Nate. Yeah, so

[1:00:57] yeah, hooks. Hooks is definitely number

[1:00:58] one.

[1:00:59] >> Okay.

[1:00:59] >> And then uh here because I I mentioned

[1:01:02] some of the things here like I mean

[1:01:03] really when I there's kind of two

[1:01:05] different sorts of cloud code features.

[1:01:07] you have like the components of the AI

[1:01:09] layer like rule, skills, hooks and then

[1:01:10] you just have like general capabilities

[1:01:12] of the harness like um agent teams and

[1:01:16] slash by the way and and um dispatch

[1:01:19] like dynamic workflows things like that

[1:01:21] right it's either like it's something

[1:01:22] that you use or it's something that you

[1:01:24] build on top of

[1:01:26] um sub aents would be probably like

[1:01:28] number two um just because like I said

[1:01:32] there's dangers to using sub aents but

[1:01:34] just using them to like sprawl out and

[1:01:37] research a ton of different things. I

[1:01:38] use it for that all of the time and

[1:01:40] especially when I'm working on more

[1:01:41] complex code bases or building out

[1:01:43] larger automations. I'm using sub aents

[1:01:46] to basically like extract context from

[1:01:49] certain parts of my system, right? Like

[1:01:52] you're responsible for getting a

[1:01:53] grounding here of like how are we going

[1:01:54] to have to mess with the front end in

[1:01:56] this application? How are we going to

[1:01:57] have to mess with the back end? Um and

[1:01:59] then honestly probably like my number of

[1:02:02] one. So I guess like hooks would be two

[1:02:04] and sub agents would be three. Probably

[1:02:06] my my number one is skills. Even though

[1:02:08] that's like super super cliche like

[1:02:11] Yeah. It's got to be skills.

[1:02:13] >> It's just the best. Yeah.

[1:02:14] >> Yeah. Like skills. Skills dictate

[1:02:15] everything. Skill. I have a skill for

[1:02:17] making this diagram. I have a skill for

[1:02:19] scripting my YouTube videos. I have a

[1:02:20] skill for building PowerPoints.

[1:02:22] >> I have a skill for

[1:02:25] >> um I mean you could literally make

[1:02:26] >> so versatile. Yeah.

[1:02:28] >> Yeah. It's just any kind of reusable

[1:02:30] prompt. You just make it as a skill.

[1:02:32] >> Mhm. And cloud code has done a really

[1:02:34] good job continuing to evolve just like

[1:02:35] the way you can parameterize things like

[1:02:37] they have like um path scope skills now

[1:02:40] and you can like set if this one is to

[1:02:41] be invoked only by you or if the agent

[1:02:43] can decide to do it as well. Um and then

[1:02:46] like talking about like verification

[1:02:49] like getting back to here like having

[1:02:51] that browser automation skill so it

[1:02:53] knows how to use a CLI. Um, like that's

[1:02:55] a whole another thing is like the the

[1:02:56] skill plus CLI combination is just

[1:02:58] really really powerful because basically

[1:03:00] any platform or tool you want your

[1:03:02] coding agent to be able to use. It's

[1:03:04] either going to be an MCP server and

[1:03:06] like those are still good but honestly

[1:03:09] what I think is even better like more

[1:03:10] token efficient is having a CLI so it

[1:03:13] has access to your CRM or GitHub or

[1:03:16] whatever through the CLI and then the

[1:03:17] skill it tells it how to use that CLI

[1:03:20] and then more it more specifically like

[1:03:22] how you want it to use that like how do

[1:03:25] you want to this CLI to be integrated in

[1:03:27] your workflow. So like that combination

[1:03:29] I'm leaning on that for everything like

[1:03:30] my AR archon tool I was talking about

[1:03:32] earlier like it is a CLI that has a

[1:03:35] skill that comes with it. So like if you

[1:03:36] want those more deterministic workflows

[1:03:39] where you get to pick like when do we

[1:03:41] have the LLM? When are we just running

[1:03:42] code then like you build that as a

[1:03:44] workflow and then now archon with its

[1:03:47] skill and its CLI becomes a tool that my

[1:03:49] second brain can call upon whenever it

[1:03:51] wants to dispatch one of those workflows

[1:03:52] to go handle a GitHub issue or run this

[1:03:55] automation whatever it is. Very cool. I

[1:03:58] love it. Yeah, I love the list.

[1:03:59] >> My top three

[1:04:01] >> were skills was number one.

[1:04:03] >> Okay.

[1:04:03] >> Number two, I had status line.

[1:04:06] >> Oh, nice.

[1:04:07] >> I love just a quality of life thing, you

[1:04:08] know, just seeing the model, the effort,

[1:04:11] the window. I love that. And then my

[1:04:13] number three was routines.

[1:04:15] >> I love the uh the cloud routines. I I I

[1:04:18] just think it's so cool that,

[1:04:19] >> you know, I know I know we've got the

[1:04:20] SDK and whatnot, but it's just nice to

[1:04:22] be able to schedule something that

[1:04:24] >> is just my cloud code going. And

[1:04:26] >> yeah, I think those were my top three

[1:04:27] and I'm sure they'll they'll move

[1:04:28] around. But yeah, I appreciate you

[1:04:30] sharing yours. It was interesting to

[1:04:31] hear. I'm glad that Hooks made the list,

[1:04:32] so I'll definitely be keeping my eye out

[1:04:34] for that video, though.

[1:04:34] >> Sounds good. Yeah. All right. What do

[1:04:36] you use uh routines for?

[1:04:38] >> Um well, I've got one going now that is

[1:04:40] a a trading bot.

[1:04:42] >> I had that originally going with an open

[1:04:43] call agent, but I switched it over to

[1:04:45] routines just to see how

[1:04:46] >> how it would do there. Um but then other

[1:04:48] things just like it's actually doing

[1:04:50] worse there.

[1:04:52] Yeah, it's doing worse there right now,

[1:04:53] but I don't know if it's I mean the

[1:04:54] market and everything as well, but

[1:04:56] >> I think OpenClaw had just out of the box

[1:05:00] it had better

[1:05:02] memory capabilities for that sort of

[1:05:04] thing. So,

[1:05:04] >> yeah, makes sense.

[1:05:06] >> Um, but then, you know, just your other

[1:05:08] standard stuff like checking in on the

[1:05:11] team and giving me updates throughout

[1:05:12] the week and um end of week reports,

[1:05:14] just very very simple things, but

[1:05:16] >> nice to throw the routines in there. So

[1:05:19] yeah, I really appreciate you walking us

[1:05:21] through all this stuff today. Is there

[1:05:23] anything else that you want to leave

[1:05:25] everyone with?

[1:05:27] >> Uh, that's a good question. Yeah, I I I

[1:05:29] would say that no matter how technical

[1:05:31] you are, really what it comes down to is

[1:05:33] you could think of yourself like the

[1:05:35] product manager for cloud code. So you

[1:05:38] don't necessarily have to describe how

[1:05:40] to build something, but it's important

[1:05:42] for you to shape the vision, right? Like

[1:05:43] what are we going to build? And then a

[1:05:46] lot of people are calling this intent

[1:05:48] engineering now. I just kind of another

[1:05:49] buzz word of basically like you want you

[1:05:51] want to give like the why like cloud

[1:05:53] code this is why we're building this

[1:05:54] thing because that really actually it

[1:05:55] ends up shaping the how quite well. So

[1:05:57] like that's a big part of your planning

[1:05:58] process

[1:05:59] >> that's going to take you far and like it

[1:06:02] it it seems kind of silly cuz you really

[1:06:04] start to get into sort of like the

[1:06:05] personification of claude code when

[1:06:08] you're you're telling it why you're

[1:06:09] doing things but like it actually makes

[1:06:10] a difference. you kind of have to like

[1:06:12] get over yourself and be like it's kind

[1:06:14] of cringe to treat it like a person, but

[1:06:16] like that actually is how you get the

[1:06:17] best results.

[1:06:18] >> So just just do it. It actually helps a

[1:06:20] lot and good plans and good specs going

[1:06:22] into whatever you're building with

[1:06:23] claude or automating.

[1:06:25] >> Great tip. Great tip. I actually did

[1:06:27] just yesterday read in

[1:06:29] >> the Claude docs on how to prompt 4.8

[1:06:31] that it said that it said to give it the

[1:06:33] context for why you're doing something

[1:06:35] and it will probably do a better job. So

[1:06:38] >> that's awesome. Cole, where can people

[1:06:40] find you if they want to watch more of

[1:06:42] your stuff or get in touch?

[1:06:44] >> Yeah, so YouTube channel is the main

[1:06:46] place for me to put all my content. So,

[1:06:48] you can just search my name, Cole

[1:06:50] Medine. Uh, it is not spelled as you'd

[1:06:52] think. It's me d i n. Sounds like medin.

[1:06:55] Everyone says it wrong. But yeah, that's

[1:06:57] that's my YouTube channel. And then uh

[1:06:58] also doing a lot of posting on LinkedIn

[1:07:00] as well.

[1:07:02] >> Same name obviously.

[1:07:03] >> There we go. Oh yeah, I think for the

[1:07:05] first multiple months I knew you, I

[1:07:06] thought it was Cole Meen and I was

[1:07:07] saying I was saying Meden all the time,

[1:07:09] but nice.

[1:07:10] >> Good to know everyone cleared up. It's

[1:07:12] Cole Medine.

[1:07:13] >> That's right. Yeah, it's a Swedish last

[1:07:14] name. And uh yeah, Nate, there's there's

[1:07:16] people that have said it way worse than

[1:07:17] you. Like someone called me Melden uh

[1:07:20] live on stage at a chess tournament in

[1:07:22] high school. Like it's it's been worse.

[1:07:24] >> Oh man. Yeah. I don't know. A lot of

[1:07:25] people have hallucinated the L in there.

[1:07:27] I've noticed that. I'm not sure why.

[1:07:29] >> Oh, really?

[1:07:29] >> Yeah. I've had a lot of people spell it

[1:07:31] to me as Meldon or Medlin.

[1:07:34] >> Oh wow. Okay. Cuz I that was actually a

[1:07:35] onetime thing for me. That's

[1:07:36] >> I've gotten that a lot for some reason.

[1:07:37] But

[1:07:38] >> Wow.

[1:07:38] >> Anyways, yeah, thank you so much for

[1:07:40] hopping on, Cole. Um I was here to not

[1:07:44] only chat with you, but I also learned a

[1:07:46] lot as well. So, thank you so much as

[1:07:48] always. It's a pleasure to get to speak

[1:07:50] with you and hopefully we can do it

[1:07:51] again soon.

[1:07:52] >> Yeah, sounds good. I appreciate it. And

[1:07:54] thank you as well, Nate. This was

[1:07:55] awesome.

[1:07:55] >> Absolutely. I love chatting with you.

[1:07:57] >> Awesome. There we go. All right. Take it

[1:07:59] easy Cole.

[1:08:00] >> Yep.

[1:08:00] >> Have a good one.

[1:08:00] >> Thanks so much for watching today's

[1:08:02] episode. I hope that you guys enjoyed.

[1:08:03] Don't forget that I broke all of this

[1:08:05] down into a free resource guide that you

[1:08:06] can access for completely free using the

[1:08:08] link in the description to join our free

[1:08:09] school community. I'll see you guys in

[1:08:11] there. Thanks so much.

⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.