Why software projects fail
45sHigh failure rate of software projects is a relatable pain point for developers.
▶ Play ClipBrandon Rhodes discusses the Clean Architecture concept, originally popularized by Uncle Bob Martin, and how it applies to Python. He argues that programmers have been using subroutines backwards for 62 years by hiding I/O complexity instead of decoupling logic from I/O. The talk demonstrates how to restructure code to keep I/O at the top level and pure functions at the bottom, leading to more testable and maintainable software.
Brandon Rhodes introduces the talk, noting that many software projects still fail, and Clean Architecture offers a way to organize code better.
Uncle Bob Martin's Clean Architecture (2011-2012) organizes code with I/O at the top level and business logic at the center, similar to hexagonal architecture but more popular due to better diagrams.
Programmers have been using subroutines backwards for 62 years, hiding I/O complexity instead of decoupling it. This dates back to D.J. Wheeler's 1952 paper on subroutines.
Wheeler advocated hiding complexity, but this led to the mistake of burying I/O rather than cleanly decoupling it. Hiding is not enough; true decoupling is needed.
Instead of hiding I/O, rescue the logic by pulling data operations into separate pure functions, leaving I/O at the top level. This is the essence of Clean Architecture.
Listing three shows pure functions that take data and return data, while top-level procedures handle I/O. This makes testing easier and code more maintainable.
Dependency injection and mocking (e.g., mock.patch) allow testing without real I/O, but they can be awkward. Pure functions avoid these issues.
Pure functions are easy to test with simple data inputs and outputs, no special setup needed. This reveals coupling between logic pieces that should be separated.
In larger applications, Clean Architecture means designing business logic to survive being split off, with I/O at the top level and inner layers enforcing business rules.
The top-level procedural glue needs only a few integration tests, as most logic is tested via pure functions. Gary Bernhardt's 'functional core, imperative shell' pattern is referenced.
Functional languages naturally lead to pure functions and data transformation, making Clean Architecture easier. Python's list comprehensions, sorted(), generators, and context managers support this style.
Fred Brooks and the McIlroy vs. Knuth showdown illustrate that understanding data structures is more important than understanding code. Data is easier to reason about.
McIlroy's six-line shell script for word frequency is simpler because you can picture the data at each step. This emphasizes stepwise data transformation.
Immutability is valuable for distributed computing because pure functions can run on any core without side effects, enabling parallel processing.
Python features like list comprehensions, sorted(), generators, and context managers make it easier to write functional, decoupled code.
Skyfield (astronomy) and Lucha (tax forms) are projects where rescuing logic into pure functions improved testability and maintainability.
Instead of hiding I/O, put it at the top as procedural glue, making pure functions subordinate. This realizes the full power of subroutines as Wheeler envisioned.
Clean Architecture in Python means putting I/O at the top level and extracting pure functions for business logic, leading to more testable and maintainable code. This approach, inspired by Uncle Bob Martin and supported by Python's features, helps avoid the common mistake of hiding I/O instead of decoupling it.
"Title accurately reflects content: the talk thoroughly explains Clean Architecture principles and their application in Python."
What is the main idea of Clean Architecture according to Brandon Rhodes?
Put I/O at the top level of your program and extract pure functions for business logic, decoupling logic from I/O.
01:01
Who popularized Clean Architecture?
Uncle Bob Martin (Robert C. Martin).
01:01
What was the common mistake in using subroutines for 62 years?
Hiding I/O complexity instead of cleanly decoupling it from logic.
02:00
What did D.J. Wheeler's 1952 paper say about subroutines?
The primary reason to use subroutines is to hide complexity.
05:10
What is the difference between hiding and decoupling I/O?
Hiding puts I/O inside a subroutine but still couples it to the logic; decoupling separates I/O and logic so they can be changed independently.
08:56
How does testing pure functions differ from testing procedures with I/O?
Pure functions can be tested with simple data inputs and outputs without mocking or special setup.
18:36
What is the 'functional core, imperative shell' pattern?
A pattern where pure functions form the core of logic, and a thin imperative shell handles I/O and glues the functions together.
26:01
What did Fred Brooks say about data vs. code?
'Show me your tables and I won't usually need your flowchart; it'll be obvious.'
31:58
What was the key lesson from the McIlroy vs. Knuth showdown?
Simple tools that transform data stepwise (like shell pipes) are easier to understand and maintain than complex monolithic code.
38:30
What Python features support writing pure functions?
List comprehensions, sorted(), generators, and context managers.
41:45
Clean Architecture Origin
Introduces the core concept from Uncle Bob Martin that I/O should be at the top level.
01:01Hiding vs. Decoupling
Key distinction that hiding I/O is not enough; true decoupling is necessary.
05:10Rescue Logic, Don't Hide I/O
Practical technique to extract pure functions and leave I/O at the top.
08:56Data Over Code
Fred Brooks' quote emphasizes the importance of data structures for understanding.
31:58McIlroy's Shell Script
Demonstrates the power of stepwise data transformation over complex algorithms.
38:30[00:00] all right everybody Welcome to glad
[00:03] you've hung in there to the second
[00:04] Afternoon Of Pi Ohio I'm Brandon rhods
[00:07] when I'm not uh writing python apis or
[00:11] writing python
[00:13] applications I'm wondering why the code
[00:15] in my apis and applications is such a
[00:20] mess the industry as a whole I'm told
[00:22] that at the moment the numbers are hard
[00:24] to find that more software projects even
[00:26] to this day uh more projects fail than
[00:29] succeed worldwide in businesses and
[00:32] institutions and uh as an industry we're
[00:34] still
[00:38] explaining uh we're still trying to
[00:40] learn uh why uh a piece of that puzzle I
[00:44] think is the recent work that's been
[00:46] done in propounding the clean
[00:47] architecture and I'm going to give some
[00:50] uh examples of how I believe that
[00:52] applies to
[00:54] python uh the inspiration for this talk
[00:57] is uh uh uh someone called Uncle Bob
[01:01] Martin who is uh he's really big in the
[01:03] Java and the um strong object-oriented
[01:07] uh statically type languages and he
[01:09] recently in 2011 and 12 was thinking
[01:12] about a new way of organizing his code
[01:15] his um uh applications that he called
[01:18] the clean architecture one of several
[01:20] ideas that came out at about the around
[01:23] the same time uh with about the same
[01:26] goal but his became more popular because
[01:29] I think he drew a better picture um
[01:32] there was someone else that came out
[01:34] with something uh like called the um
[01:37] hexagonal architecture and and it just
[01:39] wasn't as pretty it didn't use
[01:41] colors um and so this is what people
[01:45] often talk about if they're going to
[01:47] refer to this idea we'll explore of
[01:49] putting IO at the top level of your
[01:52] program instead of at the bottom the
[01:55] pith the center of the
[01:57] idea uh this is this is not how he put
[02:00] it this is my spin on it uh you're
[02:02] familiar with the idea of a sub routine
[02:04] where your code can be running along and
[02:06] then make a call in Python the two forms
[02:08] of of sub routine or the function and
[02:10] the method where you can stop invoke
[02:13] some other code and wait for it to come
[02:15] back with an answer the pth of the idea
[02:18] here is that we programmers have been
[02:20] spontaneously using sub routines
[02:24] backwards for how long have programmers
[02:27] tended to use sub routines completely
[02:30] backwards the wrong way by my count we
[02:32] have been doing it for 62 years and my
[02:36] proof is that I went back and I found
[02:40] the
[02:41] 1952 ACM national meeting paper in
[02:45] pinburgh Pennsylvania it was the second
[02:47] meeting of the ACM but the first for
[02:49] which proceedings papers were
[02:53] published and I found the use of sub
[02:57] routines in programs
[03:00] by Mr DJ wheeler uh Dr DJ wheeler of
[03:04] Cambridge and Illinois
[03:06] universities um and you might wonder am
[03:10] I really going to pull anything of
[03:11] relevance out of this guy's paper
[03:13] because it was a very different world a
[03:14] typical computer at the time had about a
[03:17] thousand words of ram could do about a
[03:19] thousand operations per second and
[03:22] required a dozen people to operate could
[03:25] programming a computer with a thousand
[03:28] words of RAM really be anything like
[03:31] computer writing uh code in a modern
[03:33] language today well here's just one
[03:36] example of something that you'll find
[03:37] familiar from this paper how complex
[03:40] could programming even be with only 1K
[03:42] of memory in the paper he says the
[03:46] preparation of a library sub routine
[03:49] requires a considerable amount of work
[03:52] however even after it has been coded and
[03:54] tested there Still Remains the
[03:57] considerable task of writing
[04:00] a
[04:02] description so that people not
[04:05] acquainted with the interior coding can
[04:08] nevertheless use it
[04:10] easily this last task may
[04:16] [Music]
[04:19] be the most difficult you had 1,000
[04:25] bytes in which to write your code or I
[04:28] should say 1,000 words in which to write
[04:30] your code and they still didn't want to
[04:34] document so I think the world he was
[04:36] working in as I read this paper seemed
[04:39] very familiar though in some ways very
[04:43] strange he's advocating that instead of
[04:45] just having one huge piece of code in
[04:47] your 1,000 words of memory you split it
[04:50] into routines that call one another like
[04:53] instead of having a single uh function
[04:55] in your python file having several what
[04:57] does he advertise sub routines as being
[05:00] good at why would you organize code this
[05:03] way he says that you the primary reason
[05:07] is to hide
[05:10] complexity all complexities should if
[05:15] possible be buried out of sight and this
[05:19] you see is where everything went wrong
[05:21] and he he doomed us for the next several
[05:24] lifetimes of programming
[05:27] because that then leads programmer
[05:30] to a quite natural
[05:32] mistake IO is always a mess trying to
[05:36] talk to a database trying to parse Jason
[05:39] uh uh trying to get things in and out of
[05:41] a file it's a mess it's often very idiot
[05:45] and CTIC code that doesn't have a lot to
[05:47] do with the pure essence of what our
[05:50] program is trying to
[05:51] accomplish and the characteristic error
[05:54] that we make is that we bury the io
[06:00] rather than cleanly and completely
[06:02] decoupling from it uh in the the time
[06:05] allotted for this talk I'm only going to
[06:08] attempt one code example so if if you'll
[06:10] if we spend a second on this it will
[06:12] will um get you set up for the the rest
[06:15] of the listings and the talk this is a
[06:18] simple function in Python that uses a
[06:22] now deprecated API on Duck tuck duck
[06:24] Dogo in order to look up the definition
[06:27] of a word builds a URL in this case uses
[06:30] the requests Library I was a good
[06:33] citizen uh and then I marked those two
[06:35] lines with that there's the io there's
[06:38] that ugly complexity we'd like to make
[06:40] disappear and then having gotten the
[06:42] Json data back it can look and see if a
[06:45] definition was in fact returned for the
[06:50] word the natural thing that we tend to
[06:53] do is say well IO is kind of messy who
[06:57] knows tomorrow whether I might not be
[06:58] using some other library in order to do
[07:02] my uh HTTP who knows whether I might not
[07:05] have a different way to ask for
[07:07] definitions for instance if duck duck
[07:10] dug go deprecates the
[07:12] API um well I guess that would
[07:14] invalidate all of it so I'll stay with
[07:16] the example of what if the way that I do
[07:18] the io what if the way that I make the
[07:20] HTTP request
[07:21] changes uh we want to get that
[07:24] complexity and bury it and so we make
[07:28] the fundamental mistake of the last 60
[07:30] years we get the io pluck it out and
[07:35] feel proud of ourselves from having done
[07:38] exactly what uh Dr Wheeler said we've
[07:40] hidden it in a sub
[07:43] routine we have hidden the iio but have
[07:46] we really decoupled it pocket wheeler I
[07:51] assert that hiding is not enough if you
[07:54] want to control the complexity of your
[07:56] programs here's the listing again and I
[08:00] will just ask this if you want to call
[08:03] Find
[08:05] definition so that it doesn't really do
[08:08] any IO because you're testing it or
[08:11] because you've cashed a result and just
[08:13] want to hand it the cached result
[08:15] instead of calling your your uh lower
[08:17] level code how do you do
[08:21] that how do I call Find definition
[08:25] without it actually doing IO and at
[08:28] least as the code is presented it here
[08:31] it's not possible I have you see hidden
[08:34] the API you don't see any API if you
[08:37] read fine definition but I'm still
[08:40] tightly coupled to it the io is an
[08:44] inevitable consequence of calling fine
[08:47] definition whether it's visible in its
[08:49] code or not I have hidden but I've not
[08:52] cleanly
[08:53] decoupled what if we did
[08:56] everything the other way around what if
[08:59] when we saw a routine with IO in it
[09:02] that's ugly and idiosyncratic and might
[09:05] change
[09:06] tomorrow What If We rescued the
[09:10] logic instead of hiding the IO this is
[09:14] exactly the same lines of code but in
[09:17] this case I have pulled
[09:19] out the data
[09:22] operations and made them separate and
[09:25] left the io stranded at the top level of
[09:28] the program
[09:30] program rather than leaving my logic
[09:32] there and my claim is this that listing
[09:35] three that we just looked at is an
[09:36] architectural success while the others
[09:40] were architectural failures listing
[09:42] three shows in miniature What the clean
[09:46] architecture does for entire
[09:49] applications here's that top function
[09:52] from listing three the coupling between
[09:56] the logic and the io the thing in my
[09:58] program that brings together logic and
[10:01] IO in a way that that they both have to
[10:04] be called at at once is now isolated to
[10:08] a small procedure that mates my logic
[10:12] and my external IO operations together
[10:16] it's very readable because instead of
[10:19] blocks of logic operations I now have
[10:21] names for them build URL pluck
[10:25] definition from this
[10:26] data that document what each section of
[10:29] code was doing the previous the first
[10:31] listing had no documentation for what
[10:34] those series of operations did this
[10:36] should remind you of um a little bit of
[10:39] the uh extreme programming movement from
[10:41] the late '90s early 2000s where remember
[10:45] they said that if you ever see a piece
[10:47] of code with a comment at the top that's
[10:51] a sign that you actually have what wants
[10:53] to be a
[10:54] function um and they would say you know
[10:57] if you're writing high speed C code and
[10:59] you want it to run fast Market is in
[11:01] line static but make it so so it gets in
[11:04] line at compile time but semantically
[11:07] make it something separate XP people
[11:11] actually believed it was a bug this is
[11:13] why it was called Extreme programming
[11:15] they actually believed every comment was
[11:17] a bug because every comment was
[11:20] knowledge that wasn't in your code and
[11:23] if your code isn't explaining everything
[11:26] about what it's doing to them it's bad
[11:28] code so before you could commit in ex
[11:31] extreme programming all the comments had
[11:33] to disappear as in this case we
[11:35] introduce a new name a new identifier
[11:38] into the code build URL that wasn't
[11:41] there before so that semantic
[11:43] information about what are these three
[11:45] lines do becomes a part of our program's
[11:48] actual
[11:49] semantics and and in this uh so so this
[11:52] uh maneuver we see is we turn pure logic
[11:56] into functions and thus have to give
[11:58] them names
[11:59] in the same way that XP did uh we find
[12:03] that we're adding more semantic content
[12:05] to our
[12:07] code so our architecture in listing one
[12:12] was simply a procedure a procedure
[12:14] meaning something that has side effects
[12:16] you call it and some IO has happened
[12:18] when it's all done listing two the
[12:21] natural way of using a sub routine since
[12:24] the 1950s to hide complexity resulted in
[12:28] hiding the IO but the top level code
[12:31] there was was still a procedure all of
[12:33] our logic was stranded in a routine that
[12:36] did IO every time you called it listing
[12:39] three by doing the opposite maneuver
[12:42] left the I/O up in the procedure and
[12:45] resulted in pure functions it resulted
[12:49] in uh Downstream uh uh python functions
[12:52] that don't do IO that don't have side
[12:55] effects they simply take some arguments
[12:57] that are data and return some results
[13:00] that are just data this has incredible
[13:04] ramifications among other things uh for
[13:08] testing how would we have tested listing
[13:10] one or two where the goal is to not have
[13:13] your tests need the network and to talk
[13:16] to duck ducko imagine that you want your
[13:18] tests to run on an airplane or at the
[13:20] airport or you don't have Wi-Fi or
[13:23] something two techniques have been
[13:25] developed over the 2000s in um uh being
[13:28] Pion I believe in Java and the other big
[13:31] oo statically typed languages uh they
[13:34] are dependency injection and the idea of
[13:37] mocking which in Python we can do
[13:40] through monkey patching without even
[13:41] modifying our code through something
[13:43] I'll show in a moment called mock patch
[13:47] dependency injection was pioneered in
[13:49] 2004 by another of the big oo thinkers
[13:52] named Martin Fowler his idea was to make
[13:55] the io library or function that the
[13:58] routine needs to call
[13:59] itself a parameter and this is really
[14:01] easy in Python functions in Python are
[14:04] first class objects you can pass them um
[14:08] modules are first class objects and can
[14:10] be an argument to a function so instead
[14:14] of having Find definition from listing
[14:17] one um literally and always use the
[14:21] requests Library you could make that a
[14:24] parameter whose default if it's not
[14:26] provided is to use Kenneth writs as
[14:29] requests library but which lets you
[14:32] substitute any other kind of modu
[14:35] looking object in instead if you want to
[14:38] skip the call out to duck ducko and here
[14:42] is how you might write a test against
[14:44] that function I just showed you you'd
[14:46] make a request a fake requests library
[14:48] with a get call inside of it just like
[14:51] the real requests library but when it
[14:53] was asked for its Json data it can just
[14:57] return a uh constant the the test
[14:59] therefore can just set up a fake answer
[15:02] we're not really doing any IO here we're
[15:04] just going to answer this fake Json data
[15:07] back when the uh definition is asked for
[15:11] and we can now call our code Find
[15:14] definition and avoid any IO by having it
[15:18] use our fake little requests Library
[15:20] instead of the real one so we get a
[15:22] self-contain test that doesn't actually
[15:25] spam do Dogo with lots of um requests
[15:29] and a couple us to duck ducko needing to
[15:31] be up and running and not having blocked
[15:33] our IP address yet um because we're
[15:36] running so many
[15:38] tests uh the problems with this are
[15:41] obvious first that fake requests Library
[15:44] we wrote well it's not the real request
[15:47] Library so who knows whether calling you
[15:50] know the fact that we called it and got
[15:51] data back doesn't tell us that calling
[15:53] real duck Dugo will give us data back it
[15:56] might look simple for one server
[15:59] an IO routine that just needs to make an
[16:02] HTTP request but a procedure that also
[16:04] needs let's say database and file system
[16:07] access is going to need lots of
[16:09] injection what you tend to get if you
[16:11] use dependency injection is highlevel
[16:14] functions that need everything in the
[16:16] kitchen sink because if way down beneath
[16:19] them anyone tries to talk to the
[16:21] database it's got to be dependency
[16:23] injected if another procedure needs the
[16:25] web it needs to be dependency injected
[16:29] and um this problem has actually spun up
[16:32] to the level of having huge dependency
[16:34] Frameworks uh dependency injection
[16:36] Frameworks they're called in the larger
[16:39] oo languages because of this problem of
[16:41] if the very bottom guy has got to talk
[16:43] to the web and you ever want to be able
[16:46] to test that code then the top level
[16:48] procedure has somehow got to get the
[16:51] information about what the web is right
[16:54] now is it a test mock or is it the real
[16:56] thing uh when it's called
[17:00] now a dynamic language like python
[17:02] fortunately has ways around dependency
[17:05] injection so we don't wind up with that
[17:06] problem I just described uh thanks to
[17:09] the mock Library uh incredible resource
[17:11] created by Michael Ford uh we have the
[17:14] ability to live patch our I IO libraries
[17:19] to briefly substitute fake versions of
[17:21] their cbles that will return the data we
[17:24] want and I believe the mock Library uh
[17:26] is now part of the most recent Python 3
[17:29] um it's so important it was added to the
[17:31] standard
[17:32] Library um in that case we can use the
[17:35] original listing one or the original
[17:38] listing two and we can just ask um our
[17:44] the the patch cable from the requests
[17:46] library to patch requests.get to be our
[17:51] fake version of it instead inside of the
[17:54] width statement inside of this context
[17:56] this block of code during which that
[17:59] patch is active uh our test gets run no
[18:03] real connection is made to the outside
[18:05] world and we find out if our uh function
[18:07] works against purported data from Duck
[18:11] ducko whether you do dependency
[18:13] injection or whether you call mack.
[18:16] Patch I find that the result is kind of
[18:19] awkward and kind of sad as I test I just
[18:23] feel like I'm fighting the structure of
[18:25] my application I feel like I'm trying to
[18:29] make it do something that it would
[18:31] really rather not
[18:32] do so how does testing improve when we
[18:36] factor out our logic as in listing three
[18:41] where we get the logic that simply deals
[18:44] with data structures and Rescue It by
[18:47] putting it beneath the io rather than
[18:51] Above It Well by definition pure
[18:54] functions can be tested using only data
[18:57] arguments go in the top a list or a
[19:00] string or some other data structure is
[19:02] going to come out the bottom so for
[19:05] example if I want to test the build URL
[19:08] I just call it I don't have to set up
[19:10] objects I don't have to build things I
[19:12] just call it with different arguments
[19:14] and instead of going and hunting for
[19:16] side effects I can just look at the
[19:18] return value and see whether it's what I
[19:22] expected no special setup is needed no
[19:25] special preparation I don't have to
[19:27] build a mock and and the test calls I'm
[19:30] making look exactly like the calls that
[19:32] are used in production so I know they
[19:35] have a high probability of telling me
[19:38] whether my code will work in
[19:41] production uh here I'm going to test the
[19:44] second half of the logic pluck
[19:46] definition which needs to pull out the
[19:48] value of the definition key or raise an
[19:51] exception two simple tests and I have
[19:53] 100% test coverage of it again making a
[19:57] pure call
[19:59] that is not in any way adulterated or
[20:02] changed or adjusted from the way this
[20:05] function will be experiencing reality
[20:07] when this code is in production it's
[20:09] seeing exactly the same kind of things
[20:11] come in and go out as it will when I use
[20:14] it for
[20:15] real uh being able by the way to write
[20:19] the tests like that taught me about a
[20:22] symptom of coupling I had never observed
[20:24] a symptom that tells me that I might
[20:26] have locked logic together that that
[20:28] could more cleanly split out you'll note
[20:30] that all I had to do there was write one
[20:33] set of tests for building the URL and a
[20:36] completely different set of tests for
[20:38] whether I could parse the data that came
[20:40] back and I noticed that in a lot of my
[20:43] older projects I had bigger more
[20:47] complicated uh routines where I had
[20:52] where doing the test for a good URL and
[20:54] good data was very easy to call but that
[20:58] I then had to essentially uh start doing
[21:02] different permutations of argument to
[21:05] get each part of my logic to fail
[21:08] separately because it wasn't out
[21:09] separate where I could call it and so
[21:11] having a big series of pieces of logic
[21:14] where I want to make each part fail
[21:16] individually I first have to make a
[21:18] bunch of calls with a bad URL but that
[21:21] don't pass in a second piece of data
[21:22] because I never reached that part of the
[21:24] code and then a series of tests that
[21:26] give a good URL so we revive the first
[21:29] half of the code but then bad data so
[21:32] that that part will fail and I now
[21:34] consider that uh pattern that I I see uh
[21:38] a symptom a a a a cry for help if you
[21:41] will from my application code telling me
[21:44] that I have coupled two pieces of logic
[21:47] together that are really separate they
[21:49] do different things they're going to
[21:51] fail in different circumstances and then
[21:53] instead of leaving them coupled and then
[21:55] having to Fiddle in turn
[21:58] variable while leaving the others
[22:00] constant I might be able to rescue these
[22:03] pieces of logic into separate functions
[22:06] this does become I do sometimes leave
[22:10] this pattern in my tests if there's just
[22:13] so much State shared between the first
[22:16] and the second piece of logic that it's
[22:19] just not reasonable to return all 20
[22:21] things so that they can then then be the
[22:23] uh arguments to the second piece of
[22:25] logic this comes up a lot in astronomy
[22:27] where an initial routine might set up a
[22:30] bunch of variables that the conclusion
[22:33] of a logic then needs to succeed or fail
[22:35] on before throwing them away and
[22:37] returning a simple value but if you look
[22:40] at the output to the first piece of
[22:42] logic and find it's rather modest
[22:45] rescuing the two pieces of logic out
[22:47] into separate routines can make um your
[22:50] tests less expensive simpler and easy to
[22:53] think about by the fact that you're not
[22:55] getting big tall sequences of logic and
[22:58] contorting yourself to try to get the
[23:00] third thing that happens to
[23:03] fail all all all of which is invalidated
[23:06] by the way if you then change the order
[23:08] of your operations because now you need
[23:10] something different to succeed in order
[23:12] to reach the second or third uh error or
[23:16] exception that could happen in your code
[23:19] so that is a really really simple
[23:22] example that we've just gone through um
[23:26] almost trivially simple I I I made only
[23:29] as complicated as I thought it would you
[23:30] needed to to get the point in real life
[23:33] the clean architecture often involves
[23:36] much much bigger pieces of code and the
[23:38] question of how they hook together not
[23:41] nine line functions and the fact that we
[23:43] can pull one or two pieces out what um
[23:46] Uncle Bob Martin does is he as he's
[23:49] designing his entire application he's
[23:53] thinking through what parts of my
[23:56] business logic can survive
[23:59] being split off where they take
[24:01] arguments take data structures and
[24:03] return data structures such that the top
[24:07] level glues all of these pieces together
[24:10] so that the io stays up at the top level
[24:15] and the bottom levels are simply objects
[24:17] or functions that don't need to know
[24:19] where the data is coming from where it's
[24:21] going how it's getting
[24:23] persisted uh but instead simply enforce
[24:27] your business rule
[24:29] do your computation and leave it up to
[24:31] the caller where to put the
[24:34] results he says in one of those blog
[24:37] posts in general the further in you go
[24:39] in his architecture the higher level the
[24:42] software becomes the outer circles are
[24:45] mechanisms the inner circles are
[24:48] policies the important thing is that
[24:52] isolated simple data structures are what
[24:55] is passed across the boundaries
[24:59] when any of the external parts of the
[25:01] system become obsolete like the database
[25:05] or the web framework you replace those
[25:08] obsolete elements with a minimum of
[25:11] fuss because all the innards don't know
[25:14] about the database the ards don't know
[25:16] that the web is there the um and so if
[25:20] you need to replace the way your data is
[25:23] stored the way data flows in or out you
[25:26] just make adjustments at the outside
[25:27] level and everything else should keep
[25:33] working back to our code to make this
[25:35] concrete we could change how we do the
[25:38] io we could change how we batch up these
[25:41] operations we could change what happens
[25:43] up at the top without having to change
[25:46] either of these functions down inside
[25:50] because they take simple data as input
[25:53] manipulate it and return new data as
[25:56] output
[25:59] all right you might
[26:01] say I would like to know whether my app
[26:03] works against duck
[26:05] Dugo I do want to test my IO code at
[26:09] least once even if this pattern does let
[26:12] me do most of my testing with peer data
[26:15] how do you test the top level procedural
[26:17] glue uh and here I'd refer you to Gary
[26:20] bernhard's talks at Pyon 2011 through
[26:24] 2013 where he uh from the Ruby world
[26:27] that's his Prim language uh explored a
[26:30] different form of this same kind of
[26:34] approach and talking about how to make
[26:36] the majority of your tests very fast and
[26:39] only investing in a few tests doing the
[26:43] endtoend io bound operations that there
[26:46] at the end tell you yes my app actually
[26:49] works and will actually fetch in real
[26:52] information from a database or whatever
[26:54] and work with it his terminology is a
[26:57] little different than Uncle Bob but
[26:58] works in much the same way an imperative
[27:02] Shell at the top level that does IO that
[27:06] wraps and uses your functional core
[27:10] functional core because it takes and
[27:12] returns data can have lots of fast unit
[27:15] tests exercising directly all the ways
[27:18] it could fail all the conditions it has
[27:20] to detect up at the top your imperative
[27:23] shell hopefully only needs a few
[27:26] integration tests in order to verify for
[27:29] you that it works because you're not
[27:30] having to hit the imperative shell with
[27:33] the 20 different ways that a uh a word
[27:36] definition you're looking up could be
[27:37] misformed you're doing that by testing
[27:40] the functional core you just test the
[27:42] imperative shell to make sure the pieces
[27:44] are then hooked together correctly
[27:46] here's our top level function from
[27:48] listing three I mean there's not even
[27:50] any if statements here it shouldn't
[27:52] require very many tests to confirm for
[27:54] you that this is doing the steps of your
[27:57] application in the right order this
[28:00] pattern by the way um is already
[28:03] familiar to a lot of people who do
[28:04] functional programming languages like
[28:06] list Pascal closure and fshp make it
[28:09] quite natural to write most of your code
[28:13] as pure functions um and then you get
[28:16] awkward and put the procedural stuff up
[28:18] at the top functional languages
[28:20] naturally lead you to process data
[28:22] structures while avoiding side effect IO
[28:26] you tend to call functions and fun
[28:28] functional languages for what they
[28:30] return not for the sequence of things
[28:33] they happen to do while that code is
[28:37] running this is an example of IO as a
[28:40] side effect I'm getting a data
[28:42] processing task iterating over a python
[28:45] iterator to get a series of words and
[28:48] upper casing them with an IO task when
[28:52] you call uppercase words in this example
[28:54] you're not expecting to see any
[28:56] uppercase words as it's return value
[28:59] you're expecting that when it returns
[29:01] nothing to you it will have had a side
[29:03] effect in the outside world of producing
[29:06] those as output if you want to test this
[29:08] you're going to have to use mack. patch
[29:11] or something else to intercept the
[29:13] standard
[29:15] output here is an example of the same
[29:18] code split into a purely logical piece
[29:23] where it
[29:24] consumes uh uh an iterator that gives it
[29:27] word words and produces as an iterator
[29:31] as a generator in this case a series of
[29:33] uppercase words separately from the
[29:36] question of any side effects and it can
[29:39] then be quite naturally plugged into a
[29:41] top level as Gary burnhard calls it
[29:43] procedural glue routine which then does
[29:47] the io on its
[29:50] behalf procedural code tends to be
[29:53] called not because it's going to return
[29:55] anything interesting but because of what
[29:57] it does because of what it tosses out or
[29:59] pulls from the world it tends to Output
[30:02] as it runs functional code on the other
[30:05] hand tends to be organized in discrete
[30:08] stages that each produced data that then
[30:12] finally gets output at the
[30:16] end in Gary bernhard's talks he talks a
[30:20] lot about the
[30:21] immutability uh a lot of these
[30:23] functional programming languages imagine
[30:26] python where you didn't have lists but
[30:28] only tupal that once a list was built
[30:30] you couldn't change it anymore imagine
[30:33] python with dictionaries that once you
[30:34] built them you could never change them
[30:36] where if you wanted to uh produce a new
[30:39] dictionary you had to ask for the a copy
[30:42] of the old dictionary with like one
[30:44] thing changed or something uh a lot of
[30:46] these functional languages have
[30:48] immutable data structures that never
[30:50] change and
[30:52] some programmers who who are fans of the
[30:55] functional programming Style say that
[30:58] that they're much much easier if they
[31:01] pass a data structure to a function
[31:03] knowing it can't be changed that they
[31:05] don't have to go search to see if it
[31:06] looks different that every data
[31:08] structure is immutable and they claim
[31:11] some of them like to claim that the
[31:13] whole point of this programming style is
[31:15] immutable data structures so that you
[31:18] would feel guilty about having objects
[31:20] with writable attributes or dictionaries
[31:23] that you might update and I'm going to
[31:25] make the argument that it is not
[31:27] immutability that makes the functional
[31:29] programming language so clean or it's
[31:32] not the only thing my guess is that the
[31:35] biggest advantage of data in a
[31:37] functional programming style isn't its
[31:40] immutability it is simply the fact that
[31:43] it's data and that data structures you
[31:45] can see them you can reason about them
[31:48] unlike a moving process that you're
[31:49] worried about whether it's spinning off
[31:51] consequences in the right order a data
[31:53] structure is just something you can look
[31:55] at and understand
[31:58] um two two examples from Computing
[32:00] history that I'll use to back me up on
[32:02] this the famous Fred Brooks book the
[32:04] mythical man month about uh successes
[32:07] and failures in managing uh projects uh
[32:10] written in 1975 it's very famous you
[32:13] probably heard of it before because of
[32:15] the quote um this is back when if a pro
[32:18] a project was going slowly they would
[32:19] just keep throwing more developers in
[32:22] the bearing of a child takes 9 months no
[32:27] matter how many women are
[32:30] assigned there are some processes that
[32:34] do not get faster because you flood the
[32:36] organization with young with with with
[32:38] untrained people who who who don't know
[32:40] what's going on and and he often found
[32:43] projects he's the famous uh aphorism
[32:45] that projects go slow more slowly the
[32:48] more people you add in many
[32:51] cases he said the following on the
[32:53] question of what's easier to understand
[32:56] data or code code at the time code was
[32:59] usually written out as flowcharts and
[33:02] data was usually organized in memory in
[33:04] what they call tables he said show me
[33:08] your flowchart and conceal your tables
[33:12] and I shall continue to be
[33:15] mystified show me your
[33:17] tables and I won't usually need your
[33:21] flowchart it'll be
[33:23] obvious very often if you just show
[33:26] someone the way that you laid out your
[33:28] dictionaries and
[33:30] lists and other data structures they can
[33:33] probably guess how you're going to run
[33:35] through those data structures and get
[33:36] your job done there's something that's
[33:39] much clearer about seeing the data that
[33:41] you wind up producing or the data in an
[33:44] intermediate step than to stare at the
[33:46] steps in your program and try starting
[33:50] there without any bigger picture of what
[33:52] they're creating trying to guess or
[33:55] understand what result is being built or
[33:59] generated uh so that's one example of of
[34:01] a uh a famous thinker in computer
[34:04] science who I think would back me up
[34:06] that the data is where it's at but I'll
[34:08] also cite the
[34:10] 1986 famous showdown between melroy and
[34:15] Donald kth um who who K who largely
[34:19] invented computer science as in the 70s
[34:21] as he wrote uh the art of computer
[34:24] programming um canth very very very
[34:28] famous programmer uh there he
[34:31] is was asked to write a
[34:34] routine uh he he he practiced something
[34:36] called literate programming where he had
[34:38] lots and lots of comments and and where
[34:40] where a computer program could actually
[34:41] be published as a book explaining itself
[34:44] and he was given by a uh programming
[34:47] magazine the task given a text file and
[34:51] an integer
[34:53] K can you tell that computer science was
[34:55] invented by mathematicians
[34:58] print the K most common words in the
[35:01] file and the number of their occurrences
[35:04] in decreasing
[35:06] frequency he produced 10 pages of Pascal
[35:10] code that did this um and
[35:13] mroy said I mean this is a for he
[35:16] admitted this is a formidable solution
[35:19] can's solution is to tally in an
[35:22] associative data data structure
[35:24] something like our python dictionary
[35:26] each word as it is from the file the
[35:29] data structure is a tree with 26 way
[35:32] well for technical reasons actually 27
[35:34] way fan out at each letter to avoid
[35:37] wasting space all of the sparse 26
[35:40] element arrays are cleverly interleaved
[35:43] in one common arena with hashing used to
[35:46] assign
[35:47] homes 10 pages of
[35:54] Pascal at the conclusion of his article
[35:57] after viewing can's code pointing out
[35:59] several bugs in it and uh edge cases
[36:02] that would make it
[36:04] crash in one of the most famous moments
[36:06] in computer science Mikel Roy replaced
[36:09] Donald kth with a six line shell
[36:16] script the first
[36:19] line finds every run of letters a
[36:23] through z or lowercase letter through Z
[36:25] in the file and puts them together gets
[36:29] everything that's not a letter and turns
[36:30] it into a new
[36:32] line Second command makes everything
[36:35] lower case so that we don't count Words
[36:37] twice if they're at the beginning and in
[36:38] the middle of a
[36:39] sentence sort is going to bring you know
[36:44] AR all of the word instances of the word
[36:46] arvar together in a row and then all of
[36:48] the instances of the word Brandon and
[36:50] Python and so forth unique is going to
[36:53] get those runs of identical words count
[36:56] them until you arvar five Brandon and so
[36:59] forth I suppose I rate my popularity
[37:02] slightly below that of the arvar
[37:05] then it is then going to sort that
[37:08] output on the numeric field sitting in
[37:11] front of each word so that six arvar
[37:14] goes first and um for python goes next
[37:18] and so forth and then finally we ask
[37:21] said for the first in lines and then
[37:23] quit
[37:27] mckelroy points out that every one of
[37:30] these tools back as Unix was being
[37:31] invented was written first for a
[37:33] particular need but then untangled from
[37:35] the specific application the person who
[37:38] first needed to put a sorter inside of
[37:40] their program got it written and then
[37:43] Ste back and said you know I'll bet
[37:45] someone else might need to sort
[37:47] something someday and went to the work
[37:50] which is difficult of pulling that out
[37:52] so it could work on any text input file
[37:54] with any format now the traditional
[37:57] lesson then the one that melroy Drew
[37:59] here was it's better to use simple small
[38:03] tools that can be easily linked together
[38:06] and I would say that if this is the only
[38:07] lesson we can draw from his uh Showdown
[38:10] with uh canth uh it's it's a very good
[38:12] one because python has lots of simp
[38:15] because of the iterator protocol
[38:17] especially it's really easy in Python to
[38:19] link together a series of generators to
[38:22] throw in sets and lists and dictionaries
[38:24] at just the right point to get a lot of
[38:27] really interesting data processing done
[38:30] but today I want to draw a different
[38:32] lesson that meloy did
[38:34] not to me the shell script is
[38:38] simpler not simply because the steps are
[38:41] easy but because I can picture the
[38:46] data it's because in between each of
[38:49] these pipes in between each of these
[38:51] commands that data is Flowing between I
[38:54] can close my eyes and know exactly what
[38:57] that data looks like at easy at each
[39:00] step the shell script is the simpler
[39:02] solution because it operates the
[39:05] stepwise transformation of data and
[39:07] what's key here is not simply that the
[39:09] steps are easy to describe I'll bet that
[39:11] many of you even who didn't know the TR
[39:13] command before could probably tomorrow
[39:15] explain this shell script to someone
[39:17] after a bit of heads scratching it's not
[39:20] just that the steps are easy though they
[39:22] are it's that I can just close my eyes
[39:25] and picture what the out put looks like
[39:28] at the conclusion of each command
[39:31] running and that's very powerful because
[39:33] it lets you visualize very accurately
[39:36] what this is
[39:37] doing in a way that is not going to
[39:40] happen with 10 pages of dense Pascal
[39:43] that are producing an inmemory hashtable
[39:46] 26 27 way fan out um binary
[39:52] tree this approach continually surfaces
[39:56] intermediate results that can be checked
[39:58] examined if you find this doesn't work
[40:01] you can just go back and find it which
[40:03] Step it failed in each case as simple
[40:05] plain
[40:07] text so if I'm right that one of the big
[40:10] wins of a functional programming style
[40:13] is simply that it deals in data which
[40:15] our minds can picture very
[40:18] easily what then is the value of
[40:20] immutability and I think Gary burnhard
[40:23] got this right when he said the fun of
[40:25] immutability I believe this was in the
[40:27] 12 talk is distributed computing um is
[40:32] that um this is my only slide about this
[40:34] is that um if all of your routines just
[40:38] take a data structure and return a data
[40:41] data structure it doesn't much matter
[40:43] what core they run on in a big cluster
[40:46] you can push data out to a bunch of
[40:48] servers run the data step separately and
[40:52] then collect the output back and a task
[40:56] that you broke broken down into steps
[40:59] that simply pull in data and return data
[41:01] can then be hooked up to a message cue
[41:03] and fanned out across a very wide uh
[41:07] data server so
[41:10] long as it's the return value that's
[41:12] important and not the side effect if I
[41:14] call a routine and its value is that my
[41:17] data structure will now look different
[41:19] it's got to live on the same machine so
[41:20] it's changing the copy of the data
[41:22] structure I've got in
[41:24] memory but if it's it's return Val value
[41:27] that's important and not the way it
[41:29] monkeys with the data I already have in
[41:31] memory it can run anywhere so long as
[41:33] the result is
[41:35] delivered data and transforms are easier
[41:39] to understand and I think they're easier
[41:41] to maintain than coupled
[41:45] procedures now if that's the case python
[41:48] has been evolving recently in exactly
[41:51] the right direction if you think about
[41:52] the kind of innovations that have marked
[41:54] the last decade especially of uh in a
[41:56] decade and a half half as python has
[41:58] grown from the language it was in the
[42:00] 1990s in October 2000 we got the list
[42:04] comprehension the list comprehension
[42:07] seems like a slight convenience but it
[42:10] really changes you as a
[42:14] programmer it takes someone whose job
[42:19] our job used to be modifying data
[42:22] structures make an empty list and then
[42:24] go through and start changing it
[42:27] and it turned us into people that build
[42:29] new data structures that we often never
[42:31] touch often my code today takes in a
[42:34] list and in a series of comprehensions
[42:37] just like that shell script generates a
[42:40] series of intermediate results and a
[42:42] final data structure that it returns
[42:44] without ever reaching back into one of
[42:47] the earlier results and feeling the need
[42:49] to change it list comprehensions make it
[42:51] really easy to write python code that's
[42:54] purely functional where you write your
[42:57] using right once throwaway data
[42:59] structures for your intermediate results
[43:02] rather than doing constant
[43:04] modification python
[43:06] 2.4 saw the introduction of the sorted
[43:10] built-in remember how we used to sort
[43:12] you used to have to build a list give it
[43:14] a name call its. sort method which
[43:16] returns none so you couldn't use it in
[43:18] the middle of an expression and then go
[43:20] back to the list which has now been
[43:22] modified which has now changed to see
[43:24] the result thanks to um um Raymond
[43:28] hettinger's addition of the sorted built
[43:30] in we now just ask for data to be sorted
[43:33] and returned to us in a single step
[43:36] instead of having to build and modify a
[43:37] data structure in
[43:39] several and um if you try applying these
[43:42] different patterns to your python code
[43:45] remember that python has several
[43:47] different ways of breaking out a pattern
[43:50] from uh your code we do have functions
[43:53] or methods but remember that if it's
[43:55] iteration that want to factor out you
[43:58] can build a generator if you have some
[44:02] set up and tear down that you want to
[44:04] pull out you can build a context manager
[44:07] there's actually whereas older
[44:09] programming languages uh let's say Java
[44:12] will have one way to break out a sub
[44:15] routine and then you have to come up
[44:17] with design patterns that fill in the
[44:19] lack of generators and the lack of
[44:22] context managers with something like the
[44:24] visitor pattern or something like that
[44:26] python just has all three of them built
[44:28] in you can pull out the middle of a loop
[44:31] as a function you can pull out the loop
[44:33] logic itself as a generator you can pull
[44:36] up set pull out setup and tear down as a
[44:39] context manager we have a lot of
[44:41] different ways of getting logic and
[44:45] doing that rescue operation where we
[44:47] decouple it from our IO so that it can
[44:50] live separately two real world examples
[44:53] just current projects of mine are uh
[44:56] skyfield and object-based API for
[44:58] astronomy backed by dozens of pure
[45:01] functions that were really easy to test
[45:04] functions implement the actual
[45:06] operations um and the miserable thing by
[45:09] the way about a method is that it
[45:11] implicitly depends on the state of the
[45:13] whole object it's often hard to test a
[45:15] method because it's not clear how much
[45:17] of the object in the test needs to be
[45:19] set up and initialized before the method
[45:21] can run um whereas the beautiful thing
[45:25] about a function is you just read the
[45:27] arguments and you know now what to
[45:28] provide to it there's a well-written
[45:31] function doesn't need extra globals or
[45:34] other persistent State set up and
[45:36] available for it to run which makes
[45:38] testing bug fixing a lot of easier
[45:43] remember in the Zen of python which I
[45:45] hope you look at every morning as you
[45:46] get ready to
[45:48] code second only in Python's uh um motto
[45:55] behind beautiful is better than ugly
[45:57] is explicit is better than implicit and
[46:00] a function if nothing else is very
[46:02] explicit about its needs right there in
[46:05] the argument list tells you what it
[46:06] needs to succeed so skyfield is one
[46:10] example I've done recently that has
[46:12] turned out really well when I didn't
[46:14] strand my important logic up coupled
[46:17] into my objects or IO but where I spun
[46:20] off everything I could in an easy to
[46:22] call function the other is something I
[46:24] use for filling in tax forms for myself
[46:27] called Lucha it's also on GitHub the
[46:29] Temptation there and actually the first
[46:31] version of it as it ran along Computing
[46:34] Fields would immediately then call the
[46:36] low-level PDF operations to you know
[46:39] write them onto the 1040 tax form or
[46:43] whatever I then was able to uh rescue
[46:47] that that deeply deeply um compromised
[46:50] and and very difficult to maintain code
[46:52] by breaking it into phases I first read
[46:55] in the entire input of the tax form
[46:58] while resisting the temptation to do
[46:59] anything with it I simply read it into a
[47:02] data structure and return that I then
[47:05] have a routine that takes the inputs to
[47:07] a tax form and it's very easy to write
[47:10] these little routines add up the numbers
[47:12] do the rounding multiply the percents
[47:14] and produce all of the output lines in
[47:17] the tax form and I resist the temptation
[47:20] to write that out to the PDF I hand that
[47:23] data data structure then to my PDF
[47:25] writer who's job is to fill text into
[47:28] Fields it it was much easier to write
[47:31] and maintain code that was split so that
[47:34] data structures pass between phases
[47:36] rather than making a slightly shorter
[47:38] program that immediately tried to go
[47:40] have side effects but thereby made
[47:42] itself very difficult to test so the
[47:44] pith of the idea here is that in the old
[47:47] days if we wanted to get rid of all of
[47:50] that
[47:51] pesio we would try to accomplish that by
[47:54] turning it into a sub routine
[47:57] the new idea I'm propounding is that if
[47:59] you really want to get rid of someone
[48:02] make them a
[48:04] manager put them in charge get all of
[48:07] that IO Laden code and make it feel
[48:10] important by putting it up at the
[48:13] top of your program make it the
[48:16] procedural glue leaving all of the
[48:19] little function subordinance free to do
[48:22] their jobs let's return to Wheeler I
[48:25] have one last quote in
[48:28] 1952 he gave us the sub
[48:31] routine and I think because of that
[48:35] initial phase of computing history in
[48:37] which we tried to use it wrongly we have
[48:39] yet to realize its full power and
[48:41] promise but I'd like to end with a quote
[48:44] in which he described what he thought
[48:45] would someday happen now that we have
[48:47] sub routines he said when a program has
[48:51] been made from a set of sub routines the
[48:54] breakdown of the code is more complete
[48:57] than it otherwise would be this allows
[49:00] the coder to concentrate on one section
[49:02] of the program at a time without the
[49:04] overall detailed program continually
[49:07] intruding thus the sub routines can be
[49:11] more easily coded and be tested in
[49:14] isolation from the rest of the program
[49:17] when the entire program has to be tested
[49:20] it is with the forn knowledge that the
[49:23] incidence of mistakes in the sub routine
[49:27] is
[49:28] zero or at least one order of
[49:33] magnitude below that of the untested
[49:36] portions of the program thank you very
[49:39] much for listening I'm Brandon rhods
[49:46] [Applause]
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.