---
title: 'The Clean Architecture in Python'
source: 'https://youtube.com/watch?v=DJtef410XaM'
video_id: 'DJtef410XaM'
date: 2026-06-15
duration_sec: 0
---

# The Clean Architecture in Python

> Source: [The Clean Architecture in Python](https://youtube.com/watch?v=DJtef410XaM)

## Summary

Brandon Rhodes discusses the Clean Architecture concept, originally popularized by Uncle Bob Martin, and how it applies to Python. He argues that programmers have been using subroutines backwards for 62 years by hiding I/O complexity instead of decoupling logic from I/O. The talk demonstrates how to restructure code to keep I/O at the top level and pure functions at the bottom, leading to more testable and maintainable software.

### Key Points

- **Introduction to Clean Architecture** [00:00] — Brandon Rhodes introduces the talk, noting that many software projects still fail, and Clean Architecture offers a way to organize code better.
- **Inspiration from Uncle Bob Martin** [01:01] — Uncle Bob Martin's Clean Architecture (2011-2012) organizes code with I/O at the top level and business logic at the center, similar to hexagonal architecture but more popular due to better diagrams.
- **Subroutines Used Backwards** [02:00] — Programmers have been using subroutines backwards for 62 years, hiding I/O complexity instead of decoupling it. This dates back to D.J. Wheeler's 1952 paper on subroutines.
- **Hiding vs. Decoupling** [05:10] — Wheeler advocated hiding complexity, but this led to the mistake of burying I/O rather than cleanly decoupling it. Hiding is not enough; true decoupling is needed.
- **Rescuing Logic Instead of Hiding I/O** [08:56] — Instead of hiding I/O, rescue the logic by pulling data operations into separate pure functions, leaving I/O at the top level. This is the essence of Clean Architecture.
- **Pure Functions vs. Procedures** [12:03] — Listing three shows pure functions that take data and return data, while top-level procedures handle I/O. This makes testing easier and code more maintainable.
- **Testing with Dependency Injection and Mocking** [13:08] — Dependency injection and mocking (e.g., mock.patch) allow testing without real I/O, but they can be awkward. Pure functions avoid these issues.
- **Testing Pure Functions** [18:36] — Pure functions are easy to test with simple data inputs and outputs, no special setup needed. This reveals coupling between logic pieces that should be separated.
- **Real-World Application of Clean Architecture** [23:26] — In larger applications, Clean Architecture means designing business logic to survive being split off, with I/O at the top level and inner layers enforcing business rules.
- **Testing the Top-Level Glue** [26:01] — The top-level procedural glue needs only a few integration tests, as most logic is tested via pure functions. Gary Bernhardt's 'functional core, imperative shell' pattern is referenced.
- **Functional Programming Influence** [28:00] — Functional languages naturally lead to pure functions and data transformation, making Clean Architecture easier. Python's list comprehensions, sorted(), generators, and context managers support this style.
- **Data Over Code** [31:58] — Fred Brooks and the McIlroy vs. Knuth showdown illustrate that understanding data structures is more important than understanding code. Data is easier to reason about.
- **McIlroy's Shell Script Lesson** [38:30] — McIlroy's six-line shell script for word frequency is simpler because you can picture the data at each step. This emphasizes stepwise data transformation.
- **Immutability for Distributed Computing** [40:10] — Immutability is valuable for distributed computing because pure functions can run on any core without side effects, enabling parallel processing.
- **Python's Evolution Supporting Clean Architecture** [41:45] — Python features like list comprehensions, sorted(), generators, and context managers make it easier to write functional, decoupled code.
- **Real-World Examples: Skyfield and Lucha** [44:53] — Skyfield (astronomy) and Lucha (tax forms) are projects where rescuing logic into pure functions improved testability and maintainability.
- **Conclusion: Make I/O a Manager** [47:44] — Instead of hiding I/O, put it at the top as procedural glue, making pure functions subordinate. This realizes the full power of subroutines as Wheeler envisioned.

### Conclusion

Clean Architecture in Python means putting I/O at the top level and extracting pure functions for business logic, leading to more testable and maintainable code. This approach, inspired by Uncle Bob Martin and supported by Python's features, helps avoid the common mistake of hiding I/O instead of decoupling it.

## Transcript

all right everybody Welcome to glad
you've hung in there to the second
Afternoon Of Pi Ohio I'm Brandon rhods
when I'm not uh writing python apis or
writing python
applications I'm wondering why the code
in my apis and applications is such a
mess the industry as a whole I'm told
that at the moment the numbers are hard
to find that more software projects even
to this day uh more projects fail than
succeed worldwide in businesses and
institutions and uh as an industry we're
still
explaining uh we're still trying to
learn uh why uh a piece of that puzzle I
think is the recent work that's been
done in propounding the clean
architecture and I'm going to give some
uh examples of how I believe that
applies to
python uh the inspiration for this talk
is uh uh uh someone called Uncle Bob
Martin who is uh he's really big in the
Java and the um strong object-oriented
uh statically type languages and he
recently in 2011 and 12 was thinking
about a new way of organizing his code
his um uh applications that he called
the clean architecture one of several
ideas that came out at about the around
the same time uh with about the same
goal but his became more popular because
I think he drew a better picture um
there was someone else that came out
with something uh like called the um
hexagonal architecture and and it just
wasn't as pretty it didn't use
colors um and so this is what people
often talk about if they're going to
refer to this idea we'll explore of
putting IO at the top level of your
program instead of at the bottom the
pith the center of the
idea uh this is this is not how he put
it this is my spin on it uh you're
familiar with the idea of a sub routine
where your code can be running along and
then make a call in Python the two forms
of of sub routine or the function and
the method where you can stop invoke
some other code and wait for it to come
back with an answer the pth of the idea
here is that we programmers have been
spontaneously using sub routines
backwards for how long have programmers
tended to use sub routines completely
backwards the wrong way by my count we
have been doing it for 62 years and my
proof is that I went back and I found
the
1952 ACM national meeting paper in
pinburgh Pennsylvania it was the second
meeting of the ACM but the first for
which proceedings papers were
published and I found the use of sub
routines in programs
by Mr DJ wheeler uh Dr DJ wheeler of
Cambridge and Illinois
universities um and you might wonder am
I really going to pull anything of
relevance out of this guy's paper
because it was a very different world a
typical computer at the time had about a
thousand words of ram could do about a
thousand operations per second and
required a dozen people to operate could
programming a computer with a thousand
words of RAM really be anything like
computer writing uh code in a modern
language today well here's just one
example of something that you'll find
familiar from this paper how complex
could programming even be with only 1K
of memory in the paper he says the
preparation of a library sub routine
requires a considerable amount of work
however even after it has been coded and
tested there Still Remains the
considerable task of writing
a
description so that people not
acquainted with the interior coding can
nevertheless use it
easily this last task may
[Music]
be the most difficult you had 1,000
bytes in which to write your code or I
should say 1,000 words in which to write
your code and they still didn't want to
document so I think the world he was
working in as I read this paper seemed
very familiar though in some ways very
strange he's advocating that instead of
just having one huge piece of code in
your 1,000 words of memory you split it
into routines that call one another like
instead of having a single uh function
in your python file having several what
does he advertise sub routines as being
good at why would you organize code this
way he says that you the primary reason
is to hide
complexity all complexities should if
possible be buried out of sight and this
you see is where everything went wrong
and he he doomed us for the next several
lifetimes of programming
because that then leads programmer
to a quite natural
mistake IO is always a mess trying to
talk to a database trying to parse Jason
uh uh trying to get things in and out of
a file it's a mess it's often very idiot
and CTIC code that doesn't have a lot to
do with the pure essence of what our
program is trying to
accomplish and the characteristic error
that we make is that we bury the io
rather than cleanly and completely
decoupling from it uh in the the time
allotted for this talk I'm only going to
attempt one code example so if if you'll
if we spend a second on this it will
will um get you set up for the the rest
of the listings and the talk this is a
simple function in Python that uses a
now deprecated API on Duck tuck duck
Dogo in order to look up the definition
of a word builds a URL in this case uses
the requests Library I was a good
citizen uh and then I marked those two
lines with that there's the io there's
that ugly complexity we'd like to make
disappear and then having gotten the
Json data back it can look and see if a
definition was in fact returned for the
word the natural thing that we tend to
do is say well IO is kind of messy who
knows tomorrow whether I might not be
using some other library in order to do
my uh HTTP who knows whether I might not
have a different way to ask for
definitions for instance if duck duck
dug go deprecates the
API um well I guess that would
invalidate all of it so I'll stay with
the example of what if the way that I do
the io what if the way that I make the
HTTP request
changes uh we want to get that
complexity and bury it and so we make
the fundamental mistake of the last 60
years we get the io pluck it out and
feel proud of ourselves from having done
exactly what uh Dr Wheeler said we've
hidden it in a sub
routine we have hidden the iio but have
we really decoupled it pocket wheeler I
assert that hiding is not enough if you
want to control the complexity of your
programs here's the listing again and I
will just ask this if you want to call
Find
definition so that it doesn't really do
any IO because you're testing it or
because you've cashed a result and just
want to hand it the cached result
instead of calling your your uh lower
level code how do you do
that how do I call Find definition
without it actually doing IO and at
least as the code is presented it here
it's not possible I have you see hidden
the API you don't see any API if you
read fine definition but I'm still
tightly coupled to it the io is an
inevitable consequence of calling fine
definition whether it's visible in its
code or not I have hidden but I've not
cleanly
decoupled what if we did
everything the other way around what if
when we saw a routine with IO in it
that's ugly and idiosyncratic and might
change
tomorrow What If We rescued the
logic instead of hiding the IO this is
exactly the same lines of code but in
this case I have pulled
out the data
operations and made them separate and
left the io stranded at the top level of
the program
program rather than leaving my logic
there and my claim is this that listing
three that we just looked at is an
architectural success while the others
were architectural failures listing
three shows in miniature What the clean
architecture does for entire
applications here's that top function
from listing three the coupling between
the logic and the io the thing in my
program that brings together logic and
IO in a way that that they both have to
be called at at once is now isolated to
a small procedure that mates my logic
and my external IO operations together
it's very readable because instead of
blocks of logic operations I now have
names for them build URL pluck
definition from this
data that document what each section of
code was doing the previous the first
listing had no documentation for what
those series of operations did this
should remind you of um a little bit of
the uh extreme programming movement from
the late '90s early 2000s where remember
they said that if you ever see a piece
of code with a comment at the top that's
a sign that you actually have what wants
to be a
function um and they would say you know
if you're writing high speed C code and
you want it to run fast Market is in
line static but make it so so it gets in
line at compile time but semantically
make it something separate XP people
actually believed it was a bug this is
why it was called Extreme programming
they actually believed every comment was
a bug because every comment was
knowledge that wasn't in your code and
if your code isn't explaining everything
about what it's doing to them it's bad
code so before you could commit in ex
extreme programming all the comments had
to disappear as in this case we
introduce a new name a new identifier
into the code build URL that wasn't
there before so that semantic
information about what are these three
lines do becomes a part of our program's
actual
semantics and and in this uh so so this
uh maneuver we see is we turn pure logic
into functions and thus have to give
them names
in the same way that XP did uh we find
that we're adding more semantic content
to our
code so our architecture in listing one
was simply a procedure a procedure
meaning something that has side effects
you call it and some IO has happened
when it's all done listing two the
natural way of using a sub routine since
the 1950s to hide complexity resulted in
hiding the IO but the top level code
there was was still a procedure all of
our logic was stranded in a routine that
did IO every time you called it listing
three by doing the opposite maneuver
left the I/O up in the procedure and
resulted in pure functions it resulted
in uh Downstream uh uh python functions
that don't do IO that don't have side
effects they simply take some arguments
that are data and return some results
that are just data this has incredible
ramifications among other things uh for
testing how would we have tested listing
one or two where the goal is to not have
your tests need the network and to talk
to duck ducko imagine that you want your
tests to run on an airplane or at the
airport or you don't have Wi-Fi or
something two techniques have been
developed over the 2000s in um uh being
Pion I believe in Java and the other big
oo statically typed languages uh they
are dependency injection and the idea of
mocking which in Python we can do
through monkey patching without even
modifying our code through something
I'll show in a moment called mock patch
dependency injection was pioneered in
2004 by another of the big oo thinkers
named Martin Fowler his idea was to make
the io library or function that the
routine needs to call
itself a parameter and this is really
easy in Python functions in Python are
first class objects you can pass them um
modules are first class objects and can
be an argument to a function so instead
of having Find definition from listing
one um literally and always use the
requests Library you could make that a
parameter whose default if it's not
provided is to use Kenneth writs as
requests library but which lets you
substitute any other kind of modu
looking object in instead if you want to
skip the call out to duck ducko and here
is how you might write a test against
that function I just showed you you'd
make a request a fake requests library
with a get call inside of it just like
the real requests library but when it
was asked for its Json data it can just
return a uh constant the the test
therefore can just set up a fake answer
we're not really doing any IO here we're
just going to answer this fake Json data
back when the uh definition is asked for
and we can now call our code Find
definition and avoid any IO by having it
use our fake little requests Library
instead of the real one so we get a
self-contain test that doesn't actually
spam do Dogo with lots of um requests
and a couple us to duck ducko needing to
be up and running and not having blocked
our IP address yet um because we're
running so many
tests uh the problems with this are
obvious first that fake requests Library
we wrote well it's not the real request
Library so who knows whether calling you
know the fact that we called it and got
data back doesn't tell us that calling
real duck Dugo will give us data back it
might look simple for one server
an IO routine that just needs to make an
HTTP request but a procedure that also
needs let's say database and file system
access is going to need lots of
injection what you tend to get if you
use dependency injection is highlevel
functions that need everything in the
kitchen sink because if way down beneath
them anyone tries to talk to the
database it's got to be dependency
injected if another procedure needs the
web it needs to be dependency injected
and um this problem has actually spun up
to the level of having huge dependency
Frameworks uh dependency injection
Frameworks they're called in the larger
oo languages because of this problem of
if the very bottom guy has got to talk
to the web and you ever want to be able
to test that code then the top level
procedure has somehow got to get the
information about what the web is right
now is it a test mock or is it the real
thing uh when it's called
now a dynamic language like python
fortunately has ways around dependency
injection so we don't wind up with that
problem I just described uh thanks to
the mock Library uh incredible resource
created by Michael Ford uh we have the
ability to live patch our I IO libraries
to briefly substitute fake versions of
their cbles that will return the data we
want and I believe the mock Library uh
is now part of the most recent Python 3
um it's so important it was added to the
standard
Library um in that case we can use the
original listing one or the original
listing two and we can just ask um our
the the patch cable from the requests
library to patch requests.get to be our
fake version of it instead inside of the
width statement inside of this context
this block of code during which that
patch is active uh our test gets run no
real connection is made to the outside
world and we find out if our uh function
works against purported data from Duck
ducko whether you do dependency
injection or whether you call mack.
Patch I find that the result is kind of
awkward and kind of sad as I test I just
feel like I'm fighting the structure of
my application I feel like I'm trying to
make it do something that it would
really rather not
do so how does testing improve when we
factor out our logic as in listing three
where we get the logic that simply deals
with data structures and Rescue It by
putting it beneath the io rather than
Above It Well by definition pure
functions can be tested using only data
arguments go in the top a list or a
string or some other data structure is
going to come out the bottom so for
example if I want to test the build URL
I just call it I don't have to set up
objects I don't have to build things I
just call it with different arguments
and instead of going and hunting for
side effects I can just look at the
return value and see whether it's what I
expected no special setup is needed no
special preparation I don't have to
build a mock and and the test calls I'm
making look exactly like the calls that
are used in production so I know they
have a high probability of telling me
whether my code will work in
production uh here I'm going to test the
second half of the logic pluck
definition which needs to pull out the
value of the definition key or raise an
exception two simple tests and I have
100% test coverage of it again making a
pure call
that is not in any way adulterated or
changed or adjusted from the way this
function will be experiencing reality
when this code is in production it's
seeing exactly the same kind of things
come in and go out as it will when I use
it for
real uh being able by the way to write
the tests like that taught me about a
symptom of coupling I had never observed
a symptom that tells me that I might
have locked logic together that that
could more cleanly split out you'll note
that all I had to do there was write one
set of tests for building the URL and a
completely different set of tests for
whether I could parse the data that came
back and I noticed that in a lot of my
older projects I had bigger more
complicated uh routines where I had
where doing the test for a good URL and
good data was very easy to call but that
I then had to essentially uh start doing
different permutations of argument to
get each part of my logic to fail
separately because it wasn't out
separate where I could call it and so
having a big series of pieces of logic
where I want to make each part fail
individually I first have to make a
bunch of calls with a bad URL but that
don't pass in a second piece of data
because I never reached that part of the
code and then a series of tests that
give a good URL so we revive the first
half of the code but then bad data so
that that part will fail and I now
consider that uh pattern that I I see uh
a symptom a a a a cry for help if you
will from my application code telling me
that I have coupled two pieces of logic
together that are really separate they
do different things they're going to
fail in different circumstances and then
instead of leaving them coupled and then
having to Fiddle in turn
variable while leaving the others
constant I might be able to rescue these
pieces of logic into separate functions
this does become I do sometimes leave
this pattern in my tests if there's just
so much State shared between the first
and the second piece of logic that it's
just not reasonable to return all 20
things so that they can then then be the
uh arguments to the second piece of
logic this comes up a lot in astronomy
where an initial routine might set up a
bunch of variables that the conclusion
of a logic then needs to succeed or fail
on before throwing them away and
returning a simple value but if you look
at the output to the first piece of
logic and find it's rather modest
rescuing the two pieces of logic out
into separate routines can make um your
tests less expensive simpler and easy to
think about by the fact that you're not
getting big tall sequences of logic and
contorting yourself to try to get the
third thing that happens to
fail all all all of which is invalidated
by the way if you then change the order
of your operations because now you need
something different to succeed in order
to reach the second or third uh error or
exception that could happen in your code
so that is a really really simple
example that we've just gone through um
almost trivially simple I I I made only
as complicated as I thought it would you
needed to to get the point in real life
the clean architecture often involves
much much bigger pieces of code and the
question of how they hook together not
nine line functions and the fact that we
can pull one or two pieces out what um
Uncle Bob Martin does is he as he's
designing his entire application he's
thinking through what parts of my
business logic can survive
being split off where they take
arguments take data structures and
return data structures such that the top
level glues all of these pieces together
so that the io stays up at the top level
and the bottom levels are simply objects
or functions that don't need to know
where the data is coming from where it's
going how it's getting
persisted uh but instead simply enforce
your business rule
do your computation and leave it up to
the caller where to put the
results he says in one of those blog
posts in general the further in you go
in his architecture the higher level the
software becomes the outer circles are
mechanisms the inner circles are
policies the important thing is that
isolated simple data structures are what
is passed across the boundaries
when any of the external parts of the
system become obsolete like the database
or the web framework you replace those
obsolete elements with a minimum of
fuss because all the innards don't know
about the database the ards don't know
that the web is there the um and so if
you need to replace the way your data is
stored the way data flows in or out you
just make adjustments at the outside
level and everything else should keep
working back to our code to make this
concrete we could change how we do the
io we could change how we batch up these
operations we could change what happens
up at the top without having to change
either of these functions down inside
because they take simple data as input
manipulate it and return new data as
output
all right you might
say I would like to know whether my app
works against duck
Dugo I do want to test my IO code at
least once even if this pattern does let
me do most of my testing with peer data
how do you test the top level procedural
glue uh and here I'd refer you to Gary
bernhard's talks at Pyon 2011 through
2013 where he uh from the Ruby world
that's his Prim language uh explored a
different form of this same kind of
approach and talking about how to make
the majority of your tests very fast and
only investing in a few tests doing the
endtoend io bound operations that there
at the end tell you yes my app actually
works and will actually fetch in real
information from a database or whatever
and work with it his terminology is a
little different than Uncle Bob but
works in much the same way an imperative
Shell at the top level that does IO that
wraps and uses your functional core
functional core because it takes and
returns data can have lots of fast unit
tests exercising directly all the ways
it could fail all the conditions it has
to detect up at the top your imperative
shell hopefully only needs a few
integration tests in order to verify for
you that it works because you're not
having to hit the imperative shell with
the 20 different ways that a uh a word
definition you're looking up could be
misformed you're doing that by testing
the functional core you just test the
imperative shell to make sure the pieces
are then hooked together correctly
here's our top level function from
listing three I mean there's not even
any if statements here it shouldn't
require very many tests to confirm for
you that this is doing the steps of your
application in the right order this
pattern by the way um is already
familiar to a lot of people who do
functional programming languages like
list Pascal closure and fshp make it
quite natural to write most of your code
as pure functions um and then you get
awkward and put the procedural stuff up
at the top functional languages
naturally lead you to process data
structures while avoiding side effect IO
you tend to call functions and fun
functional languages for what they
return not for the sequence of things
they happen to do while that code is
running this is an example of IO as a
side effect I'm getting a data
processing task iterating over a python
iterator to get a series of words and
upper casing them with an IO task when
you call uppercase words in this example
you're not expecting to see any
uppercase words as it's return value
you're expecting that when it returns
nothing to you it will have had a side
effect in the outside world of producing
those as output if you want to test this
you're going to have to use mack. patch
or something else to intercept the
standard
output here is an example of the same
code split into a purely logical piece
where it
consumes uh uh an iterator that gives it
word words and produces as an iterator
as a generator in this case a series of
uppercase words separately from the
question of any side effects and it can
then be quite naturally plugged into a
top level as Gary burnhard calls it
procedural glue routine which then does
the io on its
behalf procedural code tends to be
called not because it's going to return
anything interesting but because of what
it does because of what it tosses out or
pulls from the world it tends to Output
as it runs functional code on the other
hand tends to be organized in discrete
stages that each produced data that then
finally gets output at the
end in Gary bernhard's talks he talks a
lot about the
immutability uh a lot of these
functional programming languages imagine
python where you didn't have lists but
only tupal that once a list was built
you couldn't change it anymore imagine
python with dictionaries that once you
built them you could never change them
where if you wanted to uh produce a new
dictionary you had to ask for the a copy
of the old dictionary with like one
thing changed or something uh a lot of
these functional languages have
immutable data structures that never
change and
some programmers who who are fans of the
functional programming Style say that
that they're much much easier if they
pass a data structure to a function
knowing it can't be changed that they
don't have to go search to see if it
looks different that every data
structure is immutable and they claim
some of them like to claim that the
whole point of this programming style is
immutable data structures so that you
would feel guilty about having objects
with writable attributes or dictionaries
that you might update and I'm going to
make the argument that it is not
immutability that makes the functional
programming language so clean or it's
not the only thing my guess is that the
biggest advantage of data in a
functional programming style isn't its
immutability it is simply the fact that
it's data and that data structures you
can see them you can reason about them
unlike a moving process that you're
worried about whether it's spinning off
consequences in the right order a data
structure is just something you can look
at and understand
um two two examples from Computing
history that I'll use to back me up on
this the famous Fred Brooks book the
mythical man month about uh successes
and failures in managing uh projects uh
written in 1975 it's very famous you
probably heard of it before because of
the quote um this is back when if a pro
a project was going slowly they would
just keep throwing more developers in
the bearing of a child takes 9 months no
matter how many women are
assigned there are some processes that
do not get faster because you flood the
organization with young with with with
untrained people who who who don't know
what's going on and and he often found
projects he's the famous uh aphorism
that projects go slow more slowly the
more people you add in many
cases he said the following on the
question of what's easier to understand
data or code code at the time code was
usually written out as flowcharts and
data was usually organized in memory in
what they call tables he said show me
your flowchart and conceal your tables
and I shall continue to be
mystified show me your
tables and I won't usually need your
flowchart it'll be
obvious very often if you just show
someone the way that you laid out your
dictionaries and
lists and other data structures they can
probably guess how you're going to run
through those data structures and get
your job done there's something that's
much clearer about seeing the data that
you wind up producing or the data in an
intermediate step than to stare at the
steps in your program and try starting
there without any bigger picture of what
they're creating trying to guess or
understand what result is being built or
generated uh so that's one example of of
a uh a famous thinker in computer
science who I think would back me up
that the data is where it's at but I'll
also cite the
1986 famous showdown between melroy and
Donald kth um who who K who largely
invented computer science as in the 70s
as he wrote uh the art of computer
programming um canth very very very
famous programmer uh there he
is was asked to write a
routine uh he he he practiced something
called literate programming where he had
lots and lots of comments and and where
where a computer program could actually
be published as a book explaining itself
and he was given by a uh programming
magazine the task given a text file and
an integer
K can you tell that computer science was
invented by mathematicians
print the K most common words in the
file and the number of their occurrences
in decreasing
frequency he produced 10 pages of Pascal
code that did this um and
mroy said I mean this is a for he
admitted this is a formidable solution
can's solution is to tally in an
associative data data structure
something like our python dictionary
each word as it is from the file the
data structure is a tree with 26 way
well for technical reasons actually 27
way fan out at each letter to avoid
wasting space all of the sparse 26
element arrays are cleverly interleaved
in one common arena with hashing used to
assign
homes 10 pages of
Pascal at the conclusion of his article
after viewing can's code pointing out
several bugs in it and uh edge cases
that would make it
crash in one of the most famous moments
in computer science Mikel Roy replaced
Donald kth with a six line shell
script the first
line finds every run of letters a
through z or lowercase letter through Z
in the file and puts them together gets
everything that's not a letter and turns
it into a new
line Second command makes everything
lower case so that we don't count Words
twice if they're at the beginning and in
the middle of a
sentence sort is going to bring you know
AR all of the word instances of the word
arvar together in a row and then all of
the instances of the word Brandon and
Python and so forth unique is going to
get those runs of identical words count
them until you arvar five Brandon and so
forth I suppose I rate my popularity
slightly below that of the arvar
then it is then going to sort that
output on the numeric field sitting in
front of each word so that six arvar
goes first and um for python goes next
and so forth and then finally we ask
said for the first in lines and then
quit
mckelroy points out that every one of
these tools back as Unix was being
invented was written first for a
particular need but then untangled from
the specific application the person who
first needed to put a sorter inside of
their program got it written and then
Ste back and said you know I'll bet
someone else might need to sort
something someday and went to the work
which is difficult of pulling that out
so it could work on any text input file
with any format now the traditional
lesson then the one that melroy Drew
here was it's better to use simple small
tools that can be easily linked together
and I would say that if this is the only
lesson we can draw from his uh Showdown
with uh canth uh it's it's a very good
one because python has lots of simp
because of the iterator protocol
especially it's really easy in Python to
link together a series of generators to
throw in sets and lists and dictionaries
at just the right point to get a lot of
really interesting data processing done
but today I want to draw a different
lesson that meloy did
not to me the shell script is
simpler not simply because the steps are
easy but because I can picture the
data it's because in between each of
these pipes in between each of these
commands that data is Flowing between I
can close my eyes and know exactly what
that data looks like at easy at each
step the shell script is the simpler
solution because it operates the
stepwise transformation of data and
what's key here is not simply that the
steps are easy to describe I'll bet that
many of you even who didn't know the TR
command before could probably tomorrow
explain this shell script to someone
after a bit of heads scratching it's not
just that the steps are easy though they
are it's that I can just close my eyes
and picture what the out put looks like
at the conclusion of each command
running and that's very powerful because
it lets you visualize very accurately
what this is
doing in a way that is not going to
happen with 10 pages of dense Pascal
that are producing an inmemory hashtable
26 27 way fan out um binary
tree this approach continually surfaces
intermediate results that can be checked
examined if you find this doesn't work
you can just go back and find it which
Step it failed in each case as simple
plain
text so if I'm right that one of the big
wins of a functional programming style
is simply that it deals in data which
our minds can picture very
easily what then is the value of
immutability and I think Gary burnhard
got this right when he said the fun of
immutability I believe this was in the
12 talk is distributed computing um is
that um this is my only slide about this
is that um if all of your routines just
take a data structure and return a data
data structure it doesn't much matter
what core they run on in a big cluster
you can push data out to a bunch of
servers run the data step separately and
then collect the output back and a task
that you broke broken down into steps
that simply pull in data and return data
can then be hooked up to a message cue
and fanned out across a very wide uh
data server so
long as it's the return value that's
important and not the side effect if I
call a routine and its value is that my
data structure will now look different
it's got to live on the same machine so
it's changing the copy of the data
structure I've got in
memory but if it's it's return Val value
that's important and not the way it
monkeys with the data I already have in
memory it can run anywhere so long as
the result is
delivered data and transforms are easier
to understand and I think they're easier
to maintain than coupled
procedures now if that's the case python
has been evolving recently in exactly
the right direction if you think about
the kind of innovations that have marked
the last decade especially of uh in a
decade and a half half as python has
grown from the language it was in the
1990s in October 2000 we got the list
comprehension the list comprehension
seems like a slight convenience but it
really changes you as a
programmer it takes someone whose job
our job used to be modifying data
structures make an empty list and then
go through and start changing it
and it turned us into people that build
new data structures that we often never
touch often my code today takes in a
list and in a series of comprehensions
just like that shell script generates a
series of intermediate results and a
final data structure that it returns
without ever reaching back into one of
the earlier results and feeling the need
to change it list comprehensions make it
really easy to write python code that's
purely functional where you write your
using right once throwaway data
structures for your intermediate results
rather than doing constant
modification python
2.4 saw the introduction of the sorted
built-in remember how we used to sort
you used to have to build a list give it
a name call its. sort method which
returns none so you couldn't use it in
the middle of an expression and then go
back to the list which has now been
modified which has now changed to see
the result thanks to um um Raymond
hettinger's addition of the sorted built
in we now just ask for data to be sorted
and returned to us in a single step
instead of having to build and modify a
data structure in
several and um if you try applying these
different patterns to your python code
remember that python has several
different ways of breaking out a pattern
from uh your code we do have functions
or methods but remember that if it's
iteration that want to factor out you
can build a generator if you have some
set up and tear down that you want to
pull out you can build a context manager
there's actually whereas older
programming languages uh let's say Java
will have one way to break out a sub
routine and then you have to come up
with design patterns that fill in the
lack of generators and the lack of
context managers with something like the
visitor pattern or something like that
python just has all three of them built
in you can pull out the middle of a loop
as a function you can pull out the loop
logic itself as a generator you can pull
up set pull out setup and tear down as a
context manager we have a lot of
different ways of getting logic and
doing that rescue operation where we
decouple it from our IO so that it can
live separately two real world examples
just current projects of mine are uh
skyfield and object-based API for
astronomy backed by dozens of pure
functions that were really easy to test
functions implement the actual
operations um and the miserable thing by
the way about a method is that it
implicitly depends on the state of the
whole object it's often hard to test a
method because it's not clear how much
of the object in the test needs to be
set up and initialized before the method
can run um whereas the beautiful thing
about a function is you just read the
arguments and you know now what to
provide to it there's a well-written
function doesn't need extra globals or
other persistent State set up and
available for it to run which makes
testing bug fixing a lot of easier
remember in the Zen of python which I
hope you look at every morning as you
get ready to
code second only in Python's uh um motto
behind beautiful is better than ugly
is explicit is better than implicit and
a function if nothing else is very
explicit about its needs right there in
the argument list tells you what it
needs to succeed so skyfield is one
example I've done recently that has
turned out really well when I didn't
strand my important logic up coupled
into my objects or IO but where I spun
off everything I could in an easy to
call function the other is something I
use for filling in tax forms for myself
called Lucha it's also on GitHub the
Temptation there and actually the first
version of it as it ran along Computing
Fields would immediately then call the
low-level PDF operations to you know
write them onto the 1040 tax form or
whatever I then was able to uh rescue
that that deeply deeply um compromised
and and very difficult to maintain code
by breaking it into phases I first read
in the entire input of the tax form
while resisting the temptation to do
anything with it I simply read it into a
data structure and return that I then
have a routine that takes the inputs to
a tax form and it's very easy to write
these little routines add up the numbers
do the rounding multiply the percents
and produce all of the output lines in
the tax form and I resist the temptation
to write that out to the PDF I hand that
data data structure then to my PDF
writer who's job is to fill text into
Fields it it was much easier to write
and maintain code that was split so that
data structures pass between phases
rather than making a slightly shorter
program that immediately tried to go
have side effects but thereby made
itself very difficult to test so the
pith of the idea here is that in the old
days if we wanted to get rid of all of
that
pesio we would try to accomplish that by
turning it into a sub routine
the new idea I'm propounding is that if
you really want to get rid of someone
make them a
manager put them in charge get all of
that IO Laden code and make it feel
important by putting it up at the
top of your program make it the
procedural glue leaving all of the
little function subordinance free to do
their jobs let's return to Wheeler I
have one last quote in
1952 he gave us the sub
routine and I think because of that
initial phase of computing history in
which we tried to use it wrongly we have
yet to realize its full power and
promise but I'd like to end with a quote
in which he described what he thought
would someday happen now that we have
sub routines he said when a program has
been made from a set of sub routines the
breakdown of the code is more complete
than it otherwise would be this allows
the coder to concentrate on one section
of the program at a time without the
overall detailed program continually
intruding thus the sub routines can be
more easily coded and be tested in
isolation from the rest of the program
when the entire program has to be tested
it is with the forn knowledge that the
incidence of mistakes in the sub routine
is
zero or at least one order of
magnitude below that of the untested
portions of the program thank you very
much for listening I'm Brandon rhods
[Applause]
