[0:00] all right everybody Welcome to glad
[0:03] you've hung in there to the second
[0:04] Afternoon Of Pi Ohio I'm Brandon rhods
[0:07] when I'm not uh writing python apis or
[0:11] writing python
[0:13] applications I'm wondering why the code
[0:15] in my apis and applications is such a
[0:20] mess the industry as a whole I'm told
[0:22] that at the moment the numbers are hard
[0:24] to find that more software projects even
[0:26] to this day uh more projects fail than
[0:29] succeed worldwide in businesses and
[0:32] institutions and uh as an industry we're
[0:34] still
[0:38] explaining uh we're still trying to
[0:40] learn uh why uh a piece of that puzzle I
[0:44] think is the recent work that's been
[0:46] done in propounding the clean
[0:47] architecture and I'm going to give some
[0:50] uh examples of how I believe that
[0:52] applies to
[0:54] python uh the inspiration for this talk
[0:57] is uh uh uh someone called Uncle Bob
[1:01] Martin who is uh he's really big in the
[1:03] Java and the um strong object-oriented
[1:07] uh statically type languages and he
[1:09] recently in 2011 and 12 was thinking
[1:12] about a new way of organizing his code
[1:15] his um uh applications that he called
[1:18] the clean architecture one of several
[1:20] ideas that came out at about the around
[1:23] the same time uh with about the same
[1:26] goal but his became more popular because
[1:29] I think he drew a better picture um
[1:32] there was someone else that came out
[1:34] with something uh like called the um
[1:37] hexagonal architecture and and it just
[1:39] wasn't as pretty it didn't use
[1:41] colors um and so this is what people
[1:45] often talk about if they're going to
[1:47] refer to this idea we'll explore of
[1:49] putting IO at the top level of your
[1:52] program instead of at the bottom the
[1:55] pith the center of the
[1:57] idea uh this is this is not how he put
[2:00] it this is my spin on it uh you're
[2:02] familiar with the idea of a sub routine
[2:04] where your code can be running along and
[2:06] then make a call in Python the two forms
[2:08] of of sub routine or the function and
[2:10] the method where you can stop invoke
[2:13] some other code and wait for it to come
[2:15] back with an answer the pth of the idea
[2:18] here is that we programmers have been
[2:20] spontaneously using sub routines
[2:24] backwards for how long have programmers
[2:27] tended to use sub routines completely
[2:30] backwards the wrong way by my count we
[2:32] have been doing it for 62 years and my
[2:36] proof is that I went back and I found
[2:40] the
[2:41] 1952 ACM national meeting paper in
[2:45] pinburgh Pennsylvania it was the second
[2:47] meeting of the ACM but the first for
[2:49] which proceedings papers were
[2:53] published and I found the use of sub
[2:57] routines in programs
[3:00] by Mr DJ wheeler uh Dr DJ wheeler of
[3:04] Cambridge and Illinois
[3:06] universities um and you might wonder am
[3:10] I really going to pull anything of
[3:11] relevance out of this guy's paper
[3:13] because it was a very different world a
[3:14] typical computer at the time had about a
[3:17] thousand words of ram could do about a
[3:19] thousand operations per second and
[3:22] required a dozen people to operate could
[3:25] programming a computer with a thousand
[3:28] words of RAM really be anything like
[3:31] computer writing uh code in a modern
[3:33] language today well here's just one
[3:36] example of something that you'll find
[3:37] familiar from this paper how complex
[3:40] could programming even be with only 1K
[3:42] of memory in the paper he says the
[3:46] preparation of a library sub routine
[3:49] requires a considerable amount of work
[3:52] however even after it has been coded and
[3:54] tested there Still Remains the
[3:57] considerable task of writing
[4:00] a
[4:02] description so that people not
[4:05] acquainted with the interior coding can
[4:08] nevertheless use it
[4:10] easily this last task may
[4:16] [Music]
[4:19] be the most difficult you had 1,000
[4:25] bytes in which to write your code or I
[4:28] should say 1,000 words in which to write
[4:30] your code and they still didn't want to
[4:34] document so I think the world he was
[4:36] working in as I read this paper seemed
[4:39] very familiar though in some ways very
[4:43] strange he's advocating that instead of
[4:45] just having one huge piece of code in
[4:47] your 1,000 words of memory you split it
[4:50] into routines that call one another like
[4:53] instead of having a single uh function
[4:55] in your python file having several what
[4:57] does he advertise sub routines as being
[5:00] good at why would you organize code this
[5:03] way he says that you the primary reason
[5:07] is to hide
[5:10] complexity all complexities should if
[5:15] possible be buried out of sight and this
[5:19] you see is where everything went wrong
[5:21] and he he doomed us for the next several
[5:24] lifetimes of programming
[5:27] because that then leads programmer
[5:30] to a quite natural
[5:32] mistake IO is always a mess trying to
[5:36] talk to a database trying to parse Jason
[5:39] uh uh trying to get things in and out of
[5:41] a file it's a mess it's often very idiot
[5:45] and CTIC code that doesn't have a lot to
[5:47] do with the pure essence of what our
[5:50] program is trying to
[5:51] accomplish and the characteristic error
[5:54] that we make is that we bury the io
[6:00] rather than cleanly and completely
[6:02] decoupling from it uh in the the time
[6:05] allotted for this talk I'm only going to
[6:08] attempt one code example so if if you'll
[6:10] if we spend a second on this it will
[6:12] will um get you set up for the the rest
[6:15] of the listings and the talk this is a
[6:18] simple function in Python that uses a
[6:22] now deprecated API on Duck tuck duck
[6:24] Dogo in order to look up the definition
[6:27] of a word builds a URL in this case uses
[6:30] the requests Library I was a good
[6:33] citizen uh and then I marked those two
[6:35] lines with that there's the io there's
[6:38] that ugly complexity we'd like to make
[6:40] disappear and then having gotten the
[6:42] Json data back it can look and see if a
[6:45] definition was in fact returned for the
[6:50] word the natural thing that we tend to
[6:53] do is say well IO is kind of messy who
[6:57] knows tomorrow whether I might not be
[6:58] using some other library in order to do
[7:02] my uh HTTP who knows whether I might not
[7:05] have a different way to ask for
[7:07] definitions for instance if duck duck
[7:10] dug go deprecates the
[7:12] API um well I guess that would
[7:14] invalidate all of it so I'll stay with
[7:16] the example of what if the way that I do
[7:18] the io what if the way that I make the
[7:20] HTTP request
[7:21] changes uh we want to get that
[7:24] complexity and bury it and so we make
[7:28] the fundamental mistake of the last 60
[7:30] years we get the io pluck it out and
[7:35] feel proud of ourselves from having done
[7:38] exactly what uh Dr Wheeler said we've
[7:40] hidden it in a sub
[7:43] routine we have hidden the iio but have
[7:46] we really decoupled it pocket wheeler I
[7:51] assert that hiding is not enough if you
[7:54] want to control the complexity of your
[7:56] programs here's the listing again and I
[8:00] will just ask this if you want to call
[8:03] Find
[8:05] definition so that it doesn't really do
[8:08] any IO because you're testing it or
[8:11] because you've cashed a result and just
[8:13] want to hand it the cached result
[8:15] instead of calling your your uh lower
[8:17] level code how do you do
[8:21] that how do I call Find definition
[8:25] without it actually doing IO and at
[8:28] least as the code is presented it here
[8:31] it's not possible I have you see hidden
[8:34] the API you don't see any API if you
[8:37] read fine definition but I'm still
[8:40] tightly coupled to it the io is an
[8:44] inevitable consequence of calling fine
[8:47] definition whether it's visible in its
[8:49] code or not I have hidden but I've not
[8:52] cleanly
[8:53] decoupled what if we did
[8:56] everything the other way around what if
[8:59] when we saw a routine with IO in it
[9:02] that's ugly and idiosyncratic and might
[9:05] change
[9:06] tomorrow What If We rescued the
[9:10] logic instead of hiding the IO this is
[9:14] exactly the same lines of code but in
[9:17] this case I have pulled
[9:19] out the data
[9:22] operations and made them separate and
[9:25] left the io stranded at the top level of
[9:28] the program
[9:30] program rather than leaving my logic
[9:32] there and my claim is this that listing
[9:35] three that we just looked at is an
[9:36] architectural success while the others
[9:40] were architectural failures listing
[9:42] three shows in miniature What the clean
[9:46] architecture does for entire
[9:49] applications here's that top function
[9:52] from listing three the coupling between
[9:56] the logic and the io the thing in my
[9:58] program that brings together logic and
[10:01] IO in a way that that they both have to
[10:04] be called at at once is now isolated to
[10:08] a small procedure that mates my logic
[10:12] and my external IO operations together
[10:16] it's very readable because instead of
[10:19] blocks of logic operations I now have
[10:21] names for them build URL pluck
[10:25] definition from this
[10:26] data that document what each section of
[10:29] code was doing the previous the first
[10:31] listing had no documentation for what
[10:34] those series of operations did this
[10:36] should remind you of um a little bit of
[10:39] the uh extreme programming movement from
[10:41] the late '90s early 2000s where remember
[10:45] they said that if you ever see a piece
[10:47] of code with a comment at the top that's
[10:51] a sign that you actually have what wants
[10:53] to be a
[10:54] function um and they would say you know
[10:57] if you're writing high speed C code and
[10:59] you want it to run fast Market is in
[11:01] line static but make it so so it gets in
[11:04] line at compile time but semantically
[11:07] make it something separate XP people
[11:11] actually believed it was a bug this is
[11:13] why it was called Extreme programming
[11:15] they actually believed every comment was
[11:17] a bug because every comment was
[11:20] knowledge that wasn't in your code and
[11:23] if your code isn't explaining everything
[11:26] about what it's doing to them it's bad
[11:28] code so before you could commit in ex
[11:31] extreme programming all the comments had
[11:33] to disappear as in this case we
[11:35] introduce a new name a new identifier
[11:38] into the code build URL that wasn't
[11:41] there before so that semantic
[11:43] information about what are these three
[11:45] lines do becomes a part of our program's
[11:48] actual
[11:49] semantics and and in this uh so so this
[11:52] uh maneuver we see is we turn pure logic
[11:56] into functions and thus have to give
[11:58] them names
[11:59] in the same way that XP did uh we find
[12:03] that we're adding more semantic content
[12:05] to our
[12:07] code so our architecture in listing one
[12:12] was simply a procedure a procedure
[12:14] meaning something that has side effects
[12:16] you call it and some IO has happened
[12:18] when it's all done listing two the
[12:21] natural way of using a sub routine since
[12:24] the 1950s to hide complexity resulted in
[12:28] hiding the IO but the top level code
[12:31] there was was still a procedure all of
[12:33] our logic was stranded in a routine that
[12:36] did IO every time you called it listing
[12:39] three by doing the opposite maneuver
[12:42] left the I/O up in the procedure and
[12:45] resulted in pure functions it resulted
[12:49] in uh Downstream uh uh python functions
[12:52] that don't do IO that don't have side
[12:55] effects they simply take some arguments
[12:57] that are data and return some results
[13:00] that are just data this has incredible
[13:04] ramifications among other things uh for
[13:08] testing how would we have tested listing
[13:10] one or two where the goal is to not have
[13:13] your tests need the network and to talk
[13:16] to duck ducko imagine that you want your
[13:18] tests to run on an airplane or at the
[13:20] airport or you don't have Wi-Fi or
[13:23] something two techniques have been
[13:25] developed over the 2000s in um uh being
[13:28] Pion I believe in Java and the other big
[13:31] oo statically typed languages uh they
[13:34] are dependency injection and the idea of
[13:37] mocking which in Python we can do
[13:40] through monkey patching without even
[13:41] modifying our code through something
[13:43] I'll show in a moment called mock patch
[13:47] dependency injection was pioneered in
[13:49] 2004 by another of the big oo thinkers
[13:52] named Martin Fowler his idea was to make
[13:55] the io library or function that the
[13:58] routine needs to call
[13:59] itself a parameter and this is really
[14:01] easy in Python functions in Python are
[14:04] first class objects you can pass them um
[14:08] modules are first class objects and can
[14:10] be an argument to a function so instead
[14:14] of having Find definition from listing
[14:17] one um literally and always use the
[14:21] requests Library you could make that a
[14:24] parameter whose default if it's not
[14:26] provided is to use Kenneth writs as
[14:29] requests library but which lets you
[14:32] substitute any other kind of modu
[14:35] looking object in instead if you want to
[14:38] skip the call out to duck ducko and here
[14:42] is how you might write a test against
[14:44] that function I just showed you you'd
[14:46] make a request a fake requests library
[14:48] with a get call inside of it just like
[14:51] the real requests library but when it
[14:53] was asked for its Json data it can just
[14:57] return a uh constant the the test
[14:59] therefore can just set up a fake answer
[15:02] we're not really doing any IO here we're
[15:04] just going to answer this fake Json data
[15:07] back when the uh definition is asked for
[15:11] and we can now call our code Find
[15:14] definition and avoid any IO by having it
[15:18] use our fake little requests Library
[15:20] instead of the real one so we get a
[15:22] self-contain test that doesn't actually
[15:25] spam do Dogo with lots of um requests
[15:29] and a couple us to duck ducko needing to
[15:31] be up and running and not having blocked
[15:33] our IP address yet um because we're
[15:36] running so many
[15:38] tests uh the problems with this are
[15:41] obvious first that fake requests Library
[15:44] we wrote well it's not the real request
[15:47] Library so who knows whether calling you
[15:50] know the fact that we called it and got
[15:51] data back doesn't tell us that calling
[15:53] real duck Dugo will give us data back it
[15:56] might look simple for one server
[15:59] an IO routine that just needs to make an
[16:02] HTTP request but a procedure that also
[16:04] needs let's say database and file system
[16:07] access is going to need lots of
[16:09] injection what you tend to get if you
[16:11] use dependency injection is highlevel
[16:14] functions that need everything in the
[16:16] kitchen sink because if way down beneath
[16:19] them anyone tries to talk to the
[16:21] database it's got to be dependency
[16:23] injected if another procedure needs the
[16:25] web it needs to be dependency injected
[16:29] and um this problem has actually spun up
[16:32] to the level of having huge dependency
[16:34] Frameworks uh dependency injection
[16:36] Frameworks they're called in the larger
[16:39] oo languages because of this problem of
[16:41] if the very bottom guy has got to talk
[16:43] to the web and you ever want to be able
[16:46] to test that code then the top level
[16:48] procedure has somehow got to get the
[16:51] information about what the web is right
[16:54] now is it a test mock or is it the real
[16:56] thing uh when it's called
[17:00] now a dynamic language like python
[17:02] fortunately has ways around dependency
[17:05] injection so we don't wind up with that
[17:06] problem I just described uh thanks to
[17:09] the mock Library uh incredible resource
[17:11] created by Michael Ford uh we have the
[17:14] ability to live patch our I IO libraries
[17:19] to briefly substitute fake versions of
[17:21] their cbles that will return the data we
[17:24] want and I believe the mock Library uh
[17:26] is now part of the most recent Python 3
[17:29] um it's so important it was added to the
[17:31] standard
[17:32] Library um in that case we can use the
[17:35] original listing one or the original
[17:38] listing two and we can just ask um our
[17:44] the the patch cable from the requests
[17:46] library to patch requests.get to be our
[17:51] fake version of it instead inside of the
[17:54] width statement inside of this context
[17:56] this block of code during which that
[17:59] patch is active uh our test gets run no
[18:03] real connection is made to the outside
[18:05] world and we find out if our uh function
[18:07] works against purported data from Duck
[18:11] ducko whether you do dependency
[18:13] injection or whether you call mack.
[18:16] Patch I find that the result is kind of
[18:19] awkward and kind of sad as I test I just
[18:23] feel like I'm fighting the structure of
[18:25] my application I feel like I'm trying to
[18:29] make it do something that it would
[18:31] really rather not
[18:32] do so how does testing improve when we
[18:36] factor out our logic as in listing three
[18:41] where we get the logic that simply deals
[18:44] with data structures and Rescue It by
[18:47] putting it beneath the io rather than
[18:51] Above It Well by definition pure
[18:54] functions can be tested using only data
[18:57] arguments go in the top a list or a
[19:00] string or some other data structure is
[19:02] going to come out the bottom so for
[19:05] example if I want to test the build URL
[19:08] I just call it I don't have to set up
[19:10] objects I don't have to build things I
[19:12] just call it with different arguments
[19:14] and instead of going and hunting for
[19:16] side effects I can just look at the
[19:18] return value and see whether it's what I
[19:22] expected no special setup is needed no
[19:25] special preparation I don't have to
[19:27] build a mock and and the test calls I'm
[19:30] making look exactly like the calls that
[19:32] are used in production so I know they
[19:35] have a high probability of telling me
[19:38] whether my code will work in
[19:41] production uh here I'm going to test the
[19:44] second half of the logic pluck
[19:46] definition which needs to pull out the
[19:48] value of the definition key or raise an
[19:51] exception two simple tests and I have
[19:53] 100% test coverage of it again making a
[19:57] pure call
[19:59] that is not in any way adulterated or
[20:02] changed or adjusted from the way this
[20:05] function will be experiencing reality
[20:07] when this code is in production it's
[20:09] seeing exactly the same kind of things
[20:11] come in and go out as it will when I use
[20:14] it for
[20:15] real uh being able by the way to write
[20:19] the tests like that taught me about a
[20:22] symptom of coupling I had never observed
[20:24] a symptom that tells me that I might
[20:26] have locked logic together that that
[20:28] could more cleanly split out you'll note
[20:30] that all I had to do there was write one
[20:33] set of tests for building the URL and a
[20:36] completely different set of tests for
[20:38] whether I could parse the data that came
[20:40] back and I noticed that in a lot of my
[20:43] older projects I had bigger more
[20:47] complicated uh routines where I had
[20:52] where doing the test for a good URL and
[20:54] good data was very easy to call but that
[20:58] I then had to essentially uh start doing
[21:02] different permutations of argument to
[21:05] get each part of my logic to fail
[21:08] separately because it wasn't out
[21:09] separate where I could call it and so
[21:11] having a big series of pieces of logic
[21:14] where I want to make each part fail
[21:16] individually I first have to make a
[21:18] bunch of calls with a bad URL but that
[21:21] don't pass in a second piece of data
[21:22] because I never reached that part of the
[21:24] code and then a series of tests that
[21:26] give a good URL so we revive the first
[21:29] half of the code but then bad data so
[21:32] that that part will fail and I now
[21:34] consider that uh pattern that I I see uh
[21:38] a symptom a a a a cry for help if you
[21:41] will from my application code telling me
[21:44] that I have coupled two pieces of logic
[21:47] together that are really separate they
[21:49] do different things they're going to
[21:51] fail in different circumstances and then
[21:53] instead of leaving them coupled and then
[21:55] having to Fiddle in turn
[21:58] variable while leaving the others
[22:00] constant I might be able to rescue these
[22:03] pieces of logic into separate functions
[22:06] this does become I do sometimes leave
[22:10] this pattern in my tests if there's just
[22:13] so much State shared between the first
[22:16] and the second piece of logic that it's
[22:19] just not reasonable to return all 20
[22:21] things so that they can then then be the
[22:23] uh arguments to the second piece of
[22:25] logic this comes up a lot in astronomy
[22:27] where an initial routine might set up a
[22:30] bunch of variables that the conclusion
[22:33] of a logic then needs to succeed or fail
[22:35] on before throwing them away and
[22:37] returning a simple value but if you look
[22:40] at the output to the first piece of
[22:42] logic and find it's rather modest
[22:45] rescuing the two pieces of logic out
[22:47] into separate routines can make um your
[22:50] tests less expensive simpler and easy to
[22:53] think about by the fact that you're not
[22:55] getting big tall sequences of logic and
[22:58] contorting yourself to try to get the
[23:00] third thing that happens to
[23:03] fail all all all of which is invalidated
[23:06] by the way if you then change the order
[23:08] of your operations because now you need
[23:10] something different to succeed in order
[23:12] to reach the second or third uh error or
[23:16] exception that could happen in your code
[23:19] so that is a really really simple
[23:22] example that we've just gone through um
[23:26] almost trivially simple I I I made only
[23:29] as complicated as I thought it would you
[23:30] needed to to get the point in real life
[23:33] the clean architecture often involves
[23:36] much much bigger pieces of code and the
[23:38] question of how they hook together not
[23:41] nine line functions and the fact that we
[23:43] can pull one or two pieces out what um
[23:46] Uncle Bob Martin does is he as he's
[23:49] designing his entire application he's
[23:53] thinking through what parts of my
[23:56] business logic can survive
[23:59] being split off where they take
[24:01] arguments take data structures and
[24:03] return data structures such that the top
[24:07] level glues all of these pieces together
[24:10] so that the io stays up at the top level
[24:15] and the bottom levels are simply objects
[24:17] or functions that don't need to know
[24:19] where the data is coming from where it's
[24:21] going how it's getting
[24:23] persisted uh but instead simply enforce
[24:27] your business rule
[24:29] do your computation and leave it up to
[24:31] the caller where to put the
[24:34] results he says in one of those blog
[24:37] posts in general the further in you go
[24:39] in his architecture the higher level the
[24:42] software becomes the outer circles are
[24:45] mechanisms the inner circles are
[24:48] policies the important thing is that
[24:52] isolated simple data structures are what
[24:55] is passed across the boundaries
[24:59] when any of the external parts of the
[25:01] system become obsolete like the database
[25:05] or the web framework you replace those
[25:08] obsolete elements with a minimum of
[25:11] fuss because all the innards don't know
[25:14] about the database the ards don't know
[25:16] that the web is there the um and so if
[25:20] you need to replace the way your data is
[25:23] stored the way data flows in or out you
[25:26] just make adjustments at the outside
[25:27] level and everything else should keep
[25:33] working back to our code to make this
[25:35] concrete we could change how we do the
[25:38] io we could change how we batch up these
[25:41] operations we could change what happens
[25:43] up at the top without having to change
[25:46] either of these functions down inside
[25:50] because they take simple data as input
[25:53] manipulate it and return new data as
[25:56] output
[25:59] all right you might
[26:01] say I would like to know whether my app
[26:03] works against duck
[26:05] Dugo I do want to test my IO code at
[26:09] least once even if this pattern does let
[26:12] me do most of my testing with peer data
[26:15] how do you test the top level procedural
[26:17] glue uh and here I'd refer you to Gary
[26:20] bernhard's talks at Pyon 2011 through
[26:24] 2013 where he uh from the Ruby world
[26:27] that's his Prim language uh explored a
[26:30] different form of this same kind of
[26:34] approach and talking about how to make
[26:36] the majority of your tests very fast and
[26:39] only investing in a few tests doing the
[26:43] endtoend io bound operations that there
[26:46] at the end tell you yes my app actually
[26:49] works and will actually fetch in real
[26:52] information from a database or whatever
[26:54] and work with it his terminology is a
[26:57] little different than Uncle Bob but
[26:58] works in much the same way an imperative
[27:02] Shell at the top level that does IO that
[27:06] wraps and uses your functional core
[27:10] functional core because it takes and
[27:12] returns data can have lots of fast unit
[27:15] tests exercising directly all the ways
[27:18] it could fail all the conditions it has
[27:20] to detect up at the top your imperative
[27:23] shell hopefully only needs a few
[27:26] integration tests in order to verify for
[27:29] you that it works because you're not
[27:30] having to hit the imperative shell with
[27:33] the 20 different ways that a uh a word
[27:36] definition you're looking up could be
[27:37] misformed you're doing that by testing
[27:40] the functional core you just test the
[27:42] imperative shell to make sure the pieces
[27:44] are then hooked together correctly
[27:46] here's our top level function from
[27:48] listing three I mean there's not even
[27:50] any if statements here it shouldn't
[27:52] require very many tests to confirm for
[27:54] you that this is doing the steps of your
[27:57] application in the right order this
[28:00] pattern by the way um is already
[28:03] familiar to a lot of people who do
[28:04] functional programming languages like
[28:06] list Pascal closure and fshp make it
[28:09] quite natural to write most of your code
[28:13] as pure functions um and then you get
[28:16] awkward and put the procedural stuff up
[28:18] at the top functional languages
[28:20] naturally lead you to process data
[28:22] structures while avoiding side effect IO
[28:26] you tend to call functions and fun
[28:28] functional languages for what they
[28:30] return not for the sequence of things
[28:33] they happen to do while that code is
[28:37] running this is an example of IO as a
[28:40] side effect I'm getting a data
[28:42] processing task iterating over a python
[28:45] iterator to get a series of words and
[28:48] upper casing them with an IO task when
[28:52] you call uppercase words in this example
[28:54] you're not expecting to see any
[28:56] uppercase words as it's return value
[28:59] you're expecting that when it returns
[29:01] nothing to you it will have had a side
[29:03] effect in the outside world of producing
[29:06] those as output if you want to test this
[29:08] you're going to have to use mack. patch
[29:11] or something else to intercept the
[29:13] standard
[29:15] output here is an example of the same
[29:18] code split into a purely logical piece
[29:23] where it
[29:24] consumes uh uh an iterator that gives it
[29:27] word words and produces as an iterator
[29:31] as a generator in this case a series of
[29:33] uppercase words separately from the
[29:36] question of any side effects and it can
[29:39] then be quite naturally plugged into a
[29:41] top level as Gary burnhard calls it
[29:43] procedural glue routine which then does
[29:47] the io on its
[29:50] behalf procedural code tends to be
[29:53] called not because it's going to return
[29:55] anything interesting but because of what
[29:57] it does because of what it tosses out or
[29:59] pulls from the world it tends to Output
[30:02] as it runs functional code on the other
[30:05] hand tends to be organized in discrete
[30:08] stages that each produced data that then
[30:12] finally gets output at the
[30:16] end in Gary bernhard's talks he talks a
[30:20] lot about the
[30:21] immutability uh a lot of these
[30:23] functional programming languages imagine
[30:26] python where you didn't have lists but
[30:28] only tupal that once a list was built
[30:30] you couldn't change it anymore imagine
[30:33] python with dictionaries that once you
[30:34] built them you could never change them
[30:36] where if you wanted to uh produce a new
[30:39] dictionary you had to ask for the a copy
[30:42] of the old dictionary with like one
[30:44] thing changed or something uh a lot of
[30:46] these functional languages have
[30:48] immutable data structures that never
[30:50] change and
[30:52] some programmers who who are fans of the
[30:55] functional programming Style say that
[30:58] that they're much much easier if they
[31:01] pass a data structure to a function
[31:03] knowing it can't be changed that they
[31:05] don't have to go search to see if it
[31:06] looks different that every data
[31:08] structure is immutable and they claim
[31:11] some of them like to claim that the
[31:13] whole point of this programming style is
[31:15] immutable data structures so that you
[31:18] would feel guilty about having objects
[31:20] with writable attributes or dictionaries
[31:23] that you might update and I'm going to
[31:25] make the argument that it is not
[31:27] immutability that makes the functional
[31:29] programming language so clean or it's
[31:32] not the only thing my guess is that the
[31:35] biggest advantage of data in a
[31:37] functional programming style isn't its
[31:40] immutability it is simply the fact that
[31:43] it's data and that data structures you
[31:45] can see them you can reason about them
[31:48] unlike a moving process that you're
[31:49] worried about whether it's spinning off
[31:51] consequences in the right order a data
[31:53] structure is just something you can look
[31:55] at and understand
[31:58] um two two examples from Computing
[32:00] history that I'll use to back me up on
[32:02] this the famous Fred Brooks book the
[32:04] mythical man month about uh successes
[32:07] and failures in managing uh projects uh
[32:10] written in 1975 it's very famous you
[32:13] probably heard of it before because of
[32:15] the quote um this is back when if a pro
[32:18] a project was going slowly they would
[32:19] just keep throwing more developers in
[32:22] the bearing of a child takes 9 months no
[32:27] matter how many women are
[32:30] assigned there are some processes that
[32:34] do not get faster because you flood the
[32:36] organization with young with with with
[32:38] untrained people who who who don't know
[32:40] what's going on and and he often found
[32:43] projects he's the famous uh aphorism
[32:45] that projects go slow more slowly the
[32:48] more people you add in many
[32:51] cases he said the following on the
[32:53] question of what's easier to understand
[32:56] data or code code at the time code was
[32:59] usually written out as flowcharts and
[33:02] data was usually organized in memory in
[33:04] what they call tables he said show me
[33:08] your flowchart and conceal your tables
[33:12] and I shall continue to be
[33:15] mystified show me your
[33:17] tables and I won't usually need your
[33:21] flowchart it'll be
[33:23] obvious very often if you just show
[33:26] someone the way that you laid out your
[33:28] dictionaries and
[33:30] lists and other data structures they can
[33:33] probably guess how you're going to run
[33:35] through those data structures and get
[33:36] your job done there's something that's
[33:39] much clearer about seeing the data that
[33:41] you wind up producing or the data in an
[33:44] intermediate step than to stare at the
[33:46] steps in your program and try starting
[33:50] there without any bigger picture of what
[33:52] they're creating trying to guess or
[33:55] understand what result is being built or
[33:59] generated uh so that's one example of of
[34:01] a uh a famous thinker in computer
[34:04] science who I think would back me up
[34:06] that the data is where it's at but I'll
[34:08] also cite the
[34:10] 1986 famous showdown between melroy and
[34:15] Donald kth um who who K who largely
[34:19] invented computer science as in the 70s
[34:21] as he wrote uh the art of computer
[34:24] programming um canth very very very
[34:28] famous programmer uh there he
[34:31] is was asked to write a
[34:34] routine uh he he he practiced something
[34:36] called literate programming where he had
[34:38] lots and lots of comments and and where
[34:40] where a computer program could actually
[34:41] be published as a book explaining itself
[34:44] and he was given by a uh programming
[34:47] magazine the task given a text file and
[34:51] an integer
[34:53] K can you tell that computer science was
[34:55] invented by mathematicians
[34:58] print the K most common words in the
[35:01] file and the number of their occurrences
[35:04] in decreasing
[35:06] frequency he produced 10 pages of Pascal
[35:10] code that did this um and
[35:13] mroy said I mean this is a for he
[35:16] admitted this is a formidable solution
[35:19] can's solution is to tally in an
[35:22] associative data data structure
[35:24] something like our python dictionary
[35:26] each word as it is from the file the
[35:29] data structure is a tree with 26 way
[35:32] well for technical reasons actually 27
[35:34] way fan out at each letter to avoid
[35:37] wasting space all of the sparse 26
[35:40] element arrays are cleverly interleaved
[35:43] in one common arena with hashing used to
[35:46] assign
[35:47] homes 10 pages of
[35:54] Pascal at the conclusion of his article
[35:57] after viewing can's code pointing out
[35:59] several bugs in it and uh edge cases
[36:02] that would make it
[36:04] crash in one of the most famous moments
[36:06] in computer science Mikel Roy replaced
[36:09] Donald kth with a six line shell
[36:16] script the first
[36:19] line finds every run of letters a
[36:23] through z or lowercase letter through Z
[36:25] in the file and puts them together gets
[36:29] everything that's not a letter and turns
[36:30] it into a new
[36:32] line Second command makes everything
[36:35] lower case so that we don't count Words
[36:37] twice if they're at the beginning and in
[36:38] the middle of a
[36:39] sentence sort is going to bring you know
[36:44] AR all of the word instances of the word
[36:46] arvar together in a row and then all of
[36:48] the instances of the word Brandon and
[36:50] Python and so forth unique is going to
[36:53] get those runs of identical words count
[36:56] them until you arvar five Brandon and so
[36:59] forth I suppose I rate my popularity
[37:02] slightly below that of the arvar
[37:05] then it is then going to sort that
[37:08] output on the numeric field sitting in
[37:11] front of each word so that six arvar
[37:14] goes first and um for python goes next
[37:18] and so forth and then finally we ask
[37:21] said for the first in lines and then
[37:23] quit
[37:27] mckelroy points out that every one of
[37:30] these tools back as Unix was being
[37:31] invented was written first for a
[37:33] particular need but then untangled from
[37:35] the specific application the person who
[37:38] first needed to put a sorter inside of
[37:40] their program got it written and then
[37:43] Ste back and said you know I'll bet
[37:45] someone else might need to sort
[37:47] something someday and went to the work
[37:50] which is difficult of pulling that out
[37:52] so it could work on any text input file
[37:54] with any format now the traditional
[37:57] lesson then the one that melroy Drew
[37:59] here was it's better to use simple small
[38:03] tools that can be easily linked together
[38:06] and I would say that if this is the only
[38:07] lesson we can draw from his uh Showdown
[38:10] with uh canth uh it's it's a very good
[38:12] one because python has lots of simp
[38:15] because of the iterator protocol
[38:17] especially it's really easy in Python to
[38:19] link together a series of generators to
[38:22] throw in sets and lists and dictionaries
[38:24] at just the right point to get a lot of
[38:27] really interesting data processing done
[38:30] but today I want to draw a different
[38:32] lesson that meloy did
[38:34] not to me the shell script is
[38:38] simpler not simply because the steps are
[38:41] easy but because I can picture the
[38:46] data it's because in between each of
[38:49] these pipes in between each of these
[38:51] commands that data is Flowing between I
[38:54] can close my eyes and know exactly what
[38:57] that data looks like at easy at each
[39:00] step the shell script is the simpler
[39:02] solution because it operates the
[39:05] stepwise transformation of data and
[39:07] what's key here is not simply that the
[39:09] steps are easy to describe I'll bet that
[39:11] many of you even who didn't know the TR
[39:13] command before could probably tomorrow
[39:15] explain this shell script to someone
[39:17] after a bit of heads scratching it's not
[39:20] just that the steps are easy though they
[39:22] are it's that I can just close my eyes
[39:25] and picture what the out put looks like
[39:28] at the conclusion of each command
[39:31] running and that's very powerful because
[39:33] it lets you visualize very accurately
[39:36] what this is
[39:37] doing in a way that is not going to
[39:40] happen with 10 pages of dense Pascal
[39:43] that are producing an inmemory hashtable
[39:46] 26 27 way fan out um binary
[39:52] tree this approach continually surfaces
[39:56] intermediate results that can be checked
[39:58] examined if you find this doesn't work
[40:01] you can just go back and find it which
[40:03] Step it failed in each case as simple
[40:05] plain
[40:07] text so if I'm right that one of the big
[40:10] wins of a functional programming style
[40:13] is simply that it deals in data which
[40:15] our minds can picture very
[40:18] easily what then is the value of
[40:20] immutability and I think Gary burnhard
[40:23] got this right when he said the fun of
[40:25] immutability I believe this was in the
[40:27] 12 talk is distributed computing um is
[40:32] that um this is my only slide about this
[40:34] is that um if all of your routines just
[40:38] take a data structure and return a data
[40:41] data structure it doesn't much matter
[40:43] what core they run on in a big cluster
[40:46] you can push data out to a bunch of
[40:48] servers run the data step separately and
[40:52] then collect the output back and a task
[40:56] that you broke broken down into steps
[40:59] that simply pull in data and return data
[41:01] can then be hooked up to a message cue
[41:03] and fanned out across a very wide uh
[41:07] data server so
[41:10] long as it's the return value that's
[41:12] important and not the side effect if I
[41:14] call a routine and its value is that my
[41:17] data structure will now look different
[41:19] it's got to live on the same machine so
[41:20] it's changing the copy of the data
[41:22] structure I've got in
[41:24] memory but if it's it's return Val value
[41:27] that's important and not the way it
[41:29] monkeys with the data I already have in
[41:31] memory it can run anywhere so long as
[41:33] the result is
[41:35] delivered data and transforms are easier
[41:39] to understand and I think they're easier
[41:41] to maintain than coupled
[41:45] procedures now if that's the case python
[41:48] has been evolving recently in exactly
[41:51] the right direction if you think about
[41:52] the kind of innovations that have marked
[41:54] the last decade especially of uh in a
[41:56] decade and a half half as python has
[41:58] grown from the language it was in the
[42:00] 1990s in October 2000 we got the list
[42:04] comprehension the list comprehension
[42:07] seems like a slight convenience but it
[42:10] really changes you as a
[42:14] programmer it takes someone whose job
[42:19] our job used to be modifying data
[42:22] structures make an empty list and then
[42:24] go through and start changing it
[42:27] and it turned us into people that build
[42:29] new data structures that we often never
[42:31] touch often my code today takes in a
[42:34] list and in a series of comprehensions
[42:37] just like that shell script generates a
[42:40] series of intermediate results and a
[42:42] final data structure that it returns
[42:44] without ever reaching back into one of
[42:47] the earlier results and feeling the need
[42:49] to change it list comprehensions make it
[42:51] really easy to write python code that's
[42:54] purely functional where you write your
[42:57] using right once throwaway data
[42:59] structures for your intermediate results
[43:02] rather than doing constant
[43:04] modification python
[43:06] 2.4 saw the introduction of the sorted
[43:10] built-in remember how we used to sort
[43:12] you used to have to build a list give it
[43:14] a name call its. sort method which
[43:16] returns none so you couldn't use it in
[43:18] the middle of an expression and then go
[43:20] back to the list which has now been
[43:22] modified which has now changed to see
[43:24] the result thanks to um um Raymond
[43:28] hettinger's addition of the sorted built
[43:30] in we now just ask for data to be sorted
[43:33] and returned to us in a single step
[43:36] instead of having to build and modify a
[43:37] data structure in
[43:39] several and um if you try applying these
[43:42] different patterns to your python code
[43:45] remember that python has several
[43:47] different ways of breaking out a pattern
[43:50] from uh your code we do have functions
[43:53] or methods but remember that if it's
[43:55] iteration that want to factor out you
[43:58] can build a generator if you have some
[44:02] set up and tear down that you want to
[44:04] pull out you can build a context manager
[44:07] there's actually whereas older
[44:09] programming languages uh let's say Java
[44:12] will have one way to break out a sub
[44:15] routine and then you have to come up
[44:17] with design patterns that fill in the
[44:19] lack of generators and the lack of
[44:22] context managers with something like the
[44:24] visitor pattern or something like that
[44:26] python just has all three of them built
[44:28] in you can pull out the middle of a loop
[44:31] as a function you can pull out the loop
[44:33] logic itself as a generator you can pull
[44:36] up set pull out setup and tear down as a
[44:39] context manager we have a lot of
[44:41] different ways of getting logic and
[44:45] doing that rescue operation where we
[44:47] decouple it from our IO so that it can
[44:50] live separately two real world examples
[44:53] just current projects of mine are uh
[44:56] skyfield and object-based API for
[44:58] astronomy backed by dozens of pure
[45:01] functions that were really easy to test
[45:04] functions implement the actual
[45:06] operations um and the miserable thing by
[45:09] the way about a method is that it
[45:11] implicitly depends on the state of the
[45:13] whole object it's often hard to test a
[45:15] method because it's not clear how much
[45:17] of the object in the test needs to be
[45:19] set up and initialized before the method
[45:21] can run um whereas the beautiful thing
[45:25] about a function is you just read the
[45:27] arguments and you know now what to
[45:28] provide to it there's a well-written
[45:31] function doesn't need extra globals or
[45:34] other persistent State set up and
[45:36] available for it to run which makes
[45:38] testing bug fixing a lot of easier
[45:43] remember in the Zen of python which I
[45:45] hope you look at every morning as you
[45:46] get ready to
[45:48] code second only in Python's uh um motto
[45:55] behind beautiful is better than ugly
[45:57] is explicit is better than implicit and
[46:00] a function if nothing else is very
[46:02] explicit about its needs right there in
[46:05] the argument list tells you what it
[46:06] needs to succeed so skyfield is one
[46:10] example I've done recently that has
[46:12] turned out really well when I didn't
[46:14] strand my important logic up coupled
[46:17] into my objects or IO but where I spun
[46:20] off everything I could in an easy to
[46:22] call function the other is something I
[46:24] use for filling in tax forms for myself
[46:27] called Lucha it's also on GitHub the
[46:29] Temptation there and actually the first
[46:31] version of it as it ran along Computing
[46:34] Fields would immediately then call the
[46:36] low-level PDF operations to you know
[46:39] write them onto the 1040 tax form or
[46:43] whatever I then was able to uh rescue
[46:47] that that deeply deeply um compromised
[46:50] and and very difficult to maintain code
[46:52] by breaking it into phases I first read
[46:55] in the entire input of the tax form
[46:58] while resisting the temptation to do
[46:59] anything with it I simply read it into a
[47:02] data structure and return that I then
[47:05] have a routine that takes the inputs to
[47:07] a tax form and it's very easy to write
[47:10] these little routines add up the numbers
[47:12] do the rounding multiply the percents
[47:14] and produce all of the output lines in
[47:17] the tax form and I resist the temptation
[47:20] to write that out to the PDF I hand that
[47:23] data data structure then to my PDF
[47:25] writer who's job is to fill text into
[47:28] Fields it it was much easier to write
[47:31] and maintain code that was split so that
[47:34] data structures pass between phases
[47:36] rather than making a slightly shorter
[47:38] program that immediately tried to go
[47:40] have side effects but thereby made
[47:42] itself very difficult to test so the
[47:44] pith of the idea here is that in the old
[47:47] days if we wanted to get rid of all of
[47:50] that
[47:51] pesio we would try to accomplish that by
[47:54] turning it into a sub routine
[47:57] the new idea I'm propounding is that if
[47:59] you really want to get rid of someone
[48:02] make them a
[48:04] manager put them in charge get all of
[48:07] that IO Laden code and make it feel
[48:10] important by putting it up at the
[48:13] top of your program make it the
[48:16] procedural glue leaving all of the
[48:19] little function subordinance free to do
[48:22] their jobs let's return to Wheeler I
[48:25] have one last quote in
[48:28] 1952 he gave us the sub
[48:31] routine and I think because of that
[48:35] initial phase of computing history in
[48:37] which we tried to use it wrongly we have
[48:39] yet to realize its full power and
[48:41] promise but I'd like to end with a quote
[48:44] in which he described what he thought
[48:45] would someday happen now that we have
[48:47] sub routines he said when a program has
[48:51] been made from a set of sub routines the
[48:54] breakdown of the code is more complete
[48:57] than it otherwise would be this allows
[49:00] the coder to concentrate on one section
[49:02] of the program at a time without the
[49:04] overall detailed program continually
[49:07] intruding thus the sub routines can be
[49:11] more easily coded and be tested in
[49:14] isolation from the rest of the program
[49:17] when the entire program has to be tested
[49:20] it is with the forn knowledge that the
[49:23] incidence of mistakes in the sub routine
[49:27] is
[49:28] zero or at least one order of
[49:33] magnitude below that of the untested
[49:36] portions of the program thank you very
[49:39] much for listening I'm Brandon rhods
[49:46] [Applause]