[0:00] all right everybody Welcome to glad [0:03] you've hung in there to the second [0:04] Afternoon Of Pi Ohio I'm Brandon rhods [0:07] when I'm not uh writing python apis or [0:11] writing python [0:13] applications I'm wondering why the code [0:15] in my apis and applications is such a [0:20] mess the industry as a whole I'm told [0:22] that at the moment the numbers are hard [0:24] to find that more software projects even [0:26] to this day uh more projects fail than [0:29] succeed worldwide in businesses and [0:32] institutions and uh as an industry we're [0:34] still [0:38] explaining uh we're still trying to [0:40] learn uh why uh a piece of that puzzle I [0:44] think is the recent work that's been [0:46] done in propounding the clean [0:47] architecture and I'm going to give some [0:50] uh examples of how I believe that [0:52] applies to [0:54] python uh the inspiration for this talk [0:57] is uh uh uh someone called Uncle Bob [1:01] Martin who is uh he's really big in the [1:03] Java and the um strong object-oriented [1:07] uh statically type languages and he [1:09] recently in 2011 and 12 was thinking [1:12] about a new way of organizing his code [1:15] his um uh applications that he called [1:18] the clean architecture one of several [1:20] ideas that came out at about the around [1:23] the same time uh with about the same [1:26] goal but his became more popular because [1:29] I think he drew a better picture um [1:32] there was someone else that came out [1:34] with something uh like called the um [1:37] hexagonal architecture and and it just [1:39] wasn't as pretty it didn't use [1:41] colors um and so this is what people [1:45] often talk about if they're going to [1:47] refer to this idea we'll explore of [1:49] putting IO at the top level of your [1:52] program instead of at the bottom the [1:55] pith the center of the [1:57] idea uh this is this is not how he put [2:00] it this is my spin on it uh you're [2:02] familiar with the idea of a sub routine [2:04] where your code can be running along and [2:06] then make a call in Python the two forms [2:08] of of sub routine or the function and [2:10] the method where you can stop invoke [2:13] some other code and wait for it to come [2:15] back with an answer the pth of the idea [2:18] here is that we programmers have been [2:20] spontaneously using sub routines [2:24] backwards for how long have programmers [2:27] tended to use sub routines completely [2:30] backwards the wrong way by my count we [2:32] have been doing it for 62 years and my [2:36] proof is that I went back and I found [2:40] the [2:41] 1952 ACM national meeting paper in [2:45] pinburgh Pennsylvania it was the second [2:47] meeting of the ACM but the first for [2:49] which proceedings papers were [2:53] published and I found the use of sub [2:57] routines in programs [3:00] by Mr DJ wheeler uh Dr DJ wheeler of [3:04] Cambridge and Illinois [3:06] universities um and you might wonder am [3:10] I really going to pull anything of [3:11] relevance out of this guy's paper [3:13] because it was a very different world a [3:14] typical computer at the time had about a [3:17] thousand words of ram could do about a [3:19] thousand operations per second and [3:22] required a dozen people to operate could [3:25] programming a computer with a thousand [3:28] words of RAM really be anything like [3:31] computer writing uh code in a modern [3:33] language today well here's just one [3:36] example of something that you'll find [3:37] familiar from this paper how complex [3:40] could programming even be with only 1K [3:42] of memory in the paper he says the [3:46] preparation of a library sub routine [3:49] requires a considerable amount of work [3:52] however even after it has been coded and [3:54] tested there Still Remains the [3:57] considerable task of writing [4:00] a [4:02] description so that people not [4:05] acquainted with the interior coding can [4:08] nevertheless use it [4:10] easily this last task may [4:16] [Music] [4:19] be the most difficult you had 1,000 [4:25] bytes in which to write your code or I [4:28] should say 1,000 words in which to write [4:30] your code and they still didn't want to [4:34] document so I think the world he was [4:36] working in as I read this paper seemed [4:39] very familiar though in some ways very [4:43] strange he's advocating that instead of [4:45] just having one huge piece of code in [4:47] your 1,000 words of memory you split it [4:50] into routines that call one another like [4:53] instead of having a single uh function [4:55] in your python file having several what [4:57] does he advertise sub routines as being [5:00] good at why would you organize code this [5:03] way he says that you the primary reason [5:07] is to hide [5:10] complexity all complexities should if [5:15] possible be buried out of sight and this [5:19] you see is where everything went wrong [5:21] and he he doomed us for the next several [5:24] lifetimes of programming [5:27] because that then leads programmer [5:30] to a quite natural [5:32] mistake IO is always a mess trying to [5:36] talk to a database trying to parse Jason [5:39] uh uh trying to get things in and out of [5:41] a file it's a mess it's often very idiot [5:45] and CTIC code that doesn't have a lot to [5:47] do with the pure essence of what our [5:50] program is trying to [5:51] accomplish and the characteristic error [5:54] that we make is that we bury the io [6:00] rather than cleanly and completely [6:02] decoupling from it uh in the the time [6:05] allotted for this talk I'm only going to [6:08] attempt one code example so if if you'll [6:10] if we spend a second on this it will [6:12] will um get you set up for the the rest [6:15] of the listings and the talk this is a [6:18] simple function in Python that uses a [6:22] now deprecated API on Duck tuck duck [6:24] Dogo in order to look up the definition [6:27] of a word builds a URL in this case uses [6:30] the requests Library I was a good [6:33] citizen uh and then I marked those two [6:35] lines with that there's the io there's [6:38] that ugly complexity we'd like to make [6:40] disappear and then having gotten the [6:42] Json data back it can look and see if a [6:45] definition was in fact returned for the [6:50] word the natural thing that we tend to [6:53] do is say well IO is kind of messy who [6:57] knows tomorrow whether I might not be [6:58] using some other library in order to do [7:02] my uh HTTP who knows whether I might not [7:05] have a different way to ask for [7:07] definitions for instance if duck duck [7:10] dug go deprecates the [7:12] API um well I guess that would [7:14] invalidate all of it so I'll stay with [7:16] the example of what if the way that I do [7:18] the io what if the way that I make the [7:20] HTTP request [7:21] changes uh we want to get that [7:24] complexity and bury it and so we make [7:28] the fundamental mistake of the last 60 [7:30] years we get the io pluck it out and [7:35] feel proud of ourselves from having done [7:38] exactly what uh Dr Wheeler said we've [7:40] hidden it in a sub [7:43] routine we have hidden the iio but have [7:46] we really decoupled it pocket wheeler I [7:51] assert that hiding is not enough if you [7:54] want to control the complexity of your [7:56] programs here's the listing again and I [8:00] will just ask this if you want to call [8:03] Find [8:05] definition so that it doesn't really do [8:08] any IO because you're testing it or [8:11] because you've cashed a result and just [8:13] want to hand it the cached result [8:15] instead of calling your your uh lower [8:17] level code how do you do [8:21] that how do I call Find definition [8:25] without it actually doing IO and at [8:28] least as the code is presented it here [8:31] it's not possible I have you see hidden [8:34] the API you don't see any API if you [8:37] read fine definition but I'm still [8:40] tightly coupled to it the io is an [8:44] inevitable consequence of calling fine [8:47] definition whether it's visible in its [8:49] code or not I have hidden but I've not [8:52] cleanly [8:53] decoupled what if we did [8:56] everything the other way around what if [8:59] when we saw a routine with IO in it [9:02] that's ugly and idiosyncratic and might [9:05] change [9:06] tomorrow What If We rescued the [9:10] logic instead of hiding the IO this is [9:14] exactly the same lines of code but in [9:17] this case I have pulled [9:19] out the data [9:22] operations and made them separate and [9:25] left the io stranded at the top level of [9:28] the program [9:30] program rather than leaving my logic [9:32] there and my claim is this that listing [9:35] three that we just looked at is an [9:36] architectural success while the others [9:40] were architectural failures listing [9:42] three shows in miniature What the clean [9:46] architecture does for entire [9:49] applications here's that top function [9:52] from listing three the coupling between [9:56] the logic and the io the thing in my [9:58] program that brings together logic and [10:01] IO in a way that that they both have to [10:04] be called at at once is now isolated to [10:08] a small procedure that mates my logic [10:12] and my external IO operations together [10:16] it's very readable because instead of [10:19] blocks of logic operations I now have [10:21] names for them build URL pluck [10:25] definition from this [10:26] data that document what each section of [10:29] code was doing the previous the first [10:31] listing had no documentation for what [10:34] those series of operations did this [10:36] should remind you of um a little bit of [10:39] the uh extreme programming movement from [10:41] the late '90s early 2000s where remember [10:45] they said that if you ever see a piece [10:47] of code with a comment at the top that's [10:51] a sign that you actually have what wants [10:53] to be a [10:54] function um and they would say you know [10:57] if you're writing high speed C code and [10:59] you want it to run fast Market is in [11:01] line static but make it so so it gets in [11:04] line at compile time but semantically [11:07] make it something separate XP people [11:11] actually believed it was a bug this is [11:13] why it was called Extreme programming [11:15] they actually believed every comment was [11:17] a bug because every comment was [11:20] knowledge that wasn't in your code and [11:23] if your code isn't explaining everything [11:26] about what it's doing to them it's bad [11:28] code so before you could commit in ex [11:31] extreme programming all the comments had [11:33] to disappear as in this case we [11:35] introduce a new name a new identifier [11:38] into the code build URL that wasn't [11:41] there before so that semantic [11:43] information about what are these three [11:45] lines do becomes a part of our program's [11:48] actual [11:49] semantics and and in this uh so so this [11:52] uh maneuver we see is we turn pure logic [11:56] into functions and thus have to give [11:58] them names [11:59] in the same way that XP did uh we find [12:03] that we're adding more semantic content [12:05] to our [12:07] code so our architecture in listing one [12:12] was simply a procedure a procedure [12:14] meaning something that has side effects [12:16] you call it and some IO has happened [12:18] when it's all done listing two the [12:21] natural way of using a sub routine since [12:24] the 1950s to hide complexity resulted in [12:28] hiding the IO but the top level code [12:31] there was was still a procedure all of [12:33] our logic was stranded in a routine that [12:36] did IO every time you called it listing [12:39] three by doing the opposite maneuver [12:42] left the I/O up in the procedure and [12:45] resulted in pure functions it resulted [12:49] in uh Downstream uh uh python functions [12:52] that don't do IO that don't have side [12:55] effects they simply take some arguments [12:57] that are data and return some results [13:00] that are just data this has incredible [13:04] ramifications among other things uh for [13:08] testing how would we have tested listing [13:10] one or two where the goal is to not have [13:13] your tests need the network and to talk [13:16] to duck ducko imagine that you want your [13:18] tests to run on an airplane or at the [13:20] airport or you don't have Wi-Fi or [13:23] something two techniques have been [13:25] developed over the 2000s in um uh being [13:28] Pion I believe in Java and the other big [13:31] oo statically typed languages uh they [13:34] are dependency injection and the idea of [13:37] mocking which in Python we can do [13:40] through monkey patching without even [13:41] modifying our code through something [13:43] I'll show in a moment called mock patch [13:47] dependency injection was pioneered in [13:49] 2004 by another of the big oo thinkers [13:52] named Martin Fowler his idea was to make [13:55] the io library or function that the [13:58] routine needs to call [13:59] itself a parameter and this is really [14:01] easy in Python functions in Python are [14:04] first class objects you can pass them um [14:08] modules are first class objects and can [14:10] be an argument to a function so instead [14:14] of having Find definition from listing [14:17] one um literally and always use the [14:21] requests Library you could make that a [14:24] parameter whose default if it's not [14:26] provided is to use Kenneth writs as [14:29] requests library but which lets you [14:32] substitute any other kind of modu [14:35] looking object in instead if you want to [14:38] skip the call out to duck ducko and here [14:42] is how you might write a test against [14:44] that function I just showed you you'd [14:46] make a request a fake requests library [14:48] with a get call inside of it just like [14:51] the real requests library but when it [14:53] was asked for its Json data it can just [14:57] return a uh constant the the test [14:59] therefore can just set up a fake answer [15:02] we're not really doing any IO here we're [15:04] just going to answer this fake Json data [15:07] back when the uh definition is asked for [15:11] and we can now call our code Find [15:14] definition and avoid any IO by having it [15:18] use our fake little requests Library [15:20] instead of the real one so we get a [15:22] self-contain test that doesn't actually [15:25] spam do Dogo with lots of um requests [15:29] and a couple us to duck ducko needing to [15:31] be up and running and not having blocked [15:33] our IP address yet um because we're [15:36] running so many [15:38] tests uh the problems with this are [15:41] obvious first that fake requests Library [15:44] we wrote well it's not the real request [15:47] Library so who knows whether calling you [15:50] know the fact that we called it and got [15:51] data back doesn't tell us that calling [15:53] real duck Dugo will give us data back it [15:56] might look simple for one server [15:59] an IO routine that just needs to make an [16:02] HTTP request but a procedure that also [16:04] needs let's say database and file system [16:07] access is going to need lots of [16:09] injection what you tend to get if you [16:11] use dependency injection is highlevel [16:14] functions that need everything in the [16:16] kitchen sink because if way down beneath [16:19] them anyone tries to talk to the [16:21] database it's got to be dependency [16:23] injected if another procedure needs the [16:25] web it needs to be dependency injected [16:29] and um this problem has actually spun up [16:32] to the level of having huge dependency [16:34] Frameworks uh dependency injection [16:36] Frameworks they're called in the larger [16:39] oo languages because of this problem of [16:41] if the very bottom guy has got to talk [16:43] to the web and you ever want to be able [16:46] to test that code then the top level [16:48] procedure has somehow got to get the [16:51] information about what the web is right [16:54] now is it a test mock or is it the real [16:56] thing uh when it's called [17:00] now a dynamic language like python [17:02] fortunately has ways around dependency [17:05] injection so we don't wind up with that [17:06] problem I just described uh thanks to [17:09] the mock Library uh incredible resource [17:11] created by Michael Ford uh we have the [17:14] ability to live patch our I IO libraries [17:19] to briefly substitute fake versions of [17:21] their cbles that will return the data we [17:24] want and I believe the mock Library uh [17:26] is now part of the most recent Python 3 [17:29] um it's so important it was added to the [17:31] standard [17:32] Library um in that case we can use the [17:35] original listing one or the original [17:38] listing two and we can just ask um our [17:44] the the patch cable from the requests [17:46] library to patch requests.get to be our [17:51] fake version of it instead inside of the [17:54] width statement inside of this context [17:56] this block of code during which that [17:59] patch is active uh our test gets run no [18:03] real connection is made to the outside [18:05] world and we find out if our uh function [18:07] works against purported data from Duck [18:11] ducko whether you do dependency [18:13] injection or whether you call mack. [18:16] Patch I find that the result is kind of [18:19] awkward and kind of sad as I test I just [18:23] feel like I'm fighting the structure of [18:25] my application I feel like I'm trying to [18:29] make it do something that it would [18:31] really rather not [18:32] do so how does testing improve when we [18:36] factor out our logic as in listing three [18:41] where we get the logic that simply deals [18:44] with data structures and Rescue It by [18:47] putting it beneath the io rather than [18:51] Above It Well by definition pure [18:54] functions can be tested using only data [18:57] arguments go in the top a list or a [19:00] string or some other data structure is [19:02] going to come out the bottom so for [19:05] example if I want to test the build URL [19:08] I just call it I don't have to set up [19:10] objects I don't have to build things I [19:12] just call it with different arguments [19:14] and instead of going and hunting for [19:16] side effects I can just look at the [19:18] return value and see whether it's what I [19:22] expected no special setup is needed no [19:25] special preparation I don't have to [19:27] build a mock and and the test calls I'm [19:30] making look exactly like the calls that [19:32] are used in production so I know they [19:35] have a high probability of telling me [19:38] whether my code will work in [19:41] production uh here I'm going to test the [19:44] second half of the logic pluck [19:46] definition which needs to pull out the [19:48] value of the definition key or raise an [19:51] exception two simple tests and I have [19:53] 100% test coverage of it again making a [19:57] pure call [19:59] that is not in any way adulterated or [20:02] changed or adjusted from the way this [20:05] function will be experiencing reality [20:07] when this code is in production it's [20:09] seeing exactly the same kind of things [20:11] come in and go out as it will when I use [20:14] it for [20:15] real uh being able by the way to write [20:19] the tests like that taught me about a [20:22] symptom of coupling I had never observed [20:24] a symptom that tells me that I might [20:26] have locked logic together that that [20:28] could more cleanly split out you'll note [20:30] that all I had to do there was write one [20:33] set of tests for building the URL and a [20:36] completely different set of tests for [20:38] whether I could parse the data that came [20:40] back and I noticed that in a lot of my [20:43] older projects I had bigger more [20:47] complicated uh routines where I had [20:52] where doing the test for a good URL and [20:54] good data was very easy to call but that [20:58] I then had to essentially uh start doing [21:02] different permutations of argument to [21:05] get each part of my logic to fail [21:08] separately because it wasn't out [21:09] separate where I could call it and so [21:11] having a big series of pieces of logic [21:14] where I want to make each part fail [21:16] individually I first have to make a [21:18] bunch of calls with a bad URL but that [21:21] don't pass in a second piece of data [21:22] because I never reached that part of the [21:24] code and then a series of tests that [21:26] give a good URL so we revive the first [21:29] half of the code but then bad data so [21:32] that that part will fail and I now [21:34] consider that uh pattern that I I see uh [21:38] a symptom a a a a cry for help if you [21:41] will from my application code telling me [21:44] that I have coupled two pieces of logic [21:47] together that are really separate they [21:49] do different things they're going to [21:51] fail in different circumstances and then [21:53] instead of leaving them coupled and then [21:55] having to Fiddle in turn [21:58] variable while leaving the others [22:00] constant I might be able to rescue these [22:03] pieces of logic into separate functions [22:06] this does become I do sometimes leave [22:10] this pattern in my tests if there's just [22:13] so much State shared between the first [22:16] and the second piece of logic that it's [22:19] just not reasonable to return all 20 [22:21] things so that they can then then be the [22:23] uh arguments to the second piece of [22:25] logic this comes up a lot in astronomy [22:27] where an initial routine might set up a [22:30] bunch of variables that the conclusion [22:33] of a logic then needs to succeed or fail [22:35] on before throwing them away and [22:37] returning a simple value but if you look [22:40] at the output to the first piece of [22:42] logic and find it's rather modest [22:45] rescuing the two pieces of logic out [22:47] into separate routines can make um your [22:50] tests less expensive simpler and easy to [22:53] think about by the fact that you're not [22:55] getting big tall sequences of logic and [22:58] contorting yourself to try to get the [23:00] third thing that happens to [23:03] fail all all all of which is invalidated [23:06] by the way if you then change the order [23:08] of your operations because now you need [23:10] something different to succeed in order [23:12] to reach the second or third uh error or [23:16] exception that could happen in your code [23:19] so that is a really really simple [23:22] example that we've just gone through um [23:26] almost trivially simple I I I made only [23:29] as complicated as I thought it would you [23:30] needed to to get the point in real life [23:33] the clean architecture often involves [23:36] much much bigger pieces of code and the [23:38] question of how they hook together not [23:41] nine line functions and the fact that we [23:43] can pull one or two pieces out what um [23:46] Uncle Bob Martin does is he as he's [23:49] designing his entire application he's [23:53] thinking through what parts of my [23:56] business logic can survive [23:59] being split off where they take [24:01] arguments take data structures and [24:03] return data structures such that the top [24:07] level glues all of these pieces together [24:10] so that the io stays up at the top level [24:15] and the bottom levels are simply objects [24:17] or functions that don't need to know [24:19] where the data is coming from where it's [24:21] going how it's getting [24:23] persisted uh but instead simply enforce [24:27] your business rule [24:29] do your computation and leave it up to [24:31] the caller where to put the [24:34] results he says in one of those blog [24:37] posts in general the further in you go [24:39] in his architecture the higher level the [24:42] software becomes the outer circles are [24:45] mechanisms the inner circles are [24:48] policies the important thing is that [24:52] isolated simple data structures are what [24:55] is passed across the boundaries [24:59] when any of the external parts of the [25:01] system become obsolete like the database [25:05] or the web framework you replace those [25:08] obsolete elements with a minimum of [25:11] fuss because all the innards don't know [25:14] about the database the ards don't know [25:16] that the web is there the um and so if [25:20] you need to replace the way your data is [25:23] stored the way data flows in or out you [25:26] just make adjustments at the outside [25:27] level and everything else should keep [25:33] working back to our code to make this [25:35] concrete we could change how we do the [25:38] io we could change how we batch up these [25:41] operations we could change what happens [25:43] up at the top without having to change [25:46] either of these functions down inside [25:50] because they take simple data as input [25:53] manipulate it and return new data as [25:56] output [25:59] all right you might [26:01] say I would like to know whether my app [26:03] works against duck [26:05] Dugo I do want to test my IO code at [26:09] least once even if this pattern does let [26:12] me do most of my testing with peer data [26:15] how do you test the top level procedural [26:17] glue uh and here I'd refer you to Gary [26:20] bernhard's talks at Pyon 2011 through [26:24] 2013 where he uh from the Ruby world [26:27] that's his Prim language uh explored a [26:30] different form of this same kind of [26:34] approach and talking about how to make [26:36] the majority of your tests very fast and [26:39] only investing in a few tests doing the [26:43] endtoend io bound operations that there [26:46] at the end tell you yes my app actually [26:49] works and will actually fetch in real [26:52] information from a database or whatever [26:54] and work with it his terminology is a [26:57] little different than Uncle Bob but [26:58] works in much the same way an imperative [27:02] Shell at the top level that does IO that [27:06] wraps and uses your functional core [27:10] functional core because it takes and [27:12] returns data can have lots of fast unit [27:15] tests exercising directly all the ways [27:18] it could fail all the conditions it has [27:20] to detect up at the top your imperative [27:23] shell hopefully only needs a few [27:26] integration tests in order to verify for [27:29] you that it works because you're not [27:30] having to hit the imperative shell with [27:33] the 20 different ways that a uh a word [27:36] definition you're looking up could be [27:37] misformed you're doing that by testing [27:40] the functional core you just test the [27:42] imperative shell to make sure the pieces [27:44] are then hooked together correctly [27:46] here's our top level function from [27:48] listing three I mean there's not even [27:50] any if statements here it shouldn't [27:52] require very many tests to confirm for [27:54] you that this is doing the steps of your [27:57] application in the right order this [28:00] pattern by the way um is already [28:03] familiar to a lot of people who do [28:04] functional programming languages like [28:06] list Pascal closure and fshp make it [28:09] quite natural to write most of your code [28:13] as pure functions um and then you get [28:16] awkward and put the procedural stuff up [28:18] at the top functional languages [28:20] naturally lead you to process data [28:22] structures while avoiding side effect IO [28:26] you tend to call functions and fun [28:28] functional languages for what they [28:30] return not for the sequence of things [28:33] they happen to do while that code is [28:37] running this is an example of IO as a [28:40] side effect I'm getting a data [28:42] processing task iterating over a python [28:45] iterator to get a series of words and [28:48] upper casing them with an IO task when [28:52] you call uppercase words in this example [28:54] you're not expecting to see any [28:56] uppercase words as it's return value [28:59] you're expecting that when it returns [29:01] nothing to you it will have had a side [29:03] effect in the outside world of producing [29:06] those as output if you want to test this [29:08] you're going to have to use mack. patch [29:11] or something else to intercept the [29:13] standard [29:15] output here is an example of the same [29:18] code split into a purely logical piece [29:23] where it [29:24] consumes uh uh an iterator that gives it [29:27] word words and produces as an iterator [29:31] as a generator in this case a series of [29:33] uppercase words separately from the [29:36] question of any side effects and it can [29:39] then be quite naturally plugged into a [29:41] top level as Gary burnhard calls it [29:43] procedural glue routine which then does [29:47] the io on its [29:50] behalf procedural code tends to be [29:53] called not because it's going to return [29:55] anything interesting but because of what [29:57] it does because of what it tosses out or [29:59] pulls from the world it tends to Output [30:02] as it runs functional code on the other [30:05] hand tends to be organized in discrete [30:08] stages that each produced data that then [30:12] finally gets output at the [30:16] end in Gary bernhard's talks he talks a [30:20] lot about the [30:21] immutability uh a lot of these [30:23] functional programming languages imagine [30:26] python where you didn't have lists but [30:28] only tupal that once a list was built [30:30] you couldn't change it anymore imagine [30:33] python with dictionaries that once you [30:34] built them you could never change them [30:36] where if you wanted to uh produce a new [30:39] dictionary you had to ask for the a copy [30:42] of the old dictionary with like one [30:44] thing changed or something uh a lot of [30:46] these functional languages have [30:48] immutable data structures that never [30:50] change and [30:52] some programmers who who are fans of the [30:55] functional programming Style say that [30:58] that they're much much easier if they [31:01] pass a data structure to a function [31:03] knowing it can't be changed that they [31:05] don't have to go search to see if it [31:06] looks different that every data [31:08] structure is immutable and they claim [31:11] some of them like to claim that the [31:13] whole point of this programming style is [31:15] immutable data structures so that you [31:18] would feel guilty about having objects [31:20] with writable attributes or dictionaries [31:23] that you might update and I'm going to [31:25] make the argument that it is not [31:27] immutability that makes the functional [31:29] programming language so clean or it's [31:32] not the only thing my guess is that the [31:35] biggest advantage of data in a [31:37] functional programming style isn't its [31:40] immutability it is simply the fact that [31:43] it's data and that data structures you [31:45] can see them you can reason about them [31:48] unlike a moving process that you're [31:49] worried about whether it's spinning off [31:51] consequences in the right order a data [31:53] structure is just something you can look [31:55] at and understand [31:58] um two two examples from Computing [32:00] history that I'll use to back me up on [32:02] this the famous Fred Brooks book the [32:04] mythical man month about uh successes [32:07] and failures in managing uh projects uh [32:10] written in 1975 it's very famous you [32:13] probably heard of it before because of [32:15] the quote um this is back when if a pro [32:18] a project was going slowly they would [32:19] just keep throwing more developers in [32:22] the bearing of a child takes 9 months no [32:27] matter how many women are [32:30] assigned there are some processes that [32:34] do not get faster because you flood the [32:36] organization with young with with with [32:38] untrained people who who who don't know [32:40] what's going on and and he often found [32:43] projects he's the famous uh aphorism [32:45] that projects go slow more slowly the [32:48] more people you add in many [32:51] cases he said the following on the [32:53] question of what's easier to understand [32:56] data or code code at the time code was [32:59] usually written out as flowcharts and [33:02] data was usually organized in memory in [33:04] what they call tables he said show me [33:08] your flowchart and conceal your tables [33:12] and I shall continue to be [33:15] mystified show me your [33:17] tables and I won't usually need your [33:21] flowchart it'll be [33:23] obvious very often if you just show [33:26] someone the way that you laid out your [33:28] dictionaries and [33:30] lists and other data structures they can [33:33] probably guess how you're going to run [33:35] through those data structures and get [33:36] your job done there's something that's [33:39] much clearer about seeing the data that [33:41] you wind up producing or the data in an [33:44] intermediate step than to stare at the [33:46] steps in your program and try starting [33:50] there without any bigger picture of what [33:52] they're creating trying to guess or [33:55] understand what result is being built or [33:59] generated uh so that's one example of of [34:01] a uh a famous thinker in computer [34:04] science who I think would back me up [34:06] that the data is where it's at but I'll [34:08] also cite the [34:10] 1986 famous showdown between melroy and [34:15] Donald kth um who who K who largely [34:19] invented computer science as in the 70s [34:21] as he wrote uh the art of computer [34:24] programming um canth very very very [34:28] famous programmer uh there he [34:31] is was asked to write a [34:34] routine uh he he he practiced something [34:36] called literate programming where he had [34:38] lots and lots of comments and and where [34:40] where a computer program could actually [34:41] be published as a book explaining itself [34:44] and he was given by a uh programming [34:47] magazine the task given a text file and [34:51] an integer [34:53] K can you tell that computer science was [34:55] invented by mathematicians [34:58] print the K most common words in the [35:01] file and the number of their occurrences [35:04] in decreasing [35:06] frequency he produced 10 pages of Pascal [35:10] code that did this um and [35:13] mroy said I mean this is a for he [35:16] admitted this is a formidable solution [35:19] can's solution is to tally in an [35:22] associative data data structure [35:24] something like our python dictionary [35:26] each word as it is from the file the [35:29] data structure is a tree with 26 way [35:32] well for technical reasons actually 27 [35:34] way fan out at each letter to avoid [35:37] wasting space all of the sparse 26 [35:40] element arrays are cleverly interleaved [35:43] in one common arena with hashing used to [35:46] assign [35:47] homes 10 pages of [35:54] Pascal at the conclusion of his article [35:57] after viewing can's code pointing out [35:59] several bugs in it and uh edge cases [36:02] that would make it [36:04] crash in one of the most famous moments [36:06] in computer science Mikel Roy replaced [36:09] Donald kth with a six line shell [36:16] script the first [36:19] line finds every run of letters a [36:23] through z or lowercase letter through Z [36:25] in the file and puts them together gets [36:29] everything that's not a letter and turns [36:30] it into a new [36:32] line Second command makes everything [36:35] lower case so that we don't count Words [36:37] twice if they're at the beginning and in [36:38] the middle of a [36:39] sentence sort is going to bring you know [36:44] AR all of the word instances of the word [36:46] arvar together in a row and then all of [36:48] the instances of the word Brandon and [36:50] Python and so forth unique is going to [36:53] get those runs of identical words count [36:56] them until you arvar five Brandon and so [36:59] forth I suppose I rate my popularity [37:02] slightly below that of the arvar [37:05] then it is then going to sort that [37:08] output on the numeric field sitting in [37:11] front of each word so that six arvar [37:14] goes first and um for python goes next [37:18] and so forth and then finally we ask [37:21] said for the first in lines and then [37:23] quit [37:27] mckelroy points out that every one of [37:30] these tools back as Unix was being [37:31] invented was written first for a [37:33] particular need but then untangled from [37:35] the specific application the person who [37:38] first needed to put a sorter inside of [37:40] their program got it written and then [37:43] Ste back and said you know I'll bet [37:45] someone else might need to sort [37:47] something someday and went to the work [37:50] which is difficult of pulling that out [37:52] so it could work on any text input file [37:54] with any format now the traditional [37:57] lesson then the one that melroy Drew [37:59] here was it's better to use simple small [38:03] tools that can be easily linked together [38:06] and I would say that if this is the only [38:07] lesson we can draw from his uh Showdown [38:10] with uh canth uh it's it's a very good [38:12] one because python has lots of simp [38:15] because of the iterator protocol [38:17] especially it's really easy in Python to [38:19] link together a series of generators to [38:22] throw in sets and lists and dictionaries [38:24] at just the right point to get a lot of [38:27] really interesting data processing done [38:30] but today I want to draw a different [38:32] lesson that meloy did [38:34] not to me the shell script is [38:38] simpler not simply because the steps are [38:41] easy but because I can picture the [38:46] data it's because in between each of [38:49] these pipes in between each of these [38:51] commands that data is Flowing between I [38:54] can close my eyes and know exactly what [38:57] that data looks like at easy at each [39:00] step the shell script is the simpler [39:02] solution because it operates the [39:05] stepwise transformation of data and [39:07] what's key here is not simply that the [39:09] steps are easy to describe I'll bet that [39:11] many of you even who didn't know the TR [39:13] command before could probably tomorrow [39:15] explain this shell script to someone [39:17] after a bit of heads scratching it's not [39:20] just that the steps are easy though they [39:22] are it's that I can just close my eyes [39:25] and picture what the out put looks like [39:28] at the conclusion of each command [39:31] running and that's very powerful because [39:33] it lets you visualize very accurately [39:36] what this is [39:37] doing in a way that is not going to [39:40] happen with 10 pages of dense Pascal [39:43] that are producing an inmemory hashtable [39:46] 26 27 way fan out um binary [39:52] tree this approach continually surfaces [39:56] intermediate results that can be checked [39:58] examined if you find this doesn't work [40:01] you can just go back and find it which [40:03] Step it failed in each case as simple [40:05] plain [40:07] text so if I'm right that one of the big [40:10] wins of a functional programming style [40:13] is simply that it deals in data which [40:15] our minds can picture very [40:18] easily what then is the value of [40:20] immutability and I think Gary burnhard [40:23] got this right when he said the fun of [40:25] immutability I believe this was in the [40:27] 12 talk is distributed computing um is [40:32] that um this is my only slide about this [40:34] is that um if all of your routines just [40:38] take a data structure and return a data [40:41] data structure it doesn't much matter [40:43] what core they run on in a big cluster [40:46] you can push data out to a bunch of [40:48] servers run the data step separately and [40:52] then collect the output back and a task [40:56] that you broke broken down into steps [40:59] that simply pull in data and return data [41:01] can then be hooked up to a message cue [41:03] and fanned out across a very wide uh [41:07] data server so [41:10] long as it's the return value that's [41:12] important and not the side effect if I [41:14] call a routine and its value is that my [41:17] data structure will now look different [41:19] it's got to live on the same machine so [41:20] it's changing the copy of the data [41:22] structure I've got in [41:24] memory but if it's it's return Val value [41:27] that's important and not the way it [41:29] monkeys with the data I already have in [41:31] memory it can run anywhere so long as [41:33] the result is [41:35] delivered data and transforms are easier [41:39] to understand and I think they're easier [41:41] to maintain than coupled [41:45] procedures now if that's the case python [41:48] has been evolving recently in exactly [41:51] the right direction if you think about [41:52] the kind of innovations that have marked [41:54] the last decade especially of uh in a [41:56] decade and a half half as python has [41:58] grown from the language it was in the [42:00] 1990s in October 2000 we got the list [42:04] comprehension the list comprehension [42:07] seems like a slight convenience but it [42:10] really changes you as a [42:14] programmer it takes someone whose job [42:19] our job used to be modifying data [42:22] structures make an empty list and then [42:24] go through and start changing it [42:27] and it turned us into people that build [42:29] new data structures that we often never [42:31] touch often my code today takes in a [42:34] list and in a series of comprehensions [42:37] just like that shell script generates a [42:40] series of intermediate results and a [42:42] final data structure that it returns [42:44] without ever reaching back into one of [42:47] the earlier results and feeling the need [42:49] to change it list comprehensions make it [42:51] really easy to write python code that's [42:54] purely functional where you write your [42:57] using right once throwaway data [42:59] structures for your intermediate results [43:02] rather than doing constant [43:04] modification python [43:06] 2.4 saw the introduction of the sorted [43:10] built-in remember how we used to sort [43:12] you used to have to build a list give it [43:14] a name call its. sort method which [43:16] returns none so you couldn't use it in [43:18] the middle of an expression and then go [43:20] back to the list which has now been [43:22] modified which has now changed to see [43:24] the result thanks to um um Raymond [43:28] hettinger's addition of the sorted built [43:30] in we now just ask for data to be sorted [43:33] and returned to us in a single step [43:36] instead of having to build and modify a [43:37] data structure in [43:39] several and um if you try applying these [43:42] different patterns to your python code [43:45] remember that python has several [43:47] different ways of breaking out a pattern [43:50] from uh your code we do have functions [43:53] or methods but remember that if it's [43:55] iteration that want to factor out you [43:58] can build a generator if you have some [44:02] set up and tear down that you want to [44:04] pull out you can build a context manager [44:07] there's actually whereas older [44:09] programming languages uh let's say Java [44:12] will have one way to break out a sub [44:15] routine and then you have to come up [44:17] with design patterns that fill in the [44:19] lack of generators and the lack of [44:22] context managers with something like the [44:24] visitor pattern or something like that [44:26] python just has all three of them built [44:28] in you can pull out the middle of a loop [44:31] as a function you can pull out the loop [44:33] logic itself as a generator you can pull [44:36] up set pull out setup and tear down as a [44:39] context manager we have a lot of [44:41] different ways of getting logic and [44:45] doing that rescue operation where we [44:47] decouple it from our IO so that it can [44:50] live separately two real world examples [44:53] just current projects of mine are uh [44:56] skyfield and object-based API for [44:58] astronomy backed by dozens of pure [45:01] functions that were really easy to test [45:04] functions implement the actual [45:06] operations um and the miserable thing by [45:09] the way about a method is that it [45:11] implicitly depends on the state of the [45:13] whole object it's often hard to test a [45:15] method because it's not clear how much [45:17] of the object in the test needs to be [45:19] set up and initialized before the method [45:21] can run um whereas the beautiful thing [45:25] about a function is you just read the [45:27] arguments and you know now what to [45:28] provide to it there's a well-written [45:31] function doesn't need extra globals or [45:34] other persistent State set up and [45:36] available for it to run which makes [45:38] testing bug fixing a lot of easier [45:43] remember in the Zen of python which I [45:45] hope you look at every morning as you [45:46] get ready to [45:48] code second only in Python's uh um motto [45:55] behind beautiful is better than ugly [45:57] is explicit is better than implicit and [46:00] a function if nothing else is very [46:02] explicit about its needs right there in [46:05] the argument list tells you what it [46:06] needs to succeed so skyfield is one [46:10] example I've done recently that has [46:12] turned out really well when I didn't [46:14] strand my important logic up coupled [46:17] into my objects or IO but where I spun [46:20] off everything I could in an easy to [46:22] call function the other is something I [46:24] use for filling in tax forms for myself [46:27] called Lucha it's also on GitHub the [46:29] Temptation there and actually the first [46:31] version of it as it ran along Computing [46:34] Fields would immediately then call the [46:36] low-level PDF operations to you know [46:39] write them onto the 1040 tax form or [46:43] whatever I then was able to uh rescue [46:47] that that deeply deeply um compromised [46:50] and and very difficult to maintain code [46:52] by breaking it into phases I first read [46:55] in the entire input of the tax form [46:58] while resisting the temptation to do [46:59] anything with it I simply read it into a [47:02] data structure and return that I then [47:05] have a routine that takes the inputs to [47:07] a tax form and it's very easy to write [47:10] these little routines add up the numbers [47:12] do the rounding multiply the percents [47:14] and produce all of the output lines in [47:17] the tax form and I resist the temptation [47:20] to write that out to the PDF I hand that [47:23] data data structure then to my PDF [47:25] writer who's job is to fill text into [47:28] Fields it it was much easier to write [47:31] and maintain code that was split so that [47:34] data structures pass between phases [47:36] rather than making a slightly shorter [47:38] program that immediately tried to go [47:40] have side effects but thereby made [47:42] itself very difficult to test so the [47:44] pith of the idea here is that in the old [47:47] days if we wanted to get rid of all of [47:50] that [47:51] pesio we would try to accomplish that by [47:54] turning it into a sub routine [47:57] the new idea I'm propounding is that if [47:59] you really want to get rid of someone [48:02] make them a [48:04] manager put them in charge get all of [48:07] that IO Laden code and make it feel [48:10] important by putting it up at the [48:13] top of your program make it the [48:16] procedural glue leaving all of the [48:19] little function subordinance free to do [48:22] their jobs let's return to Wheeler I [48:25] have one last quote in [48:28] 1952 he gave us the sub [48:31] routine and I think because of that [48:35] initial phase of computing history in [48:37] which we tried to use it wrongly we have [48:39] yet to realize its full power and [48:41] promise but I'd like to end with a quote [48:44] in which he described what he thought [48:45] would someday happen now that we have [48:47] sub routines he said when a program has [48:51] been made from a set of sub routines the [48:54] breakdown of the code is more complete [48:57] than it otherwise would be this allows [49:00] the coder to concentrate on one section [49:02] of the program at a time without the [49:04] overall detailed program continually [49:07] intruding thus the sub routines can be [49:11] more easily coded and be tested in [49:14] isolation from the rest of the program [49:17] when the entire program has to be tested [49:20] it is with the forn knowledge that the [49:23] incidence of mistakes in the sub routine [49:27] is [49:28] zero or at least one order of [49:33] magnitude below that of the untested [49:36] portions of the program thank you very [49:39] much for listening I'm Brandon rhods [49:46] [Applause]