[0:00] so far in this series we've looked at [0:02] how deep learning was a paradigm shift [0:04] in AI where intelligence is understood [0:06] to be the ability to learn instead of [0:08] following human instructions the problem [0:11] is most people see neural networks as [0:13] some kind of magic it's not clear why [0:16] they work and so in this video we'll [0:19] explore the guts of a neural network [0:23] recall that a neural network receives an [0:25] input which we could call a perception [0:27] this could represent an image sound text [0:30] anything we want a network to perceive a [0:33] perception boils down to a list of [0:35] measurements that are provided as input [0:37] as a list or vector of values for [0:41] example if it was an image each value [0:43] would represent a single pixels value in [0:46] that image and these values are [0:49] represented as electrical pulses which [0:51] are sent to the first layer of neurons [0:53] which we call the input layer and based [0:55] on these values some of the neurons in [0:58] this layer will fire in a predictable [1:00] manner and send off a pulse to the next [1:02] layer of neurons and this process [1:04] repeats creating a wave of electrical [1:06] activity that passes through all the [1:08] layers in the network at the final [1:11] output layer certain neurons turn on or [1:13] off the output can describe the degree [1:16] of belief that the input is or isn't [1:19] some concept it's been trained to [1:20] recognize based on the activation level [1:23] of the output neurons a key question is [1:26] how does a neural network connect [1:29] perceptions to concepts [1:31] put another way if we freeze a neural [1:34] network as it's processing the picture [1:36] of a dog what is it doing inside right [1:39] before it knows it's a dog let's start [1:43] with the simplest possible neural [1:45] network a single neuron with just one [1:47] input and one output you can think of a [1:50] neuron as a switch the input is [1:52] represented as a number which is the [1:54] value of something being measured such [1:57] as the temperature outside if the input [2:00] value is above a certain threshold what [2:03] we call the activation threshold it [2:05] flips the output on otherwise the output [2:08] remains off [2:09] and to better understand the guts of a [2:11] neural network we'll need a mathematical [2:13] model of this simple switching action [2:16] imagine the input value or temperature [2:18] is a point on a line the position of [2:22] this point depends on the input value we [2:25] can think of this as our perception [2:27] space it's one dimensional because we [2:29] only have one input a neuron can be [2:33] viewed as dividing the perception space [2:36] into active or inactive regions if the [2:39] value is above the neurons activation [2:42] threshold it fires down the output and [2:45] so when we train a neuron we are moving [2:48] this dividing line around by changing [2:50] the weight of the incoming connection [2:52] which determines how much current is [2:54] needed to trigger the neuron similar to [2:57] a thermostat where the input is the [2:59] temperature and the activation point is [3:02] where we like the air conditioning to [3:04] turn on so let's consider a neuron with [3:06] two inputs [3:07] perhaps the input is a temperature and [3:10] pressure reading of the environment now [3:13] our model will have two variables which [3:15] can each define a position along a [3:17] dimension and so together they can be [3:19] thought of as a two dimensional [3:21] perception space or plane where every [3:24] input to the neuron is a point in this [3:27] 2d space in this case the neuron can be [3:30] represented as a straight line which [3:32] partitions the space into active and [3:34] non-active regions any input on this [3:37] side will trigger the neuron and any [3:39] input on this side will not and this [3:42] pattern continues if we add more inputs [3:44] we just move up a dimension each [3:46] measurement or input can be represented [3:49] as a point in 3-dimensional perception [3:52] space and the neuron can be represented [3:55] as a plane which partitions this space [3:57] into active and non-active regions [4:00] perhaps we want the neuron to act as a [4:02] storm detector for example and so no [4:06] matter how many inputs it has a neuron [4:09] is like a partition or linear separation [4:11] of a set of data points in perception [4:14] space in higher dimensions we just call [4:17] it a hyperplane [4:19] this is how perceptions which are values [4:21] measuring the environment can turn into [4:24] concepts where a concept is a region in [4:27] perception space that's how the neuron [4:30] knows how to feel a storm if a [4:32] measurement is in the right region but [4:35] of course reality is not always so [4:37] simple because we can always draw a [4:38] straight line through our problems for [4:41] example imagine a situation where we [4:43] have two kinds of input measurements [4:45] temperature and humidity and our input [4:48] measurements arrange themselves in a [4:50] perception space like this the circles [4:53] represent measurements of winter days [4:55] and the x's represents summer days well [4:58] we can't draw a single line to separate [5:01] these points but if we have two lines we [5:04] can separating the data into four [5:06] distinct regions each region is defined [5:09] by the state of the neurons being on or [5:11] off if neuron one is on and neuron two [5:14] is off we know it's in this blue region [5:17] and that is why we need to use multiple [5:19] neurons so we have the ability to carve [5:21] up the perception space into many more [5:23] regions this is what the learning [5:26] process does by changing the weights of [5:29] the connections we move these partitions [5:31] around to carve out regions around [5:34] conceptually similar input points so [5:38] let's pause and summarize a perception [5:41] is a list of measurements that are [5:42] inputted into a network these vectors [5:45] can represent a coordinate or point in [5:48] perception space the number of [5:50] dimensions in this space is equal to the [5:53] number of different input values and [5:55] neurons act as partitions in this space [5:58] and a group of neurons together define a [6:02] specific region in this space and these [6:05] regions can carve out inputs which are [6:07] part of the same concept but so far [6:10] we've been looking at simple toy [6:12] problems and when we move to the real [6:14] world things get a little bit more [6:16] interesting for example the first big [6:19] commercial application of neural [6:20] networks was vision specifically making [6:24] a machine which can understand human [6:26] handwriting so that at the post office [6:29] that can read human letter at [6:30] automatically this is a hard problem [6:34] because everyone writes numbers slightly [6:36] differently so the machine must find the [6:38] general pattern of each number in this [6:42] example the input to our network is an [6:43] image containing 784 individual pixels [6:47] and so we have 784 input dimensions each [6:51] which measure the brightness of one [6:53] pixel using our spatial view we can [6:56] think of the image of each written digit [6:58] we input to the network as a point in [7:01] perception space and if we take many [7:04] real examples and plot them in the [7:07] perception space we get this the points [7:10] are not nicely clustered into regions [7:12] but scattered all over and so to carve [7:15] up this space into regions is going to [7:17] be very difficult the messy distribution [7:20] of inputs in perception space is why [7:22] shallow networks with only one middle [7:25] layer struggled to divide categories up [7:28] cleanly the way out of this problem [7:30] though is to follow the way of nature [7:32] organic brains use layers of neuron [7:35] activations to process their inputs the [7:38] importance of depth or many layers is [7:41] the least understood aspect of neural [7:43] networks so let's pause and consider a [7:46] simple analogy to understand why [7:48] multi-layered networks are better at [7:50] partitioning the perception space than a [7:52] single layer network imagine this is our [7:55] perception space and we have two kinds [7:58] of input data types each neuron we add [8:01] in the first layer acts like a fold in [8:04] this space with two neurons we can make [8:07] two folds like this and we could keep [8:09] going folding and unfolding the paper to [8:12] carve out regions to separate these [8:14] points this will take six separate folds [8:17] this allows us to then group regions [8:20] containing the same type of points using [8:23] a final neuron which activates if any of [8:25] those regions are active but now [8:29] consider what happens if we layer our [8:31] folds that is we don't unfold after each [8:33] fold so let's do the first fold again [8:36] then the second then the third fold [8:40] across that layer like this [8:43] that ends up carving the space in the [8:45] exact same way using three folds instead [8:48] of six and if we were to continue this [8:51] process with a fourth fold that results [8:54] in 16 regions and five folds results in [8:59] 32 regions this recursive power of [9:02] folding shows how we can get [9:05] exponentially more partitions using the [9:07] same number of neurons if we layer them [9:11] practically this means that neurons deep [9:13] in a network are not simple linear [9:16] partitions but are instead activated by [9:19] a complex pattern of linear partitions [9:22] and so let's look at how this works [9:24] using a real world example researchers [9:27] took a neural network which was trained [9:29] on real images such as image net and [9:31] then probed individual neurons to find [9:35] out what activated them or what turned [9:38] them on if we probe neurons in the first [9:41] layer of the network we find they are [9:44] detecting these patterns which are [9:46] looking for our edges and points then [9:49] they move to the next layer deeper into [9:51] the network and probe those neurons to [9:54] see what activates them what they found [9:56] was the next layers are activated by [9:58] different kinds of textures and deeper [10:02] into the network these textures get more [10:04] specific and as you move deeper into the [10:07] network [10:07] the textures get more complex and the [10:11] deepest layers contain individual [10:13] neurons that are activated by entire [10:16] objects such as dogs wheels houses or [10:20] trees these complex activation patterns [10:24] are possible due to the layered [10:26] structure of the network and so if we [10:28] cut open a neural network will find the [10:31] deep layers contain representations of a [10:33] perception base on what level of [10:36] different things or patterns they [10:38] contain which is defined by how active [10:41] those specific neurons are for example [10:44] of an image of a dog would light up the [10:47] doglike patterns in the network [10:50] finally let's flip back to the spatial [10:52] perspective to see this power of layers [10:55] in acts [10:56] recall that our input such as an image [10:59] can be represented by a point in [11:01] perception space and each of the [11:04] following layer activations can be [11:06] thought as moving that point to a new [11:09] location and it finally settles into a [11:12] final space at the end of the network we [11:14] could call concept space and critically [11:18] the job of these transformations is to [11:20] pull apart dissimilar points in concept [11:24] space and push together similar ones to [11:28] see this in action let's return to our [11:29] real example where we plot the points of [11:32] various perceptions of different human [11:35] written digits notice at first the [11:38] points are scattered all over but as [11:41] these perceptions move through the [11:44] network gradually these points are [11:46] separated into tighter and tighter [11:49] clusters and so each layer acts as a [11:52] transformation that gradually [11:54] disentangle x' these points this allows [11:57] the final layer of neurons to easily [11:59] partition the data into separable [12:01] regions which represents the concepts in [12:04] this case numbers we are looking to [12:07] classify and so the magic or true power [12:10] of a neural network is entirely in this [12:13] layered processing because it allows the [12:15] final layer of neurons to carve up [12:18] concept space where the points are [12:20] nicely clustered instead of perception [12:23] space which is hard or impossible to [12:25] partition that's how a neural network [12:28] knows something is a number 3 or 6 based [12:33] on the proximity to the 3 or 6 cluster [12:36] this partly describes why we have [12:38] different mental feelings or intuitions [12:40] when we see different objects it's [12:43] because different groups of neurons are [12:45] being activated deep in our mind [12:47] depending on what cluster it belongs to [12:50] and these clusters can also be thought [12:53] of as connected regions or manifolds and [12:57] so perceptions representing the written [12:59] digit 6 would fall on one manifold and [13:02] the digit 3 would fall on another and so [13:06] manifolds are spatial regions which [13:09] represent [13:09] the patterns essential to a concept [13:12] these patterns are defined by the [13:14] connection patterns or strength across [13:17] the neuron layers but so far we have [13:21] only modeled how intuition works that is [13:24] a rapid interpretation of some impulse [13:27] such as when you recognize someone's [13:29] voice by a single syllable this is what [13:32] a single pass through a neural network [13:34] simulates the other big challenge is [13:37] problems which require reasoning [13:41] interactive problems such as having a [13:43] conversation or playing a game these [13:46] problems are sequential in nature and [13:49] require things like a form of working [13:51] memory this leads us to the cutting edge [13:54] of neural network research [13:55] how will a neural network learn to [13:58] reason [14:00] [Music] [14:19] you