[0:00] so far in this series we've looked at
[0:02] how deep learning was a paradigm shift
[0:04] in AI where intelligence is understood
[0:06] to be the ability to learn instead of
[0:08] following human instructions the problem
[0:11] is most people see neural networks as
[0:13] some kind of magic it's not clear why
[0:16] they work and so in this video we'll
[0:19] explore the guts of a neural network
[0:23] recall that a neural network receives an
[0:25] input which we could call a perception
[0:27] this could represent an image sound text
[0:30] anything we want a network to perceive a
[0:33] perception boils down to a list of
[0:35] measurements that are provided as input
[0:37] as a list or vector of values for
[0:41] example if it was an image each value
[0:43] would represent a single pixels value in
[0:46] that image and these values are
[0:49] represented as electrical pulses which
[0:51] are sent to the first layer of neurons
[0:53] which we call the input layer and based
[0:55] on these values some of the neurons in
[0:58] this layer will fire in a predictable
[1:00] manner and send off a pulse to the next
[1:02] layer of neurons and this process
[1:04] repeats creating a wave of electrical
[1:06] activity that passes through all the
[1:08] layers in the network at the final
[1:11] output layer certain neurons turn on or
[1:13] off the output can describe the degree
[1:16] of belief that the input is or isn't
[1:19] some concept it's been trained to
[1:20] recognize based on the activation level
[1:23] of the output neurons a key question is
[1:26] how does a neural network connect
[1:29] perceptions to concepts
[1:31] put another way if we freeze a neural
[1:34] network as it's processing the picture
[1:36] of a dog what is it doing inside right
[1:39] before it knows it's a dog let's start
[1:43] with the simplest possible neural
[1:45] network a single neuron with just one
[1:47] input and one output you can think of a
[1:50] neuron as a switch the input is
[1:52] represented as a number which is the
[1:54] value of something being measured such
[1:57] as the temperature outside if the input
[2:00] value is above a certain threshold what
[2:03] we call the activation threshold it
[2:05] flips the output on otherwise the output
[2:08] remains off
[2:09] and to better understand the guts of a
[2:11] neural network we'll need a mathematical
[2:13] model of this simple switching action
[2:16] imagine the input value or temperature
[2:18] is a point on a line the position of
[2:22] this point depends on the input value we
[2:25] can think of this as our perception
[2:27] space it's one dimensional because we
[2:29] only have one input a neuron can be
[2:33] viewed as dividing the perception space
[2:36] into active or inactive regions if the
[2:39] value is above the neurons activation
[2:42] threshold it fires down the output and
[2:45] so when we train a neuron we are moving
[2:48] this dividing line around by changing
[2:50] the weight of the incoming connection
[2:52] which determines how much current is
[2:54] needed to trigger the neuron similar to
[2:57] a thermostat where the input is the
[2:59] temperature and the activation point is
[3:02] where we like the air conditioning to
[3:04] turn on so let's consider a neuron with
[3:06] two inputs
[3:07] perhaps the input is a temperature and
[3:10] pressure reading of the environment now
[3:13] our model will have two variables which
[3:15] can each define a position along a
[3:17] dimension and so together they can be
[3:19] thought of as a two dimensional
[3:21] perception space or plane where every
[3:24] input to the neuron is a point in this
[3:27] 2d space in this case the neuron can be
[3:30] represented as a straight line which
[3:32] partitions the space into active and
[3:34] non-active regions any input on this
[3:37] side will trigger the neuron and any
[3:39] input on this side will not and this
[3:42] pattern continues if we add more inputs
[3:44] we just move up a dimension each
[3:46] measurement or input can be represented
[3:49] as a point in 3-dimensional perception
[3:52] space and the neuron can be represented
[3:55] as a plane which partitions this space
[3:57] into active and non-active regions
[4:00] perhaps we want the neuron to act as a
[4:02] storm detector for example and so no
[4:06] matter how many inputs it has a neuron
[4:09] is like a partition or linear separation
[4:11] of a set of data points in perception
[4:14] space in higher dimensions we just call
[4:17] it a hyperplane
[4:19] this is how perceptions which are values
[4:21] measuring the environment can turn into
[4:24] concepts where a concept is a region in
[4:27] perception space that's how the neuron
[4:30] knows how to feel a storm if a
[4:32] measurement is in the right region but
[4:35] of course reality is not always so
[4:37] simple because we can always draw a
[4:38] straight line through our problems for
[4:41] example imagine a situation where we
[4:43] have two kinds of input measurements
[4:45] temperature and humidity and our input
[4:48] measurements arrange themselves in a
[4:50] perception space like this the circles
[4:53] represent measurements of winter days
[4:55] and the x's represents summer days well
[4:58] we can't draw a single line to separate
[5:01] these points but if we have two lines we
[5:04] can separating the data into four
[5:06] distinct regions each region is defined
[5:09] by the state of the neurons being on or
[5:11] off if neuron one is on and neuron two
[5:14] is off we know it's in this blue region
[5:17] and that is why we need to use multiple
[5:19] neurons so we have the ability to carve
[5:21] up the perception space into many more
[5:23] regions this is what the learning
[5:26] process does by changing the weights of
[5:29] the connections we move these partitions
[5:31] around to carve out regions around
[5:34] conceptually similar input points so
[5:38] let's pause and summarize a perception
[5:41] is a list of measurements that are
[5:42] inputted into a network these vectors
[5:45] can represent a coordinate or point in
[5:48] perception space the number of
[5:50] dimensions in this space is equal to the
[5:53] number of different input values and
[5:55] neurons act as partitions in this space
[5:58] and a group of neurons together define a
[6:02] specific region in this space and these
[6:05] regions can carve out inputs which are
[6:07] part of the same concept but so far
[6:10] we've been looking at simple toy
[6:12] problems and when we move to the real
[6:14] world things get a little bit more
[6:16] interesting for example the first big
[6:19] commercial application of neural
[6:20] networks was vision specifically making
[6:24] a machine which can understand human
[6:26] handwriting so that at the post office
[6:29] that can read human letter at
[6:30] automatically this is a hard problem
[6:34] because everyone writes numbers slightly
[6:36] differently so the machine must find the
[6:38] general pattern of each number in this
[6:42] example the input to our network is an
[6:43] image containing 784 individual pixels
[6:47] and so we have 784 input dimensions each
[6:51] which measure the brightness of one
[6:53] pixel using our spatial view we can
[6:56] think of the image of each written digit
[6:58] we input to the network as a point in
[7:01] perception space and if we take many
[7:04] real examples and plot them in the
[7:07] perception space we get this the points
[7:10] are not nicely clustered into regions
[7:12] but scattered all over and so to carve
[7:15] up this space into regions is going to
[7:17] be very difficult the messy distribution
[7:20] of inputs in perception space is why
[7:22] shallow networks with only one middle
[7:25] layer struggled to divide categories up
[7:28] cleanly the way out of this problem
[7:30] though is to follow the way of nature
[7:32] organic brains use layers of neuron
[7:35] activations to process their inputs the
[7:38] importance of depth or many layers is
[7:41] the least understood aspect of neural
[7:43] networks so let's pause and consider a
[7:46] simple analogy to understand why
[7:48] multi-layered networks are better at
[7:50] partitioning the perception space than a
[7:52] single layer network imagine this is our
[7:55] perception space and we have two kinds
[7:58] of input data types each neuron we add
[8:01] in the first layer acts like a fold in
[8:04] this space with two neurons we can make
[8:07] two folds like this and we could keep
[8:09] going folding and unfolding the paper to
[8:12] carve out regions to separate these
[8:14] points this will take six separate folds
[8:17] this allows us to then group regions
[8:20] containing the same type of points using
[8:23] a final neuron which activates if any of
[8:25] those regions are active but now
[8:29] consider what happens if we layer our
[8:31] folds that is we don't unfold after each
[8:33] fold so let's do the first fold again
[8:36] then the second then the third fold
[8:40] across that layer like this
[8:43] that ends up carving the space in the
[8:45] exact same way using three folds instead
[8:48] of six and if we were to continue this
[8:51] process with a fourth fold that results
[8:54] in 16 regions and five folds results in
[8:59] 32 regions this recursive power of
[9:02] folding shows how we can get
[9:05] exponentially more partitions using the
[9:07] same number of neurons if we layer them
[9:11] practically this means that neurons deep
[9:13] in a network are not simple linear
[9:16] partitions but are instead activated by
[9:19] a complex pattern of linear partitions
[9:22] and so let's look at how this works
[9:24] using a real world example researchers
[9:27] took a neural network which was trained
[9:29] on real images such as image net and
[9:31] then probed individual neurons to find
[9:35] out what activated them or what turned
[9:38] them on if we probe neurons in the first
[9:41] layer of the network we find they are
[9:44] detecting these patterns which are
[9:46] looking for our edges and points then
[9:49] they move to the next layer deeper into
[9:51] the network and probe those neurons to
[9:54] see what activates them what they found
[9:56] was the next layers are activated by
[9:58] different kinds of textures and deeper
[10:02] into the network these textures get more
[10:04] specific and as you move deeper into the
[10:07] network
[10:07] the textures get more complex and the
[10:11] deepest layers contain individual
[10:13] neurons that are activated by entire
[10:16] objects such as dogs wheels houses or
[10:20] trees these complex activation patterns
[10:24] are possible due to the layered
[10:26] structure of the network and so if we
[10:28] cut open a neural network will find the
[10:31] deep layers contain representations of a
[10:33] perception base on what level of
[10:36] different things or patterns they
[10:38] contain which is defined by how active
[10:41] those specific neurons are for example
[10:44] of an image of a dog would light up the
[10:47] doglike patterns in the network
[10:50] finally let's flip back to the spatial
[10:52] perspective to see this power of layers
[10:55] in acts
[10:56] recall that our input such as an image
[10:59] can be represented by a point in
[11:01] perception space and each of the
[11:04] following layer activations can be
[11:06] thought as moving that point to a new
[11:09] location and it finally settles into a
[11:12] final space at the end of the network we
[11:14] could call concept space and critically
[11:18] the job of these transformations is to
[11:20] pull apart dissimilar points in concept
[11:24] space and push together similar ones to
[11:28] see this in action let's return to our
[11:29] real example where we plot the points of
[11:32] various perceptions of different human
[11:35] written digits notice at first the
[11:38] points are scattered all over but as
[11:41] these perceptions move through the
[11:44] network gradually these points are
[11:46] separated into tighter and tighter
[11:49] clusters and so each layer acts as a
[11:52] transformation that gradually
[11:54] disentangle x' these points this allows
[11:57] the final layer of neurons to easily
[11:59] partition the data into separable
[12:01] regions which represents the concepts in
[12:04] this case numbers we are looking to
[12:07] classify and so the magic or true power
[12:10] of a neural network is entirely in this
[12:13] layered processing because it allows the
[12:15] final layer of neurons to carve up
[12:18] concept space where the points are
[12:20] nicely clustered instead of perception
[12:23] space which is hard or impossible to
[12:25] partition that's how a neural network
[12:28] knows something is a number 3 or 6 based
[12:33] on the proximity to the 3 or 6 cluster
[12:36] this partly describes why we have
[12:38] different mental feelings or intuitions
[12:40] when we see different objects it's
[12:43] because different groups of neurons are
[12:45] being activated deep in our mind
[12:47] depending on what cluster it belongs to
[12:50] and these clusters can also be thought
[12:53] of as connected regions or manifolds and
[12:57] so perceptions representing the written
[12:59] digit 6 would fall on one manifold and
[13:02] the digit 3 would fall on another and so
[13:06] manifolds are spatial regions which
[13:09] represent
[13:09] the patterns essential to a concept
[13:12] these patterns are defined by the
[13:14] connection patterns or strength across
[13:17] the neuron layers but so far we have
[13:21] only modeled how intuition works that is
[13:24] a rapid interpretation of some impulse
[13:27] such as when you recognize someone's
[13:29] voice by a single syllable this is what
[13:32] a single pass through a neural network
[13:34] simulates the other big challenge is
[13:37] problems which require reasoning
[13:41] interactive problems such as having a
[13:43] conversation or playing a game these
[13:46] problems are sequential in nature and
[13:49] require things like a form of working
[13:51] memory this leads us to the cutting edge
[13:54] of neural network research
[13:55] how will a neural network learn to
[13:58] reason
[14:00] [Music]
[14:19] you