The foundational building block of neural networks is the perceptron - a binary classifier loosely modeled after a biological neuron.
The perceptron was invented in 1943 by Warren McCulloch and Walter Pitts, and first implemented in 1957 by Frank Rosenblatt.
Biological NeuronΒΆ
Biological neurons typically look like this π
https://www.simplypsychology.org/neuron.html
This is one neuron. Not shown in this picture are the other ~86 billion neurons in your brain, connected to this one in an intricate web.
On the left side of this one neuron are its dendrites. Each dentrite is connected to another neuron.
On the right side of this neuron is the axon terminal. Other, downsteam neurons connect to this neuron via its axon terminal.
Information flows through the web of interconnected neurons in the form of electric or chemical signals.
The basic model goes like this:
- This neuron receives information from other neurons via its dendrites.
- This neuron processes the information and decides whether to emit a signal.
- If the neuron emits a signal, the signal flows through the axon to the axon terminal, where this neuron's output signal becomes an input signal to downstream neurons.
Most neurons fire an all-or-nothing response. That is, they recieve a bunch of input signals, and they either output a signal (1) or they output nothing (0).
Neurons are diverse and complex
The diagram above depicts a multipolar neuron - the most common type in the human brain. However, other types of neurons exist such as unipolar, bipolar, and anaxonic. There are even different types of multipolar neurons (e.g. Golgi type I and Golgi type II).
Also, some neurons emit a graded response as opposed to an all-or-nothing response, particularly neurons in the retina.
Perceptron ModelΒΆ
Inspired by biological neurons, the model of a perceptron is as follows
- Start with an input vector: x β \vec{x} x
- Take a weighted sum of its components: w β β x β \vec{w} \cdot \vec{x} wβ x
- Add a bias (AKA offset) to the weighted sum: w β β x β + b \vec{w} \cdot \vec{x} + b wβ x+b
- If the sum is greater than or equal to 0, output 1, otherwise output 0
More formally,
y i ^ = { 1 , Β w β x i + b β₯ 0 0 , Β otherwise \hat{y_i} = {\begin{cases} 1, \ \mathbf{w} \cdot \mathbf{x}_i + b \ge 0\\ 0, \ \text{otherwise} \end{cases}} yiβ^β={1,Β wβ xiβ+bβ₯00,Β otherwiseβBinary Classifier
The perceptron is a binary classifier because it outputs a binary response (0 or 1).
ExampleΒΆ
Suppose you want to build a model for 2x2 grayscale images to classify platforms.
A platform is just two dark cells on the bottom and two light cells on the top (kind of what an actual platform would look like from far away).
You collect the following training data.
is_platform
is the target. x1
, x2
, x3
, and x4
are the features.
Four features means our perceptron model needs four weights.
Now suppose the π§ Perceptron Fairy visits our dreams to tell us the best weights and bias for the model are
w β = [ 0.8 0.8 β 1.0 β 1.0 ] b = β 250 \begin{aligned} \vec{w} &= \begin{bmatrix} 0.8 & 0.8 & -1.0 & -1.0 \end{bmatrix} \\ b &= -250 \end{aligned} wbβ=[0.8β0.8ββ1.0ββ1.0β]=β250βOur updated perceptron model looks like this π
Calculating weighted sums plus bias and then thresholding gives us the following results
This model classifies the training data with 90% accuracy. Not perfect, but not bad .
So, why does it work?
IntuitionΒΆ
In our example, we can express the weighted sum as follows
z = 0.8 x 1 + 0.8 x 2 β x 3 β x 4 β 250 \begin{aligned} z = 0.8x_1 + 0.8x_2 - x_3 - x_4 - 250 \end{aligned} z=0.8x1β+0.8x2ββx3ββx4ββ250βNow think about pixel x 1 x_1 x1β. As its value increases (goes from dark to light), the weighted sum increases. In other words, lightening pixel x 1 x_1 x1β adds evidence that the image is a platform. Similarly,
- lightening pixel x 2 x_2 x2β increases evidence that the image is a platform
- lightening pixel x 3 x_3 x3β decreases evidence that the image is a platform
- lightening pixel x 4 x_4 x4β decreases evidence that the image is a platform
Furthermore, a certain amount of evidence is required to classify an image as a platform. How much evidence is implied by the bias (250).
Can you imagine a scenario where this behavior would be undesireable?
DrawbackΒΆ
Instead of classifying platforms, suppose we wanted to classify stairs.
In this case, should lightening pixel x 1 x_1 x1β increase or decrease the evidence that the image depicts stairs?
It depends.
Forgot which pixel is x 1 x_1 x1β?
If x 2 x_2 x2β is dark, then lightening x 1 x_1 x1β should increase the evidence that the image depicts stairs. But if x 2 x_2 x2β is light, then lightening x 1 x_1 x1β should decrease the evidence that the image depicts stairs. Perceptrons can't handle this type of non-linear logic.
Choosing the weights and biasΒΆ
Given some training data, how do we find the best weights and bias for a perceptron? There's a clever algorithm to do this called The Pocket Algorithm, but we won't cover it here. Soon we'll be using gradient descent, making The Pocket Algorithm somewhat obsolete.