Ben Gorman

Ben Gorman

Life's a garden. Dig it.

Challenge

By hand, implement a multilayer perceptron that classifies stairs like these πŸ‘‡

Stairs

Assume the pixel values range between 0 (black) and 1 (white).

Data

You shouldn't need this, but here's some data in case you want to see it.

import numpy as np
 
imgs = np.array([
  [[0.52, 0.32],
   [0.44, 0.24]],
 
  [[0.91, 0.04],
   [0.11, 0.08]],
 
  [[0.07, 0.97],
   [0.16, 0.13]],
 
  [[0.19, 0.51],
   [0.9 , 0.24]],
 
  [[0.77, 0.68],
   [0.94, 0.96]],
 
  [[0.05, 0.87],
   [0.02, 0.09]],
 
  [[0.47, 0.41],
   [0.99, 0.15]],
 
  [[0.42, 0.09],
   [0.89, 0.83]]
])
 
stairs = np.array([False,  True,  True, False, False,  True, False, False])

Hint 1

You can solve this task by wiring up three perceptrons, like this

Multilayer Perceptron

Hint 2

Make every weight a 1 or -1.

Hint 3

  1. Design a perceptron that classifies "left stairs"
  2. Design a perceptron that classifies "right stairs"
  3. Feed the output of those two perceptrons into a third perceptron that classifies stairs.

Solution

There are multiple ways to do this. Here's one of them.

  1. Design a perceptron that classifies "left stairs" .

    We know the model needs four inputs and a bias, so off-the-bat we can set up something like this.

     
        b ---------\
                    \
            w1       \
    x1 -----------\   \
                   \   \
           w2      --------
    x2 ---------- |  _|Β―  |
                   -------- 
           w3     /   /
    x3 ----------/   /
                    /
           w4      /
    x4 ---------- /
     

    Left stairs should have a dark x 1 x_1 x1​, x 3 x_3 x3​, and x 4 x_4 x4​, and a light x 2 x_2 x2​. So it makes sense to assign negative and positive weights respectively. For simplicity, we choose the following

     
        b ---------\
                    \
         w1 = -1     \
    x1 -----------\   \
                   \   \
         w2 = 1    --------
    x2 ---------- |  _|Β―  |
                   -------- 
         w3 = -1  /   /
    x3 ----------/   /
                    /
         w4 = -1   /
    x4 ---------- /
     

    Now we must choose the bias. In a perfect left stairs image, x 1 x_1 x1​, x 3 x_3 x3​, and x 4 x_4 x4​ will be 0, and x 2 x_2 x2​ will be 1, so the weighted sum will be 1. As the image becomes less perfect, the weighted sum decreases. b = βˆ’ 0.8 b = -0.8 b=βˆ’0.8 seems like a reasonable choice as it forces the weighted sum to be close to 1, but allows for a bit of noise.

     
        b = -0.8 --\
                    \
         w1 = -1     \
    x1 -----------\   \
                   \   \
         w2 = 1    --------
    x2 ---------- |  _|Β―  |
                   -------- 
         w3 = -1  /   /
    x3 ----------/   /
                    /
         w4 = -1   /
    x4 ---------- /
     
  2. Design a perceptron that classifies "right stairs" .

    Following the same process, we can quickly develop a perceptron like this one.

     
        b = -0.8 --\
                    \
         w1 = 1      \
    x1 -----------\   \
                   \   \
         w2 = -1   --------
    x2 ---------- |  _|Β―  |
                   -------- 
         w3 = -1  /   /
    x3 ----------/   /
                    /
         w4 = -1   /
    x4 ---------- /
     
  3. At this point we have a model that classifies left stairs and another that classifies right stairs. Let's feed them into a downstream perceptron

          b1 ---\          b3 ---\
    x1 <         ------           \
                |  P1  | -------\  \
    x2 <         ------    w1   -------
                               |  _|Β―  | --->
    x3 <         ------         -------
                |  P2  | -------/
    x4 <         ------   w2
          b2 ---/

    Each Perceptron (P1 and P2 above) outputs a 0 or a 1. If either model outputs a 1, we want the final model to classify stairs. In other words, we want their sum to be greater than or equal to 1. We can achieve this by setting both weights to 1 and the bias term to -0.5. In other words, this πŸ‘‡

        b = -0.5 --\
                    \
    P1 -----------\  \
        w = 1     --------
                 |  _|Β―  | --->
                  -------- 
    P2 ----------/ 
        w = 1

    Notice how this perceptron mimmics an or operator.

Our final model looks like this

MLP

Food for thought

This example highlights how a multilayer perceptron can emulate conditional logic. This makes it a powerful prediction model.

Once you're convinced that MLP can be a good predictor, the question becomes How do we scale it?

Imagine connecting thousands of perceptrons across dozens of layers. Given some training dataset, how could we possibly find the optimal set of weights and biases?