Intuition for perceptron weight update rule

16,372

Solution 1

The perceptron's output is the hard limit of the dot product between the instance and the weight. Let's see how this changes after the update. Since

w(t + 1) = w(t) + y(t)x(t),

then

x(t) ⋅ w(t + 1) = x(t) ⋅ w(t) + x(t) ⋅ (y(t) x(t)) = x(t) ⋅ w(t) + y(t) [x(t) ⋅ x(t))].

Note that:

By the algorithm's specification, the update is only applied if x(t) was misclassified.
By the definition of the dot product, x(t) ⋅ x(t) ≥ 0.

How does this move the boundary relative to x(t)?

If x(t) was correctly classified, then the algorithm does not apply the update rule, so nothing changes.
If x(t) was incorrectly classified as negative, then y(t) = 1. It follows that the new dot product increased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.
Conversely, if x(t) was incorrectly classified as positive, then y(t) = -1. It follows that the new dot product decreased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.

Solution 2

A better derivation of the perceptron update rule is documented here and here. The derivation is using gradient descent.

Basic premise of gradient descent algorithm is find the error of classification and make your parameters so that error is minimized.

PS: I was trying very hard to get the intuition on why would someone multiply x and y to derive the update for w. Because w is the slope for a single dimension (y = wx+c) and slope w = (y/x) and not y * x.

16,372

Author by

joshreesjones

Updated on June 07, 2022

Comments

joshreesjones about 2 years
I am having trouble understanding the weight update rule for perceptrons:

w(t + 1) = w(t) + y(t)x(t).

Assume we have a linearly separable data set.
- w is a set of weights [w0, w1, w2, ...] where w0 is a bias.
- x is a set of input parameters [x0, x1, x2, ...] where x0 is fixed at 1 to accommodate the bias.
At iteration t, where t = 0, 1, 2, ...,
- w(t) is the set of weights at iteration t.
- x(t) is a misclassified training example.
- y(t) is the target output of x(t) (either -1 or 1).
Why does this update rule move the boundary in the right direction?
joshreesjones over 8 years

Can you expand a little on the last sentence? What happens when x(t) is classified and when x(t) is misclassified?