1 Chapter 3 Simple Supervised learning The Threshold Logic Unit (TLU),McCulloch&Pitts, 1943 is the simplest modelof an artificial neuron.
2 TLU is the feedforward structure, which only one of several available.The feedforward is used to place aninput pattern into one of severalclasses according to the resultingpattern of outputs.
3 The requirements of McCulloch-Pitts The activation is binary. (1 is fire or 0 is not fire)The neurons are connected by directed, weighted paths.3. A connection path is excitatory if the weight on the path is positive; otherwise it is inhibitory. (All excitatory connections into a particular neuron have the same weights)
4 The requirements of McCulloch-Pitts Each neuron has a fixed threshold such that if the net input is greater than threshold, the neuron fires.The threshold is set so that inhibition is absolute. That is, any nonzero inhibitory input will prevent the neuron from firing.6. It takes one time step for a signal to pass over one connection link.
6 Algorithm The weights for neuron are set, together with the threshold for theneuron’s activation function, thus theneuron will perform a simple logicfunction.We used the simple neurons asbuilding blocks, that can model anyfunction that can be represented as alogic function. Rather than a trainingalgorithm, it is used to determine thevalues of weights and threshold.
7 Simple networks for logic functions The binary form of the functions for AND, OR and AND NOT are defined for reference the neuron’s activation function. This defined the threshold on Y unit to be 2.X1X2YW1W2
8 gives the following four training input, target output pairs : AND functiongives the following four training input, target output pairs :X1 X2 Yจะสามารถกำหนด w1 และ w2 มีค่าเท่ากับ ?
9 gives the following four training input, target output pairs : OR functiongives the following four training input, target output pairs :X1 X2 Yจะสามารถกำหนด w1 และ w2 มีค่าเท่ากับ ?
10 gives the following four training input, target output pairs : AND NOT functiongives the following four training input, target output pairs :X1 X2 Yจะสามารถกำหนด w1 มีค่าเท่ากับ ? และw2 มีค่าเท่ากับ ?
11 XOR function x1 XOR x2 (x1 AND NOT x2) OR (x2 AND NOT x1) How to model the network for XOR function?
12 2.1 Pattern Classification For NN approach, we assume that there area set of training patterns for which the correctclassification is known. In the simplest case,we find the output unit representsmembership in the class with a response of 1;a response of -1 (or 0) indicates the patternis not a member of the class.
13 Simple Pattern Classification The activation functiony_in = w1x1+ w2x2 +….+ wnxnThe output (bipolar value)-1 if y_in < thresholdf(y_in) =1 if y_in >= threshold
14 AND TLU : threshold = 3w1, w2 = ?x1x2activationoutput-11
15 2.2 The linear separation of classes Critical condition of classification :the activation equals the thresholdFor 2-D case :
16 : a = -1 , b =1.5 กรณี binary input ที่กำหนด w1, w2 = 1 threshold = 1.5: a = -1 , b =1.5
17 2.3 Biases and Thresholds net = b + n wnxn A bias acts as a weight on a connection from a unit whose activation is always 1. Increasing the bias increases the net input to the unit.net = b + n wnxnThe output-1 if net < 0f(net) =1 if net >= 0
18 Single Layer with a Binary Step Function Consider a network with 2 inputs and 1 outputnode (2 classes).The net output of the network is a linear functionof the weights and the inputs.net = W X = x1 w1 + x2 w2 y = f(net) x1 w1 + x2 w2 = 0 defines a straight line through the input space. x2 = - w1/w2 x1 <- this is line through the origin with slope -w1/w2
19 Bias (threshold)What if the line dividing the 2 classes does notgo through the origin?
28 2.6 Training TLUs Training Methods : three kinds of methods for training single-layer networks that dopattern classification.Hebb net - earliest and simplest learning rulePerceptron - guaranteed to find the rightweights if they existThe Adaline (uses Delta Rule) - can easilybe generalized to multi-layernets (nonlinear problems)
29 Hebb Algorithm Step 0. Initialize all weights : wi = 0 (i=1 to n) Step 1. For each input training vector and target output pair, s : t, do steps 2-4Step 2. Set activations for input units :xi = si ( i=1 to n)Step 3. Set activation for output unit:y = tStep 4. Adjust the weights forwi(new) = wi(old) + xiyAdjust the biasb(new) = b(old) + y
30 2.7 PerceptronRosenblatt introduced it in 1962.Perceptron consists of a TLU whoseinputs come from a set of preprocessingassociation units.
31 Perceptron Trainingใน training unit จะมีการปรับ weightvector และ threshold เพื่อได้ค่าสำหรับการแบ่งกลุ่มที่เหมาะสมการปรับค่า weightกรณี 1
35 Perceptron Algorithm Step 0. Initialize all weights : wi = 0 (i=1 to n)Set learning rate ( 0 < <= 1)Step 1. While stopping condition is false, do steps 2-6Step 2. For each training pair s:t, do steps 3-5.Step 3. Set activation of input units:xi = si ( i=1 to n)Step 4. Compute the response of output unit:y_in = b + xiwi1 if y_in > y = if - <= y_in <=-1 if y_in < -
36 Perceptron Algorithm Step 5. Update the weights and bias if an error occurred for this patternif y twi(new) = wi(old) + xitb(new) = b(old) + telsewi(new) = wi(old)b(new) = b(old)Step 6. Test stopping condition:if no weights changed in Step2, stop;else, continue.
43 Adaptive Linear Neuron using delta rule 2.9 ADALINEAdaptive Linear Neuron using delta rulefor training. An ADALINE is a special case inwhich there is only one output unit.Architecture of ADALINE is a single neuronthat receives input from several units.x1xnY1bwnw1.
44 สมการปรับค่า weightนั่นคือเรียกว่า a training rule หรือ learning ruleและ พารามิเตอร์ เรียกว่า learning rate
46 Training Algorithm Step 0. Initialize weights. (Small random values) set learning rate .Step 1. While stopping condition is false, do Step 2-6.Step 2. For each bipolar training pair s:t,do Step 3-5.Step 3. Set activations of input units,i=1,…,nxi = siStep 4. Compute net input to outputy_in = b + xiwiStep 5. Update bias and weightsb(new) = b(old)+(t-y_in)wi(new) = wi(old )+(t-y_in)xiStep 6. Test for stopping condition.
47 2.10 Delta rule : minimizing an error 2.10.1 Hebb’s learning law From linear associator network, theoutput vector y’ is derived from theinput vector x by means of this formula.where W = (wij) is them x n weight matrix.
50 Delta rule is also known as - Widrow-Hoff rule - Least Mean Squares (LMS) ruleTo train the network, we adjust the weightsin the network so as to decrease the cost(this is where we require differentiability).This is called gradient descent.Delta Rule: Training by Gradient Descent Revisited
51 2.10.3 Gradient descent on error The early ADALINE(ADAptive LInear NEuron) model of Widrow and Hoff isdiscussed as a simple type of processingelement. The Widrow learning law appliedminimizing error as the delta rule.
52 Delta rule จะทำการคำนวณ error ที่เกิดจากtraining set ของแต่ละครั้ง แล้วนำค่านั้นไป พิจารณาเป็นฟังก์ชันของ weight ในรูปของgradient descent on the errorจากสมการ ค่า error Ep คือฟังก์ชันของ weightsสำหรับ input pattern 1 input ดังนั้นค่าของerror ทั้งหมด (Total error, E) แสดงดังนี้
53 The learning algorithm terminates once we are at, or sufficiently near to, the minimum of theerror function, where dE/dw = 0. We saythen that the algorithm has converged.
54 An important consideration is the learning rate µ, which determines by how muchwe change the weights w at each step.If µ is too small, the algorithm will takea long time to converge.
55 Conversely, if µ is too large, we may end up bouncing around the errorsurface out of control - the algorithmdiverges. This usually ends with anoverflow error in the computer'sfloating-point arithmetic.