![]() |
|||||||||||
Warning: You should complete lab1 before attempting to deal with lab2 and have finished lab2 before dealing with coursework 1.PART 1: DIFFERENTIATION.Definition: Consider the following function f(x), a point x0 of the x axes and any other point x close to x0. The first derivative of the function f at the point x0 is defined as: where the symbol limx->x0 means that points x and x0 are very close to each other. In practice we never use this definition to calculate the derivative of a function. What we do is learn the derivatives of some basic functions and the derivative rules and combine them to calculate the derivatives of some more complicated functions. The derivatives of the basic functions have -of course- been calculated by using the definition of the derivative. Though you might never use this definition yourself, you will probably find it useful to keep in mind what the derivative is all about. First of all the derivative of a function is another function which gives us important knowledge about the initial function. For example if your function is increasing around a certain point x0, f(x) will be greater than f(x0) therefore the numerator will be positive. The denominator will also be positive which means that the derivative at x0 will be a positive number. This number shows how the function is changing if you make a small change at x. When we want to refer to the derivative of a one variable function f(x) in general we can use the symbols df/dx , df(x)/dx or f ´(x) but when we discuss the value of the derivative at a particular point x0 we will use the symbol df(x0)/dx or f ´(x0). Note that x0 can be any number, where the original function is defined. Another symbol often used to describe the difference between two values of a variable is the Greek letter delta, equivalent of D : Examples:Assume that you have to deal with the very simple function f(x)=x :
The first derivative at any point x0 is 1, which means the function is increasing in the same way at any point. Another example is the first derivative of f(x)=x2. In this case f´(x)=2x. The derivative at x=2 equals to: f´(2)=2*2=4 . That means that, near the point x=2, the function f(x)=x2 is increasing. The derivative df/dx=4 is equal to the slope of the tangent line of the function at x=2. If we calculate the derivative at x=-2 is f´(-2)= 2*(-2)=-4, which shows that the function is decreasing around x=-2.
First derivative of some basic functions.
Derivative rules.
PART 2: PARTIAL DIFFERENTIATION & DELTA RULE.Partial Differentiation.In many cases we have to deal with functions that have more than one variable. Such an example is the function f(x,y)=x2+5y+6. In these cases, similar to the one variable case, we can investigate the effect of one of the variables, while we keep all the others at a steady value. Same rules apply as before, but we call the process partial differentiation and the function that results partial derivative. We also use a slightly different symbol, another version of the letter d: The partial derivative of the function f(x,y) with respect to x (considering y to be a constant value e.g. y=1):
since the derivative of a constant function is 0. The partial derivative of the function f(x,y) with respect to y (considering x to be a constant value):
Linear delta rule.Whatever we said so far about differentiation can be applied to Neural Networks. Consider that we have a single neuron unit that receives an input x, multiplies it with a weight w and results in output y. We happen to know given the input x what would be the desired output t (target) but we do not know the appropriate weight. What we want is a general method to calculate the weight value, in other words to train the network. This method should be general and work with more complicated structures, but here we deal with a very simple one. First we will consider the linear case, where the output of the network y equals the netinput a of the node. What we are looking for is a way to update the value of the weight in order to make the error of the output minimum. One expression to calculate the error of such a network is the following: E=(y-t)2. This is quite a nice expression because it has some useful properties. 1) The bigger the difference between our output and the target, the bigger the error E. 2) Due to squaring, it will make no difference if our output is, for example, 10 units bigger than our target or 10 units smaller than our target. These mistakes are equivalent; they are equally error. 3) It is easy to handle mathematically. The error E depends on the output y which depends on the weights since y=a=w.x. Therefore the error is a function of the weights. We want to modify the weights in a way to decrease the error. From what we discussed before, differentiation is the appropriate "tool" that gives us information about how a function changes when we make a small change to one of its variables. We have to differentiate the error with respect to the weight. This calculation will eventually lead us to the following linear delta rule:
For more than one input in our single unit, e.g. x1, x2, ..., xn and their corresponding weights w1, w2,..., wn we will use one equation for each weight. Delta rule for the first weight: Delta rule for the second weight: etc. We prefer to write these expressions in a general way : where i is an index, taking values from 1, 2, 3 etc. depending on the number of our inputs. If you are interested in how the linear delta rule has been derived read the following section, otherwise go to exercise 6. Such a calculation will be much more easier if we use the chain rule. The combination of these three equations (1,2,3) will lead to: If you recall the definition of the differentiation, for small changes of the weigh, we can write equation 4 in the following form: Assume that the weight of our network has the value w0 and the error due to this weight is E0. We wish to change the weight w0 to a new value w, which will be close to w0 but will result an error E smaller than the error E0. We would like to ensure that E<E0, which means E-E0<0. Rewriting the previous equation as: E-E0=2(y-t).x.(w-w0) we want to pick up a value for (w-w0) which will lead to a negative second part of the equality, regardless of the values of y,t and x. Take: where
which will be negative at any rate since a is positive and any number squared is also positive. Choosing the new w according to equation (6) we can be sure that the error will be decreased if we decide on a small value for In exercise 12 of the previous lab you designed a simple neural network that implements the logical AND function. However you had used predefined values for the weights. Now you can redesign the same network with a training process. Given the input and the target, your network should compute the appropriate values for the weights. A rough description of the process follows: i) Give an initial value to the weights. Usually we randomize the weight values using a random number generator. You may use Matlab function rand. ii) Calculate the output of the network, using the first of the input sets (commonly named training data), calculate the error and update the weights according the linear learning rule. Then go to the second set and repeat the same process. When you are done with each one of the input sets, you have completed an epoch. At the end of the epoch you have to calculate the average error of all input sets. Repeat ii) for a specific number of epochs. Remember to set average error to zero at the start of each epoch, since it is meant to count the average error of one particular epoch.
Non-Linear delta rule.Quite often though, as we mentioned in lab 1, the activation (netinput) is not equal with the network output, but it is used as input to a function whose output is the network output. In many cases this function is the sigmoid function. In such cases we can develop a non-linear learning rule, similar to the linear case.
The learning delta rule that can be developed when the activation of the neural unit passes through a sigmoid function is: Note that we may omit the 2 from both learning rules by choosing a suitable | |||||||||||
[ Lab 1
| Lab 2
| Lab 3
| Lab 4
] [ General Info | Lab Menu | COGS ] |
|||||||||||
© April 2001 h.vassilakis |