Machine Learning - Lecture 14 Knowledge test
Chris Thornton
Question
In the normal arrangement for a multi-layer perceptron
(MLP), each hidden unit takes input from every input unit,
and each output unit takes input from every hidden unit.
Let's say an MLP with this architecture has three input
units, three hidden units and two output units. Assuming
unit bias is implemented as a weight, how many numbers do
we need to specify the weights of the network? Is it (a)
5, (b) 8, (c) 9, (d) 15, (e) 18, (f) 20, (g) 24, (h) 28)
or (i) 32?
Question
While training an MLP, it is noticed that training error
is swinging wildly between low and high values. Which of
the following would be plausible remedies? (a) decrease
the learning rate, (b) increase the learning rate, (c)
decrease momentum, (d) increase momentum, (e) decrease
learning rate but increase momentum, (f) increase learning
but decrease momentum.
Question
When using a reference point to define a linear boundary,
we normally have to set a threshold value. The boundary
then separates off all datapoints whose inner product with
the reference point is above the threshold. In the MLP
method, multiple reference points are combined to produce
a curving boundary. But no thresholds are involved. What
then defines the difference between one side of the
boundary and the other?
Question
Cross-validation applied to a fully trained-up MLP reveals
generalization error to be considerably worse than training
error. It seems over-fitting has occurred. The decision
is made to re-run the learning using a stronger bias.
Which of the following changes might produce the desired
effect? (a) increase the learning rate, (b) increase the
number of hidden units, (c) decrease the number of hidden
units, (e) increase the activation of the bias unit, (f)
decrease the activation of the bias unit, (g) increase the
number of training examples?
Question
Let's say an MLP is restricted so that unit activation is
always thresholded at the 0.5 level. This means unit
activation is always either precisely 0 or precisely 1.
Does this have the effect of (a) strengthening the bias of
the learning method, (b) weakening the bias of the method,
(c) leaving the bias unchanged?
Question
A certain MLP is made up of a number of input, hidden and
output units. We would like to implement unit bias. Which of
the units in the network can benefit from having a
bias value? How is this bias value more easily implemented?
Question
How can the problem of over-fitting arise in the case of
k-means clustering?
Question
How does the problem of over-fitting relate to the
problem of lookup tables?
Question
What is left open in the the stopping condition for
delta-rule error-correction? How could we formulate a
specific condition for a particular domain?
Question
Is it possible to achieve delta-rule error-correction
through subtraction rather than addition of error values?
How would this be done?