This teach file provides an introduction to vectors as geometrical objects.
In FCS3, the term vector was used to mean, in effect, a matrix with a single column. In other words, a vector was a collection of numbers in a definite order. A vector x was said to have elements x[1], x[2], etc., up to x[N], and the general element was denoted by x[i]. That teach file set out the conventions for adding vectors together, multiplying a vector by a scalar, and multiplying a vector by a matrix. One significant application of the last of these operations was given: matrix-vector multiplication is another way of writing down what a single-layer linear neural network does. Vectors were used to represent the inputs and outputs of the network by single symbols.
Vectors have, however, many other uses. One particularly important class of uses is in geometry, when vectors are used to represent spatial relationships and operations. Applications of this can be found in many areas, but analysis of the perceptual and motor systems of autonomous agents benefits particularly strongly from vector geometry.
Here, we look at the idea of vectors as geometrical objects in general, before outlining two significant applications in perception and motor control in the next teach file. As with FCS1 and FCS2, those already familiar with the main ideas of vectors might skip straight away to FCS5.
The idea of representing a point in space using coordinates is probably a familiar one. The coordinates of a point can be assembled into a single computational structure, which is then a vector. Using the matrix notation of FCS2, the correspondence between some points in the 2-D plane and their vector representation works like this:
| - - 2 + + |1| = a | |2| - - | - - + |-4| 1 + | 1| | - - | -+-----+-----+-----+-----+-----+-----+-----+-----+- -4 -3 -2 -1 | 1 2 3 4 | - - -1 + + | 3| | |-1| - - | - - + |-3| -2 + |-2| | - - | -3 + | | - - -4 + + | 2| = b | |-4| - -
The points themselves are marked with the "+" symbol, with the corresponding vectors written using matrix notation beside them. I have also given two of them names, a and b.
It is straightforward to extend this idea to 3 dimensions.
I have not given names to the axes in the diagram. The convention generally used is that the first component of the vector represents the coordinate along the horizontal axis, which in turn is often called the x-axis. The second component represents the coordinate along the vertical or y-axis. With these names for axes, the components of the vector a might be called a[x] and a[y] instead of a[1] and a[2] as we have been doing so far. If we wish to retain numerical subscripts, it might be more sensible to refer to the axes as the 1-axis and the 2-axis.
In fact, both conventions are in use. For geometrical work in 2 and 3 dimensions, x, y and z subscripts are common, and the axes are labelled with these letters. However, for more abstract uses of vectors (such as to represent quantities in a neural network), numerical subscripts tend to be used, and if pictures are needed the axes might be designated by numbers too. Here, I am going to stick to numerical subscripts because it makes generalisation to abstract uses easier, fits in with the notation of the earlier teach file, and corresponds to what happens in practice when you represent vectors by data structures in a computer program.
Sometimes a vector is drawn using an arrow. For position vectors like a and b above, the arrow would be drawn starting from the origin (the intersection of the axes), with its tip at the point in question. This is simply an alternative convention for indicating a position. A vector does not intrinsically have a "start" and a "finish".
This use of vectors to represent positions in space is a paradigm for all their other applications. However, vectors are used to represent many other kinds of things - both physical quantities such as velocities, which are measured in ordinary space-time, and more abstract things such as the state of a network, which are measured in a higher dimensional space.
The fundamental mathematical concept of a vector is actually more general and abstract than this. In practice, however, the crucial ideas are that position in space is exactly the kind of thing that vectors can represent, and that a vector can in turn be represented by a column of numbers.
From FCS3, section 5, you should be able to see that for the two named vectors in the diagram, - - - - - - a + b = |1| + | 2| = | 3| |2| |-4| |-2| - - - - - - because adding the vectors means, by definition, adding their components individually.
There is a simple geometrical interpretation of the operation. First draw the diagram above with arrows or lines from the origin to each of the points:
| 2 + + a | / | / 1 + / | / |/ 0 +-----+-----+-----+-----+-----+ |\ 1 2 3 4 5 | \ -1 + \ | \ | \ -2 + \ | \ | \ -3 + \ | \ | \ -4 + + b |
Then move a copy of one of the lines so that it starts from the end of the other line, without changing its length or orientation. Here we move a copy of b so that it starts from the end of a:
| 2 + + a | / \ | / \ 1 + / \ | / \<- shifted copy of line from origin to b |/ \ 0 +-----+-----+-----+-----+-----+ |\ 1 2\ 3 4 5 | \ \ -1 + \ \ | \ \ | \ \ -2 + \ + c | \ | \ -3 + \ c = a + b | \ | \ -4 + + b |
The new position, marked c, obtained by this graphical operation, is represented by the arithmetical operation of adding the vectors by components. The same result would have been obtained if a copy of a's arrow had been tacked onto the end of b's. This result is general: the formal operation of vector addition corresponds to composing the 2-D position vectors together.
A classical application of this is in navigation. Adding a plane's velocity through the air to the wind velocity gives the plane's velocity over the ground. Velocities are not position vectors; but they do add like position vectors and so the calculation, done either numerically on the components or graphically, gives the right answer. A more esoteric application is in the superposition of wave functions in quantum mechanics; later we will see an application to robot navigation.
Subtracting one vector from another is done by components, like addition. You should be able to figure out its geometrical meaning; bear in mind that in the diagram above a = c - b.
In FCS3 I said that multiplying a vector by a scalar (an ordinary number) meant multiplying each element by that number. Multiplying a vector by a positive scalar k corresponds in geometrical terms to making the arrow k times as long but giving it the same direction. If you think about 2a = a + a, and draw a diagram like that above, this should become reasonably obvious.
Multiplying a vector by -1 makes the arrow point in the opposite direction (or puts the point it represents on the opposite side of the origin).
How far is the point indicated by the vector a from the origin? This is easy to answer from the diagram in which a was first defined:
| 2 + + a + | / /| | / / | 1 + / which contains a triangle / | 2 | / with this shape and / | |/ dimensions: / | 0 +-----+-----+-- ------- 1 2 1
Applying Pythagoras' theorem to the triangle gives the length of the hypoteneuse (the sloping side) the value of sqrt(1^2 + 2^2) or sqrt(5) (using ^2 to mean taking the square, and sqrt to mean taking the square root). This is the distance required. In terms of the components of a, the formula for the distance is
2 2 sqrt(a + a ) 1 2
writing the subscripts and exponents in their conventional positions.
If a is thought of as representing the line from the origin to the point, then this formula gives the length of that line, and so it is called the length of vector. The formula generalises to many dimensions. If y is an N-dimensional vector, then its length is given by
N 2 sqrt( SIGMA y ) i = 1 i
This is more precisely called the Euclidean norm of the vector. It is sometimes written as |y| or ||y||; occasionally also y (in light type) is used to stand for the norm of the vector y (in bold type).
How far apart are the points marked a and b in the diagram above? Again, one can draw a right-angled triangle and apply Pythagoras' theorem. In this case, the picture is:
2 + + a | | | | 1 + | | | | | 0 +-----+-----+- | 1 2 | | -1 + | | | | | -2 + | | | | | -3 + | | | | | -4 + ------+ b
where I have drawn on two sides of the triangle - the hypoteneuse is the line from point a to point b. The distance between them is thus sqrt(1^2 + 6^2) = sqrt(37). The general formula for 2-D vectors is
2 2 sqrt( (b - a ) + (b - a ) ) 1 1 2 2
Note that the components are subtracted first, then the differences are squared.
This is a particular case for the distance between two points represented by vectors. The general formula for the distance between x and y is just ||x - y||. Putting the definition of the norm together with the procedure for subtraction should allow this to make sense. Now, x and y can be in a space with any number of dimensions.
The Euclidean norm has a nice geometrical meaning in 2-D and 3-D, but you have already met it in a different context. Look at the error function for a neural network in section 4.1 of FCS2. The error E is the square of the Euclidean distance between the output vector y and the target vector t.
In FCS3 we looked at what matrix-vector multiplication meant in terms of computations on the components, and we noted the relationship of this to the operation of a linear neural network. There is also a geometrical way to look at this, though it does not yield a single simple picture. The general idea is that the multiplication leads to a geometrical transformation of one vector into another, and the nature of the transformation can be related to properties of the matrix.
I will not give a comprehensive treatment of this, but will give examples of two important special cases that illustrate the idea.
First, consider multiplying a by a matrix that only has non-zero elements on its top-left to bottom-right diagonal (called a diagonal matrix). Since a has only two components, and we want the result to be another vector like a, the matrix has to have 2 rows and 2 columns. It looks like this:
- - - - - - | p 0| | a[1] | = | p * a[1] | | 0 q| | a[2] | | q * a[2] | - - - - - -
(You should be able to verify this equation by applying the matrix-vector multiplication rule from FCS3.) Here p and q are ordinary numbers.
What does this do to a? For a start, putting p = q is the same as multiplying a by a scalar, so this just stretches a out by a factor p. If p and q are different, then, roughly speaking, a is stretched out along the 1-axis by a factor p and along the 2-axis by a factor q. If it is not clear what this means, try plotting the result for the vector a (with components (1, 2)) and various different values for p and q. The numbers p and q are sometimes called expansion factors, though if they are less than 1 they cause contraction rather than expansion.
Suppose a shape is represented by a number of points on its periphery, and each of these is represented by a vector. Multiplying each vector by the same matrix will produce a new shape. For a diagonal matrix, the shape may be expanded or contracted, and it might be squeezed up or stretched out more along one axis or another. Thus diagonal matrices can make transformations with effects like this:
----- --------- ----- ------- __ | | -> | | -> | | -> | | -> |__| ----- --------- | | | | | | | | ----- -------
If p = q the shape is simply expanded or contracted, but otherwise its aspect ratio (the ratio of its width to its height) changes.
Above, I wrote the components of a as (1,2). Strictly, I should have written them in a column with square brackets, but it gets tedious laying those out. I will sometimes therefore write the components in the text line in round brackets; this should not cause any confusion, and in any case the convention is often used in books.
Now consider multiplying the vector a by the following specific matrix:
- - | 0.7 -0.7 | | 0.7 0.7 | - -
For a with components (1, 2), you should be able to verify that the multiplication gives a vector with components (-0.7, 2.1). Multiplying this matrix with the other vectors in the initial diagram gives the following transformations when the multiplication is done (remember, these ought to be written as column vectors, but I'm being lazy):
(-4, 1) -> (-3.5, -2.1) (-3, -2) -> (-0.7, -3.5) (3, -1) -> (2.8, 1.4) (2, -4) -> (4.2, -1.4)
Plotting these transformations on the original diagram has this effect, marking the original points by lower case letters and the new points by the corresponding upper case letters:
| A 2 + a | | C d 1 + | | -+-----+-----+-----+-----+-----+-----+-----+-----+- -4 -3 -2 -1 | 1 2 3 4 | -1 + c | | D e -2 + | B | -3 + | E | -4 + b |
If you join the corresponding points, you will see that they have all been approximately rotated about the origin by about 45o. This suggests that a matrix can effectively act to rotate vectors. If the set of vectors defined a shape amongst them, then this shape would get rotated by the matrix multiplication.
In fact the matrix I have used to demonstrate this is only approximately a pure rotation matrix; if you plotted out the results really accurately you'd detect a little contraction as well. To construct a pure rotation matrix for 2 dimensions, you choose an angle you want to rotate through, (call it theta), then get the matrix element values using
- - | cos(theta) -sin(theta) | | sin(theta) cos(theta) | - -
This produces an anticlockwise rotation of the vectors if theta is positive. If theta = 45o, then cos(theta) = sin(theta) = approximately 0.7, which is why the example above gave the results that it did. A matrix like this produces no expansion or contraction or change in aspect ratio. Such matrices are useful in many areas; one currently very important use is in computer graphics, where objects represented by sets of position vectors must often be rotated for display from different viewpoints.
Rotation matrices generalise to 3-D and higher dimensions.
In general, a transformation produced by a matrix multiplication can be broken down into a rotation through some angle, followed by multiplication by a diagonal matrix, followed by another rotation. This turns out to be the geometrical interpretation of the singular value decomposition mentioned at the end of FCS3. And if a matrix cannot be inverted, it means that it transforms more than one input vector into the same output vector, so that there is no way of going backwards unambiguously.
The multiplication of a vector by a matrix is a linear transformation. What this means is that if you transform two vectors separately, and add the results together, you get the same answer as if you add the two vectors together first, and then transform the sum. In symbols:
M * (x + y) = M * x + M * y
The mathematics of transformations that have this property is fundamentally far simpler than that of transformations that do not. Conversely, using non-linear transformations can yield richer behaviour (e.g. in the context of neural networks) than linear transforms can.
There is another way of looking at the effect of a matrix multiplication. Instead of thinking of it as moving a position vector around in a coordinate system, you can think of the vector as being fixed and the axes as changing. The matrix multiplication is a way of expressing what the vector is in a different coordinate system. This interpretation can be very useful; whether it is appropriate depends on the application, but you need to be clear about which interpretation you are using at any time.
The effects of the matrices described above on the coordinate system are the opposite of their effects on the vectors. For instance, the diagonal matrix shrinks the 1-axis by a factor of p and the 2-axis by a factor of q, whilst the rotation matrix turns the axes clockwise through an angle theta. The numerical calculations are of course identical whichever interpretation is in use.
Suppose we take two vectors and multiply the transpose of one by the other. Taking the transpose means swapping rows and columns, so an ordinary column vector becomes a row vector. It looks like this:
- - [ a[1] a[2] ] | b[1] | | b[2] | - -
which, if you apply the normal multiplication rule and write out what you get, comes to
a[1] * b[1] + a[2] * b[2]
The generalisation of this to N-dimensional vectors x and y is
N SIGMA x * y i = 1 i i
This is called the dot product of the vectors. It is an example of a kind of relationship between vectors called an inner product.
You have met the dot product already in a different guise. A single linear unit in a neural network forms the dot product of its weights vector and its input vector (section 2.1 of FCS2).
Does this have a geometrical significance? It does, but first we need to explain what is meant by projecting one vector onto another. Suppose we have two position vectors, say a and b, and we draw the arrows from the origin to the points they represent. Then we draw a line from the tip of the a arrow, at right angles to the b arrow, and mark where this perpendicular meets the b arrow. This point is the projection of a onto b. In a picture:
+ a /| / | / | / | / | / | / | / | +--------+------------+ b Origin \ \ \ projection of a onto b is at this intersection
The dot product of a and b is the length of b times the length of the projection of a onto b. If the intersection occurs on the opposite side of the origin to the point b (i.e. a is moved to the left in the diagram above so that you have to project b's arrow backwards to get an intersection), then the dot product is negative. The relationship is symmetrical - you can swap the a and b and get the same result.
If a and b point in the same direction, then their dot product is just the product of their lengths. If they point in exactly opposite directions, the dot product is minus the product of their lengths. If they are at right angles, the dot product is zero.
Another formula you might see for the dot product is
a . b = ||a|| * ||b|| * cos(angle between a and b)
Textbooks give the proof that this is the same as the definition above in terms of the components. For high-dimensional vectors, this formula is used to define what is meant by the angle between two vectors.
The dot product is biggest for two vectors of fixed length if the angle between the two vectors is zero - that is, one of the vectors is just a scalar constant times the other. For example, if a linear unit in a neural network has only a fixed amount of weight to distribute (in the sense that the sum of the squares of its weights is fixed), it can optimise its response to a given input by making the weights match the inputs - if the weights are proportional to the given inputs, then the unit will be giving as big an output as possible.
The dot product gives another way of thinking about linear transformations. Each row of a matrix can the treated as the transpose of a column vector. Then when we multiply that matrix by a vector, what we do is to form a set of dot products. The first component of the output vector is the dot product of the input vector with the first row of the matrix, and so on.
The rows of the matrix are sometimes called basis vectors (though the matrix must be invertible for this to make proper sense). The elements of the new vector are the projections of the old vector onto each of the basis vectors in turn, multiplied by the lengths of the basis vectors. The coordinate transformation produced by the matrix can thus be seen as a set of projection operations onto a set of basis vectors.
In fact, this allows us to give abstract definitions of vectors that do not depend on how they are represented. The ordinary components of the vectors, that we have used so far to represent them, are actually just the dot products of the vectors with the standard basis vectors, which have components (in 3-D) of (1,0,0), (0,1,0) and (0,0,1).
This idea will not be pursued further here, but it provides important tools for analysing some systems. To go on in this direction requires an analysis of the abstract ideas of vector spaces - Horn & Johnson (see FCS3) gives information.
Copyright University of Sussex 1997. All rights reserved.