Hopfield [Hopfield, 1982] popularized the idea that any physical dynamical system could be constrained to serve as a neural network, with fixed points of the system acting as ``memories'', which could be recalled associatively, in a content-addressable manner. However, the learning algorithms availble for the attractor-based neural networks on which Hopfield focussed are not as powerful as those for feed-forward networks, for which Hopfield's observation also applies. It is for this reason, and because the feed-forward back-propagation network is the one most likely to be familiar to a general audience, that the exmaple here will be based on a feed-forward network architecture. The principles and insights, however, generalize to other kinds of neural network.

Similarly, many different quantum situations could be used to implement a learning algorithm; I will stick to the situation of a barrier with slits in front of a photo-sensitive plate, because of its familiarity. It is very likely that such a situation would prove impractical in order to obtain some of the computational advantages (especially those of speed and size) of quantum computing, but as such advantages are relatively obvious, and are not the advantages on which I will be concentrating, this impracticality need not concern us much.

The details of feed-forward networks involve units, activation values, layers, and weights. But what is important at first is that:

- feed-forward networks realize parameterized non-linear functions from an input space to an output space;
- networks modify these parameters in response to interaction with
their environments (usually via a training set) so that the function
realized by each network better approximates some intended function.

**Figure 1:** A feed-forward quantum neurocomputer. See text for key.

A quantum implementation of this kind of network is illustrated in
figure 1. A particle beam *G* is set up facing a barrier
*B*, behind which is a photo-sensitive plate *P*. The barrier has
several slits in it, which are responsible for the famous interference
patterns on the plate. Some of the slits are designated to be *
input slits*, and the rest are the *weight slits*. The
interference pattern on the plate *P* that results from the beam *G*
being aimed at the barrier *B* with a particular input slit
configuration is the output of the system for that input. Thus, use
of the system to perform some computation will require two mappings:
one (*I*) from inputs (e.g., character strings, images, queries, etc.)
to input slit configurations, and one (*O*) from interference patterns
to outputs (e.g., classifications, stored data, etc.).

Assume that one already has a mapping from inputs to *n*-vectors, and
a mapping from *m*-vectors to outputs. One could map these input
*n*-vectors to input slit configurations by, for example, dividing the
barrier into *n* sections, each having an input slit, with the
relative position of the *i*th slit within the *i*th section of the
barrier indicating the value of the *i*th coordinate of the input
vector (this is the type of arrangement depicted in figure
1). For binary input vectors, this mapping could be even
simpler: there is a slit in the *n*th section of the barrier if and
only if the *n*th coordinate of the input vector is ``1''.

Less straightforward is the mapping *O* from interference patterns to
outputs. Since interference patterns are of high-dimensionality, any
dimension-reducing mapping will do. Perhaps the plate could be
divided into *m* sections, and the average intensity within the *m*th
region can serve as the *m*th coordinate of the output vector. For
the case of binary output vectors, a soft (sigmoid) thresholding of
this value would do (the thresholding must be soft so as to allow
differentiation for back-propagation learning; see below).

The *error* of the system, *E*, is defined to be some function of
the desired ( ) and actual ( ) output vectors, typically
. If is the function that yields
an interference pattern given an input and a weight slit
configuration , then .

Given some such setup, the system could learn an associative mapping
*f* in the following way:

- a number of samples are collected as input/output pairs for training;
- the system's weight slits are randomized;
- for each training sample , the following occurs:
- the input slits are initialized according to ; the plate is cleared or replaced;
- the beam is activated, until an interference pattern of sufficient resolution is produced on the plate, and an output is produced according to ;
- for each weight slit , the partial derivative of
*E*with respect to is estimated. This is done by calculating ; - this estimate is used to calculate the change to be made to
the weight slit in such a way that gradient descent in error is achieved: the change in is proportional to the negative of the partial of
*E*with respect to .

After several passes through the training set, this procedure will ensure that the system settles into a weight configuration that produces minimal error on the training set. In many cases, this will also result in good performance on other samples drawn from the same source as the training set (i.e., generalization).

This is enough to establish a correspondence between neural nets and a quantum system. The correspondence can then be used to suggest how the many variations on connectionist learning (e.g., recurrency, competitive learning, etc.) also could be implemented in a quantum system.

Wed Nov 20 01:10:59 GMT 1996