Approaches to Cognitive Science

Lecture 3: Approaches to Vision

References

Basic Reading

Green & others (course textbook), Chapter 4

Taking it further

Sharples, M. et al., Computers and Thought (MIT Press, 1989), Chapter 9

Bruce, V., Green, P. & Georgeson, M., Visual Perception: physiology, psychology and ecology (Psychology Press, 1996)

Seminal classics

Gibson, J.J., The Senses Considered as Perceptual Systems (Allen & Unwin, 1968; Waveland Press, 1983)

Marr, D., Vision: A computational investigation into the human representation and processing of visual information (Freeman, 1982)

Vision and the study of mind

Vision mediates most of our interactions with

our physical environment
- think of walking along a clifftop, or playing a ball game
our social environment
- think of going into a club or café
our `information environment'
- think of looking at a diagram in a book, or on a web page

Visual representations are

beneficial, even essential, for some kinds of communication
- why do knitting patterns and recipes have pictures?
- why do textbooks have diagrams when they could use words?
- why do writers use visual metaphors?
central to reasoning processes
- can you work out how many different combinations of coins make 7p without using a mental picture of some kind?
central to memory
- what was your first day at school like?

Visual processing

uses a large part of the brain
is extraordinarily effective
is cognitively impenetrable
is extremely hard to analyse, understand or imitate

Questions about vision

What do we see?

images on our retinas?
light?
what we expect?
our mental models?
a distorted version of reality?
3-D surfaces and objects?
possibilities for action?

How is visual processing organised?

`bottom-up'?
- images are analysed in a pipeline of processes to identify 3-D objects, their shape and position, with the least possible commitment to what they might represent
`top-down'?
- we start from hypotheses about what we are looking at; these are checked against the data coming from our visual organs in order to verify, discard or modify them

Are our visual processes modular?

separate modules for, e.g. stereoscopic vision, colour, motion, with well-defined connections passing information between them?
a complex network, not readily separable into modules, with multiple connections between and multiple functions for elements?

Are our visual processes specialised or general?

can we see anything, within the physical limits of our visual systems?
- we adapt to strange environments - underwater, space - and can see the structure of novel objects e.g. sculptures
are there processes tuned to special things in our environment?
- faces are special
- looming is special

How can we investigate vision?

experiments in the laboratory
- probing the black box with carefully constructed stimuli
- visual illusions and the limits of the system
investigations on animals
- anatomy, physiology, behaviour
observations of real behaviour
- the system doing what it evolved to do
computational modelling
- do our theories really work?

Can we make machines that see?

philosophical questions - could a robot `really' see?
technical questions - what algorithms? what architecture?
financial questions - who will pay?
and will it tell us anything about how we see?

How do you cross the road?

You are at the side of a busy, two-way road, with no pedestrian crossing. You need to get across, and you're in a hurry. What does your visual system do?

Maybe some of the following ...

The static environment

Where is the kerb? Where are the parked cars, the road markings, junctions? Which direction takes me straight across the road?

To help answer these questions, your visual system might have to

segment the image into meaningful regions, perhaps using boundaries of brightness, colour, texture
recognise objects, perhaps using some kind of template
find the position and orientation of surfaces relative to you - hence e.g. the direction perpendicular to the kerb line, whether the road surface is level
scan the scene using head and eye movements, and integrate the information obtained from different fixations

The dynamic environment

Should I cross now, or wait? Your visual system needs to predict whether you have time to safely get to the other side of the road before a car comes.

Your visual system might

estimate speeds, directions of motion and positions of moving cars

but it might just

estimate the time to collision of approaching vehicles

Looming and time to collision - a study in visual information pickup

The image of an approaching object expands. A description of the dynamic properties of the image is called the optic flow field.

Approaching vehicle and expanding optic flow

You can show that for an object on a collision course, the rate of image expansion specifies directly the time to collision. There is no need to know the size, distance, or speed of the approaching object!

There is thus a computational theory for at least part of the visual control of road crossing. (More information is here.)

There is also some experimental evidence that the visual system makes use of this kind of information:

infants sensitive to looming
gannets plummeting
people punching balls

But what of the intentions of the drivers? Is this part of visual perception?

J.J. Gibson and "nouvelle AI"

Gibson's work on human perception, especially visual perception, had many strands. Some important aspects:

optic flow and texture gradients were developed as specific examples of powerful sources of visual information which had been largely neglected
invariants - perceptual properties of objects which are independent of viewing position - are seen as key to reliable visual information pickup
visual perception picks up affordances - properties of the relationship between an object and the observer, specifying potential interactions - not abstractions such as shape or identity
his theory of direct perception rejects a `hypothesis testing' approach to vision in favour of a view in which information specified directly in the optic array is used to control action

In recent years, Artificial Intelligence's approach to vision has converged to some extent with Gibson's outlook, moving away from the construction of elaborate 3-D representations underpinning complex reasoning, towards a more action-centred approach, in which vision is part of a perceptuo-motor cycle, in which feedback from actions plays a major role.

However, practical image understanding technologies exploit many of the tools developed in computer vision over the past 4 decades.

Conclusion

Understanding vision remains one of central challenges in cognitive science.

Some processes are understood; but the overall architecture and functioning of the human visual system remains largely uncertain.

The challenge can be met only with a wide variety of approaches: computational, psychological and philosophical; experimental and theoretical.

Maintained by: David Young