Approaches to Cognitive Science
Lecture 3: Approaches to Vision
References
Basic Reading
Green & others (course textbook), Chapter 4
Taking it further
Sharples, M. et al., Computers and Thought (MIT Press, 1989), Chapter 9
Bruce, V., Green, P. & Georgeson, M., Visual Perception: physiology, psychology and ecology (Psychology Press, 1996)
Seminal classics
Gibson, J.J., The Senses Considered as Perceptual Systems (Allen & Unwin, 1968; Waveland Press, 1983)
Marr, D., Vision: A computational investigation into the human representation and processing of visual information (Freeman, 1982)
Vision and the study of mind
Vision mediates most of our interactions with
- our physical environment
- think of walking along a clifftop, or playing a ball game
- our social environment
- think of going into a club or café
- our `information environment'
- think of looking at a diagram in a book, or on a web page
Visual representations are
- beneficial, even essential, for some kinds of communication
- why do knitting patterns and recipes have pictures?
- why do textbooks have diagrams when they could use words?
- why do writers use visual metaphors?
- central to reasoning processes
- can you work out how many different combinations of coins make 7p without using a mental picture of some kind?
- central to memory
- what was your first day at school like?
Visual processing
- uses a large part of the brain
- is extraordinarily effective
- is cognitively impenetrable
- is extremely hard to analyse, understand or imitate
Questions about vision
What do we see?
- images on our retinas?
- light?
- what we expect?
- our mental models?
- a distorted version of reality?
- 3-D surfaces and objects?
- possibilities for action?
How is visual processing organised?
- `bottom-up'?
- images are analysed in a pipeline of processes to identify 3-D objects, their shape and position, with the least possible commitment to what they might represent
- `top-down'?
- we start from hypotheses about what we are looking at; these are checked against the data coming from our visual organs in order to verify, discard or modify them
Are our visual processes modular?
- separate modules for, e.g. stereoscopic vision, colour, motion, with well-defined connections passing information between them?
- a
complex network, not readily separable into modules, with multiple connections between and multiple functions for elements?
Are our visual processes specialised or general?
- can we see anything, within the physical limits of our visual systems?
- we adapt to strange environments - underwater, space - and can see the structure of novel objects e.g. sculptures
- are there processes tuned to special things in our environment?
- faces are special
- looming is special
How can we investigate vision?
- experiments in the laboratory
- probing the black box with carefully constructed stimuli
- visual illusions and the limits of the system
- investigations on animals
- anatomy, physiology, behaviour
- observations of real behaviour
- the system doing what it evolved to do
- computational modelling
- do our theories really work?
Can we make machines that see?
- philosophical questions - could a robot `really' see?
- technical questions - what algorithms? what architecture?
- financial questions - who will pay?
- and will it tell us anything about how we see?
How do you cross the road?
You are at the side of a busy, two-way road, with no pedestrian crossing. You need to get across, and you're in a hurry. What does your visual system do?
Maybe some of the following ...
The static environment
Where is the kerb? Where are the parked cars, the road markings, junctions? Which direction takes me straight across the road?
To help answer these questions, your visual system might have to
- segment the image into meaningful regions, perhaps using boundaries of brightness, colour, texture
- recognise objects, perhaps using some kind of template
- find the position and orientation of surfaces relative to you - hence e.g. the direction perpendicular to the kerb line, whether the road surface is level
- scan the scene using head and eye movements, and integrate the information obtained from different fixations
The dynamic environment
Should I cross now, or wait? Your visual system needs to predict whether you have time to safely get to the other side of the road before a car comes.
Your visual system might
- estimate speeds, directions of motion and positions of moving cars
but it might just
- estimate the time to collision of approaching vehicles
Looming and time to collision - a study in visual information pickup
The image of an approaching object expands. A description of the dynamic properties of the image is called the optic flow field.
You can show that for an object on a collision course, the rate of image expansion specifies directly the time to collision. There is no need to know the size, distance, or speed of the approaching object!
There is thus a computational theory for at least part of the visual control of road crossing.
(More information is here.)
There is also some experimental evidence that the visual system makes use of this kind of information:
- infants sensitive to looming
- gannets plummeting
- people punching balls
But what of the intentions of the drivers? Is this part of visual perception?
J.J. Gibson and "nouvelle AI"
Gibson's work on human perception, especially visual perception, had many strands. Some important aspects:
- optic flow and texture gradients were developed as specific examples of powerful sources of visual information which had been largely neglected
- invariants - perceptual properties of objects which are independent of viewing position - are seen as key to reliable visual information pickup
- visual perception picks up affordances - properties of the relationship between an object and the observer, specifying potential interactions - not abstractions such as shape or identity
- his theory of direct perception rejects a `hypothesis testing' approach to vision in favour of a view in which information specified directly in the optic array is used to control action
In recent years, Artificial Intelligence's approach to vision has converged to some extent with Gibson's outlook, moving away from the construction of elaborate 3-D representations underpinning complex reasoning, towards a more action-centred approach, in which vision is part of a perceptuo-motor cycle, in which feedback from actions plays a major role.
However, practical image understanding technologies exploit many of the tools developed in computer vision over the past 4 decades.
Conclusion
Understanding vision remains one of central challenges in cognitive science.
Some processes are understood; but the overall architecture and functioning of the human visual system remains largely uncertain.
The challenge can be met only with a wide variety of approaches: computational, psychological and philosophical; experimental and theoretical.
Maintained by:
David Young