The distinction between short and long-term memory processes is at
present debated due to a series of effects, such as recency (higher recall for
the last items in a list) previously associated with STM that have also been
discovered in long-term memory tasks. I will describe a Hebbian account of
short- and long- STM/LTM where the two type of memory correspond to activation
versus synaptic (weight) based processes. A computational model will be
presented, where the activation based processes drive the formation of synaptic
changes (episodic traces) between item representations and a changing context.
The theory makes predictions about the shape of serial position function in
immediate free recall and about dissociations between short and long term
recency, supported in a series of experiments. The results may also help to
better understand specific patterns of neuropsychological memory deficits
(amnesia/frontal-lobes lessions).
In recent work, we have focused on the role of active memory in
free recall tasks (Davelaar & Usher, 2002; Usher & Cohen, 1999). We
have shown that a model based on units with self-recurrent excitation and
global inhibition explains how a short-term buffer could be implemented in the
brain. This limited-capacity buffer is referred to as an active memory
component, as it constitutes the activated part of long-term semantic memory
and it provides a contribution to episodic memory additional to the one from
contextual learning implemented as a synaptic based system. In this talk, the
relation between the buffer parameters and the following aspects will be
discussed. First, it will be shown how the model parameters relate to neural
processes that are deficient in patients with brain damage. To this end,
deviant serial position functions documented for patients with frontal,
medial-temporal or striatal brain damage will be simulated. Second, it will be
shown how the model behavior is affected by the neuromodulation of its inputs,
accounting for a variety of processing in working memory, such as updating
versus maintenance of the information as well as ignoring irrelevant
information. Third, a comparison between the activation-based memory buffer and
the mathematical buffers used in some computational models of free recall
memory (e.g. SAM; Raaijmakers & Shiffrin, 1980) will be carried out. These
three aspects (relation to neuropsychology, working memory, mathematical
buffers) shape the domain in which our implementation of active memory can be
applied and thereby can provide a means to bridge the separate literatures on
short-term/working memory.
If retroactive interference (RI) is a well-known phenomenon in
humans, the differential effect of the structure of the learning material on RI
was only seldom addressed. Mirman and Spivey (2001) investigated this effect
and reported on behavioural results that show more RI for the subjects exposed
to structured items than for those exposed to unstructured items. These authors
claim that two complementary memory systems working in a radically different
way (localized vs distributed) are required to account for the behavioural
results they reported. Now in the behavioural experiment of Mirman and Spivey
(2001) the proactive interference (PI) level was not controlled. With the same
paradigm we investigated the influence of the nature of the to-be-learned
material on RI. Subjects learned meaningless A-B associations, took a
forced-choice recognition test, then learned meaningless C-D associations and
took a forced-choice recognition test on all the learned associations. Subjects
learned sequentially either two lists of 'structured' or two lists of
'unstructured' associations. Unstructured associations are arbitrary,
structured associations are built using two simple rules per list. If PI level
is left uncontrolled no influence of the nature of the to-be-learned material
on RI is found. Controlling for PI level, more RI was found for subjects
exposed to unstructured items than for those exposed to structured items, a
result opposed to that of Mirman and Spivey (2001). Two control experiments
confirmed that the subjects in the 'Structured' condition could generalize to
novel associations, and that they learned the items at the exemplar level as
the subjects from the 'Unstructured' condition did. A first simulation using a
classical three-layer backpropagation network produced a pattern of RI results
that mirrored qualitatively the structure effect we found in humans. However
the amount of RI was high. Moreover, the network did not exhibit PI but a
proactive advantage: Training on List2 associations gave better results after
training on List1. In a second simulation the memory self-refreshing neural
network model of Ans and Rousset (1997, 2000) was used. As expected, amount of
RI was smaller and the structure effect on RI was still present. Moreover, PI
was observed, and also a structure effect on PI. Furthermore, as for the
behavioural data, the structure effect on RI and the structure effect on PI
were negatively correlated.
Gluck and Bower's (1988) configural cue model is an associative
network that represents stimuli using independent nodes for each feature and
feature combination within the stimulus. This form of powerset representation
was first used to simulate performance on non-linear discrimination tasks by
Wagner and Rescorla (1972). While conceptually influential in associative
learning and categorisation research, the configural cue model of stimulus
representation is highly limited with respect to its ability to simulate a
variety of experimental findings. One of its main limitations is the lack of
any clear method for incorporating secondary learning processes such as
selective attention. A new approach to the configural cue model is proposed in
which node activation is dependent on the average characteristics of a
sequential component or dimensional sampling process. This process may be
described in terms of a Markov process. Associability of and contribution from
singlet nodes is controlled by component sampling probabilities with the same
characteristics for configural nodes being governed by the probability of
sequentially sampling their member components.
Learning algorithms based on error
reduction may be applied to alter the matrix of transition probabilities
governing the behaviour of the sampling process on each trial. This allows the
model to qualitatively simulate learning effects that seem to based on limited-capacity
dimensional attention. Because distribution of capacity across dimensions by
the model is based on learnt relationships between instantiated dimensions, the
approach also allows the model to be used to simulate attention learning and
associative learning with stimuli that vary in terms of their dimensionality.
This represents a potential advance over many models used in category learning
research where dominant models are either only applicable to stimuli that do
not vary in terms of their dimensionality (such as ALCOVE (Kruschke, 1992)), or
make use of component-cue stimulus representations that are incapable of
learning non-linear discriminations (such as EXIT (Kruschke, 2001)).
The search for computational principles that underlie the
functionality of different cortical areas is a fundamental task in science. In
the case of sensory areas, one approach to this issue is to examine how the
statistical properties of natural stimuli – which in the case of vision include
natural images and image sequences -- are related to the operations that the
neurons seem to perform. For simple cells, the most prominent computational
theories linking neural properties and stimulus statistics are temporal
coherence, and independent component analysis (ICA) / sparse coding (in the
case of visual data, ICA and sparse coding are closely related). For these
theories, the case of spatial linear cell models has been studied in a number
of recent publications, but the case of spatiotemporal models has received
fairly little attention.
Here we first provide a short
introduction to these theories, and to the results obtained with spatial
models. We then examine the spatiotemporal case by applying the theories to
natural image sequence data, and by analyzing the obtained results
quantitatively. We compare the properties of the spatiotemporal linear cell
models obtained with the methods against each other, and against parameters
measured from real visual systems.
The information about time-to-collision is critical for animals.
Making interceptions, visual navigation and avoiding collition is crucial for
adaptation in a competitive environment. There have been many proposals of how
time-to-collision is computed. Tau function shows an easy way to compute
time-to-collision independently of object size or speed by dividing the visual
angle of the approaching object by its rate of change (Lee, 1976). The evidence
for this function is not clear in all conditions and tasks. Rho function is the
absolute rate of expansion of the object, and so it presents effects product of
object size and speed. Eta function has been recently proposed and it has been
developed from neurophysiological data in locusts (Hatsopoulos, Gabbiani &
Laurent, 1995). Eta peaks before the collision, and is sensible to size and
speed, being a good predictor of collision. The results of different tasks are
not conclusive for any of the functions, but there is some evidence that
several sources of information are implicated in the timing of time to
collition (Sun & Frost, 1998). We propose a framework to formalize the integration
of these sources of information with the advantages of a parallel distributed
processing model. We have created for the task a Elman network with a input
layer that represents a unidimensional retina. The retinal size of aproaching
objects was presented in the input layer, changing orthogonally size and speed
of every object. The network learned to predict time-to-collision correctly for
all sizes and speeds. When presented with objects of new sizes and speeds, the
network showed correct generalization of the prediction response. Among the
preceding functions, rho is which explain more variance, being quite similar in
its behaviour to the behaviour of the network.
* Lee, D. N. (1976). A theory of visual control of braking based
on information about time-to-collision. Perception, 5, 437-459.
* Hatsopoulos,
N., Gabbiani, F. & Laurent, G. (1995). Elementary computation of object approach by wide field visual
neuron. Science, 270, 1000-1003.
* Sun, H.
& Frost, B. J. (1998). Computation
of different optical variables of looming objects in pigeon nucleus rotundus
neurons. Nature Neoroscience, 1(4), 296-303.
We have modeled various aspects of face perception using
connectionist networks. We have shown how developmentally reasonable
constraints can lead to a "face expert" network, how expertise with
one domain can lead to faster learning of expertise in another domain (e.g.,
why the Fusiform Gyrus might get recruited for Greeble processing if it is
already a face expert), and how disparate theories of facial expression recognition
can be resolved in a single model. In the latter domain, we have shown how a
single model can accomodate categorical perception theories as well as
"dimensional" theories of facial expression perception, and how a
single model can explain the apparent independence between identity and
expression processing without positing separate representations for these. We
review these results in this presentation.
For all the work that has been done on human face recognition, we
still don't know whether we store lots of exemplars of a given face, or some
kind of average. Burton and Jenkins (2003) have demonstrated that a PCA-based
face recognition system works much better if faces are averaged prior to being
coded by the system. We hypothesise
that this is because averaging removes extraneous variations such as
lighting. I shall present some
simulation work that demonstrates why this should be and discuss implications
for both models of human and artificial face recognition.
This paper will examine the basis and utility for the concept of a
psychological face-space in which a given facial appearance can be represented
as a location in an abstract vector space. The origins of the concept will be
discussed and we will present a specific and powerful incarnation of the idea
which we term appearance space. Practical applications of this model to
automated caricature generation and construction of facial composites will be
demonstrated.
Constant interaction with a dynamic environment - from riding a
bicycle to segmenting speech - makes sensitivity to the sequential structure of
the world a crucial dimension of information processing. In sequence learning
(SL), participants are asked to perform a choice reaction task. Unbeknownst to
them, the material contains sequential structure, so that the location of each
stimulus depends on the context set by previous stimuli. Results indicate that
participants react faster to predictable stimuli than to random stimuli, thus
suggesting that they prepare their responses based on implicit knowledge of the
sequential contingencies contained in the material ([7], see [1] for a review).
Albeit detailed models of SL based
on Elman's Simple Recurrent Network (SRN) [4] exist [2], such models are unable
to account (1) for data suggesting that participants learn not only about
sequences of stimuli, but also about sequences of responses (e.g. [5]), and (2)
for empirical data indicating that SL is enhanced when responses have specific
effects (e.g. tone-onset), even when these effects are irrelevant. The latter specifically
suggests that voluntary action is initiated by anticipation of the sensory
changes that will result from it.
To address these issues, we
explored how forward models, which originated in control theory [6], [8], can
account for basic SL data. In such models, two distinct components interact
continually. The action network (AN) receives goals and stimulus inputs and
produces appropriate actions. These actions then serve, together with the
stimulus, as input to the second network, the forward network (FN). The FN is
trained to produce the next stimulus, that is, the expected sensory
consequences of the actions produced by the AN. In this framework, learning how
to produce appropriate actions depends on previous learning about what
consequences each action entails. As a first step towards the goal of
developing a theory of SL rooted in these ideas, we simulated SL data obtained
in [3]. In our adaptation, both AN and FN are SRNs, and processing is cascaded
to make it possible for the model to capture the time course of processing
within a single trial. The output of the FN at time t (representing the
network's expectation about the identity of the next stimulus) is used as input
to the action network at t+1, together with the next stimulus. This makes it
possible for the network's responses to be shaped by its expectations.
Simulation results are presented and discussed in light of current SL theory.
[1]. Cleeremans, A..,
Destrebecqz, A., & Boyer, M., (1998), Implicit learning: News from the
front. Trends in Cognitive Sciences, 2, 406-416.
[2]. Cleeremans, A..,
& McClelland, J.L., (1991), Learning the structure of event sequences.
Journal of Experimental Psychology: General, 120, 235-253.
[3]. Destrebecqz, A..,
& Cleeremans, A.., (2001), Can sequence learning be implicit? New evidence
with the Process Dissociation Procedure. Psychonomic Bulletin & Review,
8(2), 343-350.
[4]. Elman, J.L., (1990),
Finding Structure In Time. Cognitive Science, 14, 179-211.
[5]. Hoffman, J., Sebald,
A., & St=F6cker, C., (2001), Irrelevant response effects improve serial
learning in serial reaction time tasks. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 27, 470-482.
[6]. Jordan, M.L., & Rumelhart, D.E., (1992), Forward models:
Supervised learning with a distal teacher. Cognitive Science, 16(3), 307-354.
[7]. Nissen, M.J.,
& Bullemer, P. (1987), Attentional
requirement of learning: Evidence from performance measures. Cognitive
Psychology, 19, 1-32.
[8]. Wolpert, D.M. & Kawato, M., (1998), Multiple paired
forward and inverse models for motor control. Neural Networks, 11:1317-1329.
In order to draw (or copy) a character, a process of linearising
takes place. In this process the complete static form of the character is
broken down into a temporal sequence of strokes for production. According to
Thomassen, Meulenbroek and Tibosh, (1991), individuals develop their own
production rule base, which is reflected as tendencies or strategies for
graphic production. Often the sequence in which strokes are produced is
influenced by the direction in which script will be read (Alston and Taylor,
1987). In the case of writing that is produced from left to right, the sequence
of strokes usually commences at the leftmost point of the character, and
progresses through neighbouring strokes. However, in some cases characters are
produced starting with the topmost point or first vertical line of the
character. Occasionally, these principles of production come into conflict
resulting in a variable sequence of production for some characters. For
example, a letter 'T' can be produced starting with either the horizontal
crossbar or the vertical stroke (Richardson, 2000).
This work uses a connectionist
modelling approach to investigate the emergence of production-based behaviours
in the sequential production of characters (Richardson 2002). The aim was to
discover whether connectionist networks are capable of simulating a complex
task such as character production sequence, without the use of explicitly
imposed heuristics. Networks were trained using back-propagation through time
(Rumelhart, Hinton and Williams 1986). During training they were presented with
a static visual depiction of a character upon an input array and were required
to produce the correct sequence of strokes with which it was drawn. Following
this, the networks were tested using a batch of previously unseen characters.
Interestingly, the networks produced all characters using similar sequence production-based
behaviours to those seen in humans, specifically, generating variable
production sequences for characters (such as 'T') with more than one viable
method of production. These results are interesting as they demonstrate that
not only are connectionist networks capable of emulating the
production-sequence a behaviour of humans, but also that rule-like tendencies
can emerge naturally upon the basis of learning experience.
1. Alston, J. and Taylor, J. (1987) The Sequence and Structure of
Handwriting Skills. Handwriting: theory, research and practice. New York:
Nichols.
2. Richardson, F.M., Davey, N., Peters, L., Done, D.J., Anthony,
S.H. (2002) Connectionist Models Investigating Representations Formed in the
Sequential Generation of Characters. Proceedings of the 10th European Symposium
on Artificial Neural Networks. D-side publications, Belgium, 83-88.
3. Richardson, F.M. (2000). Stroke properties of capital letters
of the alphabet. Technical Report. University of Hertfordshire.
4. Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986). Learning
internal representations by error propagation. In Parallel Distributed
Processing, vol. 1, chap 8. Cambridge: MIT Press.
5. Thomassen, A., Meulenbroek, R., and Tibosh, H. (1991).
Latencies and kinematics reflect graphic production rules. Human Movement
Science, 10, p271-289.
Contemporary robots are predominantly single-task systems,
operating in specially-designed environments, in which they perform
pre-programmed sequences of actions. While such systems have proved appropriate
and useful on the factory floor, many future applications for robotics will
require control systems with much greater flexibility. An effective strategy
for designing flexible control systems is to reverse-engineer the biological
control systems implemented in animals.
The 'Psikharpax' project is a
multi-partner program which aims at synthesizing an `artificial rat', whose
control systems are modelled on neural circuits of a specific model animal -the
laboratory rat. A control architecture of action selection, inspired by the
neural circuits of the basal ganglia -a group of subcortical nuclei of the
vertebrate brain- was already implemented into a robot which was required to
select efficiently between a set of actions in order to `survive' in an
environment where it could find places for `ingestive' and `digestive'
behaviours. The results have shown that the model was able to generate adaptive
switching which ensured survival. Nevertheless, the lack of a navigation system
occasionally prevents the robot from reaching essential environmental
resources.
The present paper concerns the
connexion of this architecture with a navigation system. This connexion is also
inspired by recent hypotheses concerning the basal ganglia. A particular
nucleus - the striatum – which was formerly modelled as a whole, is now
segregated into dorsal and ventral parts, the latter corresponding to the
nucleus accumbens (Nacc) that is assumed to integrate spatial, motivational and
sensorimotor information. The Nacc selects locomotor actions generated by
various navigation strategies and modulated by the motivations. The dorsal
striatum is in charge of non-spatial tasks selection and of the coordination
with the Nacc selection. Implemented in a simulated robot performing the same
task as before, the whole architecture improves the robot's survival in a large
environment, using the abilities of building a cognitive map and of path
planning. Furthermore, the robot is also able to occasionally neglect the information
recorded in its cognitive map in order to behave opportunistically, i.e. to
reach an unexpected but visible resource, rather than to reach a memorized but
remote one. These results are discussed from the perspective of the pros and
cons of such a biomimetic approach.
Three- to 4-month-old infants presented with cat or dog images
form a category representation for Cat that excludes dogs and a category
representation for Dog that includes cats (Quinn, Eimas, & Rosenkrantz,
1993). We have accounted for this asymmetry by positing an inclusion
relationship in the distribution of features present in the cat and dog images
(Mareschal, French, & Quinn, 2000). Using a combination of computational
modeling and experimental testing of infants, we show that the asymmetry can be
reversed or removed by selecting and manipulating stimulus images that reverse
or remove the inclusion relationship. The findings suggest that categorization of
cat and dog images by young infants is a bottom-up driven process based on
learning occurring within the experimental task. We will also discuss how this work can be fit into a more
neurobiologically plausible framework using Gabor-filtered input. We will discuss various possible future
directions for this research.
It has been found that infants at 3-4 months of age show an
asymmetry in their construction of categories of cats and dogs (Quinn, Eimas
& Rosenkrantz, 1993). 3-4 month old infants familiarized with pictures of
cats respond with interest when subsequently presented with a picture of a dog,
indicating that they have formed a category of cats that excludes dogs. By
contrast, when familiarized with pictures of dogs, the infants do not show
increased interest to subsequent cat pictures, indicating that their category
of dogs includes novel cats. It has previously been argued that this asymmetry
is based on a featural analysis of the stimuli: the characteristic features of
dogs vary broadly and in many cases cover the more narrow range of variation in
the same features of cats. This hypothesis has been modelled an auto-encoder
neural network (Mareschal, Quinn & French, 2002). However, this model had
two weaknesses: first, it relied on a training regime in which all
familiarization stimuli were presented together before adaptation occurred.
This was different to the infant experiments, where pictures were presented
sequentially. Second, their model was unable to account for developmental
change, namely, the disappearance of the asymmetry in older infants.
Here we present a new model of
categorization that overcomes these problems. The model is an auto-encoder
neural network with Gaussian hidden units. The model implements the
'Representational Acuity Hypothesis' which states that objects are represented
on cortical maps in terms of their salient features, and during development
these representations pass from low to high acuity based on a decrease in
neural receptive field sizes. Decrease of receptive field size is modelled with
a decrease in the width of the Gaussian hidden unit activation function. This
model has previously been shown to account for the developmental trajectory in
other perceptual categorization tasks (Westermann & Mareschal, in press).
In contrast to a featural account of the categorization asymmetry, the model
represents objects holistically, albeit with progressively higher degrees of
acuity. With large receptive fields, representations of dogs cover those of
cats, but not vice versa. During development, receptive field sizes shrink and
become better tuned to individual objects, so that representations no longer
overlap and the asymmetry disappears. In this way, the model can account for
the categorization asymmetry in young infants and its disappearance in older
infants. The model thus presents a biologically plausible account of
behavioural change in infancy.
We present a computational model of the emergence of gaze
following behavior in infant caregiver interactions. We regard gaze following
as a skill that infants acquire because they learn that monitoring their
caregiver's direction of gaze allows them to predict where interesting
objects/events in their environment are (Moore, 1996). In particular, we
propose a specific "basic set" of mechanisms that are sufficient for
gaze following to emerge (Fasel et al., 2002). This basic set comprises
perceptual and motivational biases and habituation mechanisms driving the
infant to look at and shift attention between "interesting" visual
stimuli, a generic learning mechanism that learns behavioral strategies to
satisfy these preferences, and a structured environment providing correlations
between where caregivers look and where interesting stimuli are. We formalize
these ideas in a simple model based on temporal difference learning. We analyze
the model and demonstrate that a) the proposed basic set of mechanisms is
indeed sufficient for gaze following to emerge and b) alterations of parameters
of some of the basic set mechanisms motivated by findings on developmental
disorders lead to impairments in the learning of gaze following that are
typical of these disorders.
Autism is a complex and pervasive developmental disorder affecting
social interaction, communication and behavioural flexibility. Cognition,
perception and emotion have all been implicated in theories of autism. There
are many theoretical accounts for this condition including executive disorders,
failures in interpersonal emotional connectedness, poor theories of mind, weak
coherence in low-level cognitive processes and extreme patterns of a
"male-brain" cognitive style.
Recently, several connectionist
approaches have attempted to characterise some of the aspects of autism. They
can all be linked to accounts claiming
that autism is caused by weak central coherence. Cohen (1994) has
invoked large hidden layers in feed-forward networks as a mechanism for
over-fitting and thus a lack of generalisation in autism. Gustafsson (1997) has
suggested that excessive lateral inhibition causing poor and restricted
topographical maps in a Kohonen network could be a cause of over-specific
categorisation. McClelland (2000) has implicated an excessive reliance on
conjunctive encoding as a cause of over-specificity. O'Loughlin and Thagard
(2000) employ symbolic networks to show how a disruption in constraint
satisfaction can cause seemingly autistic behaviour in a linguistic task and a
task requiring a "theory of mind".
Most of these models only cover
very specific aspects of autism and fail to model how low-level deficits
develop into a number of linked high level problems. However, it will probably
be very useful to use connectionist modelling to understand autism (and other
developmental disorders) because such models can be seen to develop in a way
that no verbal description is able to.
This paper will review these models
and other relevant approaches to development and ask how mutually incompatible
they are. Possible syntheses and new ways forward will be suggested including
combinations of learning rules and the implications of new work that has
implicated deficits in binding with autism. If autism is caused by poor binding
mechanisms we can ask whether it is possible to model this purely representationally
or whether mechanisms such as temporal synchrony must be invoked.
Our study is aimed at detecting the factors influencing perceptual
feature creation. By teaching several new perceptual categories in a controlled
setting, we demonstrate the emergence of a new intermediate internal
representation. We focus on contrasting the role of two basic factors that
govern feature creation. The first is the feature-set's discriminative value
measured by the mutual information the feature-set delivers on the required
perceptual categories. The second factor is the feature-set's degree of
parsimony. In our experimental process we use a simple model problem that
requires learning categories based on a conjunction of four binary input
elements. If the feature's discriminative information is the sole dominant
factor in the feature creation process the more informative quadruple features
should emerge. On the other hand if feature-set parsimony plays a major role in
the feature creation process, the internal representation of the learned
categories should be based on pair-features, conjunctions of two input
elements. Several methods of exploring the structure of internal features are
developed using a multi layered perceptron (MLP) artificial neural network.
These methods were empirically implemented in two experiments, both
demonstrating the emergence of an intermediate internal representation that is
based on pair-features. As a result, we conclude that in addition to the
feature-set's discriminative information value, the feature-set's parsimony is
a major factor influencing the hierarchical process of feature creation. Our
results suggest that feature parsimony is maintained not only in order to
optimize the perceptual system's current resource management but also to aid
future category learning.
Results from neurophysiological studies (Gollisch & Herz, in
press) suggest that the energy spectrum (i.e., the square of the amplitude
spectrum) can be used to simulate in an appropriate physiological manner the
spectral integration of sensory neurons. We have attempted to show the
effectiveness of energy-spectum descriptors for neural network simulations of a
high level cognitive task. We used a neurobiologically plausible simulation of
the complex cells in the striate cortex as a perceptual model. This was done
with a bank of Gabor wavelets applied in the Fourier domain in order to
simulate mammalian visual processes (Jones & Palmer, 1987; Jones, Stepnoski
& Palmer, 1987). The energy-value outputs of this perceptual model were
presented to two different kinds of neural network classifiers and their respective
performances were recorded for a recognition/classification task. The stimuli
used correspond to 6 categories of 12 natural scene images: Beach, City,
Forest, Mountain, Indoor and Village. In the first simulation, a
backpropagation autoencoder was reliably able to distinguish a specific
stimulus from other exemplars of the same or different categories. Using a
method proposed by Mareschal & French (1997), we also compared the
classification performance of the autoencoder for unseen exemplars from the
training category versus stimuli from new categories. Categorization is performed in a natural (i.e., self supervised)
manner. In a second simulation, we tested the energy vectors with a standard
backpropagation hetero-associator. Both results show a reliable ability of the
two types of neural networks to categorize and generalize on new exemplars
based on the information provided by the energy spectrum of the natural scene
images.
In an attempt to demonstrate underlying brain mechanisms,
psychophysical data, from an experiment designed to further understand object
feature binding, are modelled using an attractor network. The experiment used a
post-cue rapid serial visual processing paradigm in each condition: spatial,
temporal and spatio-temporal. In the spatial conditions, objects were presented
simultaneously at pre-fixed unique locations: adjacent to, or distant from the
target. In the spatial condition, objects were presented simultaneously at
fixed, unique positions; in the temporal condition, objects were presented
sequentially at the focal point; and in the spatio-temporal condition, objects
were presented sequentially at fixed, unique positions. A ~50% error rate was
determined during practice trails and stimulus onset asynchrony was thus
determined for each condition and each observer. Results indicate that when
objects are distinguishable by their spatial location, these spatial properties
are used preferentially to temporal properties. When both spatial and temporal
properties are available, spatial properties facilitate binding, and temporal
properties hinder it. Our expectation is that the neural model, in line with
physiological evidence of enhanced cortical activation, will exhibit regions of
enhanced activity when the post-cue is input and the target will be retrieved.
By testing the experimental hypotheses on the neural model, we expect to gain a
better understanding of how and where object feature binding takes place.
Attention is arguably a necessary condition for consciousness. The
problem as to what more is needed will be addressed in the talk by using an
engineering control approach to attention, following much work on such an
approach in motor control. After a brief account of the control nature of
attention, an engineering control model for attention will be developed for
both sensory and motor modalities. Simulations of various paradigms will be
described [JGT & M Rogers (2002) Neural Networks 15:309-326; N Fragopanagos
& JGT (2003) Simulating Sensory Motor Attention, KES'03]. The model will
then be extended to the COrollary Discharge of the Movement of Attention
(CODAM) model [JGT (2002) Trends in Cognitive Sciences 6:206-210; JGT (2002) J
Consc Studies 9(4):3-22; JGT (2003) KES'03]. The application of CODAM to a
variety of areas (simulations of pyschological paradigms, such as the
Attentional Blink, schizophrenia, neglect, other deficits] will be covered
briefly in conclusion.
We have recently published SAIM (Selective Attention for
Identification Model,Heinke & Humphreys, 2003, Psych. Review) which uses
competitive processes and interactions between top-down and bottom-up processes
to achieve translation-invariant object recognition. We have shown that SAIM
can account for large amounts of empirical data on normal and impaired human
attention. In this paper we report on an extension of SAIM by a simple feature
extraction. We show that the new version of SAIM can capture important aspects
of findings in visual search experiments, e.g. variations of search slopes and
search asymmetries, depending the contents of search displays. These
simulations represent further support for the hypothesis that parallel
competitive processes could be a source for major findings in visual search
experiments. Additionally we present experimental data examining the influence
of priming on reaction times in visual search. These results confirm
predictions which originate from the interaction between top-down and bottom-up
pathways in SAIM.
Biologically plausible oscillatory neural model of several
cognitive functions is developed. The system architecture includes a hierarchy
of interactive modules which are associated with different stages of visual
information processing:
·
Representation
Module (RM): information encoding, features extraction and representation,
feature binding.
·
Central
Executive (CE): attention focus formation, consecutive selection of objects in visual
field into the attention focus.
·
Memory
Module: memorisation of objects, novelty detection.
Each module consists of interactive phase oscillators with
synchronising and desynchronising interactions. The dynamical functioning of
the system is based on principle of synchronisation and resonant activity
increase. Phase-frequency coding scheme is used in the primary layer of
information representation: grey level of the corresponding pixel is
represented by natural frequency of the oscillator. There are other layers of
RM to represent local (in neighbourhood of the pixel) and invariant features of
objects. A mechanism of feature coding and feature binding is based on the
principle of neural activity synchronisation.
Attention is realised in the system
in the form of synchronisation of the CE with some of RM oscillators. Those
oscillators that work synchronously with the CE are supposed to be included in
the attention focus and their amplitudes drastically increase forming a
resonant ensemble of oscillators. These resonant oscillators demonstrate the
regime of partial synchronisation (some part of RM oscillators work
synchronously with the CE but not all oscillators are synchronised). The
inclusion of the CE in the system design is important because it allows one to
organise the global interaction in the system without connections of
"all-to-all" type, thus avoiding an exponential explosion of the
number of lateral connection At each moment of time the activity of an ensemble
corresponding to a selected object is significantly higher than the activity in
other regions of the network. We demonstrate that the system can perform both
attentional grouping and consecutive selection of objects. Functioning of the
attention system is based on phase-locking mechanism and adaptation of natural
frequency of the CE.
Memory mechanism is based on sparse spatial representation of objects by groups of synchronous oscillators and adaptation of their natural frequencies. Novelty detection is realised via different reaction times on presentation of new and familiar stimulus: tonic (long) respond on new stimulus and phasic (short) reaction on familiar object.
A set of experiments that employ rapid serial visual presentation
(RSVP)have clarified the time course of selective attention by identifying what
has come to be called the attentional blink (AB). Many theoretical explanations
of the phenomenon and associated informal models have been advanced. However,
until recently, no connectionist model had been developed. In response we have
constructed two prototype neural network models of the blink.
In the typical AB experiment,
letters are presented using RSVP at around ten items a second. One letter (T1)
is presented in a distinct colour. It is the (first) target whose identity must
be reported. A second target (T2) follows after some number of intervening
items. For example, the person may have to report whether the letter X was
among list items that followed T1. Detection of T2 is impaired with a
characteristic serial position curve.
Within the context of theoretical
accounts of the blink, Chun and Potter's 2-stage model [1] has been our most
direct inspiration. Their first stage implements a fast identification process,
traces of which decay rapidly and are subject to erasure. It is only through
the second stage that items can be consolidated into working memory.
Both our models are based upon this
two-stage account (although a symbolic computational model [2] has also
influenced us). The first model employs an attentional control system. This
contains two neural pathways, one that feeds activation into the first
recognition stage and the second that feeds activation into the second stage.
However critically, the two pathways compete through lateral inhibition and
thus, only one of them can be active at any instant. It is this attentional
control mechanism that generates the blink.
The second model is more explicit
in its explanation of the stage 2 process of consolidation into working memory.
Specifically, it assumes that in order to consolidate T1 items into working
memory a resonating circuit has to be set-up between working memory cells and
recognition cells. It is this resonating activity that closes the door on T2
processing.
Although very simple, both the
models successfully reproduce the AB serial position curve, showing a clear
decline in performance for serial positions 2, 3, 4 and 5. In addition, the
networks show the characteristic property of lag 1 sparing, i.e. if T2 immediately
follows T1, there is no performance cost. Also our second model reproduces a
number of the other AB properties, e.g. masking effects.
[1] Chun, M. M. and M. C. Potter (1995). "A Two-Stage Model
for Multiple Target
Detection in Rapid Serial Visual Presentation." JEP : HPP
21(1): 109-127.
[2] Barnard, P.J. and H. Bowman (2003) "Rendering information
processing models of cognition and affect computationally explicit: Distributed
executive control and the deployment of attention." Cog Sci Quart, 3(3):32,
(in press).
We present a connectionist model of the development of analogical
reasoning capable of modelling relational analogies between domains of two or more
objects. In the model, relational analogy completion arises as a bi-product of
pattern completion in a dynamic memory system. The current model is an
adaptation of Leech, Mareschal and Cooper (2003) extending it to draw more
complex and abstract analogies. Units are connected by two types of modifiable
connections: fast connections which transmit the current activation of the
units and slow connections which implement a delay transmitting an earlier
activation state of the network. The fast connections drive the network into
attractor states corresponding to objects. The slow connections implement
transformations between states by pushing the network out of its stable state
and into another attractor basin. The fast and slow connections work together to
move the network from one attractor state to another in an ordered way.
Relations are assumed to be transformations between different states of
objects. Analogy is achieved by an earlier example of a relation priming the
network so that subsequent presentation of an object produces the appropriate
analogical response. Since the network can learn transformations between more
than two objects the network can draw analogies involving more than two
objects. We investigate the developmental plausibility of the network by
comparing its performance with that of 3- and 4-year olds on transitive
analogies reported by Goswami (1995).
This paper provides a connectionist account of the processes
underlying the multiple inference model of person impression formation proposed
by Reeder, Kumar, Hesson-McInnis and Trafimow (2002). First, in a replication and extension of one of their main
studies, I found evidence for discounting of trait inferences when facilitating
situational forces were present consistent with earlier causality-based theories,
while at the same time I replicated the lack of discounting in moral inferences
as documented and predicted by Reeder et al.
(2002). Second, to provide an
account of how these different and sometimes contradictory inferences are
formed and integrated in a coherent person impression, I extended existing
recurrent network models of person perception by assuming that perceivers take
into account also the actor's motives. Together with the behavior information,
the connectionist network automatically integrates these inferred motives,
resulting in a pattern of inferences that closely reproduces the observed
data. It is concluded that perceivers
apparently have a much richer knowledge on which they base their inferences
than assumed so far in earlier theories, and that a connectionist approach
appears a plausible candidate to account for this complex integration process.
Computer simulations of the nervous system play an increasingly
prominent role in understanding the way neurons process information. Spiking
neural networks received special attention after experimental evidence
suggested that biological neurons use the timing of the spikes to encode
information and compute. They represent a powerful tool for investigating how
cognitive functions emerge from the properties of basic components that
interact and function cooperatively.
When modelling the dynamics of
large-scale spike-processing networks, time and memory efficiency are crucial
criteria in the design of the simulation. Previous work on the efficient
simulation of such networks has indicated that high performance simulators for
rate-coding networks are not appropriate. The event-driven nature of spiking
neural networks require special attention when designing efficient simulation
environments. Significant efforts have
been made in the last decade to maximise the computational efficiency of these
simulators.
This paper critically reviews
current work on data structures and algorithms, appropriate for efficient
event-driven simulation of spiking neural networks on single processor systems.
These techniques make feasible the simulation of highly active large networks.
However, they place limits on the networks that they can process. We describe
two additional algorithms which we have developed in an attempt to overcome
these limitations. Moreover, simulation studies show that these algorithms
deliver significant performance improvements.
We explore two effects reported in visual word recognition: the
neighbourhood effect and the transposed-letters effect. In the first, the
recognition of a word from a larger lexical neighbourhood is facilitated. In
the second, minimally different words like "bolt" and
"blot" seem to interact during the recognition of one of them. We
show that both of these effects can be captured by the same, anatomically-based
approach to visual word recognition.
Native English speakers include irregular plurals in English
noun-noun compounds (e.g. mice chaser) more frequently than regular plurals
(e.g. *rats chaser) (Gordon, 1985). This dissociation in inflectional
morphology has been argued to stem from an internal and innate morphological
constraint as it is thought that the input to which English speaking children
are exposed is insufficient to signal that regular plurals are prohibited in compounds
but irregulars might be allowed (Marcus, Brinkmann, Clahsen, Wiese &
Pinker, 1995). In addition, this
dissociation in English compounds has been invoked to support the idea that
regular and irregular morphology are mediated by separate cognitive systems
(Pinker, 1999). The evidence of four recurrent connectionist models provides
support for an alternative view that the input the language learner is exposed
to constrains the types of English compounds that are produced. Model 1,
demonstrates that there is a discernable relationship between the [-s] morpheme
and word finality in child directed speech. Thus to include the regular plural
[-s] morpheme internal to words such as compounds contravenes an obvious
pattern in the input. Models 2, 3 and 4 investigate the hypothesis that the
regular plural morpheme is omitted from the middle of compound words because
the pattern noun morpheme [-s]-noun is used to denote possession not plurality
in English. Having make a first order distinction between the function of
various words (nouns, verbs, determiners and adjectives), Model 2, using a
localist coding scheme to represent sentences made up of 38 words (c.f. Elman,
1990), learnt a second order distinction that nouns could appear after some
[-s] morphemes but not others (even though the two [-s] morphemes were encoded
in the same way in the input). With the addition of the absolute minimum of
semantics, (whether the subject of the sentence was a singular or a plural
thing), the model learnt to further differentiate between the plural and the
possessive [-s] morpheme (Model 3) (Hayes, Murphy, Davey, Smith and Peters,
2002). In Model 4, a large training set of natural child directed speech was
employed and the syntactic category of each word (rather than the word) was
input. The actual frequency of each syntactic category in real child directed
speech was represented. Under these realistic input conditions there was a
suggestion that the network was able to recognise that the noun morpheme [-s]
pattern occurred in different patterns when it was plural than when it was
possessive. Specifically, the network showed some indication of being able to
discern that nouns follow possessives but not regular plurals. Thus it is
argued that input the language learner is exposed to does constrain the types
of English compounds that are produced without the need for internal
morphological constraints.
* Gordon, P. (1985). Level-ordering in lexical development.
Cognition, 21, 73-93.
* Hayes, J.A., Murphy, V.A., Davey, N., Smith, P.M., & Peters,
L. (2002). The /s/ morpheme and the compounding phenomenon in English. In W.
D. Gray & C.D. Schunn (Eds.) Proceeding s of the 24th Annual Conference of the Cognitive
science Society. Mahwah, NJ: Lawrence Erlbaum Associates.
* Marcus, G. F., Brinkmann, U., Clahsen,
H., Weise, R., & Pinker, S.
(1995). German inflection: The exception that proves the rule. Cognitive
Psychology, 29, 189-256.
* Pinker, S. (1999). Words and Rules. London:
Weidenfeld & Nicholson.
Large-scale distributed connectionist models of language
processing require distributed representations of orthographic and
phonetic/phonological word forms that allow to capture the similarity
structures among all words in a language. Ideally, such representations should
themselves be learned from the input, without resorting to hand-crafted schemas
such as slot-based templates (e.g.,
CCCVVCCC, ...) that preimpose great amount of structure into the networks.
Additionally, templates ignore the fundamentally sequential nature of human
language, introducing arbitrary restrictions on word length and morphological
complexity. Finally, template-based approaches require artificial manipulations
of the input by means of alignment and introduction of `gaps'. It has been
argued that such manipulations of the original input sequences implicitely
assume some sort of symbolic preprocessing, therefore damaging the sub-symbolic
processing assumptions. Elman (1990; 1993) showed how a Simple Recurrent
Network (SRN) trained on next letter or next phoneme prediction on a
sufficiently large sample of language, developed in its hidden layer detailed
representations of the phonotactics and orthotactics of a particular language.
SRNs overcome most of the problems described above; they provide for a natural
way to represent strings of virtually unlimited length without having to resort
to predefined `possible' structures. In the present study we show how the
weighted accumulation over time of the activation values of neural networks
trained on next-letter or next phoneme prediction renders detailed
representations of the orthographic and phonetic form of all words (and also
for pseudo-words) in a language. These representations render detailed measures
of word similarity that provide a continuous alternative to discrete-valued
techniques for estimating phonetic and orthographic overlap between words, with
the added value that the similarity space takes into consideration the
properties of a particular language. Using this technique, Accumulation of
Expectations, we present examples of building detailed orthographic and
phonetic vectors for the full lexicons of English and Dutch. We show that these
vectors can be succesfully used in connectionist models of lexical processing
with unrestricted vocabulary sizes, including classical tasks in connectionist
modelling such as past-tense formation and word spelling. We conclude by
showing that the similarity spaces defined by such representations accurately
predict human responses in behavioural experiments.
In recent years a number of models of speech segmentation, both
connectionist and non-connectionist, have been developed. Whilst the
non-connectionist models include unsupervised models, the connectionist models
have employed Simple Recurrent Networks (SRNs) trained with back-propagation
and thus requiring an explicit error signal. However, these models have used
networks trained on some other task and then exploited the behaviour of the
network to predict word boundaries making them only indirectly supervised.
In this talk, a connectionist
unsupervised model of speech segmentation, based on Kohonen's Self-Organising
Map, will be presented. The SOM is chosen because it does not need an explicit
error signal and is a biologically plausible architecture (e.g. both the
operation and training of the SOM correspond to process known to occur in the
brain -- the training of SRNs with back-propagation does not), and thus whether
a SOM can become sensitive to the patterns in speech is of interest. During
training it is presented with
phonotactic transcriptions of child-directed speech taken from the
Korman corpus of the CHILDES database. At the end of training, the units which
are activated at the end of utterances are noted and then whenever these are
activated during testing, a word boundary is predicted. To date, this model has
achieved reasonable, though not state of the art, results, with fscores (an
aggregate of precision and recall) of 68.15 for finding word boundaries and
30.20 for finding words, when using purely phonetic input.
A direct comparison of this model
with another connectionist model developed by Christiansen et al, based on the
SRN is currently being worked on and will also be presented in the talk.
Christiansen et al's model is trained to read in utterances phoneme by phoneme
and predict the next phoneme or whether an utterance boundary occurs. The
behaviour of the output unit is then used to predict word boundaries, on the
basis that its activation is higher than for word internal positions. The
comparison will consider both the performance and the plausibility of the 2
models and how they might be made more plausible and their performance
improved.
Finally, the question of why current connectionist models do not
perform as well as the best non-connectionist models (e.g. Brent's INCDROP
model) will be discussed, and it is intended that some results will be
presented based on an attempt to account for the performance gap.
Many aspects of human and animal behaviour require individuals to
learn quickly how to classify the patterns they encounter. For example, which
foods are good or safe to eat, which other animals should be feared, which
environments should be avoided, and so on. One might imagine that evolution by
natural selection would result in neural systems emerging that are very good at
learning things like this. Explicit simulations of the evolution of simple
developmental neural systems, that are required to classify various types of
sensory information, show that such rational behaviour can indeed emerge quite
easily. However, the same simulations
also reveal that there are situations in which evolution lets the species down,
and populations emerge that perform rather irrationally. These populations are effectively trapped in
a local maximum of evolutionary fitness, and are unable to escape into the true
maximum that corresponds to optimal behaviour.
One can speculate on which aspects of human behaviour this might correspond
to. I shall present the results from a selection of my simulations that begin
to explore the issues involved.
We here present the theoretical frame of an ongoing research. Our
aim is to realize an adaptive learning system based on neo-Darwinian evolution
of neural units. We proceed in two complementary directions. On one hand, we
try to automatically compute the costly tuning phase of the configuration and
learning parameters of neural networks (NN)s. On the other hand, we use meiosis
cellular growth as a natural computation technique to bypass palimpsest effects
observed when adding new knowledge to previous one. The main idea is to build
an event guided growing competitive NN that develops while it learns to tune
other models' parameters. NNs are very complex processes, which develop their
structure in time. From real-time process control and analysis, we can set out
that, when faced with too complex processes, usual learning algorithms reach
their limit. It thus becomes necessary to resort to more powerful methods. We
make the hypothesis that, among the existing NN algorithms, there is a
sufficient set of primitives from which a holistic constructive programming
scheme can emerge, by means of evolutionary techniques. We have chosen the
Self-Organizing Map (SOM) model because, as attested by a large amount of
research publications, the model itself is currently evolving, towards adaptive
variants. Moreover, it offers a visual interface of remarkable quality with the
data space in an either familiar and accessible bi-dimensional topologically
ordered representation. And last but not least, its number of configuration and
learning parameters to tune is rather important.
Adaptive topologies are of crucial
interest to automatically determine the size of NNs. Oversized nets lead to
prohibitive training time, while undersized nets can't find the data space
structure. Around back-propagation architectures, efficient growing and pruning
methods allow to find a near-optimal number of hidden units. A review has been
made in [1]. For SOM, things become more difficult because of topological
ordering constraints in the mapping it performs.
The following of this paper is
organized in three parts. We first investigate the main properties of the SOM
algorithm and its evolutionary growing variants. From that frame, in a second
part, we draw out a self-observing Heuristic and a minimal set of properties,
necessary to obtain the emergence of Darwinian evolution among elementary
constituents, only constrained locally by a few deterministic rules. In the last
part, we extract a few neural primitives in order to implement a minimal system
capable of evolving toward self-observation and set out the algorithm.
Human ability in spatial exploration plays an important role in
how humans understand space. This involves an implicitly developed
representation of the spatial layout usually in the form of so-called cognitive
maps. Despite its significance, the process of building such representations is
difficult to study, directly.
Nonetheless, an overt indicator of the quality of spatial knowledge
acquisition is a user's performance on specific spatial navigation tasks. This
work proposes a hybrid connectionist-symbolic model for investigating and
extracting procedural and strategic rules governing human exploratory
behaviour. The experiments were carried out within a Virtual Environment (VE),
which due to its tractable characteristics allowed the moment-to-moment
recording of users' positions and headings. Simulations involved the use of a
simple recurrent network (SRN, Elman, 1990), and the application of the C5.0
method for automatic rule extraction (Quinlan, 1993). An SRN implements a form
of short-term memory, which makes it suitable for application to symbolic tasks
that have a sequential nature, such as language. The C5.0 method is based on
the ID3 algorithms developed by Quinlan (1993), which induces concepts from
examples. It is particularly interesting due to its representation of learned
knowledge and its heuristics for selecting candidate concepts. The SRN
successfully learned to predict user's next position and orientation. The
prediction accuracy of greater than 67% suggested that the neural network
succeeded in acquiring the underlying regularities characterising user
trajectory patterns. The next step consisted in extracting the rules that
characterised human heuristics for exploring an unfamiliar environment. The
analysis of the representation in the SRN hidden layer suggested that distinct
groups of hidden units become specialised for place and direction. The
distributed representations acquired by SRN were investigated by analysing the
individual pattern error in the network prediction. According to their values,
the errors were classified as good predictions and poor predictions and were
input into the C5.0 method. This algorithm provided a set of rules which can be
expressed into a symbolic manner. The benefit of these results is both
theoretical and practical. The extracted rules can be used to confirm findings
in the area of environmental psychology. They can also be harnessed for
building adaptive VEs, designed to assist users with poor navigational skills
to improve their exploratory behaviour.
Elman, J.L.
(1990). Finding
structure in time, Cognitive Science, 14, 179-211.
Quinlan, R. (1993). C5.0 Programs for Machine Learning. Morgan
Kaufmann.
Animate goal-directedness is characterised by highly directed,
persistent, and plastic action where paths are continually shifted towards the
goal.
This paper aims to provide both a potential
basis for improving heuristic search techniques and also new tools for future
psychological research.
Firstly it is argued that the use
of highly directed plastic persistence should provide a type of heuristic
search that contrasts favourably with existing search techniques such as
gradient descent and simulated annealing.
Study of this animate organisation
is also useful for more specific applications such as designing neural
controllers in robots to learn first time, ie. with a single run of heuristic
search. Such an ability may be vital for a truly autonomous robot that may
otherwise get stuck in a hostile environment, and yet cannot re-initialise
itself physically elsewhere to try again. Likewise, re-initialisation of its
control may lead to disjoint and ineffective action.
The paper describes the testing of
a theory of how smooth shifts may be made, to see if it can elucidate
principles underlying smooth shifts in human movement. Specifically, an
experimental framework suggested in [1] is used to create an experiment for
measuring potentially constant shifts in goal-directed action. The experiment
is designed to be analogous to those of Renaissance scientists seeking to
establish constants in inanimate motion. It involves analysing hand movements
using a mouse that are similar to those used to make manual gearshifts in a car
and that involve a degree of free action.
A simple novel form of differential
calculus specially designed for measuring shifts is used to analyse the
movement and characterise the smooth shaping and deviation occurring in the
shifts. The shift shape is then discussed and put forward as being present in a
wide range of animate motion.
[1] Wale, A.P. and Weir, M.K., "Measurement and Design of
Goal-directed Behaviour", Proceedings of the 7th Neural Computation and
Psychology Workshop, World Scientific, 2002.
Saccade control belongs to the broad field of motor control and
sensorimotor coordination. For cognitive modeling in this area, adaptive
internal models (like inverse models) are of central interest. Inverse models
generate motor commands which transform the current into the desired sensory
state. For the learning of inverse models, two main problems arise: The missing
teacher signal, and the necessity to explore sensorimotor spaces. Several
solutions have been proposed, all of them limited in some respect. In the
present work, an alternative learning mechanism is developed for the example of
saccade control, implemented on a stereo vision robot camera head.
A saccade controller can be seen as
an inverse model for a constant sensory goal state in which a chosen target
region appears in the foveae of both eyes. In our model, the saccade controller
produces three motor parameters as output: pan and tilt of the camera head, and
software vergence. As input, it receives the current motor state, and the
coordinates of a selected foveation target in both the left and right camera
image.
The saccade controller is
implemented by a multi-layer perceptron. Training patterns are collected by
generating random saccades. The missing teacher problem is solved in the
following way: Whenever such a random saccade results in a shift of the target
position towards the center in both the left and right image, it is included in
the training set. After the pattern collection, this set contains for most part
movements resulting in over- or undershoots. In order to learn from these
imperfect examples, we exploit the averaging properties of multi-layer
perceptrons in our approach. Due to the structure of the training set, the
resulting performance of the trained controller network will be close to the
optimum saccade.
As a solution to the exploration
problem, we propose a staged learning procedure. A new training set is created,
this time consisting of samples generated by random variation of the output of
the already existing controller. If a random variation results in better
foveation than the original controller output, this movement is included in the
new training set. With the new set, a new controller can be trained, and this
one can again be used for pattern generation. In this way, one can
incrementally improve controllers' performance without the need to search from
scratch in sensorimotor space for the rare learning examples with very good
foveation quality.