Pronomes in Behavior Nets
Technical Report # 95-01, MIT Media Lab, Learning and Common Sense
Section
Bradley Rhodes
MIT Media Lab, Room E15-305D
20 Ames St.
Cambridge, MA 02139
phone: 617-253-9601
fax: 617-253-6215
rhodes@media.mit.edu
1/30/95
Abstract
The problem of creating autonomous agents in a dynamic and
unpredictable environment has led many to propose alternatives to the
classical approaches of AI. Unlike traditional agent archetectures,
these so-called ``nouvelle'' architectures tend to focus much more on
good-enough behavior in unpredictable and noisy environments than on
the ability to solve sophisticated tasks. This research is an attempt
to start closing the gap between traditional and nouvelle styles of
architecture by extending the behavior network algorithm, which was
designed specifically for noisy and unpredictable environments, to
handle more complex and sophisticated problem spaces. Specifically,
this paper examines the problems with using behavior nets in domains
with large numbers of features, objects, or locations which require
differentiation, and discusses extensions which have been implemented
to alleviate the problems. The advantages and limitations of these
extensions are discussed.
Areas: AI architectures, artificial life, real-time systems.
Introduction
The physical world is an inherently messy place, which makes the creation
of autonomous agents that act in the real world quite difficult. An agent
operating in the physical world must contend with a dynamic and
not-completely predictable environment, where information about the
environment is often incomplete or inaccurate. The real world also
presents opportunities and tasks that lead to multiple competing and
conflicting goals, and multiple possible methods for achieving the
same goal. While some have attempted to modify more traditional AI
techniques to handle these complex domains [Hayes-Roth, 1993b],
others have proposed so-called ``nouvelle'' architectures which deemphasize
central representation and problem-space search in favor of control which
is more directly responsive to the physical environment
[Brooks, 1986, Schoppers, Maes, 1989, Nilsson].
The result is two different emphases on the same problem. Traditional
agent architectures designed to operate in the physical world tend to
emphasize the fact that they can handle sophisticated planning tasks, even
in somewhat complex noisy domains. Nouvelle architectures, on the other
hand, tend to emphasize the fact that they can handle complex noisy
domains, even when given somewhat sophisticated planning tasks. There is
still a sizable gap between these two emphases, which becomes clear when
one tries to perform the tasks used to demonstrate one architecture in the
framework of another.
The research described in this paper was inspired by the attempt to
implement more sophisticated planning tasks in the behavior network
architecture [Maes, 1989, Maes, 1991]. In a behavior network, actions
are selected by a spreading activation mechanism through a network where
each possible action (or behavior) is a node called a competence
module. The architecture has been shown to be fast, robust, and
flexible in a dynamic and unpredictable environment [Maes, 1989, Maes,
1991]. Furthermore, it can handle multiple competing or conflicting
goals gracefully.
Because there are no classical variables or variable-passing between
competence modules, the effect of performing an action can be predicted at
network-compile time, which reduces the amount of search necessary at
run-time. However, while the lack of variables gives the algorithm several
advantages, in domains where a single action (for example, ``move to'') can
have a great number of distinct functions or consequences, it can also
require a prohibitive number of competence modules. This paper will first
provide a brief overview of the behavior net algorithm described in
[Maes, 1989], and then examine the types of tasks where the
above-mentioned problem occurs. An extension to behavior nets is proposed
to eliminate the problem, and the results of experiments performed on a
working implementation are discussed along with future directions of this
work.
Behavior Networks
In a Maes' behavior networks [Maes, 1989, Maes, 1991], an autonomous agent is
divided into four basic units: a sensor-processor, a set of behaviors, a
control unit, and effectors. The sensor-processor converts raw sensory
data into high-level state predicates such as wall-is-to-left or
block-is-in-grasper. These predicates are assumed to be updated
continuously, or at least at small enough intervals to be almost
continuous. The behaviors in a system are the actions the agent can make.
Specifically, they are domain-dependent programs that perform relatively
simple low-level actions on the environment (e.g. move-forward).
These behavior primitives are treated as black boxes and can be implemented
in hardware, hand coded, or even implemented as another behavior network.
The control unit is the part that, given the state of the current
sensor-predicates, decides in which of the behaviors to engage. Finally,
effectors carry out the chosen behavior. Several researchers have adopted
this four-part division in attacking physical-world problems
[Agre, Hayes-Roth, 1993a, Maes, 1989, Nilsson]. Behavior nets are concerned
primarily with the control unit and to some extent with the set of
behaviors made available to an agent. The sensor-processor and the
effectors are assumed, but are not dealt with in the algorithm.
Behavior networks model the selection of a particular behavior as an
emergent property of a parallel process. Each possible behavior is
represented as a node in a behavior network. Along with the program that
executes that behavior, a behavior node also contains a list of any
preconditions that must be true for the behavior to be executable, an
add list of predicates, which are expected to become true by execution of
the behavior, and a delete list of predicates, which are expected to
become false. For example, the behavior put ball in box might have
two preconditions: box-open and ball-in-grasper. Its add list
would contain ball-in-box, and its delete list would contain
ball-in-grasper.
These behavior nodes are linked together at compile time with three
types of links that represent causal relations among the behaviors.
- There is a predecessor link from a behavior to every
behavior that can make a precondition true. For example, put ball in
box might have two predecessor links, one to open box, and one to
get ball.
- There is a conflictor link from a behavior to every
behavior that would make a precondition false. For example, there would be
a conflictor link from put ball in box to drop ball.
- There is a successor link going between all behaviors
connected by a predecessor link, but going in the opposite direction,
for example from get ball back to put ball in box.
This network is used to spread activation energy by a specific process
that makes activation energy accumulate in the most relevant behaviors. At
run-time, activation energy is fed into this behavior network from two
sources. First, human provided goals (or motivations) provide
(or remove) activation to all behavior nodes which can directly satisfy (or
conflict with) that goal. For example, the goal ball-in-box
would give activation to put ball in box. Second, every
true sensor-predicate spreads activation to those behaviors with that
predicate in their precondition list.
This energy is then spread internally through the network as follows:
- Every executable behavior spreads activation energy to its
successors
- Every non-executable module spreads activation energy to those
predecessors that will make a false precondition become true
- Every behavior (executable or not) decreases the activation level
of its conflictors
Every time-cycle there is also a decay function to keep the average
activation level in the network constant. When an executable behavior
achieves an activation energy beyond a certain threshold, it is then
enacted.
Indexical-functional aspects and their limitations
Behavior nets do not incorporate classical variables and variable-passing,
and indeed [Maes, 1989] points out that many of the algorithm's advantages
would disappear if they were introduced. Instead of classical variables,
Maes suggests instead that indexical-functional aspects be used
[Agre]. The main idea is that instead of specifying a module as
goto location (x,y) one would specify the location in terms of the real
world, such as go to the nearest doorway. In this way, the location is
completely specified by the real-world itself, and is indexed by the
function of the location (in this case, a way to get into the next room,
which is really what you wanted a doorway for anyway).
The use of indexical-functional aspects reduces the amount of objects in
the problem space from the number of physical objects to the number of
functions of those objects which are relevant to the agent. When there are
very few relevant functions in the task domain, this can be a huge
reduction: so long as you don't care what object you want to pick up, or
always want to pick up the nearest one, then get nearest object works
fine. However, say you wanted to create a tool-using robot which could
work with any of 50 odd tools. In such a situation, you could not simply
specify get the nearest tool because pound a nail would want
only to give energy to a module which got a hammer, drive a screw
would only want a screwdriver, etc.. However, it would still be nice to
simply have a single ``get'' routine, since they all presumably use the
same basic grasping motions.
The problem gets worse when competence modules are highly interdependent.
For example, what if all those get routines needed first to have the tool
in reach to grasp it. This would then require a separate get in reach
of hammer, get in reach of screwdriver, etc, module for each tool.
Similarly, if going to a tool required that its location be know first,
there might have to be a separate find out where hammer is, etc.
Only at this point might we be able to end all the chains with the single
module, explore and find out where everything is. Thus, until a
point in an action chain is reached where a general action will achieve
multiple goals, the number of modules required is the number of actions
times the number of objects upon which the actions are being
performed.
Pronomes with indexical-functional aspects
In the above situation we're stuck because we could only index things
through the physical world, and the physical world didn't have any easy
index for ``the thing I want right now.'' However, say we cheat a little
bit, and consider the physical world to not only be the toolbox, the tools,
and our agent's body, but to also include a little thought balloon above
our agent's head. In this thought balloon we place a single slot labeled
``tool,'' and it is bound to whatever tool our agent is attending to at the
moment. Now, our get routine can be implemented as get attended tool
and a single routine can be used get any tool. Similarly, we would have
get in reach of attended tool and find out where attended tool
is. Now pound nail would have two preconditions,
hammer-attended-as-tool and attended-tool-in-hand. When using
multiple pronomes, one can ``pass off'' aspects of one pronome to another.
For example, the single module attend-to-location-of-attended-tool
would fill the location pronome with the location of the currently
attended tool. Thus, there need be only one module associated with each
object in our real-world, and these ``grounding modules'' would be
attend to hammer, attend to screwdriver, etc..
The above solution has many analogies from the literature. It is probably
most similar to what are called ``pronomes'' in Minsky's The Society
of Mind theory [Minsky, 1986], which he describes as temporary handles
for accessing fragments of mental states. It is also similar to the manner
in which multiple processes communicate through shared memory in a
blackboard architecture [Hayes-Roth, 1985]. Finally, this technique is
similar to the old ``hollow log'' approach to variable-passing in the early
days of assembler coding. Before programmers started passing variables to
subroutines by placing them on the stack, they placed them in a known
hard-coded memory location where that subroutine could later look.
Ordered links
To make pronomes work, we need to add a few features to the standard
behavior networks. Under the classical behavior net algorithm, all
preconditions are given energy in parallel and handled in whatever order
they happen to fire. For example, if a module put ball in box has
two preconditions, ball-in-grasper and in-reach-of-box, the
agent may very well go to the box first (thus achieving half its
preconditions), and then go towards the ball. After leaving the box to go
towards the ball, depending on the details of the network the agent may
return to the box, thus getting into a loop. These loops only occur when
the order in which preconditions are satisfied is important, but this
happens to always be the case when using pronomes in the above proposed
method. For example, it would not do to have get attended tool fire
first, and only then have attend to hammer as tool fire.
To fix this, I've broken the role usually filled by the precondition-list
into three parts. The precondition-list still determines whether a module
is executable, but now predecessor links are created based on an
``ordered-list'' and and ``unordered-list'' of predicates. In this
version, predecessor links are created from module A to module B for every
predicate in module A's ordered and unordered lists which match a predicate
in module B's add list. However, each link which is based on an ordered
list have an extra predicate associated with it called the
``relevant-when'' list, which contains all previous predicates in the
ordered list. When any of the predicates in the relevant-when list is
false, the algorithm acts as if that link simply doesn't exist. Successor
links are created such that for every predecessor link there is a successor
link pointing in the reverse direction with the same relevant-when list.
Predecessor and successor links based on unordered lists work just the same
as in the standard algorithm.
In short, this means that if pound nail has an ordered list of
hammer-attended-as-tool and then attended-tool-in-hand, it will
first give energy to attend-to-hammer-as-tool and only after the
hammer is attended will it give energy to get attended tool. If for
some reason hammer-attended-as-tool becomes false while giving energy
to get attended tool, pound nail would once again only give
energy to attend to hammer as tool. It further means that until
hammer-attended-as-tool is satisfied, any executable module that satisfies
attended-tool-in-hand will still not send energy down its successor
link to pound nail. Ordered lists are similar in nature to
Teleo-Reactive trees, as described in [Nilsson], except that ordered
lists specify only the order in which predicates should be made true
regardless of what actions will be fired to do so, while TR trees specify
the order in which specific actions should be taken.
Relevance of modules
The add list for get attended tool will contain only
attended-tool-in-hand, as that is what we expect when it fires. What
should the add list be for a module which specialized in only getting one
type of tool, for example buy hammer at hammers-R-us. We don't want
it to have hammer-in-hand, because all the other modules have
attended-tool-in-hand in their precondition list, and wouldn't know to
give energy to buy hammer at hammers-R-us even if a hammer were
attended. But we also don't want to set its add list to
attended-tool-in-hand, because this would get energy even when a hammer is
not the desired tool.
For this situation, a relevant-when field has been added to all competence
modules as well. This field simply contains a list of predicates, all of
which must be true in order for the module to receive activation, give
activation, or fire. In essence, if the module's relevant-when clause is
not true, the algorithm ignores that module for everything except the decay
function. In the above example, buy hammer at hammers-R-us would
have attended-tool-in-hand as its add list, but would also have
hammer-attended-as-tool as its relevant-when list. In this way, if a
hammer is ever attended the module is available as an alternative to the
generic module get attended tool, otherwise it is ignored.
It should be noted that the extensions described above create a strict
superset of the classical behavior nets algorithm. By not using pronomes
as indexical-functional aspects, by clearing all relevant-when clauses and
ordered-lists, and by setting all unordered-lists to be the same as
predecessor lists, the new and old algorithm are identical.
Discussion
It was stated above that the classical behavior nets algorithm requires a
number of modules on the order of the number of actions times the number of
differentiated objects being acted upon. When using pronomes, this number
is reduced to being linear with the number of actions and number of
objects. This reduction is good in its own right, but it may become vital
in a system where new information is being learned about the world all the
time. When using pronomes, if the agent learns of a new object in the
world it need only create a single new competence module to attend to that
object, and it becomes possible to get that object, drop that object, look
for it, put it in things, etc. Under such a system, it would seem possible
to make an agent which would remember what it's seen and be able to go back
to it later. It would also seem possible to make an agent which utilizes
temporary attention-getting competence modules to reference objects
relevant to the current situation but otherwise unimportant. For example,
say we want to create a librarian agent who retrieves requested books.
Even under the old architecture one could make a module such as get
book that was requested, but that only allows for a single requested book
at a time. Much better would be if each request created, on the fly, a
competence module to attend to that book and that requester, as well as a
goal specifying that those two be attended to and then that
attended-person-have-attended-object become true. In this implementation,
different people, different books, and different locations can all be given
different priorities, allowing for more control over how the different
goals and subgoals should be weighted.
Limitations of the current algorithm, and future work
The extensions described have been implemented and tested, and while under
normal conditions it works as expected there are some difficulties and
limitations.
The biggest limitation is that the algorithm is not very robust in the face
of actions failing to achieve what their add-list claims they will. Such
failures might occur because of a fault in the actuators being fired or
might be due to the fact that the add-list is based on heuristics, and
sometimes heuristics just don't work. In such situations, the system would
often have too much inertia, where it would keep firing the failed action
forever. If the parameters in the system were set to lower the amount of
inertia, often the system would waver between two alternative solutions to
a goal but never stick to a single plan. The difficulty seems to come
because there is no memory of what of a competence modules have fired and
whether they were successful or not. When the number of actions required
to achieve a goal are relatively small, as they are in the domains for
which the algorithm was originally designed, this does not cause a problem
because if a module fires the system will usually naturally try a different
one next. However, when there are five or even ten steps between a goal
and its eventual satisfaction there is time for the inertia to build either
build back up for the failed module (in the case of too much inertia) or to
build up activation for an alternative solution to a task (in the case of
too little inertia). A solution to this problem would seem to require a
method by which failed modules are weakened for a period of time, but which
slowly regain their strength as time goes on.
On a related note, there is another difficulty in working with the much
more complicated behavior nets being used here. Say we have the competence
module pound nail who's ordered list is hammer-attended-as-tool
and attended-object-in-hand. In this case, first the hammer would be
attended and then all modules which could get the hammer gain energy. But
what if there were no modules which could get a hammer, or if all those
modules which could have gotten a hammer were found to be faulty? In that
case, we would like our agent to give up on pounding the nail and go on to
satisfy other goals, but instead pound nail becomes a ``dog in the
manger'' module which can never fire, but still insists upon attending to
the hammer as a tool and not letting any other modules use the tool
pronome. This problem can actually come up in the classical behavior
networks algorithm as well when there are multiple preconditions for a
module, but again it comes up less often because the networks are simpler
in the more usual domains. A solution to both the problem with
dog-in-the-manger modules and with faulty modules has been implemented and
is currently being tested.
An interesting possible future enhancement would be to have the state of
the world give activation to all attention pronomes when things relating to
them come up. For example, right now if pound nail is attempting to
attend to the hammer as its tool to use, even if the agent wanders by a
hammer it won't think to pick it up, because get attended tool
doesn't know that the hammer is what it might want to look for yet. Were the
state of the world to give more energy to attention-getting modules for
which the object or location is close-by, they would in turn give more to
their successors and the agent would be more likely to be opportunistic.
Such a scheme would be reminiscent of Minsky's K-lines, though much more
simplistic in design [Minsky, 1986].
Conclusion
Most agent architectures for real-world control still tend either to come
from a traditional, symbolic processing approach or from a nouvelle
approach, and the architecture described above is no exception. However,
it is hoped that the extensions proposed for the behavior nets algorithm
can be a small step towards bridging the gap between the functionality of
traditional and nouvelle architectures.
Acknowledgments
The author wishes to thank Barbara Hayes-Roth, Pattie Maes, and Marvin
Minsky for their suggestions and inspiration.
Bibliography
Agre, P. & Chapman, D. (1987), Pengi:
An Implementation of a Theory of Activity. Proceedings of the Sixth
National Conference on Artificial Intelligence, AAAI-87. Los Altos, CA:
Morgan Kaufmann.
Brooks, R. (1986), A Robust Layered Control System for a Mobile Robot.
IEEE Journal of Robotics and Automation, March.
Hayes-Roth, B. (1985). A
blackboard architecture for control. Artificial Intelligence, 1985,
26, 251-321.
Hayes-Roth, B. (1993). A
Satisficing Cycle for Real-Time Reasoning in Intelligent Agents.
Expert Systems with Applications. Submitted to KSL February 1993. KSL
93-17
Hayes-Roth, B. (1993). An
Architecture for Adaptive Intelligent Systems. AIJ Special Issue on
Agents and Interactivity, 1993.
Maes, P. (1989), How To Do the Right
Thing. Connection Science Journal, 1, no.3, 291-323.
Maes, P. (1991), A Bottom-up mechanism for
Behavior Selection in an Artificial Creature, Proceedings of the first
International Conference on Simulation of Adaptive Behavior, Meyer J.A.
and Wilson S. (eds). MIT Press.
Minsky, M. (1986), The Society
of Mind. New York; Simon & Schuster.
Nilsson, N. (1994),
Teleo-Reactive Programs for Agent Control. Journal of Artificial
Intelligence Research 1 (1994), 139-158.
Schoppers, M.J. (1987). Universal
Plans for Reactive Robots in Unpredictable Domains. In Proceedings of
IJCAI-87, San Francisco, CA: Morgan Kaufmann.
Sussman, G. (1975) A computer Model of
Skill Acquisition. New York: Elsevier.