Pronomes in Behavior Nets

Technical Report # 95-01, MIT Media Lab, Learning and Common Sense Section

Bradley Rhodes
MIT Media Lab, Room E15-305D
20 Ames St.
Cambridge, MA  02139
phone: 617-253-9601
fax: 617-253-6215
rhodes@media.mit.edu
1/30/95

Abstract

The problem of creating autonomous agents in a dynamic and unpredictable environment has led many to propose alternatives to the classical approaches of AI. Unlike traditional agent archetectures, these so-called ``nouvelle'' architectures tend to focus much more on good-enough behavior in unpredictable and noisy environments than on the ability to solve sophisticated tasks. This research is an attempt to start closing the gap between traditional and nouvelle styles of architecture by extending the behavior network algorithm, which was designed specifically for noisy and unpredictable environments, to handle more complex and sophisticated problem spaces. Specifically, this paper examines the problems with using behavior nets in domains with large numbers of features, objects, or locations which require differentiation, and discusses extensions which have been implemented to alleviate the problems. The advantages and limitations of these extensions are discussed.

Areas: AI architectures, artificial life, real-time systems.


Introduction

The physical world is an inherently messy place, which makes the creation of autonomous agents that act in the real world quite difficult. An agent operating in the physical world must contend with a dynamic and not-completely predictable environment, where information about the environment is often incomplete or inaccurate. The real world also presents opportunities and tasks that lead to multiple competing and conflicting goals, and multiple possible methods for achieving the same goal. While some have attempted to modify more traditional AI techniques to handle these complex domains [Hayes-Roth, 1993b], others have proposed so-called ``nouvelle'' architectures which deemphasize central representation and problem-space search in favor of control which is more directly responsive to the physical environment [Brooks, 1986, Schoppers, Maes, 1989, Nilsson].

The result is two different emphases on the same problem. Traditional agent architectures designed to operate in the physical world tend to emphasize the fact that they can handle sophisticated planning tasks, even in somewhat complex noisy domains. Nouvelle architectures, on the other hand, tend to emphasize the fact that they can handle complex noisy domains, even when given somewhat sophisticated planning tasks. There is still a sizable gap between these two emphases, which becomes clear when one tries to perform the tasks used to demonstrate one architecture in the framework of another.

The research described in this paper was inspired by the attempt to implement more sophisticated planning tasks in the behavior network architecture [Maes, 1989, Maes, 1991]. In a behavior network, actions are selected by a spreading activation mechanism through a network where each possible action (or behavior) is a node called a competence module. The architecture has been shown to be fast, robust, and flexible in a dynamic and unpredictable environment [Maes, 1989, Maes, 1991]. Furthermore, it can handle multiple competing or conflicting goals gracefully.

Because there are no classical variables or variable-passing between competence modules, the effect of performing an action can be predicted at network-compile time, which reduces the amount of search necessary at run-time. However, while the lack of variables gives the algorithm several advantages, in domains where a single action (for example, ``move to'') can have a great number of distinct functions or consequences, it can also require a prohibitive number of competence modules. This paper will first provide a brief overview of the behavior net algorithm described in [Maes, 1989], and then examine the types of tasks where the above-mentioned problem occurs. An extension to behavior nets is proposed to eliminate the problem, and the results of experiments performed on a working implementation are discussed along with future directions of this work.

Behavior Networks

In a Maes' behavior networks [Maes, 1989, Maes, 1991], an autonomous agent is divided into four basic units: a sensor-processor, a set of behaviors, a control unit, and effectors. The sensor-processor converts raw sensory data into high-level state predicates such as wall-is-to-left or block-is-in-grasper. These predicates are assumed to be updated continuously, or at least at small enough intervals to be almost continuous. The behaviors in a system are the actions the agent can make. Specifically, they are domain-dependent programs that perform relatively simple low-level actions on the environment (e.g. move-forward). These behavior primitives are treated as black boxes and can be implemented in hardware, hand coded, or even implemented as another behavior network. The control unit is the part that, given the state of the current sensor-predicates, decides in which of the behaviors to engage. Finally, effectors carry out the chosen behavior. Several researchers have adopted this four-part division in attacking physical-world problems [Agre, Hayes-Roth, 1993a, Maes, 1989, Nilsson]. Behavior nets are concerned primarily with the control unit and to some extent with the set of behaviors made available to an agent. The sensor-processor and the effectors are assumed, but are not dealt with in the algorithm.

Behavior networks model the selection of a particular behavior as an emergent property of a parallel process. Each possible behavior is represented as a node in a behavior network. Along with the program that executes that behavior, a behavior node also contains a list of any preconditions that must be true for the behavior to be executable, an add list of predicates, which are expected to become true by execution of the behavior, and a delete list of predicates, which are expected to become false. For example, the behavior put ball in box might have two preconditions: box-open and ball-in-grasper. Its add list would contain ball-in-box, and its delete list would contain ball-in-grasper.

These behavior nodes are linked together at compile time with three types of links that represent causal relations among the behaviors.

This network is used to spread activation energy by a specific process that makes activation energy accumulate in the most relevant behaviors. At run-time, activation energy is fed into this behavior network from two sources. First, human provided goals (or motivations) provide (or remove) activation to all behavior nodes which can directly satisfy (or conflict with) that goal. For example, the goal ball-in-box would give activation to put ball in box. Second, every true sensor-predicate spreads activation to those behaviors with that predicate in their precondition list.

This energy is then spread internally through the network as follows:

Every time-cycle there is also a decay function to keep the average activation level in the network constant. When an executable behavior achieves an activation energy beyond a certain threshold, it is then enacted.

Indexical-functional aspects and their limitations

Behavior nets do not incorporate classical variables and variable-passing, and indeed [Maes, 1989] points out that many of the algorithm's advantages would disappear if they were introduced. Instead of classical variables, Maes suggests instead that indexical-functional aspects be used [Agre]. The main idea is that instead of specifying a module as goto location (x,y) one would specify the location in terms of the real world, such as go to the nearest doorway. In this way, the location is completely specified by the real-world itself, and is indexed by the function of the location (in this case, a way to get into the next room, which is really what you wanted a doorway for anyway).

The use of indexical-functional aspects reduces the amount of objects in the problem space from the number of physical objects to the number of functions of those objects which are relevant to the agent. When there are very few relevant functions in the task domain, this can be a huge reduction: so long as you don't care what object you want to pick up, or always want to pick up the nearest one, then get nearest object works fine. However, say you wanted to create a tool-using robot which could work with any of 50 odd tools. In such a situation, you could not simply specify get the nearest tool because pound a nail would want only to give energy to a module which got a hammer, drive a screw would only want a screwdriver, etc.. However, it would still be nice to simply have a single ``get'' routine, since they all presumably use the same basic grasping motions.

The problem gets worse when competence modules are highly interdependent. For example, what if all those get routines needed first to have the tool in reach to grasp it. This would then require a separate get in reach of hammer, get in reach of screwdriver, etc, module for each tool. Similarly, if going to a tool required that its location be know first, there might have to be a separate find out where hammer is, etc. Only at this point might we be able to end all the chains with the single module, explore and find out where everything is. Thus, until a point in an action chain is reached where a general action will achieve multiple goals, the number of modules required is the number of actions times the number of objects upon which the actions are being performed.

Pronomes with indexical-functional aspects

In the above situation we're stuck because we could only index things through the physical world, and the physical world didn't have any easy index for ``the thing I want right now.'' However, say we cheat a little bit, and consider the physical world to not only be the toolbox, the tools, and our agent's body, but to also include a little thought balloon above our agent's head. In this thought balloon we place a single slot labeled ``tool,'' and it is bound to whatever tool our agent is attending to at the moment. Now, our get routine can be implemented as get attended tool and a single routine can be used get any tool. Similarly, we would have get in reach of attended tool and find out where attended tool is. Now pound nail would have two preconditions, hammer-attended-as-tool and attended-tool-in-hand. When using multiple pronomes, one can ``pass off'' aspects of one pronome to another. For example, the single module attend-to-location-of-attended-tool would fill the location pronome with the location of the currently attended tool. Thus, there need be only one module associated with each object in our real-world, and these ``grounding modules'' would be attend to hammer, attend to screwdriver, etc..

The above solution has many analogies from the literature. It is probably most similar to what are called ``pronomes'' in Minsky's The Society of Mind theory [Minsky, 1986], which he describes as temporary handles for accessing fragments of mental states. It is also similar to the manner in which multiple processes communicate through shared memory in a blackboard architecture [Hayes-Roth, 1985]. Finally, this technique is similar to the old ``hollow log'' approach to variable-passing in the early days of assembler coding. Before programmers started passing variables to subroutines by placing them on the stack, they placed them in a known hard-coded memory location where that subroutine could later look.

Ordered links

To make pronomes work, we need to add a few features to the standard behavior networks. Under the classical behavior net algorithm, all preconditions are given energy in parallel and handled in whatever order they happen to fire. For example, if a module put ball in box has two preconditions, ball-in-grasper and in-reach-of-box, the agent may very well go to the box first (thus achieving half its preconditions), and then go towards the ball. After leaving the box to go towards the ball, depending on the details of the network the agent may return to the box, thus getting into a loop. These loops only occur when the order in which preconditions are satisfied is important, but this happens to always be the case when using pronomes in the above proposed method. For example, it would not do to have get attended tool fire first, and only then have attend to hammer as tool fire.

To fix this, I've broken the role usually filled by the precondition-list into three parts. The precondition-list still determines whether a module is executable, but now predecessor links are created based on an ``ordered-list'' and and ``unordered-list'' of predicates. In this version, predecessor links are created from module A to module B for every predicate in module A's ordered and unordered lists which match a predicate in module B's add list. However, each link which is based on an ordered list have an extra predicate associated with it called the ``relevant-when'' list, which contains all previous predicates in the ordered list. When any of the predicates in the relevant-when list is false, the algorithm acts as if that link simply doesn't exist. Successor links are created such that for every predecessor link there is a successor link pointing in the reverse direction with the same relevant-when list. Predecessor and successor links based on unordered lists work just the same as in the standard algorithm.

In short, this means that if pound nail has an ordered list of hammer-attended-as-tool and then attended-tool-in-hand, it will first give energy to attend-to-hammer-as-tool and only after the hammer is attended will it give energy to get attended tool. If for some reason hammer-attended-as-tool becomes false while giving energy to get attended tool, pound nail would once again only give energy to attend to hammer as tool. It further means that until hammer-attended-as-tool is satisfied, any executable module that satisfies attended-tool-in-hand will still not send energy down its successor link to pound nail. Ordered lists are similar in nature to Teleo-Reactive trees, as described in [Nilsson], except that ordered lists specify only the order in which predicates should be made true regardless of what actions will be fired to do so, while TR trees specify the order in which specific actions should be taken.

Relevance of modules

The add list for get attended tool will contain only attended-tool-in-hand, as that is what we expect when it fires. What should the add list be for a module which specialized in only getting one type of tool, for example buy hammer at hammers-R-us. We don't want it to have hammer-in-hand, because all the other modules have attended-tool-in-hand in their precondition list, and wouldn't know to give energy to buy hammer at hammers-R-us even if a hammer were attended. But we also don't want to set its add list to attended-tool-in-hand, because this would get energy even when a hammer is not the desired tool. For this situation, a relevant-when field has been added to all competence modules as well. This field simply contains a list of predicates, all of which must be true in order for the module to receive activation, give activation, or fire. In essence, if the module's relevant-when clause is not true, the algorithm ignores that module for everything except the decay function. In the above example, buy hammer at hammers-R-us would have attended-tool-in-hand as its add list, but would also have hammer-attended-as-tool as its relevant-when list. In this way, if a hammer is ever attended the module is available as an alternative to the generic module get attended tool, otherwise it is ignored. It should be noted that the extensions described above create a strict superset of the classical behavior nets algorithm. By not using pronomes as indexical-functional aspects, by clearing all relevant-when clauses and ordered-lists, and by setting all unordered-lists to be the same as predecessor lists, the new and old algorithm are identical.

Discussion

It was stated above that the classical behavior nets algorithm requires a number of modules on the order of the number of actions times the number of differentiated objects being acted upon. When using pronomes, this number is reduced to being linear with the number of actions and number of objects. This reduction is good in its own right, but it may become vital in a system where new information is being learned about the world all the time. When using pronomes, if the agent learns of a new object in the world it need only create a single new competence module to attend to that object, and it becomes possible to get that object, drop that object, look for it, put it in things, etc. Under such a system, it would seem possible to make an agent which would remember what it's seen and be able to go back to it later. It would also seem possible to make an agent which utilizes temporary attention-getting competence modules to reference objects relevant to the current situation but otherwise unimportant. For example, say we want to create a librarian agent who retrieves requested books. Even under the old architecture one could make a module such as get book that was requested, but that only allows for a single requested book at a time. Much better would be if each request created, on the fly, a competence module to attend to that book and that requester, as well as a goal specifying that those two be attended to and then that attended-person-have-attended-object become true. In this implementation, different people, different books, and different locations can all be given different priorities, allowing for more control over how the different goals and subgoals should be weighted.

Limitations of the current algorithm, and future work

The extensions described have been implemented and tested, and while under normal conditions it works as expected there are some difficulties and limitations. The biggest limitation is that the algorithm is not very robust in the face of actions failing to achieve what their add-list claims they will. Such failures might occur because of a fault in the actuators being fired or might be due to the fact that the add-list is based on heuristics, and sometimes heuristics just don't work. In such situations, the system would often have too much inertia, where it would keep firing the failed action forever. If the parameters in the system were set to lower the amount of inertia, often the system would waver between two alternative solutions to a goal but never stick to a single plan. The difficulty seems to come because there is no memory of what of a competence modules have fired and whether they were successful or not. When the number of actions required to achieve a goal are relatively small, as they are in the domains for which the algorithm was originally designed, this does not cause a problem because if a module fires the system will usually naturally try a different one next. However, when there are five or even ten steps between a goal and its eventual satisfaction there is time for the inertia to build either build back up for the failed module (in the case of too much inertia) or to build up activation for an alternative solution to a task (in the case of too little inertia). A solution to this problem would seem to require a method by which failed modules are weakened for a period of time, but which slowly regain their strength as time goes on. On a related note, there is another difficulty in working with the much more complicated behavior nets being used here. Say we have the competence module pound nail who's ordered list is hammer-attended-as-tool and attended-object-in-hand. In this case, first the hammer would be attended and then all modules which could get the hammer gain energy. But what if there were no modules which could get a hammer, or if all those modules which could have gotten a hammer were found to be faulty? In that case, we would like our agent to give up on pounding the nail and go on to satisfy other goals, but instead pound nail becomes a ``dog in the manger'' module which can never fire, but still insists upon attending to the hammer as a tool and not letting any other modules use the tool pronome. This problem can actually come up in the classical behavior networks algorithm as well when there are multiple preconditions for a module, but again it comes up less often because the networks are simpler in the more usual domains. A solution to both the problem with dog-in-the-manger modules and with faulty modules has been implemented and is currently being tested. An interesting possible future enhancement would be to have the state of the world give activation to all attention pronomes when things relating to them come up. For example, right now if pound nail is attempting to attend to the hammer as its tool to use, even if the agent wanders by a hammer it won't think to pick it up, because get attended tool doesn't know that the hammer is what it might want to look for yet. Were the state of the world to give more energy to attention-getting modules for which the object or location is close-by, they would in turn give more to their successors and the agent would be more likely to be opportunistic. Such a scheme would be reminiscent of Minsky's K-lines, though much more simplistic in design [Minsky, 1986].

Conclusion

Most agent architectures for real-world control still tend either to come from a traditional, symbolic processing approach or from a nouvelle approach, and the architecture described above is no exception. However, it is hoped that the extensions proposed for the behavior nets algorithm can be a small step towards bridging the gap between the functionality of traditional and nouvelle architectures.

Acknowledgments

The author wishes to thank Barbara Hayes-Roth, Pattie Maes, and Marvin Minsky for their suggestions and inspiration.

Bibliography

Agre, P. & Chapman, D. (1987), Pengi: An Implementation of a Theory of Activity. Proceedings of the Sixth National Conference on Artificial Intelligence, AAAI-87. Los Altos, CA: Morgan Kaufmann.

Brooks, R. (1986), A Robust Layered Control System for a Mobile Robot. IEEE Journal of Robotics and Automation, March.

Hayes-Roth, B. (1985). A blackboard architecture for control. Artificial Intelligence, 1985, 26, 251-321.

Hayes-Roth, B. (1993). A Satisficing Cycle for Real-Time Reasoning in Intelligent Agents. Expert Systems with Applications. Submitted to KSL February 1993. KSL 93-17

Hayes-Roth, B. (1993). An Architecture for Adaptive Intelligent Systems. AIJ Special Issue on Agents and Interactivity, 1993.

Maes, P. (1989), How To Do the Right Thing. Connection Science Journal, 1, no.3, 291-323.

Maes, P. (1991), A Bottom-up mechanism for Behavior Selection in an Artificial Creature, Proceedings of the first International Conference on Simulation of Adaptive Behavior, Meyer J.A. and Wilson S. (eds). MIT Press.

Minsky, M. (1986), The Society of Mind. New York; Simon & Schuster.

Nilsson, N. (1994), Teleo-Reactive Programs for Agent Control. Journal of Artificial Intelligence Research 1 (1994), 139-158.

Schoppers, M.J. (1987). Universal Plans for Reactive Robots in Unpredictable Domains. In Proceedings of IJCAI-87, San Francisco, CA: Morgan Kaufmann.

Sussman, G. (1975) A computer Model of Skill Acquisition. New York: Elsevier.