Yet Another Visitor Pattern
©The C++ Report
In this column I am going to present a variation of the Visitor Pattern  (tentatively called the Indirect Visitor). Along the way I will show some real world uses of RTTI (Run Time Type Identification), as well as a number of elements from the STL (Standard Template Library).
First I have to get out my pocket soap-box and make a comment or two on the "patterns movement". If the number of books and magazine articles is any indication, the patterns movement has really become popular within the software community. Since I happen to like patterns, I think this is a "Good Thing". One of the primary problems of the software industry is that it suffers from a very high failure rate. The success/failure ratio of software projects, especially large scale software projects (precisely the kind C++ was developed to deal with) is pitiful. Learning and applying patterns is one way to improve the chances of success on large software projects.
Unfortunately, I see some disturbing signs down in the trenches. I have come across several instances of patterns being applied incorrectly or inappropriately. There is already a certain mystique about "patterns" that make those less knowledgeable about them reluctant to modify code that appears to contain a pattern (i.e. the comments say it does). If the pattern isnt the right one for the problem, or it wasnt applied correctly, this can lead to maintenance problems down the road.
Those of you who are using patterns effectively already know this. The reason I elaborate on it now is that I have a suspicion that the number of effective pattern users is a whole lot smaller than the number of software developers who claim to "know" patterns. Learning to use patterns effectively takes practice; and since most patterns tend to represent large scale solutions, this kind of practice is not something you are going to get on a day-to-day basis in the typical software shop. This means that really learning to use patterns takes a certain commitment to studying them on your own.
So I want to raise a small red flag. On one side is a caution to those who are truly interested in patterns: we owe it to our fellow developers to actually know what we are doing before we foist our "designs" onto other people. This is always a good idea, but with patterns it is just a little more important because of that mystique I mentioned above. On the other side of the flag is a warning to all those developers whose knowledge of patterns ranges from non-existent up through knowing the names of some of them: there is nothing sacred about patterns and those who use them are not infallible. You owe it to yourself, your company, and your career to learn at least enough about patterns so that you can competently use, maintain, or enhance a design that is based on one or more of the common patterns.
Not everybody needs to be a pattern guru - someone able to instantly recognize the applicability of a Decorator or a Visitor or whatever to a given problem. All it takes is one person to recognize the problem and give it a name. On the other hand, once someone has recommended a pattern be used, everybody on the development team should have enough expertise to be able to evaluate the validity of the recommendation. Otherwise we run the risk of getting stuck with half-baked solutions that either don't solve the problem, or can actually make it worse.
Enough flag waving. On to some pattern munging. One of the more popular patterns is the Visitor (in fact Dr. Martin informs me that Visitor is the most studied pattern). A number of articles have appeared in this magazine[2,3] about how to enhance the standard Visitor pattern as it was described in GOF. Personally, I suspect that one of the reasons Visitor is so popular is because of a dirty little secret of the object oriented software community.
All the good text books and the best instructors will tell you that inheritance is about tailoring behavior and not about data. Furthermore, they will also tell you that you don't want to use inheritance as a way of creating heterogeneous collections, especially in C++ (use templates to create homogenous collections). The "dirty secret" that I referred to is the fact that some of the largest inheritance hierarchies that I have ever encountered were data hierarchies. By that, I mean the different concrete classes in the inheritance tree represented different data types. Furthermore, the reason these hierarchies existed was precisely to allow the different data types to be stored in a collection.
These hierarchies come into existence when an application (or domain) needs to deal with different data types in a uniform way. The domain I am most familiar with is communications protocols. At one level a message is a collection of data fields. At a lower level, each of those fields may represent a different type of data. When we model this situation in software, we get an abstract base class (DataField) with a number of concrete subclasses representing different data types, i.e. a data hierarchy. Usually, at least one of the classes in the data hierarchy represents some composition of other data elements from the hierarchy (the Composite pattern). This allows arbitrarily complex collections (i.e. messages) to be formed.
While a data hierarchy usually comes about because a certain application or domain needs it, all data hierarchies share similar characteristics. A truly general purpose data hierarchy can probably be used for many different types of applications. In my current work, we have a general purpose data hierarchy. It currently consists of 64 concrete classes, organized into three different sub-trees. The reason this hierarchy has gotten so big is that it is useful, and that it is easy to extend. After some experience with such hierarchies on a couple of different projects, I have concluded that to be truly general purpose, a data hierarchy must provide three types of functionality.
1. Data accessors and mutators. Obviously, the concrete classes must provide means to set and retrieve the specific data they contain. These functions can not be virtual since they will take different data types as parameters and return different data types. Nevertheless, a well designed data hierarchy will have a uniform convention for naming these functions (set and get are popular).
2. An RTTI (Run Time Type Identification) mechanism. This is required because at some point in an application, it will be necessary to actually get at the data contained in an object. Since the accessors and mutators are not virtual, this will mean down-casting a base class reference to the specific type of the object. RTTI comes for free in Standard C++ (provided you have at least one virtual function, and the base class destructor had better be virtual). If you are still using a version of C++ that does not support RTTI, you will have to provide your own (see my column ).
3. Support for the Visitor pattern. Since a data hierarchy is not about operations per se, a basic data hierarchy will provide no operations at all. Different applications will need different operations performed on the data. The Visitor pattern allows users to easily add operations to an existing hierarchy.
The Visitor pattern represents a special case of the more general problem of double dispatching. For a good overview of double dispatching, why it is something of a problem, and some of the ways you can implement it, I have to recommend Scott Meyers second book , in particular Item 31 - "Making functions virtual with respect to more than one object". I suggest you get out your copy of MEC++ and review Item 31 before reading further. As Scott notes on page 230, implementing double dispatching is not exactly easy. Besides being both informative and entertaining, if you understand the solutions Scott presents, it will save me the trouble of having to explain a lot of things later. In many ways the Indirect Visitor is an outgrowth of my attempt to build upon one of Scott's approaches to double dispatch. (If you dont have a copy of MEC++ go out and get one -- I'll wait).
With Scotts permission, I will use his GameObject hierarchy for my example. It keeps me from having to explain my own.
Unfortunately, the typical Visitor pattern has a few shortcomings. The best known of these is the circular dependency that exists between the Visitor class and all the Node classes. This makes extending the Node hierarchy difficult (painful becomes the operative word when the Node hierarchy gets large). The Indirect Visitor is my attempt to address this problem, amongst others.
The Indirect Visitor uncouples the Visitor from the details of the Node hierarchy by not using virtual functions in the Visitor. Instead, Indirect Visitor uses a replacement for the virtual function mechanism that provides the same functionality, but allows the possible set of functions to be determined at runtime instead of at compile time. Note the phrase "possible set of functions", is not the same as "the function"; virtual functions represent a runtime selection of a specific function (i.e. print()), but the set of possible functions is normally determined at compile time and is precisely all the member functions marked "virtual" in the Visitor base class.
The fundamentals of the Indirect Visitor are shown in listings 1 through 3. Let me review them and then discuss how this solves some of my problems. Listing 1 is the definition of the Visitor base class. For now just note the following: (a) the forward declaration of the GameObject no actual details of GameObject are needed by the Visitor at this point; (b) the nested Impl class declaration - where all the details are hidden; (c) the NotFoundError exception class declaration; (d) the typedefs of the prototype for all the visit functions, and for the table (based upon vector) that will hold them; (e) the operator() function which does the visiting; (f) the virtual "visit" function; and (g) the static function which provides access to the underlying visit function table. Ill come back to these in a moment.
Listing 2 is the GameObject base class declaration, and listing 3 contains the definitions of the key parts shown in listing 2. There are two separate parts of interest in listing 2. First is the pure virtual function look_up. This corresponds to the accept function in the standard visitor pattern. Look_up is invoked by the actual visitor object, and passed that visitor's visit function table. Instead of being passed a visitor reference and invoking the virtual visit function on its argument, look_up just indexes into the visit function table. Note that our indexing into the visit function table is almost the same thing as the compiler indexing into the visitor's virtual function table, but with two important differences: (i) we build the visit function table at runtime instead of at compile time, and (b) the GameObject doesn't have a reference to a visitor to use as the this pointer for the function call. The latter means that look_up has to return the function pointer back to the visitor for it to be invoked.
The first key point of Indirect Visitor is how look_up works when it does not immediately find a visit function. One of the other problems with the standard visitor pattern is how to support inherited behavior. Using Scotts example, suppose you have MilitaryShip and ComercialShip both derived from SpaceShip. Suppose, further that for some visitors you want to have specific functionality for MilitaryShips and ComercialShips, but for other visitors you would like to just provide functionality for SpaceShips and have it apply to both subtypes. The look_up function provides this support.
The comment in listing 3 describes the exact form of every derived look_up function. First it attempts to find the visit function corresponding to the actual class type (more on this below). In most cases this will succeed and the resulting function pointer will be returned, to be executed by the visitor. If there is no function in the table for this specific class, then look_up passes the buck up the hierarchy to its parent class. At some point a function should be found.
For this to work properly, every class in the hierarchy must provide an implementation of look_up. In SpaceShip, look_up would still be a pure virtual function, but it must have an implementation. The only exception to the form of the implementation is in GameObjects version of look_up (shown in listing 3). Since GameObject has no parent, the failure to find a visit function here means there is no functionality provided for a given type of object. At this point you have a couple of choices.
You could just return the null function pointer and let the visitor deal with it. On the other hand, I have chosen to assume (and require) that every type in the hierarchy have a visit function somewhere. This might be a general purpose "do-nothing" function at the GameObject level, or it might be something else. In any case, if we get to GameObject::look_up and it does not find a visit function, then it throws an exception. You will note the argument passed to the NotFound exception object is the type name of the object for which we could not find a visit function. While the static type of the this pointer in GameObject::look_up is "GameObject", the dynamic type is still the original object whose look_up function started the hierarchy tree climb. The call typeid(*this) will return the type_info of the actual derived class.
The other key point about look_up is how it indexes the Visitor::FunctionTable. First a little history. Since look_up is a virtual function, when it executes it knows what type of object it represents. I needed to turn this information into something that could be used as an index into a simulated virtual function table. Initially I based FunctionTable on the STL map<> class, with a key provided by the Standard C++ RTTI mechanism (this is basically the same as one of Scott's solutions to the overall double dispatching problem, but with a different map<> key type). FunctionTable was:
typedef map<const type_info*, VisitFunction>
It could be initialized like so:
_tbl[&typeid(SpaceShip)] = &visit_space_ship;
_tbl[&typeid(Asteroid)] = &visit_asteroid;
Finally, the look_up function in the various GameObject classes looked like:
FunctionTable::const_iterator it =
table.find(&typeid(this class' type));
return (it != table.end()) ? *it :
where "this class type" was the actual name of the class (e.g. MilitaryShip).
A couple of notes: the map<> key is a pointer to a const type_info object. The typeid operator returns a reference to a const type_info object. Since type_info objects can not be created or copied you can not have a map<> indexed by type_info objects directly, but you can have pointers to them. In MEC++ Scott used strings for the map key, obtained by calling type_info::name(). I decided to use the type_info object reference directly. This leads to an interesting question: what should be the comparison function used to order the keys in the map?
As you can see, I choose to use the default, which in this case is less<const type_info*>. The interesting aspect of this question is that the Standard definition of type_info includes a before function. This function is required to provide an "implementation dependent" ordering amongst the type_info objects. One obvious use for such an ordering is to allow type_info objects to be placed in sets<> or used as keys of a map<>. As I noted above, however, you can not actually put type_info objects in a set<> or map<>, you can only hold pointers to the objects. While I initially planned on using type_info::before(), the more I thought about it, the less desirable it seemed. The final straw was that I have no idea what the overhead of the before function is -- it might be doing a string compare on the class name -- whereas I have a pretty good idea of the overhead of less<const type_info*>. So I stuck with the simple choice.*
Having built this map<> based version of Indirect Visitor, I was pretty happy with it. I knew that the FunctionTable would have O(log-n) lookup time, but I figured the lookup times would be swamped by the actual work done in the visit functions, and the added flexibility that this pattern offered over the standard visitor seemed overwhelming. When I presented this version of visitor to my colleagues, I was surprised that it got a rather luke-warm reception. The flexibility was appreciated, but the overhead of the map<> class was not. In spite of the fact that adding a new node to our existing data hierarchy while using the standard visitor would basically require that all of our software be recompiled (which at that time took about half a day), the general feeling seemed to be that this didn't happen often enough to justify adding the extra runtime overhead. I must add that all of this was totally devoid of any real data on how much overhead the map<> lookup would actually have added to a real application.
Nevertheless, I found my new visitor pattern without a home. I spent some time thinking about ways to speed up the lookup. One obvious way (which I suggested) was to use a hashed map. As always when considering a hash table, the questions are: how good is the hash function at evenly distributing the keys, and how much overhead is the hash function itself. Again, the consensus was that no matter what the overhead, it wasn't worth it. I realized that what I needed was something with overhead equivalent to a virtual function call; in other words I needed a fixed index into an array. In order to do this, I needed to give every class in the data hierarchy a unique index value. This led to UID and the rest of GameObject shown in listing 2 and 3.
1. Erich Gamma, et. all, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.
2. Revisiting Visitor
3. Another Visitor pattern.
4. Jack Reeves, The (B)Leading Edge: Faking (and Exploring) RTTI, The C++ Report, Vol. x, No. y, Oct 1997.
5. Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs, Addison-Wesley, 1996.