“The (B)Leading Edge”

"The (B)Leading Edge"
Low Overhead Class Design

Jack Reeves
©The C++ Report

INTRODUCTION

I recently attended the international meeting of the Embedded C++ organization (held in San Jose in conjunction with the Embedded Systems Conference). If you have not heard of this organization, it is a consortium of (mostly) Japanese companies that have set out to create and adopt a subset of Standard C++ to be used in developing high end embedded applications. For more information check out their home page at "www.caravan.net/ec2plus". I have been particularly interested in this effort ever since I first heard about it. In part this is because I have a long standing interest in embedded software development, even when I can not claim to actually be doing such. More to the point of this column however, I am interested in the Embedded C++ effort for a number of sociological implications that I see it having for the overall C++ community.

First there is the relationship of Embedded C++ to Standard C++. On the one hand, there seems to be a general consensus amongst those "in the know", including many of the old time members of the C++ committee that (d)Standard C++ and its library^@ are too large. On the other hand, there is no consensus whatsoever about which features / libraries should be left out (I want to go on record as NOT being one who thinks (d)Standard C++ is too big, but that is another column). Embedded C++ is the first real attempt (that I am aware of) to come up with some viable criteria for subsetting Standard C++. While there is some disagreement over the criteria, there is almost universal support for the effort as a whole. One thing that seems key is the Embedded C++ technical committee's commitment to being a pure subset of Standard C++ (i.e. no extensions allowed). It seems that people with a real stake in creating portable, maintainable software are willing to accept Standard C++ as is, even if they don't want to actually use it.

The second factor is the emphasis on efficiency. The Embedded C++ committee identified a spectrum of hardware and applications domains that they felt represented the embedded software arena. On one end were the very small systems where assembly language is still the only real option; next came those systems where C is the programming language of choice; above that are larger systems which Embedded C++ is intended to support; and finally, sitting on the large end of things are those systems where full blown Standard C++ makes sense. Naturally, the distinctions are not clear cut and a lot of overlap is possible. In particular, I think much of the appeal of Embedded C++ is that it represents a version of C++ that could be used anywhere C makes sense. In the other direction, I suspect there are a lot of people who would be perfectly happy using Embedded C++ instead of Standard C++ for all their application development. In this sense, it seems that the Embedded C++ effort is returning C++ to its roots.

One of the key's to C++'s popularity from the beginning has been its claim to provide C-like efficiency. This remains one of the major reasons developers and software organizations choose C++. In spite of this, there are many hard core C developers that question the validity of this claim. Given the undeniable fact that developing large scale C++ software can be fraught with pitfalls, one of the contentions that often comes up when a C++ project gets into trouble is that it would have been better and more efficient to have done it in C. Now, with Java making claims to be able to generate code as efficient as C++, as well as claiming to be a more productive object-oriented language to use, the C++ community needs to start paying real attention to some of the efficiency issues that previously were simply ignored because the alternatives were clearly much worse.

A lot of these issues fall into the area of "quality of implementation", in other words how good is your compiler, and/or linker at doing C++ specific optimizations, as well as how efficient is your standard library implementation. As a mental exercise consider the following question: you have a program written in ANSI C; if you port that program to C++, basically by just recompiling it, how much extra runtime and memory overhead would you expect the C++ version to cost you?

A reasonable answer is "none at all." After all, the C code contains no constructors or destructors to be implicitly invoked, no conversion routines, no virtual functions, no templates, etc. In particular, even those primary sources of unwanted overhead, RTTI and exceptions, are not present in a pure C program. Without virtual functions, there will be no RTTI blocks generated; since there are no destructors, there is no reason for a compiler to generate the tables needed to support stack unwinding during an exception; and finally, since there are no exceptions being thrown from any place within the code, there is no reason for the final executable to contain the code to support the exception runtime system.

All of this seems "reasonable" but is it "likely?" That I do not know. What I do know is that even if you begin with a pure C design, C++ programmers soon start to tinker with it. First you replace malloc/free with new/delete. This introduces the possibility of exceptions. Then you start building better abstractions with constructors and destructors and eventually virtual functions. Finally, your end up using templates and RTTI and while the functionality may be similar to the old C program, the code is nothing like it was before. Consider the following two versions of the classic hello world program.

First a C version:

#include <stdio.h>

int main(int argc, char* argv[])

{

const char* p = "Hello world!";

printf(stdout, "%s\n", p);

return 0;

}

Now a C++ version:

#include <iostream>

int main(int argc, char* argv[])

{

const string s = "Hello world!";

cout << s << endl;

return 0;

}

Just for grins, I turned both of these into functions, stuck them into a program that ran each a million times (sending the output to /dev/null), and generated some statistics. The figures are:

C version 8.83s

C++ version 55.67s

While the actual figures are implementation dependent, system dependent, time-of-day dependent, and probably latitude/longitude dependent, they do illustrate my point: the typical C++ program is usually not anywhere close to the performance of a similar C program. The fact that the C++ version is more type safe, more flexible, may have been easier to get debugged and running, etc, is often overlooked, especially when it appears that all of those things are not actually happening. Now, I am not for a moment planning to give up string and iostreams, or any of the other general purpose C++ libraries that I have come to depend upon. Nor am I going to forgo virtual functions, constructors, destructors, multiple-inheritance, operator and function overloading, references, RTTI, or even exceptions. Still, it doesn't hurt to pay attention to what some of these things cost. The main emphasis of this column is to introduce a lexicon for identifying certain characteristics about the overhead of certain C++ classes. More broadly, this column is about performance issues in C++.

Plain Old Data

Let us start with an overview of certain parts of the (draft)Standard itself. In ARM C++ there were the built-in data types and there were user defined data types. In (d)Standard C++ things are a little more complicated. In the (d)Standard the distinction is between POD types and non-POD^* types. The POD types include the built-in types, and user defined POD-class types. The definition of a POD-struct in the (d)Standard (paraphrased slightly) is "a user defined struct or class that has no user-declared constructors, no private or protected non-static data members, no base classes, no virtual functions, no non-static data members of type pointer to member, non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignement operator, and no user-defined destructor." A POD-union is defined similarly. A POD-class is either a POD-struct or a POD-union.

When you get past the double negatives and the recursion, you find that a POD-struct is basically a data structure that could be defined in C. It is true that a POD-class can have static data members, as well as static and non-static member functions, but they do not affect the fundamental properties of the class. The point here is that many of the requirements specified in the (d)Standard apply only to non-POD-class types. For example, if you write

X x;

The compiler is required to invoke the default constructor to initialize 'x' only if class X is a non-POD class. Without attempting to go through the (d)Standard, the intent is fairly clear -- the compiler is free to treat POD data types different from non-POD types. By implication (often made explicit in the notes and footnotes of the (d)Standard), the compiler is expected to treat POD types pretty much just like a C compiler would treat them. Thus, if I write

podPoint pt;

where podPoint is a POD Point class (listing 1) I expect the compiler to set aside memory for the object, but to do no initialization (the value of 'pt' will be indeterminate). Likewise for

podPoint ptarray[1000];

Whenever these objects go out of scope, I expect the "destruction" to be equally simple. Finally, as mentioned above, I do not expect the compiler to worry about such objects when it comes to generating code to do a stack unwind in case of an exception.

Initializing POD objects

If I want to initialize a POD-class object, the obvious way to do it is with an aggregate initializer:

podPoint pt = {0.0, 0.0};

The same result as above can be obtained by simply writing

podPoint pt = {};

This brings up another subtle difference between ARM C++ and (d)Standard C++. In ARM C++, if the number of elements in an aggregate initializer was less than the number required to fully initialize the object, the compiler would supply zeros for the ommitted elements. Thus "{}" above is equivalent to "{0,0}" in ARM C++. The implicit conversion from int to float means that the two statements above have the same effect.

In (d)Standard C++, the definition of an aggregate has been relaxed somewhat from ARM C++. You can now have an aggregate that is a non-POD-class. An object of the following

struct Error {

int errNo;

string errMsg;

};

can be initialized with an aggregate initializer:

Error err = { 42, "Disk full" };^#

As a result of this, the committee also generalized what the compiler supplies if you omit elements of an aggregate initializer. Now, instead of zero, the compiler supplies the equivalent of 'X()' where 'X' is the type of the element to be initialized. Therefore, under (d)Standard C++

Error err = {};

is equivalent to

Error err = { int(), string() };

The ability to write 'X()' where X might be a built-in type was added to C++ syntax when templates came along. Most current compilers treat 'X()' as a no-op if 'X' is a built-in type. This is clearly not what people expect to happen if they omit elements from an aggregate initializer, so under (d)Standard C++, an explicit initializer of the form 'X()' for a POD type results in the element being zero initialized. This means that

Error err = {};

has the same effect as

Error err = { 0, string() };

which is probably what we expected. Since podPoint is itself a POD-class, we can write

podPoint pt = podPoint();

and get the same result as

podPoint pt = {};

Note that this is not true for class Error. Since Error is a non-POD-class, the form

Error err = Error();

does not zero initialize the elements of the class, instead it invokes the default constructor for class Error. This is obviously the correct thing to do since the string element 'errMsg' requires something more than just being zero initialized. In this case, the default constructor is implicitly defined by the compiler. The resulting constructor is equivalent to a user defined version of the form

Error() {};

This is NOT the same thing as

Error() : errNo(), errMsg() {}

Instead the language semantics make it equivalent to

Error() : errMsg() {}

The empty initializer list in the implicitly generated default constructor does not initialize any POD data types. Any elements of non-POD-class type are initialized by their default constructors, as usual.

All of this is probably of little consequence in most C++ code.

Aggregate initializers are the preferred way to initialize POD-structs. If you use aggregate initializers wherever you did before, you will get the same results that you did before. The key difference for most programmers will simply be the ability to use aggregate initializers in a few situations where they were not permitted before. I have gone into it here because it does have some important implications further down, in particular for light weight classes.

(Re)Introducing the Infamous Four

From the above it should be fairly obvious that if you stick to using just POD data types in your C++ program you should get performance that rivals a C program. On the other hand, you might as well use C. It is probably safe to say that most user defined data types in a typical C++ program are not POD-classes. In fact, many well meaning text books and C++ coding style guides go so far as to insist that if you define a class type you should:

a. Make all data members private.

b. Provide initializing constructors. These should carefully initialize all base classes and all members in the constructor initializer list (being careful to specify them in the order they appear in the class definition).

c. Provide a default constuctor.

d. Provide a copy constructor.

e. Provide an assigment operator (it should carefully check for self assignment and then make sure it assigns all data members).

f. Provide a virtual destructor.

A lot of good C++ code has been written which follows these rules. Unfortunately, it is probably safe to say that a lot of otherwise good C++ code has been derogatorily compared to comparable C performance as a result of blindly following these rules.

I assume that most readers of this column are aware of the special class members that the compiler will write for you under certain circumstances. The four most important of these (hereafter referred to as the Infamous Four, or I4) are:

• The default constructor

• The copy constructor

• The copy assignment operator

• The destructor

Further, I assume that most readers know what the semantics of the compiler generated version are, and under what circumstances the compiler generated versions are not sufficient (if not see [3] or [4]). What I am interested in here is under what circumstances the compiler generated versions aren't even generated.

For each of the I4, the (d)Standard indicates that there can be trivial and non-trivial versions. The definitions of trivial for each of these follows:

-- A default constructor of a class is trivial if it is implicitly declared and if (a) the class has no virtual functions, and no virtual base classes, and (b) each direct base class has a trivial default constructor, and (c) for all non-static data members of the class that are of class type (or array thereof), each such class has a trivial default constructor.

-- A copy constructor for a class is trivial if it is implicitly declared and (a) the class has no virtual functions and no virtual base classes, and (b) each direct base class has a trivial copy constructor, and (c) for all non-static data members of the class that are of class type (or array thereof), each such class has a trivial copy constructor.

-- A copy assignment operator of a class is trivial if it is implicitly declared, and (a) the class has no virtual functions and no virtual base classes, and (b) each direct base class has a trivial copy assignment operator, and (c) for all non-static data members of the class that are of class type (or array thereof), each such class has a trivial copy assignment operator.

-- A destructor of a class is trivial if it is implicitly declared, and (a) each direct base has a trivial destructor, and (b) for all non-static data members of the class that are of class type (or array thereof), each such class has a trivial destructor.

This is what the (d)Standard says. As with the distinction between POD-class types and non-POD-class types, certain requirements in the (d)Standard are called out only for non-trivial versions of the Infamous Four. Where there are no requirements imposed on compilers regarding trivial versions of the I4, the implication is obvious that the compiler is free to do something intelligent. For a trivial default constructor or destructor, the obvious intelligent thing to do is -- nothing. For a trivial copy constructor or copy assignment operator, the obvious intelligent thing to do is to substitute bit-wise copy semantics for the otherwise required member-wise copy semantics. It is upon the assumption that this is what a good compiler will do that the rest of this paper is based.

A lexicon for class types.

What I propose is to distinguish four different categories of user defined class types based upon whether or not they have trivial versions of the I4 functions. The class types are:

• Feather weight classes

• Light weight classes

• Middle weight classes

• Normal or typical weight classes

In overview, the definitions are:

-- A feather weight class is one that has a trivial default constructor, a trivial copy constructor, a trivial copy assignment operator, and a trivial destructor.

-- A light weight class is one that has a trivial copy constructor, a trivial copy assignment operator, and a trivial destructor.

-- A middle weight class is one that has a trivial destructor.

-- A normal weight class (also known as a typical class) has non-trivial versions of each of the I4.

Several things should be obvious from these definitions. First, each class category is a subset of the following category. Thus, if a class is a light weight class, it also qualifies as a middle weight class, and a normal weight class. Furthermore, each definition is recursive in terms of itself. Thus, a light weight class has base classes and members that are all light weight classes themselves. Finally, it should also be obvious that a POD-class type or a built-in type can also be substituted in place of a feather weight classes, with one caveat which I will mention below.

Normal weight classes are what most of us create whenever we define a class in C++. It is definitely what you get if you follow the requirements of most C++ style guides. Unfortunately, there can be a lot of overhead associated with a normal weight class. In a lot of cases, this overhead is unnecessary. This lexicon is about the alternatives.

Feather weight classes

A feather weight class is a class that has trivial versions of all of the I4 members. While a feather weight class is not a POD class, by definition, nevertheless you can think of the POD-class types as a subset of the feather weight class types with the following exception: if you try to explicitly invoke the default constructor of a POD-class type, you will zero-initialize all members of the object; whereas an explicit default constructor invocation of a feather weight class object does nothing.

It is fairly easy to turn a POD-class type into a non-POD feather weight class -- just provide a base class, or have one of the members turn into a feather weight class type. A more common transformation is to have one or more private or protected data members. Listing 2 shows a version of the Point class that qualifies as a feather weight class.

This is actually a useful abstraction, though it is not one you might expect. Assuming our compiler is on the ball, both of

fwPoint pt;

fwPoint ptarray[1000];

should act just as they would for podPoint. Likewise, any assigments such as

ptarray[100] = pt;

should use bit-wise copy semantics. The one area where there are differences between a feather weight class and a POD class is initialization.

Typically, we can not use an aggregate initializer with a feather weight class object. Instead we can do initialization such as the following:

fwPoint pt = fwPoint().moveTo(1.0, 1.0);

The (d)Standard makes it clear that the compiler is free to eliminate the temporary and the copy constructor invocation implicit in the above and simply treat 'pt' as the target of the moveTo() call. For a typical normal weight class, we would assume that any decent compiler would automatically perform this optimization. In the case of a feather weight class, the compiler might not bother to optimize away the bit-wise copy that results from the trivial copy constructor. This is the first of many QOI^$ issues that I will be touching on. If your compiler doesn't optimize away the copy, then you might prefer

fwPoint pt;

pt.moveTo(1.0, 1.0);

For myself, I have adopted a guideline that says that no attempt should be made to initialize feather weight objects unless the feather weight class is also an aggregate.

A feather weight class can be an aggregate and can thus be initialized by an aggregate initializer if (a) it has no base classes, and (b) it has no private or protected nonstatic members. Stated another way, a feather weight class is an aggregate if it is otherwise a POD-class type, but one or more of its members is of feather weight class type instead of being of POD-class type.

Feather weight classes occur more often than you might think. They occur fairly regularly in certain parts of the STL. Most of the functors defined by the STL qualify as feather weight classes even though they are empty. Most empty classes would seem to meet the requirements of a POD-class type, but the STL function objects usually have a base class. If there is a base class, then the resulting class is, at best, a feather weight class. Nevertheless, there is usually no reason to burden such an empty class (unless it is an ABC), with user defined versions of the I4.

Light weight classes

If truth be told, I do not have a lot of use for feather weight classes. I think initializing constructors rank right up there with subroutines as one of the most important developments in the history of programming. Therefore, most of my classes, even ones that might otherwise qualify as aggregates, usually end up with a constructor or two. If the class is otherwise a feather weight class, the presence of a user defined constructor converts it into a light weight class.

A light weight class is a class that has a trivial copy constructor, a trivial copy assignment operator, and a trivial destructor. The class will have one or more user defined constructors. This will prevent the compiler from implicitly declaring a default constructor, so a light weight class will typically also have a user defined default constructor. Listing 3 presents a light weight version of the Point class.

Since initialization is what light weight classes are all about, that is where we will concentrate our attention. First, because a light weight class has an initializing constructor, the compiler will require that we provide a default constructor if we are to write code such as:

lwPoint pt;

lwPoint ptarray[1000];

It is very tempting, and usually typical, to create a default constructor for lwPoint by simply providing default arguments for one of the other constructors. For example, we could have defined lwPoint's initializing constructor as

lwPoint(double x = 0.0, double y = 0.0)

: _x(x), _y(y) {}

and kill two birds with one stone. But what does this do to our users?

This is a primary illustration of the difference between C and C++ programming styles. The compiler is required to invoke the default constructor to initialize lwPoint objects. So

lwPoint pt;

sets 'pt' to the origin. The language also requires that the default constructor be used to initialize every member of the 'ptarray' object. Suppose that the origin is not the desired initial value for the point. In the case of the single object, we can override the default initialization with an explicit initialization as in

lwPoint pt = lwPoint(1.0, 1.0);

Doing the same thing for the array would require that we specify an aggregate initializer with 1000 elements. Let's be real. Instead, we will write a loop after the declaration which will step through the elements of the array and initialize each one to the desired value. This means that the compiler will first step through the array and initialize every point to (0.0,0.0) and then we will go back through and reinitialize every point. Compared to using a POD Point class (or a fwPoint), the initialization will take at least twice as long. Maybe for a given application this extra overhead will be lost in the noise level, but maybe not. One thing is for sure, it is stuff like this that causes hard core C programmers to snicker under their breath when C++ programmers talk about writing efficient code.

We can not change the language semantics, but we can at least make the compiler's job easier -- maybe. The first thing we do is to re-establish the fact that object construction and initialization are two different things. Construction is the responsibility of the class designer. Establishing an initial value for an object is the responsibility of the user. In C++, we are so use to using constructors to do initialization that we forget the fact that the default constructor's sole reason for existence is to do object construction in those cases where the user does NOT supply an initial value. In the case of lwPoint, there is no need for the default constructor to do anything, so I have written it to do nothing.

In fact, at this point I will make two assertions: (1) every light weight class can have an empty default constructor, and (2) most should. The first assertion follows from the fact that a light weight class has both a trivial copy constructor and a trivial copy assignment operator. Since both these functions exhibit the same bit-wise copy semantics, it means that assignment can be substituted for copy construction. Therefore, any class that qualifies as a light weight class can be left un-initialized by its default constructor with the assurance that when the user assigns an initial value (we always initialize our variables before we use them, don't we?), everything will be taken care of.

The second statement is not quite as strong as the first. Listing 3 provides the beginning of a light weight string class, defined as a template. In this case, while it is legal to have an empty default constructor, the semantics of the class require that a valid object always has _length <= SIZE. This, plus the fact that most users expect a string (or any container) to be initialized to empty, caused me to yeild to convention and provide some initialization in the default constructor. Still, in doing so I was conscious of what I might be losing in terms of performance.

Just exactly what does an empty default constructor buy us? In the case of a single object definition

lwPoint pt;

the inline default constructor will be substituted in place to do the object initialization. Since the constructor is empty, it will have no effect and the light weight class is equivalent to the POD class. The case of the array initialization is more complicated. Given

lwPoint ptarray[1000];

we might assume that the compiler will just generate a loop which invokes the default constructor for each element of the array. If this happens, the optimizer should come along and figure out that the loop body is empty and eliminate the entire thing. Unfortunately, things are more complicated than they at first appear. The language specification not only requires that the default constructor be invoked to construct every element of the array, but it also requires that if any one of those constructor invocations throws an exception, then every element of the array before the one under construction will have its destructor invoked before the exception is allowed to propagate.

Because of this extra complexity, many implementations delegate array initialization to a separate function. This function is passed the address of the constructor, the address of the destructor, the number of elements in the array, etc. With an empty constructor, the loop will only consist of the function call and return overhead, plus whatever overhead is typically involved in a try/catch block. For a large array, this is still a non-trivial amount of overhead to do nothing.

We might reasonably expect that the compiler will do some optimizations. The first thing we can hope is that the compiler will recognize that the destructor is trivial. A trivial destructor means that the stack unwind in case of an exception is trivial. At the very least, we can hope that the compiler will use a different initialization function when the destructor is trivial -- one that doesn't have the try/catch block overhead to deal with a constructor exception. Ideally, we can hope that the compiler will also recognize that it has an inline constructor along with the trivial destructor, and will go ahead and inline the array initialization. This will give the optimizer a chance to recognize and remove the empty loop.

Assuming all of this recognition and optimization takes place, we could be left with an array whose initialization imposes no more overhead than an array of POD class type. While most of these optimizations seem reasonable for this special case, they may not be reasonable in general. For example, in my own coding I seldom bother to inline a function that contains a loop. I figure the function call overhead will be swamped by the loop. It is not unreasonable that a compiler will make the same choice and refuse to inline the array initialization. Likewise, it might seem to me a short step from recognizing that a default constructor is trivial, to recognizing that it is non-trivial but is inlined and empty. Nevertheless, it is a step, and I really have no idea how difficult it might be. Needless to say, this is all very much QOI issues beyond the scope of the (d)Standard. If your application domain needs the performance, you might want to run a few tests to see how much optimization your implementation performs.

In general, you can probably expect that arrays of light weight classes (or higher) will likely have an initialization overhead that doesn't exist for feather weight or POD classes. In the absence of adequate optimization, what we desire is something similar to an array that lets us specify the initial value to be used during initialization. In fact, someone has come up with a reasonable facsimile of such an array, it is called vector.

If we use a vector of lwPoints, we have a few different choices. We can write

vector<lwPoint> vpt(1000);

and the vector will initialize itself using the default constructor. This is the same as an ordinary array. We can specify the initial value for a vector however.

vector<lwPoint> vpt(1000, lwPoint(1.0, 1.0));

Finally, we can just request a chunk of memory be reserved and fill it in later.

vector<lwPoint> vpt;

vpt.reserve(1000);

I said that vector is a reasonable approximation of what we desire. Unfortunately, it is not a particularly good approximation. The vector<> template is intended to be instantiatable with any type. Therefore, a correct vector<> implementation has to be written to deal with the needs of any normal weight class. This means the vector constructor has to have the same general purpose code to cope with exceptions that the standard array initialization function has (rather, it should have; until recently there was nothing in the (d)Standard that required vectors (or any other standard containers) to cope with exceptions. A recent addition to the (d)Standard as attempted to correct this oversight). What we really want is a class that is specialized to take advantage of the characteristics of light weight classes, i.e. the bit-wise copy semantics and the trivial destructor semantics.

It turns out that there is a container in the (d)Standard library that is specialized for POD classes. It is called 'basic_string'. While it may seem incongruous to create a string of points, it should work just fine. Actually, since basic_string is defined to work only with POD data types, instantiating basic_string with a light weight class technically yields undefined behavior. While I can not recommend this as a portable technique, my own experience says that a light weight class has enough in common with a POD data type to work just fine in a string. Alternatively, you might consider defining your own version of vector<> specifically tailored for light weight classes.

Middle weight classes

In my lexicon, a middle weight class has only a trivial destructor. In general, by the time you have a class that needs user defined constructors and user defined copy semantics, you probably also need a user defined destructor. There is one special case that I mention primarily for completeness. If you review the definition of what it means for the destructor to be trivial, you will not find any mention of virtual functions. This is in contrast to the trivial versions of the other three of the I4. So, one simple way to get a middle weight class is to add a virtual function to a POD, feather weight, or light weight class.

If the idea of a class that has virtual functions but does not have a virtual destructor doesn't send chills down your back, you definitely need more C++ education. Nevertheless, I mention it because there is one area where even a middle weight class can have a significant performance advantage over a normal weight class -- stack unwinding. I touched on this above, but the full story is that whenever a C++ function creates a local object, the compiler is obligated to make sure that object will be destroyed if an exception propagates out of the function. Different compilers have adopted different schemes for ensuring this stack unwind process occurs correctly. Some build tables at compile time, others build tables at run time. Each has its advantages and disadvantages. All impose some type of overhead on a C++ application, even if no exceptions ever get thrown.

There are all kinds of QOI issues surrounding exceptions, but it seems reasonable (that word again) that a compiler should realize that if a class has a trivial destructor, it doesn't have to worry about destroying objects of that type during a stack unwind. Personally, I think that if you are worried about this type of overhead, you are better off sticking to light weight classes, and/or just forgoing virtual functions altogether. Nevertheless, like I said, I mention it for the sake of completeness.

A more realistic version of a middle weight class is hinted at in listing 4. This is one of the rare occasions when it actually makes sense to have user defined copy semantics, but destruction

is still trivial. In this case, the programmer (me) decided that in order to avoid the code bloat from having a bunch of lwString templates instantiated with different sizes, I would provide all functionality via a single base class that all the templates would derive from. The base class performs all operations using its three data members, one of which is a pointer to the actual data area in the derived class. Obviously, I could not allow bit-wise copy semantics here (note that _size and _data are declared as const data members). Still, there is nothing for the destructor to do, and by eliminating it I (hopefully) keep mwString objects out of the stack unwind tables. Note that the actual data copy operation in mwString::operator=() is kept as close to bit-wise semantics as I could make it.

[Aside: I can't resist noting that one of the most useful versions of the mwString template is the specialization for SIZE = 0.

template<> class mwString<0>;

This also derives from mwStringBase, but since msString<0> doesn't have a data area of its own, it has to be constructed from an existing character array. It uses its argument to construct mwStringBase. As such, it can be used like so

const mwString<0> s = "Hello World!";

to wrap a standard C++ string class interface around an existing C style string without having to actually construct a string object -- which would make a copy of the data. End aside]

Wrapping up

After all this, what can we conclude?

A. Beware of C++ coding style guides (and introductory texts) that recommend always supplying user defined versions of the I4. There are certainly classes that need these, and every good C++ programmer should understand the circumstances where they are required. Nevertheless, blindly supplying these functions for every class you create can seriously cripple the efficiency of otherwise useful abstractions.

The question of whether or not you should provide user defined versions of the I4 for classes that are clearly normal weight classes when the implicitly generated versions are adequate is a much more subjective question. I tend to be very ambivalent on this topic. Once upon a time, I was fairly strongly in favor of not writing any code that the compiler was perfectly capable of generating automatically. Obviously, if you have any constructors you have to provide your own default constructor, but in many cases the copy constructor, copy assignment operator, and destructor just end up being hand written versions of what the compiler would have generated anyway. Besides being less code to write, it seemed to me that allowing the compiler to generate them was likely to make maintenance easier. I have since changed my mind.

These days I lean toward "explicit is better". I am not fanatical about it, but in general it now seems better to make things visible and not depend upon the compiler's implicit versions. While this may open the potential for more errors initially, it actually seems to make maintenance easier in the long run. Besides, this becomes one way to delineate normal classes from the lighter versions -- if a class doesn't have a user defined version of one or more of the I4, it should be commented that the function is being explicitly omitted so it can be recognized as trivial (remember, user defined versions of the I4 are never considered trivial, no matter how trivial they look).

B. Likewise, beware of coding style guides that recommend always giving every class a virtual destructor. Besides the obvious fact that any user defined destructor is a non-trivial version, if it is a virtual function, it automatically insures that none of the other I4 functions are trivial either.

C. Remember that good old C style structs (a.k.a. POD-class types) still have their place. If your application needs to create and destroy a lot of objects, and do it efficiently, then you need to consider POD-class types. In particular, if you need to create large arrays of objects, the POD data types make a lot of sense. Don't forget that a POD-class can have member functions, so you can have many of the advantages of an abstract data type without the overhead.

D. If you want (and can afford) a bit more abstraction (and most of us can afford it), consider using feather weight, or light weight classes. A light weight class, in particular, has most of the benefits of a normal class in terms of its ability to provide a complete abstraction, but given a decent compiler its usage can still be as efficient as a POD class.

E. Don't overspecify your default constructor. As noted, a light weight class can always have an empty default constructor. This also means that you should almost always provide a separate default constructor, not a version depending upon default arguments for another constructor. As a general rule, I would apply this guideline to all classes. Remember, the default constructor is there to construct the object under those circumstances where no initial value is available. Do as little as possible in a default constructor.

A secondary issue is whether the default constructor should be inlined or not. Opinions differ on whether constructors in general should be specified as inline. For normal weight classes I tend to prefer to not inline initializing constructors, but I also tend to inline default constructors. For light weight and middle weight classes, all constructors should always be inlined. This allows whatever optimization the compiler might be able to perform to take place.

F. As a general rule you probably want to avoid using large arrays of anything heavier than feather weight objects. If your compiler is good enough, you might get away with light weight objects, but in general, if you have to create large collections of normal weight objects, consider using a vector instead of an array. If you need a large array of light weight objects, consider using the basic_string template instead of vector.

G. If you are creating templates, don't overspecify your class definition. While I haven't specifically discussed templates, many of these guidelines are of particular concern to template writers. Consider the following

typedef pair<double, double> Point;

What kind of Point have I created? Since pair<> has at least one constructor, the best I can have is a light weight class. Ideally, what I want is to actually have a light weight class, and not something heavier. This means the definition of pair<> needs to be as minimal as possible. In particular, the default constructor for pair<> needs to be empty, and there should not be any of the other I4 functions declared. Unfortunately, there are still compilers that will not correctly handle this, but hopefully that will be changing soon.

All of this is not intended to in any way discourage the use of ordinary normal classes. Fundamentally, a clean design is the most important characteristic of any program. Starting with a clean design, you can usually address just about any other problem in an application, including its performance. Normal weight classes are a key aspect to C++'s data abstraction and object oriented programming paradigms, and as such are fundamental components of most C++ designs. On the other hand, you should not ignore efficiency (as they say), and so I encourage you to be on the lookout for those data types that can efficiently be represented with something less than a full blown normal weight class. This way we use the best of C++'s data abstraction and object oriented programming styles where it makes sense, while retaining the efficiency of C where we can.

References

1. "Working Paper for Draft Proposed International Standard for Information Systems -- Programming Language C++", December 1996.

2. Ellis, M., and B. Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley, 1990.

3. Scott Meyers, Effective C++, Second Edition, Addison-Wesley, 1997.

4. Marshal Cline and Greg Lomow, C++ FAQs, Addison-Wesley, 1995.

Sidebar

Default Initialization

Suppose I were to ask "What is the difference between the following two statements?"
X a; // 1

X b = X(); // 2

In ARM C++ the answer is fairly easy -- basically there is no difference. If X is a built-in type, the explicit constructor call syntax on line 2 is just a syntactic convenience added to the language for the sake of templates -- it does nothing. If X is a user defined type, then the compiler will implicitly invoke the default constructor -- which is what is done explicitly on line 2. In (d)Standard C++, this changes somewhat.

In (d)Standard C++, the differentiation now depends upon whether or not X is a "non-POD class type" (see the column text for the definition of a POD class type). If X is a non-POD class type, then the behavior is the same as ARM C++ for user-defined types. If X is a POD type, however, things are different. POD types do not have default constructors (by definition). In this case, the statement on line 2 does a "default initialization." A default initialization is the same as the initialization of a global static -- it does a zero initialization.

This new convention is potentially useful in those situations where aggregate initializers are not allowed (e.g. constructor initialize lists), but as general rule, I would use the aggregate initializer form, if possible. Since a POD class type is by definition also an aggregate, it is usually possible (to use the aggregate initializer list). The reason I recommend this is to ease maintenance. It is very easy for a POD class type to be turned into a non-POD class type during maintenance. If you have been using aggregate initializers, then either your code will continue to compile, and work (it is possible to have an aggregate that is not a POD), or it will no longer compile. On the other hand, if you were depending upon the explicit default initialization syntax, then your code will still compile, but it will have silently changed meaning -- when X becomes a non-POD class type, then "X()" becomes an explicit default constructor invocation. If the maintenance didn't add a default constructor, then the compiler will synthesize an empty one. This will switch

X a = X();

from being a statement that zero initializes all data members of object 'a', to a statement that leaves all data members of 'a' un-initialized.

Caveat emptor.

Listing 1.

POD-class version of Point

struct podPoint {

double x;

double y;

};

Listing 2.

Feather-weight version of Point

class fwPoint {

private:

double _x;

double _y;

public:

void moveTo(double x, double y)

{ _x = x; _y = y; }

double x() const { return _x; }

double y() const { return _y; }

};

Listing 3.

Light-weight version of Point

class lwPoint {

double _x;

double _y;

public:

lwPoint() {}

lwPoint(double x, double y)

: _x(x), _y(y) {}

void moveTo(double x, double y)

{ _x = x; _y = y; }

double x() { return _x; }

double y() { return _y; }

};

Listing 4

Light-weight String

template <int SIZE>

class lwString {

unsigned long _len;

char _data[SIZE];

public:

lwString() : _len(0) {}

// ...

};

Listing 5

Middle-weigth String

class mwStringBase {

protected:

char* const _data;

const size_t _size;

size_t _len;

mwStringBase(char* data, size_t size)

: _data(data), _size(size), _len(0) {}

// NOTE: no copy constructor

void operator=(const mwStringBase& other);

// NOTE: void return value

public:

// the rest of the string interface

};

template <int SIZE>

class mwString : protected mwStringBase {

char _data[SIZE];

public:

mwString()

: mwStringBase(_data, SIZE) {}

mwString(const mwString<SIZE>& other)

: mwStringBase(_data, SIZE)

{ mwStringBase::operator=(other); }

mwString& operator=(const mwString<SIZE>& other)

{ mwStringBase::operator=(other); return *this; }

// other constructors

};

void

operator=(const mwStringBase& other)

{

if (this == &other) return;

if (other._len < _size) {

_len = other._len;

memcpy(_data, other._data, _len);

return;

} else {

// throw length error exception

}