-
Notifications
You must be signed in to change notification settings - Fork 396
The Two Stupidest Features!
C++ has the reputation of being a poorly designed extension to C. That certainly feels right, especially if you've ever programmed in a language that was holistically designed and whose feature sets work well together and seem "complete". Python is a good example of such a language.
C++ is actually not that badly designed. Its features generally make sense given the limitations of the underlying C language and the requirements of static typing and compilation. And it adds a good amount of useful features to C. It's type safe, which may not feel like a feature but is quite helpful. It's not "memory safe" but it is much more memory safe than C in part because of its type safety and in part because of features embedded in its template libraries. It has better name scoping than C. It has function and operator overloading. And methods. And not one but two polymorphism mechanisms--templates and subclasses--which are individually quite functional even if they don't compose exactly the way you want them to. And closures! And exceptions! These are useful features that people use in real programs.
Unfortunately, it also has (at least) two very stupid "features" which if I could snap my fingers and make go away I would. These features add no functionality and serve only to make code more confusing and make programmers feel cool for being able to write confusing code. Well, guess what? Writing unreadable code is not cool. And to make matters worse, these "features" mutually reinforce to produce even more terrible results. Two terrible tastes that taste even more terrible together. Like eggs and avocados. Gross.
Anyways, what are these terrible "features"? One of them has been in C++ from the beginning and it is the reference, i.e., &
. The other is newer and it is the auto
keyword. What is so terrible about these?
C has pointers, i.e., variables that contain memory addresses. The explicit use of pointers is one of the strengths of the language believe it or not. In C, pointers are explicit and accessing a variable indirectly through a pointer looks different than accessing it directly.
int a = 5, b = 7;
int *p = &b; // p now holds the address of b. This is the original use of &, it's the "address of" operator.
int c = b; // c is reading b directly
int d = *p; // d is reading b indirectly via p, which points to b.
Direct vs. indirect access also looks different for fields of structs and classes.
struct xyz { int x, y, z; }
struct xyz xyz1 = { 5, 6, 7 };
struct xyz *p = &xyz1; // p now holds the address of xyz1;
int a = xyz1.x; // direct access of xyz1 x field
int b = p->y; // access of x field indirectly through a pointer to xyz1
C++ added the notion of a reference, which is just syntactic sugar for a pointer that makes it look like direct access.
int a = 5, b = 7;
int &p = b; // p now holds the address of b.
int c = b; // c is reading b directly
int d = p; // d is reading b indirectly via p, which references (i.e., points to) b.
And same for members.
struct xyz { int x, y, z; }
struct xyz xyz1 = { 5, 6, 7 };
struct xyz &p = xyz1; // p now holds the address of xyz1;
int a = xyz1.x; // direct access of xyz1 x field
int b = p.y; // access of x field indirectly through p which references (i.e., points to) to xyz1
Essentially what references do is make it impossible for you to tell at the point of use whether you are accessing an object directly or indirectly via a pointer. Now you may think "that's a good thing, I shouldn't have to know that". Except that good programmers want to know these things. Accessing values indirectly is more expensive, so you should know when you're doing it and only do it when you really need to. Making things that are fundamentally different look the same is not helpful. This is not the same as polymorphism and virtual functions that make two types look the same. This is obfuscating the basic mechanics of the language and its performance characteristics.
But wait, there's more. Another "feature" of references is that they cannot have the value nullptr
. Reference variables cannot be uninitialized and must be initialized to reference (i.e., point to) valid locations. You may be thinking "Fantastic, no more segfault crashes!". Not so fast. Yes, segfault crashes (what you get when you dereference a nullptr
, they are called segfault for historical reasons that have to do with how early computers implemented virtual memory) are not good, but they are also easy errors to find and fix. You know what pointer errors are difficult to find and fix? Dangling pointers, i.e., pointers to objects that go out of scope or have been freed and potentially reallocated to other objects, pointers that point to a different logical object than the one you think they are pointing to, or point to currently unallocated memory. Those are nasty, and references do nothing to help with them. The only thing that helps with dangling pointers is "garbage collection", i.e., automated implicit memory reclamation performed by the runtime system. Many newer languages have garbage collection (this is why you never have to deallocate anything in Python) but C and C++ do not. C++ allows programmers to implement garbage collection at the application level using constructs like shared_ptr
and unique_ptr
. However, the whole beauty of garbage collection is that the runtime system implements it for you and you don't have to worry about it. Once the programmer has to start worrying about it, it's no longer garbage collection, it's now "another thing for the programmer to worry about."
And one more thing. nullptr
is actually a valid and useful value for a pointer. It can mean "not available" or "end of list". You cannot optionally pass an empty reference to a function. If you want to do that, you need to use some type of optional
wrapper, adding overhead. Or you could just pass by pointer and then pass nullptr
if you want to pass nothing. Call me crazy.
TL;DR references combine all of the downsides of pointers with none of the upsides. What a feature!
C and C++ are both "statically typed" languages. What does that mean? It means that the type of every object is known at compile time, along with its size and internal layout (before you say what about void *
? The type of void *
is pointer and all pointers have the same size and no internal layout). In turn this means that C and C++ can cut down on memory management overhead by allocating objects in bulk, i.e., the entire object is allocated at once because its size and internal layout can be calculated from the size and layout of its members, and by using constant relative addresses for local and static variables that rely on a pre-computed layout for stack-frames and the static data segment. Contrast this with "dynamically typed" languages like Python. In Python every object is essentially a map data structure in which each field is a separately allocated key-value pair, the key is the field name. This is what makes it so that you don't have to declare the types of Python variables. The type of the variable is the type of the value that is currently assigned to it. a = 1
? Great, a
is an integer. a = "EnergyPlus"? Fine, now
a` is a string. How is it possible for a variable to hold an integer one second and a string the next? Well, the variable itself is just a key ("a") in a map where the value points to whatever was just assigned to it. Sounds slow, right? Well, it is. Also sounds kind of terrifying, right? Well, yes. Dynamic typing is terrifying. Beautiful but terrifying. Like sharks.
Which brings us to auto
. What does auto
mean? It doesn't mean "this variable's type is decided at runtime by whatever is assigned to it" like Python. C++ doesn't have dynamic typing. It means "I'm too cool and/or lazy to write down the type. Compiler, you figure it out from context." Now, it may sound strange that the compiler can figure out variable types from context, but this ability is equivalent to static type checking. Figuring out the type of a variable from scratch is not too different than figuring out whether the type that was assigned to it is appropriate or not given its use.
So, what's the problem then? The problem is that while compilers are good at figuring out types from context, you know who isn't always so good? Other programmers.
Here's an echo of the same sentiment from a "professional".
What happens when you multiply the root of evil by the root of evil? You get evil. With references, we have pointers that in context look just like direct accesses. And with auto
, we have lazy programmers that tell the compiler to figure out the types of variables from context. What could possibly go wrong?
struct xyz { int x, y, z; }
auto xyz1 = { 5, 6, 7 };
auto p = xyz1;
auto a = p.x;
p.y = 4;
In this example, the compiler can figure out from context that xyz1
is of type struct xyz
. It can also figure out that a
is of type int
. But it can't figure out whether p
is of type struct xyz
or struct xyz &
and by default it assumes that it is of type struct xyz
. This matters for the assignment p.y = 4;
. If p
is of type struct xyz
then xyz1.y
is unchanged, but if p
is of type struct xyz &
, then xyz1.y
is changed to 4. Fun times all around!
There is a huge number of references in EnergyPlus, and we will not get rid of them overnight. I believe we should start moving away from them, especially for output parameters to functions, but even that will take time. There is also a significant number of uses of auto
in EnergyPlus, although not nearly as many as references. And this we can and will do something about. This doesn't mean that we need to scrub all uses of auto
, but there are degrees of evil with auto
we do need to scrub the more dangerous uses.
I suggest the following "rules". There are three acceptable uses of auto
:
-
auto
for iterators. How do you know what is an iterator and what isn't? Well, in the C++ standard template library the iteration functions (e.g.,.begin()
,.end()
), the modifier functions (e.g.,.insert()
,emplace()
) and.find()
return iterators. Other functions (e.g.,.at()
andoperator[]
) return object references and should be declared asauto &
orauto const &
. The way to retrieve the object reference from the iterator is using.value()
.
auto it = instances.begin(); // iterator, so use auto
auto &instance = it.value(); // object reference, so auto &
auto it2 = instances.find("x");
auto &instance2 = it2.value();
auto &instance3 = instances.at("y"); // object reference again, so auto &
-
auto
for lambdas.
auto f = [&state, this, FirstHVACIteration](Real64 partLoadRatio) { };
-
auto &
,auto const &
,auto *
andauto const *
for local shortcut variables to objects. These cannot be justauto
which means you are making a local copy of an object, a trap several of you have already fallen into.
auto const &thisCurve = state.dataCurve->Curves(curveNum);
Incidentally, you should use assignment rather than construction (i.e., auto const &thisCurve(state.dataCurve->Curves(curveNum));
) for this type of initialization. There is no difference between an in-place construction vs. an assignment for any scalars including pointers. An parentheses are used in many other ways in C++ (not to mention in the ObjexxFCL library) which makes searching for code patterns more difficult. We should be able to write a script that searches for auto
and makes sure that it is used in one of these two allowed ways.
This is another sanctioned use of auto &
for object references.
for (auto &surf : state.dataSurface->Surfaces) {}
That's it. Scalars cannot be auto
or auto &
. See this page about local shortcut rules.
Strings should also not be auto
because many of them should really be std::string_view
and auto
always defaults to std::string
.
And of course, you don't have to use auto
in these two semi-tolerable. If you want to use the explicit type, please do!