w00t
Jump to navigation
30/04/08
C++0x - A first look at moving objects
For quite some time, the C++ language was in maintenance mode, which means that until 2003, nothing really happened. There was TR1 that gave us some goodies, like smart pointers and regexes, but no changes to the language itself.
Since 2003 however, people are allowed to throw reasonable or crazy proposals alike at the working group, and a surprising number is going to be accepted. There's a whole lot of cool stuff, but let's start with one of the more technical ones: move semantics.
Move semantics solve a problem that's been with the language from the start, and which is often used by C people to argue that C++ is expensive; every innocent looking line might hide a whole bunch of temporaries that are being copied over and over again. Consider this simple code:
string s = a + b;
where a and b are already set up strings. This invokes operator+, which returns a brand new, temporary string, which costs you a constructor call. This string is then copied(-constructed) into s, again a non-zero cost.
The compiler can optimize this, the so called RVO aka Return Value Optimization, and many times it will, but just as often it won't. The function needs to be rather crystal clear for the compiler to figure RVO out. So many programmers play it safe and write the code like this:
string s = a;
s += b;
Similarly, people will rather write
void f (vector &v);
instead of
vector f ();
which is all nice and fine, except that cluttering up readability for the sake of performance is so 1970s.
So how will C++0x solve this? Through a thing called non-const rvalues. Which usually (but h4x0r beware) just means "a temporary". In particular, & is a regular reference to some type, and && is a reference to a temporary of some type. For example:
// f takes a non-const rvalue string
void f (string && s);
// call f with a temporary
f (a + b);
How does this help ? The marker && tells your code that s is an object that will be thrown away when f returns. So you can do whatever you want with it. For example, you can plunder it; suppose string looks like this:
struct string
{
char * data;
~string() { delete [] data; }
...
}
then f could do
void f (string && s)
{
char * stolen_ptr = s.data;
s.data = 0;
// .. store stolen_ptr somewhere
}
and that would be perfectly fine as far as the compiler is concerned. f has stolen the data in s, but f still left s in a destructable state; the data pointer was nicely set to null so that ~string, which will still be called on the temporary, does not crash.
You can see how this makes for a much faster constructor that is optimized for temporaries; simply imagine the silly function f being a "real" constructor:
struct string
{
char * data;
string (string && s)
{
data = s.data;
s.data = 0;
}
};
Now it just so happens that if operator+ is written like this:
string operator+ (const string &a, const string &b)
{
string result = a;
result += b;
return result;
}
then s = a + b will move result into s; s did not new[] a fresh chunk of memory, it did not copy the data over, and result's destructor did not delete[] its memory. It simply handed it to s. That's as fast as it can get, and it doesn't depend on the compiler being smart enough.
So far this has been a very basic and simple introduction, to hopefully give an idea what move semantics are all about, and in particular to help you not freak out when you will run into a && in the middle of a (special) member or function declaration sometime in the next two years. Below are a few more "technical" notes on this move business.
Another special member ?!
If you're a library kind of guy, you want your classes to follow all the rules and so you're probably / hopefully writing a constructor, a copy constructor, an assignment operator, equality tests, and a specialized swap. So now here comes move and we have to provide yet another method just for the sake of getting data from a to b ??...
Well, yes and no. The nice part is that because of this move trick, your class specific swap can probably be deleted. This is because all of a sudden, the default "dumb" swap can be made optimal:
template<typename T>
void swap (T && a, T && b)
{
T t (a); // move a to t
a = b; // move b to a
b = t; // move t, ie a, to b
}
That's about as fast as it can get if T supports moving, so its move constructor replaces any class-specific swap specialization.
Note that the last two lines are moving data into a and b! && basically tells you "this variable's data may safely be moved around". It doesn't require you to move data only out of the object...
Help from the language
Unfortunately, the swap code as written above does not actually work :) The problem is that C++0x has a rule about && which is :
When you give a T&& object a name, you turn it into a T& (a regular reference).
Therefore, the moment we refer to a in our function definition, a is no longer a T&&! There is a simple reason for that; what would this code have to do...
void f (T&& s)
{
T a = s;
T b = s;
}
... move s twice ??... That can't possibly work. So s is just a T&, simply because of the name, and both lines take a copy.
So how do you "undo" the loss of rvalue-ness ? Why, with a cast ofcourse:
template<typename T>
void swap (T && a, T && b)
{
T t (static_cast<T&&>(a));
a = static_cast<T&&>(b);
b = static_cast<T&&>(t);
}
Fortunately, to undo the "damage" that the above language rule introduces (but for a good reason, as seen), another rule was added, which basically says that
An unnamed T&& return value from a function, preserves its "&&"ness
.. or more accurately, an unnamed rvalue reference is treated as an rvalue.
What that means in practice is that we can write a quick helper function, that still has the funky syntax, but just one last time:
template<typename T>
T && move (T && t)
{
return static_cast<T&&>(t);
}
and now everybody else can stop worrying about all that and just write their move code like so:
template<typename T>
void swap (T && a, T && b)
{
T t (move (a));
a = move (b);
b = move (t);
}
... which also just happens to very nicely describe what exactly is going on. All hail C++0x !.....or not ?...
More complications
The story so far is fairly straightforward; once you get used to the idea and the syntax, moving is not all that complicated, and can have real benefits real fast. Unfortunately, as often with C++, the simple idea gets mangled up when you start exploring less obvious cases. For example, what if we would write move(3); what would be the deduced template type T ? It can't be int, because then move would take an int && and it would try to move a constant number 3, that doesn't make sense. Normally T would be const int &, but if you add all that up you get const int & &&. What happens then ? Well, rules are introduced that say how to collapse that back to "just one type", and not a mixture of two types. As a simple guideline,
T&& is lost when you mix it with T&.
That is, mix T&& with T&, and it becomes just T&. How about mixing T&& with const? If one person says it's movable, but the other claims it's const, the compiler would better play it safe and treat it like const. So the guideline stands: at the first sign of trouble, forget about that moving business.
A related problem is type conversion, for example during overload resolution. What happens if you swap two objects with the above code, but there is no move constructor in the class T ? Then the compiler promotes a T&& to a const T&, and old-fashioned swap-by-copying follows.
This all may sound abstract and rather annoying for everyday programming; but in reality it means that the code normally does what you'd expect it to do; unless you go looking for corner cases, or start mucking with template and/or type-traits, you should be fine.
More goodness
It's not all just complications though; another aspect where this will help a lot is in the STL containers. These will be updated so that they all use move semantics internally wherever possible. So, if your class foo supports moving, you can finally just write vector<foo> without having to worry about the performance cost of a realloc. If the realloc happens, all the data is simply moved over to the new location. There's no more need to use vector<foo*> and worry about when to delete that data, just for being efficient. Ofcourse, moving pointers might still be faster than moving objects.. if you ignore the better cache coherency of having all your objects nicely in a row with zero heap fragmentation (in the by-value scenario).
This is by the way also why it's good to know about this feature even if you couldn't care less about library-style development. If you one day find contained-by-value everywhere, no, people did not collectively go nuts all of a sudden.
... and even more fun!
Another interesting aspect of && is that it enables perfect forwarding. This is the problem of how to write a function, like a factory, that's just a wrapper around another call, in such a way that you cause no problems at all. For all sorts of reasons you can not do this right know without basically copy pasting like crazy, or accepting that things that used to work without a wrapper, are now broken. With C++0x however, we can write this:
template<typename T>
void f (T && t)
{
return g (move (t));
}
and thanks to the earlier rules about reference collapsing and all that, this will forward the f call to g in the most optimal way possible.
Now if you thought this post was a bit long, well, sorry about that, but it's still just one aspect of all the goodies we'll get in C++0x... Multithreading! Concepts, aliasing, variadic parameters and more template fixes! Unicode! Better enums, auto_ptr that works with containers, better for loops, type deduction... woohoow!
...but that will have to wait for another day and another post :)
Comments
That clearly fills a gap but damn, that sounds awfully complicated. That being said, the scope and life span of temporaries was already hell. Let's just KISS and either switch language or use the good old "out values by param reference" idiom.
Is C++0x the next APL?
Well, keep in mind that there are a lot of details in here that everyday programming should not have to worry about it. The rules are designed so that Stuff Just Works. For example, move is probably a part of STL, so if you vaguely know what && and move are for, you're set.
Also.. my explanation might suck
It might just be me, but I think that C++ is not forgiving to people who don't understand how things work behind the scenes. Don't get me wrong, I really love C++, but if you don't want to mess with low level details you might consider another language. (engine code = C++, application code = some higher level language)
I agree that too much code is written in C++ which would be better off with a higher level language.
On the other hand, consider that many languages which are more specific, always seem to bloat up with time by adopting more general programming constructs.
And then you could argue that if the engine team does their job right, then C++ plus engine _is_ the higher level language you need
But it's a balancing act, sure.
Yes indeed!...
??
Ok, gone on!
Comment Notification
manage...
Add Comments
Is C++0x the next APL?