Tuesday, March 22, 2011

Pre vs Post Increment & Performance Impact

While browsing across some open-source projects, I have seen code snippets of type:

for (i = 0; i < n; ++ i )

The pre-increment "++ i " confused me as to why one should use it, as post increment is most commonly used. Googling told me that pre-increment is faster than post increment as the value of i need not be stored to a temporary register before the increment operation. This sounded logical to me and I believed it and used pre-increment in all my loops. I didn't bother to measure it though.

Few days back, Kernel developer Tejun Heo complained in twitter about the usage of pre-increment in loops that it doesn't give any benefits and affects readability. This time I got around to doing some measurements with the following code:

 int main()  
      int i, j ,k;  
 #if 0  
      for (i = 0; i < 10000; i++)  
      for (j = 0; j < 10000; j++)  
      for (k = 0; k < 40; k++);  
      for (i = 0; i < 10000; ++i)  
      for (j = 0; j < 10000; ++j)  
      for (k = 0; k < 40; ++k);  

I compiled the above code with 0 and 1 for the #if macro and found that the resultant binaries were bitwise same. So, the preVspost changes contributed nothing for performance as the resultant binaries were exactly identical in either case. So, it's irrelevant if you use post or pre increment for the "for" loop index variable.

I realized that it does not matter if the operation is pre or post increment or assignment, if the value of the expression is not stored at all. So I modified the code to store the return value of the pre/post increment operations and decided to measure the performance. So the code became:

 int main()  
      int i, j ,k, t;  
 #if 1  
      for (i = 0; i < 10000; t = i++)  
      for (j = 0; j < 10000; t = j++)  
      for (k = 0; k < 40; t = k++);  
      for (i = 0; i < 10000; t = ++i)  
      for (j = 0; j < 10000; t = ++j)  
      for (k = 0; k < 40; t = ++k);  

This time I strongly believed that there will be a difference in performance and compiled with the values of 0 and 1 for the #if macro. I did a test run of about 10 iterations for each of post and pre increments. The result was however surprising. Post-increment consistently outperformed pre-increment and finished faster always. I was startled about this observation.

Since so many performance centric people have used pre-increment instead of post-increment in the past, I believed there must be some reason for people using pre-increment. I asked this in a SUSE mailing list for an explanation of the above observation. There are people in SUSE who are as long-experienced with computers as my age. They provided me the answer for this. Earlier, gcc used to generate sub-optimal code for post-increment and that is why pre-increment was used then. Compilers have come a long way and programmers don't need to think in weird way to outsmart the compilers anymore.

C++ and Operator-overloading

With C++, one can overload the increment operator for our own classes. Even though, post outperformed pre marginally, for basic datatypes in C, with the higher-order C++ Classes, it is a different story. I recommend you read this article if you want to know more.

Iterators and modern programming languages

Modern programming languages provide iterators for easier and more effective looping, like foreach in C#. It is always best to use those iterators instead of a traditional for loop and an index variable, to iterate through a collection.

For the sake of curiosity, I tried comparing the post vs pre increment performance in a C# code (just the same above snippet modified to have a public static void Main() etc). I expected that the post will outperform the pre operations and I was right. However, the difference between post and pre increment with the mono C# compiler is substantially high. Post outperformed pre consistently by as high as 2-3 seconds, for the above 3-level nested loop.

Theoretical Micro-Benchmarking & Confirmation of a Tautology

The for loop I used in my example is to highlight the performance difference of pre and post increments. However such micro-benchmarks are theoretical and cannot be extrapolated to match real world scenarios. They are far from it.

It is safe to assume that your application will not become faster by tweaking post or pre increments. So, you can continue to write for loops with post increment which are easier to read and don't appear strange on a contributor/maintenance-engineer's eyes. Most of the slowness in applications are attributed to poor algorithms, bad I/O patterns and doing unnecessary things. Focusing on increment/decrement operators will not give any performance benefits and if you want speed up your applications, begin at Profiling, not at optimizing. As one distinguished engineer once told me, "Optimization without profiling is a waste of life".

Tools Used
  • openSUSE 11.4
  • GCC Version 4.5.1
  • Mono 2.8
  • Thinkpad T400 running Intel Core2 Dual CPU

Understand how to use objdump to look at the assembly instructions baked in the binary and try to figure out why post operations are faster by understanding the assembly code.

Please let me know if you have anything to add for the discussion. 


glandium said...

First, the only case this makes any difference is with no optimization (-O0). With basic optimization (-O1), gcc is smart enough to realize the loops do nothing different from t = 10000 and t = 40, and at -O2 or above, it's smart enough to see the whole function does nothing.

Then, at -O0, the only difference at the assembly level is the add occurring before or after copying the value.
movl -8(%rbp), %eax
movl %eax, -16(%rbp)
addl $1, -8(%rbp)
addl $1, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -16(%rbp)

My guess is that modern processors are able to addl $1, -8(%rbp) and movl -8(%rbp), %eax more or less at the same time.

Anonymous said...

Just a note, your examples aren't doing the same thing (and one is doing considerably fewer iterations). 't = ++i' is not equivalent to 't = i++'. There are situations where pre-increment is useful, though optimisation isn't one of them, I agree.

Anonymous said...

Did they not teach you about premature optimisation at college? It's called computer SCIENCE for a reason. You're not supposed to do things based on some vague superstition. Use evidence!

Sankar said...

@anonymous Did they not give you a name ;-) ? I am ready to understand the SCIENCE and willing to hear from you, if you can add some more helpful text than just advice. please see: http://goo.gl/Ko4bh and http://goo.gl/f41sp

Sankar said...

@chris: exactly my point thanks :-)

@glandium: Something like what you did was what I wanted to do actually. Thanks.

ZeD said...

if the problem is "ease of reading", no-one should use i++ nor ++i (and all the "j = ++i" / "j = i++" / "i = i++" headache) but to switch to i += 1

1) it's *immensely* easier to read
2) it's not an expression, you can't do "j = i+= 1"
3) if the step it's not 1, you should not change anything but the step.

Anonymous said...

The reason why doing t = ++i could be slower, when using a good compiler, than t = i++, is because it prevents instruction reordering by introducing a strong data dependency between t and i via an assignment.

t = i++ just needs to remember that t is an alias to i, and then it can do the increment before next time i is used again.

I'm not speaking specifically of this program, but in general.

Anyway, as you pointed out, using prefix operations in loops is a good practice because of C++ operator overloading. Fortunately the STL does that correctly for you if you are using the right constructs.

If you want, you could try to benchmark a big loop with pre/post increment of an iterator. The post-increment version of the operator overloading returns a new temporary object, so it has to build it and copy it; thus, it should be slower.

Patrick Niedzielski said...

@ZeD I disagree that ++i or i++ are hard to read. They are no harder to read than i += 1; they are just different. To people not familiar with C/C++, certainly, they look esoteric, but to those with more than a passing familiarity, they are quite readable.

They may not be readable to you, but to most others with experience to in C/C++, they read just fine.

Ole Laursen said...

I used to use post-increment. I don't think it's more readable. It's just what the books I learned from happened to do.

Then I read somewhere that post-increment has a potential overhead when using the STL in C++. Same explanation as you are giving, just perhaps more likely to occur because there are some objects involved. So I switched to pre-increment.

Now when not using C++, there's no reason at all for preferring the pre-increment, but at this point I've just gotten used to it. So there. :)

Sebastian said...

Post-increment isn't "easier to read" at all, if you're used to pre-increment. And since there are scenarios where the impact can be severe, in C++, why not get used to the style which is less likely to introduce baffling performance issues?

Anonymous said...

I wouldn't argue to death for it, but I think one should use prefix where one can, and postfix where one must. They have different semantics, and are in general not interchangeable. If you only want to increment a variable (and perhaps make a copy of the result) then prefix should be used. If you really need to increment a variable and copy its original value, then postfix should be used, but only then.

By this reasoning, I use prefix most of the time, and especially in for-loops. Not for a supposed optimization, but because it is the correct way to do it according to the semantics of the operations.

Anonymous said...

If the argument for pre-increment is that it frees a register, then you'd need to profile code that actually has some contention on registers.

But in the post-increment case I think it is pretty clear to the compiler at any optimization level that you aren't using the old value, and I would trust it to do whatever is right for the architecture.

Anonymous said...

This is really just a matter of opinion. I personally find ++i easier and it's easy to read left-to-right as "increment i". I always read i++ as "i plus plus" which just doesn't settle quite as easily on the brain. Again, that's 100% a matter of opinion, not a fact in any way. :)

Plus, if you _are_ working in C++, it's just a good habit to be in to use pre-increment, in the event that you end up using some fat iterator object that the compiler isn't able to optimize out properly when you use post-increment. That _is_ a fact.

Anonymous said...

The benchmark is useless: what you do in the loop is what will end up mattering, the loop exists to facilitate re-use of this block.

And in that context you might try:

for(int i=0; i< 5;) {
printf("Got: %d\n",i++);


for(int i=0; i< 5;) {
printf("Got: %d\n",++i);

The post-increment is useful for 0-based indices, the pre-increment works better with 1-based indices.

Smonson said...

Hi Sankar,

Objdump is a wonderful piece of software. Just use objdump -S to see the intermixed disassembly and C source. You have to give -g to gcc for it to work.


pcsnow said...

What about the affect of the cache write
and invalidation of the cache?

Doesn't the hardware have to wait for
the new values to be read from the hardware ?

Unknown said...

The reason for pre increment beeing faster is that some CPU architectures have an assmbler instruction that can only do preincrement.

PowerPC is one I know of. so invesigating this on a modern x86 is not going to show any difference. The rule is still that pre inc is more likely to run faster on more targets than post inc.

Anonymous said...

With a reasonable compiler and optimization level there will never be any difference between pre- and post incrementing the loop counter.

This happens because it can be easily seen that the alternatives are functionally equivalent, so the compiler is free to choose the fastest one itself. There is no need and no point in bothering to choose one alternative over the other.

Alan Smithee said...

Can you mail me?

Anonymous said...

A few and also fascination minimize to around Three"We get consented to produce the origins of your Western european Financial Deposit," Sarkozy claimed in the EFSF's authentic nike air jordan
completely new powerBut Win Lean, worldwide go regarding rising promotes strategy in Darkish Brothers Harriman with The big apple, said: "This is actually merely kicking your nike air jordan shoes can later onIt will be permitted for the first time to give states preventive lines of credit before they may be be indifferent to connected with air jordan shoes credit areas, along with loan governing bodies dollars for you to recapitalize banking institutions -- the two movements which usually Belgium impeded earlier this year5-5Thursday's summit Cheap Nike Air Max is unlikely in order to level a rapid or even finish quality in the Language of ancient greece situation, nonetheless, as Merkel herself known the 2009 Cheap Air Max 7 daysThe actual pound in addition to Western european stocks and shares rallied sharply about news with the promising option2 percentageGreek Pm George Papandreou said the Cheap Air Max Shoes offer would certainly cover the nation's backing requires until finally 2020 and earn its debt lasting, nevertheless experts wondered if the lessening will be sufficient to Air Max Shoes prevent the restructuring from the method period

Anonymous said...

If Tejun Heo complained about the usage of pre-increment in loops and any benefits then its true. He is an expert.Extended Warranties for Cars
Extended Warranties

Anonymous said...

Your speed comparisions of pre and post increment for C are all invalid. In any kind of optimized code the compiler will optimize out the differences unless you use a result affected by operator choice.

Try t += inc instead of t = inc, making sure to read t at some point after the loop.

for (i = 0; i < 10000; t += i++)
for (j = 0; j < 10000; t += j++)
for (k = 0; k < 40; t += k++)
; //using t in loop also good

The two operators do differ in the binary and in many cases post increment outperforms pre increment as you suggest.

viagra online said...

are you sure that for (i = 0; i < 10000; t = ++i) is the proper line for this code?

Unknown said...
This comment has been removed by the author.
elina White said...

I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful. Used Car Warranty