Efficiency geek 2: copying data in C/C++, optimisation

Having benchmarked different ways to zero an array, there's also the question of copying lumps of floating-point data from one place to another, which can be done in a similar range of different ways. Here I've benchmarked in the same way as in my first note, using the analogous approach in each case (except for method 9, which doesn't have an analogue here):

Method	Mac PPC	Linux Intel
1, sc3	21 %	69 %
2, for, array	40 %	75 %
3, for, post	38 %	51 %
4, for, pre	38 %	75 %
5, do-while	39 %	75 %
6, duff's, post	40 %	56 %
7, duff's, pre	40 %	75 %
8, memcpy	13 %	39 %
10, unrolled-for	39 %	47 %

(This shows results for copying aligned blocks of data. I also did a test using unaligned blocks, there are no differences worth reporting.)

For PPC Mac it's a very consistent story: all of the loopy methods basically take exactly the same amount of effort. JMC's crafty use of doubles is a clever optimisation here, but (as in the zeroing test) there's a definite outright winner, and it's simpler: memcpy.

For Intel Linux there's some variation in the results. For some reason postincremented pointers are better than their alternatives, and the unrolling in method 10 helps noticeably. But again, memcpy is the outright winner.

So it looks like the recommendation is a direct parallel of the first test: memcpy() please, in this kind of circumstance. YMMV.

Sun 19 October 2008 | IT | Permalink

mcld.co.uk

Other things on this site...