In what cases should I use memcpy over standard operators in C++?

c++ performance memory allocation copying

32,953

Solution 1

Efficiency should not be your concern.
Write clean maintainable code.

It bothers me that so many answers indicate that the memcpy() is inefficient. It is designed to be the most efficient way of copy blocks of memory (for C programs).

So I wrote the following as a test:

#include <algorithm>

extern float a[3];
extern float b[3];
extern void base();

int main()
{
    base();

#if defined(M1)
    a[0] = b[0];
    a[1] = b[1];
    a[2] = b[2];
#elif defined(M2)
    memcpy(a, b, 3*sizeof(float));    
#elif defined(M3)
    std::copy(&a[0], &a[3], &b[0]);
 #endif

    base();
}

Then to compare the code produces:

g++ -O3 -S xr.cpp -o s0.s
g++ -O3 -S xr.cpp -o s1.s -DM1
g++ -O3 -S xr.cpp -o s2.s -DM2
g++ -O3 -S xr.cpp -o s3.s -DM3

echo "=======" >  D
diff s0.s s1.s >> D
echo "=======" >> D
diff s0.s s2.s >> D
echo "=======" >> D
diff s0.s s3.s >> D

This resulted in: (comments added by hand)

=======   // Copy by hand
10a11,18
>   movq    _a@GOTPCREL(%rip), %rcx
>   movq    _b@GOTPCREL(%rip), %rdx
>   movl    (%rdx), %eax
>   movl    %eax, (%rcx)
>   movl    4(%rdx), %eax
>   movl    %eax, 4(%rcx)
>   movl    8(%rdx), %eax
>   movl    %eax, 8(%rcx)

=======    // memcpy()
10a11,16
>   movq    _a@GOTPCREL(%rip), %rcx
>   movq    _b@GOTPCREL(%rip), %rdx
>   movq    (%rdx), %rax
>   movq    %rax, (%rcx)
>   movl    8(%rdx), %eax
>   movl    %eax, 8(%rcx)

=======    // std::copy()
10a11,14
>   movq    _a@GOTPCREL(%rip), %rsi
>   movl    $12, %edx
>   movq    _b@GOTPCREL(%rip), %rdi
>   call    _memmove

Added Timing results for running the above inside a loop of 1000000000.

   g++ -c -O3 -DM1 X.cpp
   g++ -O3 X.o base.o -o m1
   g++ -c -O3 -DM2 X.cpp
   g++ -O3 X.o base.o -o m2
   g++ -c -O3 -DM3 X.cpp
   g++ -O3 X.o base.o -o m3
   time ./m1

   real 0m2.486s
   user 0m2.478s
   sys  0m0.005s
   time ./m2

   real 0m1.859s
   user 0m1.853s
   sys  0m0.004s
   time ./m3

   real 0m1.858s
   user 0m1.851s
   sys  0m0.006s

Solution 2

You can use memcpy only if the objects you're copying have no explicit constructors, so as their members (so-called POD, "Plain Old Data"). So it is OK to call memcpy for float, but it is wrong for, e.g., std::string.

But part of the work has already been done for you: std::copy from <algorithm> is specialized for built-in types (and possibly for every other POD-type - depends on STL implementation). So writing std::copy(a, a + 3, b) is as fast (after compiler optimization) as memcpy, but is less error-prone.

Solution 3

Compilers specifically optimize memcpy calls, at least clang & gcc does. So you should prefer it wherever you can.

Solution 4

Use std::copy(). As the header file for g++ notes:

This inline function will boil down to a call to @c memmove whenever possible.

Probably, Visual Studio's is not much different. Go with the normal way, and optimize once you're aware of a bottle neck. In the case of a simple copy, the compiler is probably already optimizing for you.

Solution 5

Don't go for premature micro-optimisations such as using memcpy like this. Using assignment is clearer and less error-prone and any decent compiler will generate suitably efficient code. If, and only if, you have profiled the code and found the assignments to be a significant bottleneck then you can consider some kind of micro-optimisation, but in general you should always write clear, robust code in the first instance.

View more solutions

32,953

Author by

Pythagoras of Samos

Independent game developer whose reason to live is to witness his own online universe. Now working full-time on Hypersomnia project. Visit my developer diary: http://hypersomnia.xyz

Updated on July 28, 2022

Comments

Pythagoras of Samos almost 2 years
When can I get better performance using memcpy or how do I benefit from using it? For example:
```
float a[3]; float b[3];
```
is code:
```
memcpy(a, b, 3*sizeof(float));
```
faster than this one?
```
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
```
Nawaz over 13 years

@Simone : the first para makes sense to me. Now I need to verify it, because I'm not sure. :-)
Nawaz over 13 years

@ismail : compilers may optimize memcpy, but stil it is less likely to be faster than the second approach. Please read Simone's post.
Karl Knechtel over 13 years

std::copy is properly found in <algorithm>; <algorithm.h> is strictly for backwards-compatibility.
Martin York over 13 years

I don;t think memcopy copies byte by byte. It is specifically designed to copy large chunks of memory very efficiently.
Martin York over 13 years

@Nawaz: I disagree. The memcpy() is likely to be faster given architecture support. Anyway this is redundant as std::copy (as described by @crazylammer) is probably the best solution.
Simone over 13 years

Source please? Only thing that POSIX mandates is this. BTW, see if this implementation is that fast.
Jakob Borg over 13 years

+1. And, since you didn't write down the obvious conclusion from this, the memcpy call looks like it's generating the most efficient code.
visual_learner over 13 years

@Simone - libc writers have spend a lot of time making sure their memcpy implementations are efficient, and compiler writers have spent just as much time making their compilers look for cases when assignments could be made faster by memcpy and vice versa. Your argument of "it can be as bad as you want it to" as well as your out-of-the-blue implementation is a red herring. Look at how GCC or other compilers/libc's implement it. That'll probably be fast enough for you.
visual_learner over 13 years

How is assigning N (where N > 2) different array items one-by-one clearer than a single memcpy? memcpy(a, b, sizeof a) is clearer because, if the size of a and b change, you don't need to add/remove assignments.
josesuero over 13 years

The usual rule of thumb applies: "Assume library writers aren't brain-damaged". Why would they write a memcpy that was only able to copy a byte at a time?
Simone over 13 years

Because memcpy is required to be able to copy a single byte. Of course it may check if the size to copy is a multiple of 4 or 8, but with assignments you may omit the check and have faster code.
Paul R over 13 years

@Chris Lutz: you have to think about the robustness of the code throughout it's lifetime, e.g. what happens if at some point someone changes the declaration of a so that it becomes a pointer instead of an array ? Assignment wouldn't break in this case, but the memcpy would.
visual_learner over 13 years

memcpy wouldn't break (the sizeof a trick would break, but only some people use that). Neither would std::copy, which is demonstrably superior to both in almost every respect.
visual_learner over 13 years

@Simone - On most modern platforms, the compiler will optimize that check. Most compilers will optimize each individual call to memcpy when they can.
Paul R over 13 years

@Chris: well I would rather see a for loop than individual assignments, and of course careful use of memcpy is not off-limits for C code (I would prefer not to see it in C++ code though). But if you work on code that has a long life-cycle or if you care about such things as portability, porting to other languages or compilers, use of code analysis tools, auto-vectorization, etc, then simplicity and clarity are always more important than brevity and low level hacks.
Simone over 13 years

@Chris as you can see, that's not true in every case.
Konrad Rudolph over 13 years

Huh. Why isn’t the call to _memmove inlined?
Nawaz over 13 years

@Simone : one thing : using while loop to do the assignments, and doing the assignment manually using constant offset is NOT same speed-wise. The latter is usually faster.
Simone over 13 years

Yes, but you can't assign manually if you have a dynamically allocated array.
Yttrill over 13 years

BTW: @Martin: it is not reasonable to say "efficiency should not be your concern, write nice code". People use C++ as opposed to a decent language precisely because they demand performance. It matters.
Martin York almost 9 years

@Yttrill: And I have never seen a micro optimization by a human that was not already being done better by compiler. On the other hand writing nice readable code implies you are thinking more at the algorithm level were the human can beat the compiler at optimization because the compiler does not know the intent.
akaltar almost 9 years

@LokiAstari You think that 20 assign operators are easier to understand than a single memcpy on a struct? It might be more readable with memcpy too. (And in C/++ you are probably used to it anyways.)
Martin York almost 9 years

@akaltar: I think std::copy is easier to read. It also the most efficient (equal to memcpy). See the supplied assembly and because you obviously did not time it yourself the timing results.
Martin York almost 9 years

@akaltar: Also you miss the point. You CANT use memcopy on structures in C++ because they have constructors (there is a small subclass of structures that you can use memcopy on but that is not the general case). Which is also why std::copy is better as it will use the most efficient and valid technique.
akaltar almost 9 years

@LokiAstari memcpy is usually used on data-only structs, as usually those are used in high-performance code, but I understand your point, std::copy is indeed superior. As to reading.. I think they come out equal.
user703016 almost 9 years

Addendum: instead of C-style arrays, using std::array<float, 3>, which does have an assignment operator, combines the best of both worlds: readability and efficiency. And has the extra added quality of not decaying to a pointer, among others. Besides, as of the time of writing, both GCC 5.2 and Clang 3.7 generate identical code in all cases, so performance is no longer relevant and readability should be favored.
user239558 about 8 years

The C++ memmove (m3) speed looks dubious. There's no way a call to memmove would have no overhead compared to the optimized and inlined memcpy case.
Martin York about 8 years

@user239558: You are surprised that std::copy produces the fastest code? Given the ability of the compiler to analyze and plant the best code it seems redundantly obvious that std::copy would be the fastest technique (or at least no slower than a technique you can do manually).
user239558 about 8 years

@LokiAstari the assembly was quoted in the answer above. There is no way a non-inlined call to memmove, which in addition to the above needs to check for pointer overlap, could ever be as fast as the inlined memcpy. It's bogus.
Martin York about 8 years

@user239558: Well it's a good job the compiler as a machine makes better choices based on maths than humans do with "opinions". Not only is the code provided but also the timing results. Since this is a science why don't you try and repeat the experiment!
user239558 about 8 years

@LokiAstari I'm just pointing out that the timings are bogus. Anyone who has programmed in assembly or has read intel instruction manuals will know this. It is common to make mistakes in benchmarking, and I'm simply pointing out an obvious one. I'm not wasting any more time on this, sorry.
Martin York about 8 years

@user239558: Words are easy (rather than just pontificate with no proof Just try it).