In what cases should I use memcpy over standard operators in C++?
Solution 1
Efficiency should not be your concern.
Write clean maintainable code.
It bothers me that so many answers indicate that the memcpy() is inefficient. It is designed to be the most efficient way of copy blocks of memory (for C programs).
So I wrote the following as a test:
#include <algorithm>
extern float a[3];
extern float b[3];
extern void base();
int main()
{
base();
#if defined(M1)
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
#elif defined(M2)
memcpy(a, b, 3*sizeof(float));
#elif defined(M3)
std::copy(&a[0], &a[3], &b[0]);
#endif
base();
}
Then to compare the code produces:
g++ -O3 -S xr.cpp -o s0.s
g++ -O3 -S xr.cpp -o s1.s -DM1
g++ -O3 -S xr.cpp -o s2.s -DM2
g++ -O3 -S xr.cpp -o s3.s -DM3
echo "=======" > D
diff s0.s s1.s >> D
echo "=======" >> D
diff s0.s s2.s >> D
echo "=======" >> D
diff s0.s s3.s >> D
This resulted in: (comments added by hand)
======= // Copy by hand
10a11,18
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movl (%rdx), %eax
> movl %eax, (%rcx)
> movl 4(%rdx), %eax
> movl %eax, 4(%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // memcpy()
10a11,16
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movq (%rdx), %rax
> movq %rax, (%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // std::copy()
10a11,14
> movq _a@GOTPCREL(%rip), %rsi
> movl $12, %edx
> movq _b@GOTPCREL(%rip), %rdi
> call _memmove
Added Timing results for running the above inside a loop of 1000000000
.
g++ -c -O3 -DM1 X.cpp
g++ -O3 X.o base.o -o m1
g++ -c -O3 -DM2 X.cpp
g++ -O3 X.o base.o -o m2
g++ -c -O3 -DM3 X.cpp
g++ -O3 X.o base.o -o m3
time ./m1
real 0m2.486s
user 0m2.478s
sys 0m0.005s
time ./m2
real 0m1.859s
user 0m1.853s
sys 0m0.004s
time ./m3
real 0m1.858s
user 0m1.851s
sys 0m0.006s
Solution 2
You can use memcpy
only if the objects you're copying have no explicit constructors, so as their members (so-called POD, "Plain Old Data"). So it is OK to call memcpy
for float
, but it is wrong for, e.g., std::string
.
But part of the work has already been done for you: std::copy
from <algorithm>
is specialized for built-in types (and possibly for every other POD-type - depends on STL implementation). So writing std::copy(a, a + 3, b)
is as fast (after compiler optimization) as memcpy
, but is less error-prone.
Solution 3
Compilers specifically optimize memcpy
calls, at least clang & gcc does. So you should prefer it wherever you can.
Solution 4
Use std::copy()
. As the header file for g++
notes:
This inline function will boil down to a call to @c memmove whenever possible.
Probably, Visual Studio's is not much different. Go with the normal way, and optimize once you're aware of a bottle neck. In the case of a simple copy, the compiler is probably already optimizing for you.
Solution 5
Don't go for premature micro-optimisations such as using memcpy like this. Using assignment is clearer and less error-prone and any decent compiler will generate suitably efficient code. If, and only if, you have profiled the code and found the assignments to be a significant bottleneck then you can consider some kind of micro-optimisation, but in general you should always write clear, robust code in the first instance.
![Pythagoras of Samos](https://i.stack.imgur.com/njMlC.jpg?s=256&g=1)
Pythagoras of Samos
Independent game developer whose reason to live is to witness his own online universe. Now working full-time on Hypersomnia project. Visit my developer diary: http://hypersomnia.xyz
Updated on July 28, 2022Comments
-
Pythagoras of Samos almost 2 years
When can I get better performance using
memcpy
or how do I benefit from using it? For example:float a[3]; float b[3];
is code:
memcpy(a, b, 3*sizeof(float));
faster than this one?
a[0] = b[0]; a[1] = b[1]; a[2] = b[2];
-
Nawaz over 13 years@Simone : the first para makes sense to me. Now I need to verify it, because I'm not sure. :-)
-
Nawaz over 13 years@ismail : compilers may optimize
memcpy
, but stil it is less likely to be faster than the second approach. Please read Simone's post. -
Karl Knechtel over 13 years
std::copy
is properly found in<algorithm>
;<algorithm.h>
is strictly for backwards-compatibility. -
Martin York over 13 yearsI don;t think memcopy copies byte by byte. It is specifically designed to copy large chunks of memory very efficiently.
-
Martin York over 13 years@Nawaz: I disagree. The memcpy() is likely to be faster given architecture support. Anyway this is redundant as std::copy (as described by @crazylammer) is probably the best solution.
-
Simone over 13 yearsSource please? Only thing that POSIX mandates is this. BTW, see if this implementation is that fast.
-
Jakob Borg over 13 years+1. And, since you didn't write down the obvious conclusion from this, the memcpy call looks like it's generating the most efficient code.
-
visual_learner over 13 years@Simone - libc writers have spend a lot of time making sure their
memcpy
implementations are efficient, and compiler writers have spent just as much time making their compilers look for cases when assignments could be made faster bymemcpy
and vice versa. Your argument of "it can be as bad as you want it to" as well as your out-of-the-blue implementation is a red herring. Look at how GCC or other compilers/libc's implement it. That'll probably be fast enough for you. -
visual_learner over 13 yearsHow is assigning N (where N > 2) different array items one-by-one clearer than a single
memcpy
?memcpy(a, b, sizeof a)
is clearer because, if the size ofa
andb
change, you don't need to add/remove assignments. -
josesuero over 13 yearsThe usual rule of thumb applies: "Assume library writers aren't brain-damaged". Why would they write a
memcpy
that was only able to copy a byte at a time? -
Simone over 13 yearsBecause
memcpy
is required to be able to copy a single byte. Of course it may check if the size to copy is a multiple of 4 or 8, but with assignments you may omit the check and have faster code. -
Paul R over 13 years@Chris Lutz: you have to think about the robustness of the code throughout it's lifetime, e.g. what happens if at some point someone changes the declaration of a so that it becomes a pointer instead of an array ? Assignment wouldn't break in this case, but the memcpy would.
-
visual_learner over 13 years
memcpy
wouldn't break (thesizeof a
trick would break, but only some people use that). Neither wouldstd::copy
, which is demonstrably superior to both in almost every respect. -
visual_learner over 13 years@Simone - On most modern platforms, the compiler will optimize that check. Most compilers will optimize each individual call to
memcpy
when they can. -
Paul R over 13 years@Chris: well I would rather see a for loop than individual assignments, and of course careful use of memcpy is not off-limits for C code (I would prefer not to see it in C++ code though). But if you work on code that has a long life-cycle or if you care about such things as portability, porting to other languages or compilers, use of code analysis tools, auto-vectorization, etc, then simplicity and clarity are always more important than brevity and low level hacks.
-
Simone over 13 years@Chris as you can see, that's not true in every case.
-
Konrad Rudolph over 13 yearsHuh. Why isn’t the call to
_memmove
inlined? -
Nawaz over 13 years@Simone : one thing : using
while
loop to do the assignments, and doing the assignment manually using constant offset is NOT same speed-wise. The latter is usually faster. -
Simone over 13 yearsYes, but you can't assign manually if you have a dynamically allocated array.
-
Yttrill over 13 yearsBTW: @Martin: it is not reasonable to say "efficiency should not be your concern, write nice code". People use C++ as opposed to a decent language precisely because they demand performance. It matters.
-
Martin York almost 9 years@Yttrill: And I have never seen a micro optimization by a human that was not already being done better by compiler. On the other hand writing nice readable code implies you are thinking more at the algorithm level were the human can beat the compiler at optimization because the compiler does not know the intent.
-
akaltar almost 9 years@LokiAstari You think that 20 assign operators are easier to understand than a single memcpy on a struct? It might be more readable with memcpy too. (And in C/++ you are probably used to it anyways.)
-
Martin York almost 9 years@akaltar: I think
std::copy
is easier to read. It also the most efficient (equal to memcpy). See the supplied assembly and because you obviously did not time it yourself the timing results. -
Martin York almost 9 years@akaltar: Also you miss the point. You CANT use memcopy on structures in C++ because they have constructors (there is a small subclass of structures that you can use memcopy on but that is not the general case). Which is also why
std::copy
is better as it will use the most efficient and valid technique. -
akaltar almost 9 years@LokiAstari memcpy is usually used on data-only structs, as usually those are used in high-performance code, but I understand your point, std::copy is indeed superior. As to reading.. I think they come out equal.
-
user703016 almost 9 yearsAddendum: instead of C-style arrays, using
std::array<float, 3>
, which does have an assignment operator, combines the best of both worlds: readability and efficiency. And has the extra added quality of not decaying to a pointer, among others. Besides, as of the time of writing, both GCC 5.2 and Clang 3.7 generate identical code in all cases, so performance is no longer relevant and readability should be favored. -
user239558 about 8 yearsThe C++ memmove (m3) speed looks dubious. There's no way a call to
memmove
would have no overhead compared to the optimized and inlinedmemcpy
case. -
Martin York about 8 years@user239558: You are surprised that
std::copy
produces the fastest code? Given the ability of the compiler to analyze and plant the best code it seems redundantly obvious thatstd::copy
would be the fastest technique (or at least no slower than a technique you can do manually). -
user239558 about 8 years@LokiAstari the assembly was quoted in the answer above. There is no way a non-inlined call to
memmove
, which in addition to the above needs to check for pointer overlap, could ever be as fast as the inlinedmemcpy
. It's bogus. -
Martin York about 8 years@user239558: Well it's a good job the compiler as a machine makes better choices based on maths than humans do with "opinions". Not only is the code provided but also the timing results. Since this is a science why don't you try and repeat the experiment!
-
user239558 about 8 years@LokiAstari I'm just pointing out that the timings are bogus. Anyone who has programmed in assembly or has read intel instruction manuals will know this. It is common to make mistakes in benchmarking, and I'm simply pointing out an obvious one. I'm not wasting any more time on this, sorry.
-
Martin York about 8 years@user239558: Words are easy (rather than just pontificate with no proof Just try it).