How does the internal implementation of memcpy work?

28,592

Solution 1

Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.

Here's a fun article that describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.

Solution 2

The implementation of memcpy is highly specific to the system in which it is implemented. Implementations are often hardware-assisted.

Memory-to-memory mov instructions are not that uncommon - they have been around since at least PDP-11 times, when you could write something like this:

    MOV FROM, R2
    MOV TO,   R3
    MOV R2,   R4
    ADD LEN,  R4
CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
    CMP R2, R4
    BNE CP

The commented line is roughly equivalent to C's

*to++ = *from++;

Contemporary CPUs have instructions that implement memcpy directly: you load special registers with the source and destination addresses, invoke a memory copy command, and let CPU do the rest.

Solution 3

A trivial implementation of memcpy is:

 while (n--) *s2++ = *s1++;

But glibc usually uses some clever implementations in assembly code. memcpy calls are usually inlined.

On x86, the code checks if the size parameter is a literal multiple of 2 or a multiple of 4 (using gcc builtins functions) and uses a loop with movl instruction (copy 4 bytes) otherwise it calls the general case.

The general case uses the fast block copy assembly using rep and movsl instructions.

Share:
28,592

Related videos on Youtube

PersonWithName
Author by

PersonWithName

:-)

Updated on July 07, 2020

Comments

  • PersonWithName
    PersonWithName almost 4 years

    How does the standard C function 'memcpy' work? It has to copy a (large) chunk of RAM to another area in the RAM. Since I know you cannot move straight from RAM to RAM in assembly (with the mov instruction) so I am guessing it uses a CPU register as the intermediate memory when copying?

    But how does it copy? By blocks (how would it copy by blocks?), by individual bytes (char) or the largest data type they have (copy in long long double's - which is 12 bytes on my system).

    EDIT: Ok apparently you can move data from RAM to RAM directly, I am not an assembly expert and all I have learnt about assembly is from this document (X86 assembly guide) which mentions in the section about the mov instruction that you cannot move from RAM to RAM. Apparently this isn't true.

    • Oliver Charlesworth
      Oliver Charlesworth almost 11 years
      This is platform-specific. Please specify a platform.
    • PersonWithName
      PersonWithName almost 11 years
      I use linux, mac & windows (32-bit, 64-bit and 32-bit respectively) but I asked this question while using Linux.
  • Jim Balter
    Jim Balter almost 11 years
    "they have been around since at least PDP-11 times" -- far longer.
  • Sergey Kalinichenko
    Sergey Kalinichenko almost 11 years
    @JimBalter This does not surprise me at all :)
  • Rockstar5645
    Rockstar5645 almost 5 years
    but s2 and s1 are void pointers, and I thought you couldn't dereference void pointers.
  • Rockstar5645
    Rockstar5645 almost 5 years
    to and from are void pointers, and I thought you can't dereference void pointers. Would you first type cast them to (unsigned char*)
  • Sergey Kalinichenko
    Sergey Kalinichenko almost 5 years
    @Rockstar5645 Assembly has no concept of type, so it's happy to dereference whatever address you pass as a void*. Of course if you are writing an implementation in C, you'd have to typecast these pointers to something that you can dereference, such as unsigned char*.
  • joepol
    joepol about 3 years
    @ouah - why use movl only on sizes that are multiples of 4 and not always try to use movl? if you have to copy a total of 50 bytes you can't you copy using 12 movl and 2 mov?
  • joepol
    joepol about 3 years
    @Rockstar5645 - you must cast before, I believe ouah referenced this : gcc memcpy implementation
  • mohammadsdtmnd
    mohammadsdtmnd over 2 years
    Buf for specific case like for when the i=1,2 or 4. then what?