What is meant by "memory is 8 bytes aligned"?
Solution 1
An object that is "8 bytes aligned" is stored at a memory address that is a multiple of 8.
Many CPUs will only load some data types from aligned locations; on other CPUs such access is just faster. There's also several other possible reasons for using memory alignment - without seeing the code it's hard to say why.
Aligned access is faster because the external bus to memory is not a single byte wide - it is typically 4 or 8 bytes wide (or even wider). This means that the CPU doesn't fetch a single byte at a time - it fetches 4 or 8 bytes starting at the requested address. As a consequence of this, the 2 or 3 least significant bits of the memory address are not actually sent by the CPU - the external memory can only be read or written at addresses that are a multiple of the bus width. If you requested a byte at address "9", the CPU would actually ask the memory for the block of bytes beginning at address 8, and load the second one into your register (discarding the others).
This implies that a misaligned access can require two reads from memory: If you ask for 8 bytes beginning at address 9, the CPU must fetch the 8 bytes beginning at address 8 as well as the 8 bytes beginning at address 16, then mask out the bytes you wanted. On the other hand, if you ask for the 8 bytes beginning at address 8, then only a single fetch is needed. Some CPUs will not even perform such a misaligned load - they will simply raise an exception (or even silently load the wrong data!).
Solution 2
The memory alignment is important for performance in different ways. It has a hardware related reason. Since the 80s there is a difference in access time between the CPU and the memory. The speed of the processor is growing faster than the speed of the memory. This difference is getting bigger and bigger over time (to give an example: on the Apple II the CPU was at 1.023 MHz, the memory was at twice that frequency, 1 cycle for the CPU, 1 cycle for the video. A modern PC works at about 3GHz on the CPU, with a memory at barely 400MHz). One solution to the problem of ever slowing memory, is to access it on ever wider busses, instead of accessing 1 byte at a time, the CPU will read a 64 bit wide word from the memory. This means that even if you read 1 byte from memory, the bus will deliver a whole 64bit (8 byte word). The memory will have these 8 byte units at address 0, 8, 16, 24, 32, 40 etc. A multiple of 8. If you access, for example an 8 byte word at address 4, the hardware will have to read the word at address 0, mask the high 4 bytes of that word, then read word at address 8, mask the low part of that word, combine it with the first half and give that to the register. As you can see a quite complicated (thus slow) operation. This is the first reason one likes aligned memory access. I will give another reason in 2 hours.
Solution 3
"X bytes aligned" means that the base address of your data must be a multiple of X. It can be used for using some special hardware like a DMA in some special hardware, for a faster access by the cpu, etc...
It is the case of the Cell Processor where data must be 16 bytes aligned in order to be copied to/from the co-processor.
Solution 4
if the memory data is 8 bytes aligned, it means:
sizeof(the_data) % 8 == 0
.
generally in C language, if a structure is proposed to be 8 bytes aligned, its size must be multiplication of 8, and if it is not, padding is required manually or by compiler. some compilers provide directives to make a structure aligned with n bytes, for VC, it is #prgama pack(8)
, and for gcc, it is __attribute__((aligned(8)))
.
Comments
-
Renjith G almost 4 years
While going through one project, I have seen that the memory data is "8 bytes aligned". Can anyone please explain what this means?
-
Renjith G almost 14 yearsmeaning , if the first position is 0x0000 then the second position would be 0x0008 .. what is the advantages of these 8 byte aligned type ?
-
Renjith G almost 14 yearsok. but how the execution become faster when it is of X bytes of aligned ? Due to easier calculation of the memory address or some thing else ? Also is there any alignment for functions? /Kanu__
-
Phong almost 14 yearsWell, it depend on your architecture. For example, if you have a 32-bit architecture and your memory can be accessed only by 4-byte for a address multiple of 4 (4bytes aligned), It would be more efficient to fit your 4byte data (eg: integer) in it. It would allow you to access it in one memory read instead of two if it is not aligned. (NOTE: This case is hypothetical)
-
Phong almost 14 yearsGenerally your compiler do all the optimization, so you dont have to manage it. In some VERY specific case, you may need to specify it yourself (eg: Cell processor, or your project hardware). If you are working on traditional architecture, you really don't need to do it.
-
Renjith G almost 14 yearsAlso is there any alignment for functions?
-
Renjith G almost 14 yearsThanks. Good one . I am waiting for your second reason.
-
Jarosław Bielawski almost 14 yearsSorry, forgot that. There isn't a second reason. At the moment I wrote that, I thought about arrays and sizes of elements of the array, which is not strictly about alignment. But sizes that are powers of 2, have the advantage of being easily computed. Certain CPUs have even address modes that make that multiplication by 2, 4 or 8 directly without penalty (x86 and 68020 for example). But as said, it has not much to do with alignments.
-
Renjith G almost 14 yearsexactly. Thanks for the info. /renjith_g
-
Royi almost 8 yearsSo lets say one is working with SSE (128 Bit) on Floating Point (Single) data. Yet the data length is 38. The process multiply the data by a constant. What should the developer do to handle this?
-
RaGa__M over 4 years"If you requested a byte at address "9" do we need to care about alignment at byte level?
-
Peter Cordes almost 4 yearsCPUs with cache fetch memory in whole (aligned) cache-line chunks so the external bus only matters for uncached MMIO accesses. Alignment means data can never be split across any wider power-of-2 boundary. But some non-x86 ISAs require natural alignment (aligned to its size) - that means cache-access hardware can be simpler because it doesn't have to shift an unaligned word from the cache.