Why doesn't left bit-shift, "<<", for 32-bit integers work as expected when used more than 32 times?

c++ bit-shift

58,093

Solution 1

This is caused due to a combination of an undefined behaviour in C and the fact that code generated for IA-32 processors has a 5 bit mask applied on the shift count. This means that on IA-32 processors, the range of a shift count is 0-31 only. ¹

From The C programming language ²

The result is undefined if the right operand is negative, or greater than or equal to the number of bits in the left expression’s type.

From IA-32 Intel Architecture Software Developer’s Manual ³

The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

¹ http://codeyarns.com/2004/12/20/c-shift-operator-mayhem/

² A7.8 Shift Operators, Appendix A. Reference Manual, The C Programming Language

³ SAL/SAR/SHL/SHR – Shift, Chapter 4. Instruction Set Reference, IA-32 Intel Architecture Software Developer’s Manual

Solution 2

In C++, shift is only well-defined if you shift a value less steps than the size of the type. If int is 32 bits, then only 0 to, and including, 31 steps is well-defined.

So, why is this?

If you take a look at the underlying hardware that performs the shift, if it only has to look at the lower five bits of a value (in the 32 bit case), it can be implemented using less logical gates than if it has to inspect every bit of the value.

Answer to question in comment

C and C++ are designed to run as fast as possible, on any available hardware. Today, the generated code is simply a ''shift'' instruction, regardless how the underlying hardware handles values outside the specified range. If the languages would have specified how shift should behave, the generated could would have to check that the shift count is in range before performing the shift. Typically, this would yield three instructions (compare, branch, shift). (Admittedly, in this case it would not be necessary as the shift count is known.)

Solution 3

It's undefined behaviour according to the C++ standard:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Solution 4

The answers of Lindydancer and 6502 explain why (on some machines) it happens to be a 1 that is being printed (although the behavior of the operation is undefined). I am adding the details in case they aren't obvious.

I am assuming that (like me) you are running the program on an Intel processor. GCC generates these assembly instructions for the shift operation:

movl $32, %ecx
sall %cl, %eax

On the topic of sall and other shift operations, page 624 in the Instruction Set Reference Manual says:

The 8086 does not mask the shift count. However, all other Intel Architecture processors (starting with the Intel 286 processor) do mask the shift count to five bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

Since the lower 5 bits of 32 are zero, then 1 << 32 is equivalent to 1 << 0, which is 1.

Experimenting with larger numbers, we would predict that

cout << (a << 32) << " " << (a << 33) << " " << (a << 34) << "\n";

would print 1 2 4, and indeed that is what is happening on my machine.

Solution 5

It doesn't work as expected because you're expecting too much.

In the case of x86 the hardware doesn't care about shift operations where the counter is bigger than the size of the register (see for example SHL instruction description on x86 reference documentation for an explanation).

The C++ standard didn't want to impose an extra cost by telling what to do in these cases because generated code would have been forced to add extra checks and logic for every parametric shift.

With this freedom implementers of compilers can generate just one assembly instruction without any test or branch.

A more "useful" and "logical" approach would have been for example to have (x << y) equivalent to (x >> -y) and also the handling of high counters with a logical and consistent behavior.

However this would have required a much slower handling for bit shifting so the choice was to do what the hardware does, leaving to the programmers the need to write their own functions for side cases.

Given that different hardware does different things in these cases what the standard says is basically "Whatever happens when you do strange things just don't blame C++, it's your fault" translated in legalese.

View more solutions

58,093

Author by

AnkitSablok

Mathematics and Computer Science student @ University at Buffalo, State University of New York.

Updated on July 05, 2022

Comments

AnkitSablok almost 2 years
When I write the following program and use the GNU C++ compiler, the output is 1 which I think is due to the rotation operation performed by the compiler.
```
#include <iostream>

int main()
{
    int a = 1;
    std::cout << (a << 32) << std::endl;

    return 0;
}
```
But logically, as it's said that the bits are lost if they overflow the bit width, the output should be 0. What is happening?

The code is on ideone, http://ideone.com/VPTwj.
Flexo over 12 years

that's some mighty quick standard searching!
David Heffernan over 12 years

@awoodland Yes, Google is seriously impressive!
AnkitSablok over 12 years

your answer doesnt justify the query sir i am interested in knowing y arent the bits lost and the answer should be 0 as per me but it does some sort of rotation and puts the last bit in the 1st position and u can confirm this with other numbers as well
Flexo over 12 years

Oh I see. I'd fired up a PDF reader and got about as far as the table of contents.
Khaled Alshaya over 12 years

@AnkitSablok Undefined behavior means that any behavior can happen ;) check: stackoverflow.com/questions/2397984/…
Flexo over 12 years

@Ankit - but it does exactly answer it. That's the beauty of undefined behaviour, anything goes!
Beta over 12 years

But... why is this undefined behavior? Why couldn't the low-level code perform the subtraction, determine that it need consider only the lowest zero bits, and return 0 without inspecting the value at all?
David Heffernan over 12 years

@Ankit The answer should not be 0. The answer is undefined. The answer could be anything. This is the essence of undefined behaviour.
V.S. over 12 years

@AnkitSablok - it DOES justify the behaviour. The point is that when behaviour is undefined the compiler is allowed to do anything it pleases. At best guess the compiler has simply decided to do nothing (which is perfectly valid according to the standard)
Flexo over 12 years

@Beta - it's undefined so as not to encumber implementations with the burden of emulating in software what was "the norm" in hardware at the point in time when the standard was conceived.
Flexo over 12 years

It doesn't mean the answer could be 0 or 1, it means that absolutely anything could happen
Matthieu M. over 12 years

@Beta: interesting question ! Most of those undefined operations map directly to processor instructions. The behavior is generally undefined because doing otherwise would require checks prior to the operations that would not be implemented efficiently on most architectures. Typical safety/speed tradeof.
6502 over 12 years

IMO it's not the compiler in this case, it's the hardware that doesn't handle shifts bigger than registers. The standard simply didn't want to specify something that would require fatter and slower code for all parametric shifts on some CPUs.
MSalters over 12 years

@Beta: "Don't pay for what you don't need" principle. You can do the check, if your code needs it. But if the check was automatic, you could not not do it.
fluffy over 12 years

A bit more precisely, at least in the case of older versions of gcc, a<<b would be implemented as equivalent to a<<(b&31). I actually relied on this behavior once upon a time in a game, where a>>33 was faster and generated equivalent code to a>>1 (due to some implementation details of the 80486 CPU). Of course in retrospect I realize how foolish it was to do that, but I was young and inexperienced, and it DID give me a significant speed increase in an often-run inner loop.
6502 over 12 years

@fluffy: It's not gcc, it's the processor. Given that the standard doesn't require any specific behavior what the guys from gcc did was just to use the assembler instruction for the shift not caring about what to do in case the counter was too big or negative. If you check the manual I pointed to you will see that indeed the x86 family uses the low 5 bits of the counter, ignoring the others... in their words "The count is masked to five bits, which limits the count range to 0 to 31". The standard wanted to leave this exact freedom, so that shift can become just one asm instruction.
fluffy over 12 years

Yes, but in this case it's a matter of how the code is generated by the compiler. It was a case of shr eax vs. shr eax,1, and the fact that gcc had a design facet in which shr eax,33 would AND the immediate operand with 31 to ensure that it fit into the 5-bit parameter slot in the instruction. Immediate-mode shr has the shift amount encoded directly in the instruction, at least on x86 and most other CISC architectures. i.e. it's a matter of how the compiler is doing the masking TO SUPPORT THE CPU. It's a fine distinction but important.
6502 over 12 years

@fluffy: The reason for which the standard doesn't mandate a behavior when counter is bigger than the bit size of the value is to avoid a performance penality in case of a parametric shift. In case of a constant shift (e.g. 33) a compiler could just simplify to a constant 0 without any performance penality. That gcc decided to keep uniform behavior to the parametric case is a reasonable choice but not the only one (I hope at least a warning was emitted tho). If you need control on the asm instruction generated just use asm... using a non-specified behavior is looking for trouble.
fluffy over 12 years

At the time, gcc didn't emit a warning, but this was in the gcc 2.x days. Nowadays it does emit a warning, which is why I'd even remembered I was doing this terrible thing since I was dusting off old code for nostalgia's sake. :) And yes, I am fully aware that the behavior was unspecified and that to rely on it is a bad idea. I was just adding more to your answer about one of the particular ways it can behave that is different than the way you were stating. I should have been more precise that I was using immediate-mode >> rather than register-mode >> though.
Shahbaz over 12 years

@Beta, Imagine the instruction for shift in a certain computer takes 5 bits. No matter what you do, you just can't shift a variable by 32! (Something like 01011 (shift) 010 (to register 2) 110 (from register 6) ????? (shift amount) (This is an imaginary but very possible instruction))
dbrank0 over 10 years

Nice example of specifications being "optimized" for existing CPUs, but made counter intuitive in process. Damn.
Cephalopod almost 10 years

Could you please include the essence of that blog post in your answer (cf meta.stackexchange.com/a/8259/244745)?
nodakai over 8 years

That's not the best part to quote. The preceding paragraph states "The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand" so n << 32 is an UB regardless of the signedness of the left operand
Luaan over 8 years

@MSalters Pascal had a nice alternative to pretty much the same problem - keep everything as defined as possible, but give an option to opt-out (using preprocessor directives). It's much more multi-platform friendly.
Toby Speight almost 8 years

This doesn't explain why there's a difference between a shift of 32 places, and two shifts that add up to 32 places.