Is left and right shifting negative integers defined behavior?

19,921

Solution 1

You're not reading that sentence correctly. The standard defines it if: the left operand has a signed type and a non-negative value and the result is representable (and previously in the same paragraph defines it for unsigned types). In all other cases (notice the use of the semicolon in that sentence), i.e, if any of these conditions isn't verified, the behaviour is undefined.

Solution 2

When the C standards were codified, different platforms would do different things when left-shifting negative integers. On some of them, the behavior might trigger implementation-specific traps whose behavior could be outside a program's control, and which could include random code execution. Nonetheless, it was possible that programs written for such platforms might make use of such behavior (a program could e.g. specify that a user would have to do something to configure a system's trap handlers before running it, but the program could then exploit the behavior of the suitably-configured trap handlers).

The authors of the C standard did not want to say that compilers for machines where left-shifting of negative numbers would trap must be modified to prevent such trapping (since programs might potentially be relying upon it), but if left-shifting a negative number is allowed to trigger a trap which could cause any arbitrary behavior (including random code execution) that means that left-shifting a negative number is allowed to do anything whatsoever. Hence Undefined Behavior.

In practice, until about 5 years ago, 99+% of compilers written for a machine that used two's-complement math (meaning 99+% of machines made since 1990) would consistently yield the following behaviors for x<<y and x>>y, to the extent that code reliance upon such behavior was considered no more non-portable than code which assumed char was 8 bits. The C standard didn't mandate such behavior, but any compiler author wanting to be compatible with a wide base of existing code would follow it.

  • if y is a signed type, x << y and x >> y are evaluated as though y was cast to unsigned.
  • if x is type int, x<<y is equivalent to (int)((unsigned)x << y).
  • if x is type int and positive, x>>y equivalent to (unsigned)x >> y. If x is of type int and negative, x>>y is equivalent to `~(~((unsigned)x) >> y).
  • If x is of type long, similar rules apply, but with unsigned long rather than unsigned.
  • if x is an N-bit type and y is greater than N-1, then x >> y and x << y may arbitrarily yield zero, or may act as though the right-hand operand was y % N; they may require extra time proportional to y [note that on a 32-bit machine, if y is negative, that could potentially be a long time, though I only know of one machine which would in practice run more than 256 extra steps]. Compilers were not necessarily consistent in their choice, but would always return one of the indicated values with no other side-effects.

Unfortunately for some reason I can't quite fathom, compiler writers have decided that rather than allowing programmers to indicate what assumptions compilers should use for dead-code removal, compilers should assume that it is impossible to execute any shift whose behavior isn't mandated by the C standard. Thus, given code like the following:

uint32_t shiftleft(uint32_t v, uint8_t n)
{
  if (n >= 32)
    v=0;
  return v<<n;
}

a compiler may determine that because the code would engage in Undefined Behavior when n is 32 or larger, the compiler may assume that the if will never return true, and may thus omit the code. Consequently, unless or until someone comes up with a standard for C which restores the classic behaviors and allows programmers to designate what assumptions merit dead code removal, such constructs cannot be recommended for any code that might be fed to a hyper-modern compiler.

Solution 3

otherwise, the behavior is undefined.

This includes the

if E1 has a signed type and non-negative value

Share:
19,921

Related videos on Youtube

user103214
Author by

user103214

Updated on June 05, 2022

Comments

  • user103214
    user103214 almost 2 years

    I know, right shifting a negative signed type depends on the implementation, but what if I perform a left shift? For example:

    int i = -1;
    i << 1;
    

    Is this well-defined?

    I think the standard doesn't say about negative value with signed type

    if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

    It only clarifies that if the result isn't representable in signed type then the behavior is undefined.

    • Steve Jessop
      Steve Jessop over 12 years
      What part isn't clear? If E1 has a non-negative value and ... otherwise the behavior is undefined. E1 has negative value, therefore the behavior is undefined. It's true that sometimes the standardese could do with some extra brackets to make it entirely clear what an "otherwise" refers to, but here it means "in any situation not already described". It helps when interpreting these things to remember that in any situation where the standard doesn't describe the behavior, then behavior is undefined. So in fact that "otherwise" is formally redundant.
    • user103214
      user103214 over 12 years
      @SteveJessop: I think the otherwise means if the value is not representable then the behavior is undefined. I'm not sure if this includes E1 with negative value.
    • Xeo
      Xeo over 12 years
      @Norman: It does, see R.Marthinho's answer, there is a semicolon, a clear seperator between the condition and the final result.
    • Jan Hudec
      Jan Hudec over 12 years
      Which standard version is that? The wording in C++03 standard ISO/IEC 14882:2003 is different. It says that it must be a bit shift and than only says what is the corresponding numeric value for unsigned type, but does not mention unsigned type at all. Which implies that it's implementation-defined, because the bit pattern is.
    • Steve Jessop
      Steve Jessop over 12 years
      @Jan: the wording is from C++11, which took it from C99. You're right that C89 and C++03 both define the left shift operator as "a bit pattern left-shifted E2 positions", without giving any further definition of what it means to "left-shift" a signed bit pattern. I think the general interpretation of that old text was that if the sign bit gets involved at all, then that's an overflow (UB), rather than that the result must be what you'd expect based on the implementation-defined integer representation. But evidently the old text was considered inadequate, or it wouldn't have been changed.
    • Jan Hudec
      Jan Hudec over 12 years
      @SteveJessop: Thanks; that needed to be specified. It's somewhat strange, that C++11 (well, n3242; I don't have final) says that right shift of negative values is implementation-defined, but right shift of negative values is undefined.
    • Steve Jessop
      Steve Jessop over 12 years
      @Jan: I agree, that is strange. I suppose there must be some dusty old hardware somewhere on which the left shift traps due to overflow, whereas the right shift does something reliable. Or at least a rumour reached the committee's ears that there is. "Dusty old hardware" sometimes turns out to mean, "the machine that runs the bank transaction that pays my salary", so I'm not entirely against the C++ standard taking it into account.
    • supercat
      supercat about 9 years
      @SteveJessop: Hardware that traps on left-shift could have been handled via "Must either yield the bit pattern... or raise a trap whose existence is implementation-defined", if the intention weren't to allow compilers to make assumptions about code behavior without having to worry about whether such assumptions match reality.
  • user103214
    user103214 over 12 years
    What if I had a non-negative and I right shift it too many times? int i = 1; i >> 24; is this valid? just confused.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 12 years
    @Norman What do you mean too many times? More bits than an int has?
  • user103214
    user103214 over 12 years
    @R.MartinhoFernandes if sizeof(i) > 24, lets say it has 32 bits. If I do i >> 24 then it would be redundant because i has value 1.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 12 years
    Well, it is equivalent to integral division by 2^24. 1/(2^24) is zero.
  • user103214
    user103214 over 12 years
    @R.MartinhoFernandes: would it be valid? for example: I could just write i >> 1 and it results zero.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 12 years
    @Norma: yes, it's valid. It matches the conditions described in paragraph 3.
  • John Zwinck
    John Zwinck over 7 years
    Thanks for this wonderful answer--even if it's not the accepted one it's very useful.
  • jww
    jww almost 7 years
    Another thing to keep in mind is, not all barrel-shifters are the same. I seem to recall ARM behave a little differently than IA32 and other platforms, and its the reason there's not an implicit mod operation using only the lower bits of the shift amount.
  • supercat
    supercat almost 7 years
    @jww: Platforms which include a "shift-by-N" instruction often mod-reduce the shift value, but in many cases the mod-reduction is by an amount larger than the word size. Further, different chips in a family might behave differently. Almost all processors, however, would use one of the two behaviors listed.
  • gengkev
    gengkev almost 6 years
    This is a useful answer, but I don't believe that the explanation for the final example of shiftleft is correct. If n >= 32 then this code doesn't engage in undefined behavior, as it will simply return. For the if-statement to be optimized out, the undefined behavior must happen before the if-statement.
  • supercat
    supercat almost 6 years
    @gengkev: As written, the code does not return zero; it sets n equal to zero and then shifts the value zero left by n bits. On many platforms, an "if" with no "else" is more efficient than an "if"/"else" construct. Unless hardware would do something unacceptable with large values of n (e.g. running a microcode loop for n cycles with interrupts disabled might be bad if n were a large 32-bit value) letting the shift execute may be more efficient than branching around it. If a compiler tries to get "clever" however, it may not be possible to make it generate what should be optimal code.
  • supercat
    supercat almost 5 years
    @jww: The ARM platforms do a mod-256 on the shift amount. If the Standard were to specify that implementations may treat x<<y for y>=bitsize as an unspecified choice among x<<(y-bitsize), x<<(y-1)<<1, or whatever hardware happens to do, that should make it practical on all platforms and yet still guarantee that constructs like (x<<y)|(x>>(32-y))` would work on all common platforms without having to add additional complexity to accommodate the y==0 case.