What is the purpose of the h and hh modifiers for printf?
Solution 1
One possible reason: for symmetry with the use of those modifiers in the formatted input functions? I know it wouldn't be strictly necessary, but maybe there was value seen for that?
Although they don't mention the importance of symmetry for the "h" and "hh" modifiers in the C99 Rationale document, the committee does mention it as a consideration for why the "%p" conversion specifier is supported for fscanf()
(even though that wasn't new for C99 - "%p" support is in C90):
Input pointer conversion with %p was added to C89, although it is obviously risky, for symmetry with fprintf.
In the section on fprintf()
, the C99 rationale document does discuss that "hh" was added, but merely refers the reader to the fscanf()
section:
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
I know it's a tenuous thread, but I'm speculating anyway, so I figured I'd give whatever argument there might be.
Also, for completeness, the "h" modifier was in the original C89 standard - presumably it would be there even if it wasn't strictly necessary because of widespread existing use, even if there might not have been a technical requirement to use the modifier.
Solution 2
In %...x
mode, all values are interpreted as unsigned. Negative numbers are therefore printed as their unsigned conversions. In 2's complement arithmetic, which most processors use, there is no difference in bit patterns between a signed negative number and its positive unsigned equivalent, which is defined by modulus arithmetic (adding the maximum value for the field plus one to the negative number, according to the C99 standard). Lots of software- especially the debugging code most likely to use %x
- makes the silent assumption that the bit representation of a signed negative value and its unsigned cast is the same, which is only true on a 2's complement machine.
The mechanics of this cast are such that hexidecimal representations of value always imply, possibly inaccurately, that a number has been rendered in 2's complement, as long as it didn't hit an edge condition of where the different integer representations have different ranges. This even holds true for arithmetic representations where the value 0 is not represented with the binary pattern of all 0s.
A negative short
displayed as an unsigned long
in hexidecimal will therefore, on any machine, be padded with f
, due to implicit sign extension in the promotion, which printf
will print. The value is the same, but it is truly visually misleading as to the size of the field, implying a significant amount of range that simply isn't present.
%hx
truncates the displayed representation to avoid this padding, exactly as you concluded from your real-world use case.
The behavior of printf
is undefined when passed an int
outside the range of short
that should be printed as a short
, but the easiest implementation by far simply discards the high bit by a raw downcast, so while the spec doesn't require any specific behavior, pretty much any sane implementation is going to just perform the truncation. There're generally better ways to do that, though.
If printf isn't padding values or displaying unsigned representations of signed values, %h
isn't very useful.
Solution 3
The only use I can think of is for passing an unsigned short
or unsigned char
and using the %x
conversion specifier. You cannot simply use a bare %x
- the value may be promoted to int
rather than unsigned int
, and then you have undefined behaviour.
Your alternatives are either to explicitly cast the argument to unsigned
; or to use %hx
/ %hhx
with a bare argument.
Solution 4
another place it's handy is snprintf size check. gcc7 added size check when using snprintf so this will fail
char arr[4];
char x='r';
snprintf(arr,sizeof(arr),"%d",r);
so it forces you to use bigger char when using %d when formatting a char
here is a commit that shows those fixes instead of increasing the char array size they changed %d to %h. this also give more accurate description
Solution 5
I found it useful to avoid casting when formatting unsigned chars to hex:
sprintf_s(tmpBuf, 3, "%2.2hhx", *(CEKey + i));
It's a minor coding convenience, and looks cleaner than multiple casts (IMO).
R.. GitHub STOP HELPING ICE
If you appreciate my questions/answers on SO and can afford to, please support me on GitHub Sponsors.
Updated on July 05, 2022Comments
-
R.. GitHub STOP HELPING ICE almost 2 years
Aside from
%hn
and%hhn
(where theh
orhh
specifies the size of the pointed-to object), what is the point of theh
andhh
modifiers forprintf
format specifiers?Due to default promotions which are required by the standard to be applied for variadic functions, it is impossible to pass arguments of type
char
orshort
(or any signed/unsigned variants thereof) toprintf
.According to 7.19.6.1(7), the
h
modifier:Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
If the argument was actually of type
short
orunsigned short
, then promotion toint
followed by a conversion back toshort
orunsigned short
will yield the same value as promotion toint
without any conversion back. Thus, for arguments of typeshort
orunsigned short
,%d
,%u
, etc. should give identical results to%hd
,%hu
, etc. (and likewise forchar
types andhh
).As far as I can tell, the only situation where the
h
orhh
modifier could possibly be useful is when the argument passed it anint
outside the range ofshort
orunsigned short
, e.g.printf("%hu", 0x10000);
but my understanding is that passing the wrong type like this results in undefined behavior anyway, so that you could not expect it to print 0.
One real world case I've seen is code like this:
char c = 0xf0; printf("%hhx", c);
where the author expects it to print
f0
despite the implementation having a plainchar
type that's signed (in which case,printf("%x", c)
would printfffffff0
or similar). But is this expectation warranted?(Note: What's going on is that the original type was
char
, which gets promoted toint
and converted back tounsigned char
instead ofchar
, thus changing the value that gets printed. But does the standard specify this behavior, or is it an implementation detail that broken software might be relying on?) -
R.. GitHub STOP HELPING ICE over 13 yearsWhere do you get the thing about negative numbers being printed in their bit forms? As far as I can tell, passing a negative value for any unsigned format specifier (
%x
,%u
, or%o
) results in undefined behavior. Also, as far as I can tell, a conformant implementation can simply ignore the presence of anyh
orhh
modifier except with%n
. -
Adam Norberg over 13 yearsCasts between
(unsigned)
and(signed)
anything, within the same width are guaranteed to make no actual changes to the bit pattern of the data, just the interpretation of that bit pattern. (Casts that change the width are zero-extended or sign-extended as appropriate.)%x
is defined to work on unsigned values, so they are first cast from signed to unsigned, which changes no data but does change the interpretation- in effect, using%x
with a negative number shows you its bit pattern. And%x
is an integer type, and theh
modifier works over integer types, so I think it's supported. -
R.. GitHub STOP HELPING ICE over 13 yearsYour information is blatently incorrect. C defines conversions (implicit or cast) in terms of values, not bit patterns. Conversions to unsigned types are defined by the standard in a way that's equivalent to modular arithmetic. Conversions to signed types are implementation-defined except when the value fits in the destination type without modification.
-
Adam Norberg over 13 yearsWith regard to
h
on%x
, quoting from linux.die.net/man/3/printf , in reference to length modifiers: "Here, 'integer conversion' stands for d, i, o, u, x, or X conversion." So%x
and%X
are, at least in Linux, officially included in the scope of what theh
modifier can be formally attached to. -
R.. GitHub STOP HELPING ICE over 13 yearsDo you agree with my tentative assessment that a conformant implementation can ignore the
h
andhh
modifiers? -
R.. GitHub STOP HELPING ICE over 13 yearsOf course
%hx
is valid. This is specified by the standard. But%hx
requires anunsigned short
argument, which gets promoted to a positiveint
, which (by the requirements of the standard) has the same representation as the correspondingunsigned int
value. Thus, as far as I can tell,%x
should work just as well. -
Adam Norberg over 13 yearsActually, C is defined to perform negative-to-positive signed-to-unsigned conversions by adding
UINT_MAX
. You're quite correct that this just happens to do absolutely nothing to the number's bit pattern in a 2's complement computer. (Casts to a smaller unsigned type are implementation specific, but not to a same-size-or-larger type.) So my advice stands on, and only on, machines that use 2's complement for their integer arithmetic. Adjust your code if you're targeting one that doesn't. -
R.. GitHub STOP HELPING ICE over 13 yearsMy question is about the C language, not about whatever implementation. And "by adding
UINT_MAX
" is wrong. You forgot the +1, among other details. Once you fix it, it becomes equivalent to modular arithmetic. -
Adam Norberg over 13 yearsWhat's important is when the conversion happens. The
%hx
has no meaning to the compiler- all the compiler cares about is the upconversion fromshort
tosigned int
in the variadic parameter. So it may do a sign-extend that you don't want. Of course, this only applies if you passed a signed short and then tried to use it as though it were unsigned. Given how much abuseprintf
has been put to through the years, this is not an implausible case.%hx
shouldn't do anything when used strictly legally, but it seems safe to say that strict legality is unlikely. -
Michael Burr over 13 yearsI'm not sure - I'm not certain that this would result in undefined behavior:
printf("%hu", (unsigned int) 0x10000);
. I can imagine arguments both ways - I'd prefer that it was well-defined, but could see that the wording "Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument" throws this into undefined territory, though the immediately following "(the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing)" throws it back. -
Adam Norberg over 13 yearsYou're correct I dropped the +1; I'll fix it when I roll the update up to the answer. Anyway, I think we've pretty clearly worked out that the practical use of
%hx
is limited to whenprintf
is illegally used (to represent a signed argument as unsigned, which is usually taken to be safe but is only safe on a 2's complement machine; result is quite a lot of broken code in common libraries on non-2'c machines), which makes it inherently implementation-specific. Rationally, there's no particular use for it when the conversion already happened in a very narrowly legal range. -
R.. GitHub STOP HELPING ICE over 13 yearsIf
unsigned short
orunsigned char
gets promoted toint
, it's still positive, so C requires the representation to match the representation forunsigned
. As far as I know, signedness mismatch is valid in variadic function arguments and arguments to functions without prototypes as long as the value is positive as a signed value. Certainly%x
is intended to work withint
arguments as long as they're positive... -
caf over 13 years@R.: For general variadic functions, you're right - but for the specific case of the
printf
family, the standard givesunsigned int
as the type of the argument to%x
, and later says "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined." - which I don't believe allows you to pass anint
. -
R.. GitHub STOP HELPING ICE over 13 yearsInteresting. I suspect this is unintentional though. Perhaps I should look through the standard and see if there are any examples like
printf("%x", 1);
(which would need to be1U
instead of1
by your reasoning). -
R.. GitHub STOP HELPING ICE over 13 yearsThe point of my question was that unless you're passing the wrong types in ways that seems to result in undefined behavior anyway, the masking/conversion would be a no-op (in terms of value).
-
Steve Jessop over 13 yearsbased on that text, I think it would be reasonable for an implementation to "convert to short int or unsigned short int" using optimized code that assumes that the value it's converting is indeed the result of promotion as the standard says it is. Said optimized code could conceivably do something nonsensical with an out-of-range value, so there is at least a plausible claim to be made by the implementation that it should be undefined behavior, and that the code has breached a requirement of the standard.
-
supercat about 9 years@R..: I see nothing that would forbid an implementation from ignoring them. Even if they did nothing, however, including them in the spec would mean that a program which performed
printf("%hx",1u);
would have defined behavior; by contrast, without text specifying that "h" was a legal modifier such a program would be UB, would it not? -
R.. GitHub STOP HELPING ICE over 6 yearsInteresting. This looks like a workaround for a gcc bug thought. For level 1 of the
-Wformat-overflow
warning, gcc documents that it considers "Numeric arguments that are known to be bounded to a subrange of their type" which is always the case for promoted chars. But level 2 doesn't describe this behavior...? gcc.gnu.org/onlinedocs/gcc/Warning-Options.html -
rafi wiener over 6 yearsi work with libvma and we pushed this commit to compile with gcc7. i'm not sure what overflow level we used (i guess the default one)
-
M.M almost 4 yearswhat is the type of
CEkey
in this answer? The behaviour is undefined if it wasn'tunsigned char *
; or if it was, thehh
is redundant. -
12431234123412341234123 over 3 yearsIt is not UB, as long as the value is in the range of both,
int
andsigned int
, because this values can be used interchangeable. They specifically mention function calls. See footnote 31 in the C99 standard or footnote 41 in C11, in 6.2.5 Types. -
caf over 3 years@12431234123412341234123: This is what the previous comments are discussing. That is correct for variadic function calls in general, but the for the specific case of the
printf
functions there is specific overriding language (in C11 7.21.6.1 p9). To be sure this is quite a pedantic point, and as R. says above may not be intentional.