Confusion in unsigned keyword in C/C++

Question

Consider the following two C program. My question is in first program unsigned keyword prints -12 but I think it should print 4294967284 but it does not print it for %d specifier. It prints it for %u specifier. But if we look on second program, the output is 144 where it should be -112. Something is fishy about unsigned keyword which I am not getting. Any help friends!

#include <stdio.h>

int main()
{ unsigned int i = -12;
  printf(" i = %d\n",i);
  printf(" i = %u\n",i);
  return 0;
}

Above prorgam I got from this link : Assigning negative numbers to an unsigned int?

#include <stdio.h>

int main(void)
{unsigned char a=200, b=200, c;
  c = a+b;
 printf("result=%d\n",c);
 return 0;
}

score 12 · Accepted Answer · edited Jul 27 '14 at 09:06

Each printf format specifier requires an argument of some particular type. "%d" requires an argument of type int; "%u" requires an argument of type unsigned int. It is entirely your responsibility to pass arguments of the correct type.

unsigned int i = -12;

-12 is of type int. The initialization implicitly converts that value from int to unsigned int. The converted value (which is positive and very large) is stored in i. If int and unsigned int are 32 bits, the stored value will be 4294967284 (2³²-12).

printf(" i = %d\n",i);

i is of type unsigned int, but "%d" requires an int argument. The behavior is not defined by the C standard. Typically the value stored in i will be interpreted as if it had been stored in an int object. On most systems, the output will be i = -12 -- but you shouldn't depend on that.

printf(" i = %u\n",i);

This will correctly print the value of i (assuming the undefined behavior of the previous statement didn't mess things up).

For ordinary functions, assuming you call them correctly, arguments will often be implicitly converted to the declared type of the parameter, if such a conversion is available. For a variadic function like printf, which can take a varying number and type(s) of arguments, no such conversion can be done, because the compiler doesn't know what type is expected. Instead, arguments undergo the default argument promotions. An argument of a type narrow than int is promoted to int if int can hold all values of the type, or to unsigned int otherwise. An argument of type float is promoted to double (which is why "%f" works for both float and double arguments).

The rules are such an argument of a narrow unsigned type will often (but not always) be promoted to (signed) int.

unsigned char a=200, b=200, c;

Assuming 8-bit bytes, a and b are set to 200.

c = a+b;

The sum 400 is too bit to fit in an unsigned char. For unsigned arithmetic and conversion, out-of-range results are reduced to the range of the type. c is set to 144.

printf("result=%d\n",c);

The value of c is promoted to int; even though the argument is of an unsigned type, int is big enough to hold all possible values of the type. The output is result=144.

Thanks for the correct answer and elaborate explanation. – user3401108 Jul 27 '14 at 10:57 — user3401108, Jul 27 '14 at 10:57

score 6 · Answer 2 · answered Jul 27 '14 at 08:54

In the first program the behaviour is undefined. It's your responsibility to make sure that the format specifier matches the data type of the argument. The compiler emits code that assumes you got it right; at runtime it does not have to do any checks (and often, cannot do any checks even if it wanted to).

(For example, the library implementation printf function does not know what arguments you gave it , it only sees some bytes and it has to assume those are the bytes for the type that you specified using %d).

You appear to be trying to infer something unsigned means based on the output of a program with undefined behaviour. That won't work. Stick to well-defined programs (and preferably just read the definition of unsigned).

In a comment you say:

could give me any reference of unsigned keyword. Still concept is not getting cleared to me. Unsigned definition in C/C++ standard.

In the C99 standard read section 6.2.5, from part 6 onwards.

The definition of unsigned int is an integer type that can hold values from 0 up to a positive number UINT_MAX (which should be one less than a power of two), which must be at least 65535, and typically is 4294967295.

When you write unsigned int i = -12;, the compiler sees that -12 is outside of the range of permitted values for unsigned int, and it performs a conversion. The definition of that conversion is to add or subtract UINT_MAX+1 until the value is in range.

The second part of your question is unrelated to all this. There are no unsigned int in that program; only unsigned char.

In that program, 200 + 200 gives 400. As mentioned above, since this is out of range the compiler converts it by subtracting UCHAR_MAX+1 (i.e. 256) until it is in range. 400 - 256 = 144.

re "the compiler converts it by subtracting", that is effectively correct in this particular case, but still misleading. generally the compiler is not involved in enforcing modulo arithmetic. on some now extinct architectures (I do hope the Unisys is really extinct now) it would have to do so, but its job is limited to making sure it happens, usually by doing nothing, as required by the standard. — Cheers and hth. - Alf, Jul 27 '14 at 11:12
@Alf that's an excessively pedantic objection IMHO . (especially at the level of technicality that this question calls for) — M.M, Jul 27 '14 at 11:13
I hope you will come to see that an understanding of how things work is very important and work-saving for practitioners, and therefore important for us to communicate to beginners. — Cheers and hth. - Alf, Jul 27 '14 at 11:16

score -4 · Answer 3 · answered Jul 27 '14 at 08:12

-4

The %d and %u specifiers of printf have capability of (or, are responsible for) typecasting the input integer into int and unsigned int, respectively.

In fact printf (in general, any variadic functions) and arithmetic operators can accept only three types of arguments (except for the format string): 4-byte int, 8-byte long long and double (warning: very inaccurate description!) Any integral arguments whose size is less than int are extended into int. Any float arguments are extended into double. These rules improve uniformity of the input parameters of printf and arithmetic operators.

Regarding your 2nd example: the following steps take place

The + operator requires (unsigned) char operands to be extended into (unsigned) int values (which are 4-bytes integers in your case, I assume.)
The resulting sum is 400 of a 4-bytes unsigned int.
Only the least significant 1 byte of the above sum can fit into unsigned char c, so c has the value of 400 % 256 == 144.
printf requires all the smaller integral arguments to be expanded into int, thus what printf receives is 400 of a 4-bytes int.
The %d specifier prints the above argument as "400".

Google for "default argument promotion" for more details.

answered Jul 27 '14 at 08:12

nodakai

7,773
3
30
60

1

“The %d and %u specifiers of printf have capability of (or, are responsible for) typecasting the input […]”. IMO, that's quite misleading. The specifiers don't trigger any conversion, the compiler doesn't even look at them (yes, good compilers do so to generate warnings for wrong ones, but that's not my point) nor does it even have to know about the format string (it could be unknown at compile time). The only conversions taking place are the default argument promotions (just as you said after your first paragraph), independent of the format specifiers. – mafso Jul 27 '14 at 08:29
@mafso %d and %u are there to do their job (precisely speaking, for `printf` to do its job...) How do you describe their job? – nodakai Jul 27 '14 at 08:42
5

@nodakai their job is to retrieve data from the stack (or whatever), and interpret it as the representation of an `int`. If the caller did not supply an `int` then the behaviour is undefined (even if the representation they did supply turns out to be a valid representation for `int`). The printf function cannot perform any conversion because it has no idea what the data type is of the arguments that were supplied, it can only (at best) see the representation in memory of the arguments. – M.M Jul 27 '14 at 08:43
1

They describe the type of the input. They aren't responsible for anything, _you_ as the caller of `printf` are responsible for giving types according to them. According to the standard, the first example is UB (the types don't correspond to the format string). – mafso Jul 27 '14 at 08:44
2

"*warning: very inaccurate description!*" -- I agree. -1 – Keith Thompson Jul 27 '14 at 08:51
To those downvoted my answer: I would be grateful if you folks could give an accurate and concise explanation (unlike mine) to what happens under the hood of `printf`. At least I *tried* to distinguish the notion of C `signed int`, C `unsigned int` and the 4-byte integer on the stack understood by CPU (from CPU's point of view, there's no signedness for data; only opcodes have,) though I'm not sure I was successful in doing so, given my (non)fluency of English. – nodakai Jul 27 '14 at 09:00
Moving onto the list of steps: IDK what you mean by "(`unsigned`) `int`". The `unsigned char` is promoted to `int`. (Not `unsigned int` -- unless we are on those rare beasts where `sizeof(int) == 1`). The resulting sum is the signed int `400`. This promotion is called the *usual arithmetic conversions*. The *default argument promotion* refers to step 4. – M.M Jul 27 '14 at 09:01
3

@nodakai: Why do you assume that `int` is 4 bytes? In `a + b`, the operands are promoted to `int`, not to `unsigned int`; the promotion rules are value-preserving, not signedness-preserving. `printf` does not "typecast" (I presume you mean "convert") its arguments based on its format string; it *assumes* that the (possibly promoted) arguments are already of the correct type. – Keith Thompson Jul 27 '14 at 09:05
@MattMcNabb @KeithThompson point taken, thanks so much! I've never correctly remember all the promotion rules of C when I'm not in front of my dev PC. But I don't think that totally invalidates my explanation above. Btw I assumed `int` is 4 bytes because it is almost always so on machines on which `printf` can run (I mean, if the OP is gonna work with esoteric embedded CPUs, he will not be able to access to `printf`, after all.) – nodakai Jul 27 '14 at 09:18
@nodakai the main problem in your post is the claim about printf "having capability or being responsible for typecasting". It is solely the programmer who has the capability and is responsible for that. `printf` just sees some bytes. – M.M Jul 27 '14 at 09:20
@MattMcNabb I'm sure you can agree that `printf` has to pop a 4-byte integer from the stack and somehow decide to render it as a C `signed int` or C `unsigned int`. The decision is made according whether the format specifier is %d or %u (forget about %f etc. for now.) So you think it was too much abuse of the word "typecast" to refer to this step? The decision has to be made independent of whether the 4-byte integer on the stack originates from C `signed int` or C `unsigned int`. (woops my brower didn't reload correclty.) – nodakai Jul 27 '14 at 09:33
@nodakai it might have originated from neither of those. A typecast is the construct `(typename) value` . Even if you meant "converting" rather than "typecast", that is still wrong. It looks somewhere, grabs bytes, and assumes *they are already* the bytes for whatever the format specifier says. No conversion applies. Also,t here might not even be a stack. Perhaps the system uses one register for passing signed ints, and a different register for unsigned ints. Then it will not even get any of the bytes that were passed. – M.M Jul 27 '14 at 10:45
The size of `int` is `sizeof (int)` bytes; there's no good reason to assume that `sizeof (int) == 4`. A "cast" is an explicit conversion, which clearly is not happening here. The word ["typecast"](http://en.wikipedia.org/wiki/Typecasting_(acting)) is not used by the C standard. There is no conversion; the behavior is undefined, and it typically behaves as I've already described. You're assuming that arguments are popped from the "stack"; the C standard doesn't use that word, and not all C implementations use a stack. (Passing arguments in registers is common.) – Keith Thompson Jul 27 '14 at 18:46
If you call `printf` with an argument that's inconsistent with the format string, then `printf` has no responsibility at all. – Keith Thompson Jul 27 '14 at 18:46
@MattMcNabb The implementation's choice must tolerate passing a `signed int` in `...` and reading it out with `va_arg` using `unsigned int`, or vice versa, if the value is representable in both types (this is allowed by `va_arg`'s specification). However, the `*printf` family of functions has a special rule that says the type must exactly match. – T.C. Jul 27 '14 at 19:31
@TC yeah, so the printf family can have a different implementation to user-defined variadic functions, e.g. perform some optimization for simple calls that don't go via a v*printf function – M.M Jul 27 '14 at 21:18

Confusion in unsigned keyword in C/C++

3 Answers3