How to turn a character array into uint8_t

20,943

Solution 1

If you're on an architecture where uint8_t is a typedef to unsigned char (you most likely are), then simply take the first char and cast it to uint8_t:

length = (uint8_t)(message_received[0]);

It should work.

Solution 2

A char is exactly one byte (per definition of the C standard). Unless bytes aren't exactly 8 bits on your system (such systems exist, but I bet you've never used or even seen one), uint8_t and char are exactly the same data type.

char c = 5;
uint8_t u = c;

And if you can do something like that with a data type, then you can just cast pointers as you wish between these two data types:

 char c[] = { 'H', 'e', 'l', 'l', 'o' };
 uint8_t * u = (uint8_t *)c;
 uint8_t x = u[1];
 // x is 101, which is the ASCII char code of 'e'

Actually you can even do that with strings, as a string is also just an array of characters, just one that is NUL terminated.

 char * c = "Hello"; 
 // "Hello" is in fact just { 'H', 'e', 'l', 'l', 'o', '\0' }
 uint8_t * u = (uint8_t *)c;
 uint8_t x = u[1];
 // x is 101, which is the ASCII char code of 'e'

The only thing you need to be careful is that the C standard does not define if char is signed or unsigned. Unlike integer types that are signed by default and only unsigned if you request so (long vs unsigned long for example), a char may be signed or unsigned by default. So if you need either one, you must use signed char or unsigned char as a data type. In practice that plays no role unless you perform certain math or logic operations on char values (which you probably shouldn't do in modern C code to begin with).

And since there is no way that your message can be bigger than 256 characters (as otherwise the length would not fit into uint8_t) and the length is always exactly one byte, I'd write the code as follows:

uint8_t messageLength = 0;
ssize_t bytesRead = recv(clients_sd, &messageLength, 1, 0);
if (bytesRead == -1) {
     // Handle read error
}
if (bytesRead == 0) {
     // Handle end of stream
}

char message[256];
bytesRead = recv(clients_sd, message, messageLength, 0);
if (bytesRead == -1) {
     // Handle read error
}
if (bytesRead == 0) {
     // Handle end of stream
}
if (bytesRead < messageLength) {
     // Handle truncated message (message too small)
}

And once more, as apparently some people fail to understand my second sentence already: Everything I wrote above it under the assumption, that bytes on your system are 8 bits long. This code is not portable to platforms where bytes have more or less than 8 bits, that's already what my second sentence clearly points out, but that doesn't make my reply wrong. If it wasn't allowed to write C code that is not portable to all existing platforms on earth, 90% of all existing C code would be forbidden. You know the platform you are working with and you know the platforms you are targeting with your app, so it's your responsibility to make sure the code is correct for all these platforms.

Share:
20,943
fatalError
Author by

fatalError

Trying to become a better developer

Updated on July 09, 2022

Comments

  • fatalError
    fatalError almost 2 years

    I have a server-client application that I'm working on that basically simulates a chat room. This is an assignment for school and the protocol specifications are somewhat strict.

    I have a char array which will store all messages from a client.

    The client must first send the length of the message as a uint8_t and then the message itself as a char array.

    My problem is I need to store the uint8_t value that is sent before the actual message is sent but I can only use the message array to store any information coming from the client.

    If I'm not mistaken the char array will not store the uint8_t that gets sent over unless I cast it somehow.

    How can I convert the uint8_t to characters and back to uint8_t?

    I've tried looking for a similar problem on here but couldn't find an example.

    server.c

    char msg[100];
    recv(clients_sd, msg, sizeof(msg), 0);
    uint8_t len; /* store the length of the message here */
    char message_received[len];
    recv(clients_sd, message_received, sizeof(message_received), 0); /* get and store the message here */
    

    client.c

    uint8_t length = 21;
    char clients_message[] = "Hi how are you today?";
    send(servers_sd, &length, sizeof(length), 0);
    send(serers_sd, &clients_message, sizeof(clients_message), 0);
    
  • fatalError
    fatalError over 7 years
    I'll give that a shot. Thanks!
  • 2501
    2501 over 7 years
    uint8_t isn't guaranteed to be a synonym for a character type. Thus using is to alias all types, as character types may be, is not portable.
  • Mecki
    Mecki over 7 years
    @2501 Please explain which part of "Unless bytes aren't exactly 8 bits on your system (such systems exist, but I bet you've never used or even seen one), uint8_t and char are exactly the same data type." exactly didn't you understand? You do know the meaning of the word unless? Please stop putting words in my mouth that I have never said. And please also refrain from commenting answers that you have clearly not read.
  • Mecki
    Mecki over 7 years
    @2501 A char is always exactly a byte, the C standard demands that! And if a byte has 8 bits on your system (and I explicitly warned that this doesn't have to be the case!!!), then uint8_t is also a byte. Please explain to the world, how a byte cannot be a byte, as the C standard also demands that if a native 8 bit type exists, uint8_t must be that type and if a char is 8 bits, then such a native type must exist. So your comment doesn't even make sense.
  • 2501
    2501 over 7 years
    I'm not putting any "words in your mouth". Let's put that aside as it isn't relevant. Let me quote you: uint8_t and char are exactly the same data type. And I have responded to this: uint8_t isn't guaranteed to be a synonym for a character type. Thus they are not the same type. Standard allows that uint8_t is not a synonym for a character type, regardless of how many bits per byte are there. This is possible because of extended integer types. Please see: 7.20 @4 of the current standard. (Also please refrain from ad-hominem if you wish to have a debate with me.)
  • 2501
    2501 over 7 years
    The comment i have made should have included your assumption of 8 bits per byte. This wasn't the issue with my argument as the conclusion is the same. C standard allows uintN_t types to be defined as extended integer types which are distinct from character types (7.20 §4 of the current standard). This is true regardless of how many bits per byte there are. Therefore an implementation may define 8 bits per byte and uint8_t as an extended integer type, which is a distinct type.
  • 2501
    2501 over 7 years
    I'm not harassing you. That accusation is blatantly false. Your interpretation of my intent is incorrect. I don't care who posts content, my only intent is to guarantee correct information on this site. Posting constructive comments, aimed exclusively at the content of the posts and not the user, is one of those ways. My comment history with you and the fact that I criticized many other posts proves this. You by freely publishing content on this site, understanding that it will be read by the public, open that content to scrutiny.
  • Mecki
    Mecki over 7 years
    @2501 Therefore an implementation may define 8 bits per byte and uint8_t as an extended integer type, which is a distinct type. Distinct C type, never said it isn't, but that's irrelevant, as same in memory type. ISO-C Standard, 7.20.1.1 The typedef name intN_t designates a signed integer type with width N , no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits. Which part of "exactly 8 bits" and "no padding" didn't you understand? If a system has [u]int8_t as type (optional!), it is exactly a byte.
  • 2501
    2501 over 7 years
    Yes it's size is exactly a byte. But the type also matters, especially with pointer conversions, exactly those showcased in your answer. It can be defined as a diifferent type than char, signed char, or unsigned char. This is important because of strict alising, violating which caused undefined behavior.
  • Mecki
    Mecki over 7 years
    @2501 In that case it does not matter. uint8_t is 8 bit, exactly, with no padding (7.20.1.1 P1). If a platform cannot provide that, it must not offer uint8_t as data type at all (7.20.1.1 P3). So I can rely that uint8_t[10] are 80 continues bits in memory and every 8 of them make up an int value. A char is always exactly one byte (3.6 to 3.7.1), a doesn't have to be 8 bits, but if bytes are 8 bits, then a char must be 8 bits. Casting 8 bits to 8 bits is safe, casting a mem ptr to an 8 bit array to a mem ptr to an 8 bit array is safe, if both are unpadded (guaranteed by standard).
  • 2501
    2501 over 7 years
    I agree with that, except for: Casting 8 bits to 8 bits is safe. This is not correct in the sense that that pointer is then used to access the object (type punning). There are exceptions for character types, but not for extended integer types. The current C standard has aliasing rules, which must be followed, otherwise you get UB. You can find material on this if you use the keyword: strict alising and under heading 6.5 of the current standard. Research this, you will be surprised. Bye.
  • Mecki
    Mecki over 7 years
    @2501 Casting 8 bits to 8 bits here is still safe, also under everything that 6.5 says, as like I already pointed out: Both data types are guaranteed to have no padding and both data types are guaranteed to have the same in-memory alignment (as long as bytes are 8 bit; otherwise both might be untrue). Char may have a different bit encoding, so char c = 'e'; uint8_t x = *(uint8_t *)c; does not strictly guarantee that x is 101, it could also be an entirely different number, but that doesn't make the cast wrong. Doing that cast on a 7- or 9-bit byte platform would be very wrong, though.
  • 2501
    2501 over 7 years
    The aliasing rules are under 6.5 §7 and there are no exceptions for uinN_t types. A character type may always alias any type. Under the assumption that uin8_t is not a synonym for a character type (which is permitted by C standard) the type uint8_t may not alias a character type. Doing so will cause ub. You should really research strict alising.
  • Mecki
    Mecki over 7 years
    @2501 Sorry, but what you write is totally unrelated to my answer and totally unrelated to my comments. Of course a char pointer can alias any type as a char pointer is a byte pointer and any data type is made out of bytes and bytes are the smallest data entity in C. But which part of a type compatible with the effective type of the object didn't you understand? As on a system with 8-bits byte, uint8_t IS such a compatible type. Please refrain from quoting standard, as apparently you are not very good at reading them and I'm sick to always have to proof you wrong.
  • 2501
    2501 over 7 years
    I understand it completely. However you're ignoring my assumption which says: there are 8 bits per byte and uint8_t is defined as an extended integer type. If you didn't know, even though I have mentioned it twice already, any uintN_t type may be defined as an extended integer type. Yes an implementation may choose this for performance reasons, even with 8 bits per byte. This is permitted under rule 7.20 @4 of the current standard. Extended integer types are distinct types from character types. In other words uint8_t is not the same type as any character type.
  • 2501
    2501 over 7 years
    Twp types may alias only if their types are compatible (6.5 §7). Quote from 6.2.7 §1: Two types have compatible type if their types are the same. I have established that uint8_t and any character type are not the same type. There are also additional rules under 6.5 §7, but none allow the type uint8_t to alias any character type. Please note that a character type may of course alias any type. But not vice-versa. This is relevant because your example in the answer presents a case where the type uint8_t aliases a type char.
  • 2501
    2501 over 7 years
    Under my assumptions this causes undefined behavior when the type uint8_t is used to access a char type. Undefined behavior happens because behavior which would define access of a char type through a uint8_t type, also called aliasing, is not defined.
  • 2501
    2501 over 7 years
    I'm certain that any reader approaching this debate with an open mind will understand or at least try to understand my conclusion and learn at the same time. I'm ending this debate since I have established my position sufficiently, and I don't see that any more progress can be made. Let me remind you again to not use ad-homined attacks if you wish to have a debate with me in the future. Have a nice day.
  • Mecki
    Mecki over 7 years
    @2501 Everything you say is correct for [u]int_least8_t and similar types, but not for [u]intX_t. [u]intX_t must not have padding (the standard is very clear about that, no doubt possible) and thus are by definition never bigger than what they claim to be (not for performance reasons, not for system limitations, not because the moon turned green). Anything else you wrote is totally irrelevant. SO is all about being helpful, which you are not at all to anyone here. You argue about how green the grass is and that based on a standard which you clearly don't understand.
  • Mecki
    Mecki over 7 years
    @2501 On a 8-bit byte system, uint8_t can alias char because it is a "compatible type". I never claimed this works on all systems. And no, the C standard doesn't say that anywhere because it doesn't have to. The C standard says any compatible type may alias any concrete type and uint8_t is a compatible type on a 8-bit byte system (not in general, only on that system and as that standard doesn't know or care for your system, it won't point out these system specific facts, but that doesn't make them wrong). Saying "compatible type" is all the standard has to say here.
  • Mecki
    Mecki over 7 years
    @2501 Two data types which have the same bit width, the same alignment and the same padding are of course compatible and can always alias each other. Everything else was like questioning the existing of gravity. The size/alignment/padding of uint8_t is clear and the same is true for char. This triple doesn't have to be the same for both, definitely not, but if bytes are 8 bits on your system, then yes, then it must be the same, there is no doubt, even if you think there is.