What if a null character is present in the middle of a string?

11,055

Solution 1

The sizeof operator does not give you the length of a string but instead the size of the type of it's operand. Since in your code the operand is an array, sizeof is giving you the size of the array including both null characters.

If it were like this

const char *string = "This is a large text\0This is another string";
printf("%zu %zu\n", strlen(string), sizeof(string));

the result will be very different because string is a pointer and not an array.

Note: Use the "%zu" specifier for size_t which is what strlen() returns, and is the type of the value given by sizeof.

Solution 2

strlen() doesn't care about the actual size of the string. It looks for a null byte and stops when it sees the first null byte.

But sizeof() operator knows the total size. It doesn't care about what bytes you are in the string literal. You might as well have all null bytes in the string and sizeof() would still give the correct size of the array (strlen() would retrun 0 in that case).

They are not comparable; they do different things.

Solution 3

strlen() computes the length of the string. This is done by returning the amount of characters before (and not including) the '\0' character. (See the manual page below.)

sizeof() returns the amount of bytes of the given variable (or data-type). Note that your example "Hello\0Hi" has 9 characters. But you don't seem to understand where character 9 comes from in your question. Let me explain the given string first. Your example string is:

"Hello\0Hi"

This can be written as the following array:

['H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0']

Note the last '\0' character. When using the string quotes the compiler ends the string with an '\0' character. This means "" also is ['\0'] and thus has 1 element.

BEWARE that sizeof() does NOT return the number of elements in the array. It returns the amount of bytes. char is 1 byte and therefor sizeof() does returns the number of elements. But if you used any other datatype, for example if you would call sizeof() on [1, 2, 3, 4] it would return 16. Since int is 4 bytes and the array has 4 elements.

BEWARE that passing an array as parameter will only passes the pointer. If you would pass s to another function and call sizeof() it will return the size of the pointer, which is the same as sizeof(void *). This is a fixed length independent from the array.

STRLEN(3)                BSD Library Functions Manual                STRLEN(3)

NAME
     strlen, strnlen -- find length of string

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <string.h>

     size_t
     strlen(const char *s);

     size_t
     strnlen(const char *s, size_t maxlen);

DESCRIPTION
     The strlen() function computes the length of the string s.  The strnlen()
     function attempts to compute the length of s, but never scans beyond the
     first maxlen bytes of s.

RETURN VALUES
     The strlen() function returns the number of characters that precede the
     terminating NUL character.  The strnlen() function returns either the
     same result as strlen() or maxlen, whichever is smaller.

SEE ALSO
     string(3), wcslen(3), wcswidth(3)

STANDARDS
     The strlen() function conforms to ISO/IEC 9899:1990 (``ISO C90'').
     The strnlen() function conforms to IEEE Std 1003.1-2008 (``POSIX.1'').

BSD                            February 28, 2009                           BSD

Solution 4

If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing?

strlen only works for string (character array), whereas sizeof works for every data type. sizeof calculates the exact memory spaces for any given data type; whereas strlen provides the length of a string (NOT including the NULL terminator \0). So in normal cases, this is true for a typical character array s:

char s[] = "Hello";
strlen( s ) + 1  = sizeof( s ); // +1 for the \0

In your case it's different because you have a NULL terminator in the middle of character array s:

char s[] = "Hello\0Hi";

Here, strlen would detect the first \0 and gives the length as 5. The sizeof, however, will calculate the total number of spaces enough to hold the character arrays, including two \0, so that's why it gives 9 as the second output.

Solution 5

As name literal itself implies string literal is a sequence of characters enclosed in double quotes. Implicitly this sequence of characters is appended by a terminating zero.

So any character enclosed in the double quotes is a part of the string literal.

When a string literal is used to initialize a character array all its characters including the terminating zero serve as initializers of the corresponding elements of the character array.

Each string literal in turn has type of a character array.

For example this string literal "Hello\0Hi" in C has type char[9]: 8 characters enclosed in the quotes plus the implicit terminating zero.

So in memory this string literal is stored like

{ 'H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0' }

Operator sizeof returns the number of bytes occupied by an object. So for the string literal above the operator sizeof will return value 9- it is the number of bytes occupied by the literal in memory.

If you wrote "Hello\0Hi" then the compiler may not itself just remove this part Hi from the literal. It has to store it in memory along with other characters of the literal enclosed in quotes.

The sizeof operator returns the size in bytes of any object in C not only of character arrays.

In general character arrays can store any raw data for example some binary data read from a binary file. In this case this data is not considered by the user and by the program like strings and as result are processed differently than strings.

Standard C function strlen is specially written for character arrays that to find the length of a stored string in a character array. It does not know what data are stored in an array and how they were written in it. All what it does is searches the first zero character in a character array and returns the number of characters in the character array before the zero character.

You can store in one character array several strings sequentially. For example

char s[12];

strcpy( s, "Hello" );
strcpy( s + sizeof( "Hello" ), "World" );

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

If you would define a two dimensional array like this

char t[2][6] = { "Hello", "World" };

then in memory it will be stored the same way as the one-dimensional array above. So you can write

char *s = ( char * )t;

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

Another example. Standard C function strtok can split one string stored in a character array to several strings substituting the specified by the user delimiters with zero bytes. As result the character array will contain several strings.

For example

char s[] = "Hello World";

printf( "%zu\n", sizeof( s ) ); // outputs 12

strtok( s, " " );

puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"

printf( "%zu\n", sizeof( s ) ); // outputs 12

The last printf statement will output the same value equal to 12 because the array occupies the same number of bytes. Simply one byte in the memory allocated for the array was changed from ' ' to '\0'.

Share:
11,055
Ranjan Srinivas
Author by

Ranjan Srinivas

Updated on June 04, 2022

Comments

  • Ranjan Srinivas
    Ranjan Srinivas almost 2 years

    I understand that the end of a string is indicated by a null character, but i cannot understand the output of the following code.

    #include <stdio.h>
    #include <string.h>
    
    int
    main(void)
    {
        char s[] = "Hello\0Hi";
        printf("%d %d", strlen(s), sizeof(s));
    }
    

    OUTPUT: 5 9

    If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing? Even if it doesn't do the same thing, isn't '\0' A null character (i.e, only one character), so shouldn't the answer be 8?