Warning: array subscript has type char

c codeblocks

79,522

Solution 1

Simple, change

char j;

unsigned char j;

or to just a plain (u)int

unsigned int j;
int j;

From GCC Warnings

-Wchar-subscripts Warn if an array subscript has type char. This is a common cause of error, as programmers often forget that this type is signed on some machines. This warning is enabled by -Wall.

The compiler doesn't want you to inadvertantly specify a negative array index. And hence the warning!

Solution 2

This is a typical case where GCC uses overly bureaucratic and indirect wording in its diagnostics, which makes it difficult to understand the real issue behind this useful warning.

// Bad code example
int demo(char ch, int *data) {
    return data[ch];
}

The root problem is that the C programming language defines several data types for "characters":

char can hold a "character from the basic execution character set" (which includes at least A-Z, a-z, 0-9 and several punctuation characters).
unsigned char can hold values from at least the range 0 to 255.
signed char can hold values from at least the range -127 to 127.

The C standard defines that the type char behaves in the same way as either signed char or unsigned char. Which of these types is actually chosen depends on the compiler and the operating system and must be documented by them.

When an element of an array is accessed by the arr[index] expression, GCC calls the index a subscript. In most situations, this array index is an unsigned integer. This is common programming style, and languages like Java or Go throw an exception if the array index is negative.

In C, out-of-bounds array indices are simply defined as invoking undefined behavior. The compiler cannot reject negative array indices in all cases since the following code is perfectly valid:

const char *hello = "hello, world";
const char *world = hello + 7;
char comma = world[-2];   // negative array index

There is one place in the C standard library that is difficult to use correctly, and that is the character classification functions from the header <ctype.h>, such as isspace. The expression isspace(ch) looks as if it would take a character as its argument:

isspace(' ');
isspace('!');
isspace('ä');

The first two cases are ok since the space and the exclamation mark come from the basic execution character set and are thus defined to be represented the same, no matter whether the compiler defines char as signed or as unsigned.

But the last case, the umlaut 'ä', is different. It typically lies outside the basic execution character set. In the character encoding ISO 8859-1, which was popular in the 1990s, the character 'ä' is represented like this:

unsigned char auml_unsigned = 'ä';   // == 228
signed   char auml_signed   = 'ä';   // == -28

Now imagine that the isspace function is implemented using an array:

static const int isspace_table[256] = {
    0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 0, 0, 1, 0, 0,
    // and so on
};
int isspace(int ch)
{
    return isspace_table[ch];
}

This implementation technique is typical.

Getting back to the call isspace('ä'), assuming that the compiler has defined char to be signed char and that the encoding is ISO 8859-1. When the function is called, the value of the character is -28, and this value is converted to an int, preserving the value.

This results in the expression isspace_table[-28], which accesses the table outside the bounds of the array. This invokes undefined behavior.

It is exactly this scenario that is described by the compiler warning.

The correct way to call the functions from the <ctype.h> header is either:

// Correct example: reading bytes from a file
int ch;
while ((ch = getchar()) != EOF) {
    isspace(ch);
}
// Correct example: checking the bytes of a string
const char *str = "hello, Ümläute";
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned char) str[i]);
}

There are also several ways that look very similar but are wrong.

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace(str[i]);   // WRONG: the cast to unsigned char is missing
}
// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((int) str[i]);   // WRONG: the cast must be to unsigned char
}

The above examples convert the character value -28 directly to the int value -28, thereby leading to a negative array index.

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned int) str[i]);   // WRONG: the cast must be to unsigned char
}

This example converts the character value -28 directly to unsigned int. Assuming a 32-bit platform with the usual two's complement integer representation, the value -28 is converted by repeatedly adding 2^32 until the value is in the range of unsigned int. In this case this results in the array index 4_294_967_268, which is much too large.

79,522

Author by

Rasmi Ranjan Nayak

Always ready for programming. Favorite Programming Languages C, C++, Python, Java, Android

Updated on July 09, 2022

Comments

Rasmi Ranjan Nayak 6 months

When I am running this program I am getting warning "array subscript has type 'char'". Please help me where is it going wrong. I am using code::blocks IDE

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
void NoFive()
{
    long long int cal;
    char alpha[25];
    char given[100] = "the quick brown fox jumped over the cow";
    int num[25];
    int i, k;
    char j;
    j = 'a';
    k = 26;
    cal = 1;
    for(i = 0; i <= 25; i++)
    {
        alpha[i] = j++;
        num[i] = k--;
      //  printf("%c = %d \n", alpha[i], num[i]);
    }
    for(i = 0; i <= (strlen(given) - 1); i++)
    {
        for(j = 0; j <= 25; j++)
        {
         if(given[i] == alpha[j]) ***//Warning array subscript has type char***
         {
            cal = cal * num [j]; ***//Warning array subscript has type char***
         }
         else
         {
         }
        }
    }
printf(" The value of cal is %I64u ", cal);
}
main()
{
NoFive();
}

alk over 10 years

Using an array index of type int does not lead to any warning, although it also would allow negative indexes ... @Pavan Manjunath
Pavan Manjunath over 10 years

@alk Ahh. It was a typo. I meant unsigned char than just unsigned. Anyways, I was just making the point of negative indexes. Nevertheless I edited my post to be clear to future visitors :)
supercat over 10 years

@alk: A couple of differences between int and char: (1) There aren't any compilers (at least none that wouldn't be considered even remotely "normal") where int might reasonably be expected to be unsigned; (2) Code which uses type char as an array subscript is more likely than code which uses type int, to assume that all character literals, or all the characters in string literals, represent positive values. I'm not certain if all characters in the "C character set" are required to be positive, but I know characters outside that set are not.
AlastairG about 9 years

I have got this warning with the following code: context->ptr[0] = (char)toupper(c); where "ptr" is of type "char *". Is the compiler thinking that 0 is a signed char and hence might be negative?
ad absurdum almost 3 years

"the type char is equivalent to either signed char or to unsigned char": char must behave as and have the same representation as either signed char or unsigned char, but char, signed char, and unsigned char are three distinct types in C. "negative array indices are simply defined as invoking undefined behavior.": array indexing with negative values is perfectly well-defined in C since arr[n] is equivalent to *(arr + n). One way this can lead to undefined behavior is if the pointer arithmetic leads to an out-of-bounds access.
Roland Illig over 2 years

Reported as a GCC bug
Roland Illig almost 2 years

Simply changing the type from char to int or unsigned int is wrong. Read any good manual about the <ctype.h> function to learn the details.
stefanct about 1 year

I think that blindly casting the parameters to unsigned char might introduce regressions if the remaining code relies on the respective ctype function to check for EOF. At least in some hypothetical cases, depending on the value resulting from the cast, EOF might be erroneously deemed as member of one the classes by the ctype functions. You are elegantly avoiding this possibility in your examples but I think it's a possible pitfall worth mentioning. Also, mentioning macro implementations of the ctype functions would make it clear why the compiler actually warns with this specific warning(?)