C++ vector size types

29,801

Solution 1

C++ is a language for library writing*, and allowing the author to be as general as possible is one of its key strengths. Rather than prescribing the standard containers to use any particular data type, the more general approach is to decree that each container expose a size_type member type. This allows for greater flexibility and genericity. For example, consider this generic code:

template <template <typename...> Container, typename T>
void doStuff(const Container<T> & c)
{
  typename Container<T>::size_type n = c.size();
  // ...
}

This code will work on any container template (that can be instantiated with a single argument), and we don't impose any unnecessary restrictions on the user of our code.

(In practice, most size types will resolve to std::size_t, which in turn is an unsigned type, usually unsigned int or unsigned long -- but why should we have to know that?)

*) I'm not sure what the corresponding statement for Java would be.

Solution 2

Java does not have unsigned integer types, so they have to go with int.

Contrarily, C++ does and uses them where appropriate (where negative values are nonsensical), the canonical example being the length of something like an array.

Solution 3

The C++ standard says that a container's size_type is an unsigned integral type, but it doesn't specify which one; one implementation might use unsigned int and another might use unsigned long, for example.

C++ isn't "shielded" from platform-specific implementation details as much as Java is. The size_type alias helps to shield your code from such details, so it'll work properly regardless of what actual type should be used to represent a vector's size.

Solution 4

The book you’re reading states that if you want to extract the size of a vector of type double (for example), you should do something like:

    vector<double>::size_type vector_size;
    vector_size = myVector.size();

Whereas in Java you might do

    int vector_size;
    vector_size = myVector.size();

Both are inferior options in C++. The first is extremely verbose and unsafe (mostly due to implicit promotions). The second is verbose and extremely unsafe (due to number range).

In C++, do

    ptrdiff_t const vectorSize = myVector.size();

Note that

  • ptrdiff_t, from the stddef.h header, is a signed type that is guaranteed large enough.

  • Initialization is done in the declaration (this is better C++ style).

  • The same naming convention has been applied to both variables.

In summary, doing the right thing is shorter and safer.

Cheers & hth.,

Solution 5

My personal feeling about this is that it is for a better code safety/readability.

For me int is a type which conveys no special meaning: it can number apples, bananas, or anything.

size_type, which is probably a typedef for size_t has a stronger meaning: it indicates a size, in bytes.

That is, it is easier to know what a variable mean. Of course, following this rationale, there could be a lot of different types for different units. But a "buffer size" is really a common case so it somehow deserves a dedicated type.

Another aspect is code maintability: if the container suddenly changes its size_type from say, uint64_t to unsigned int for instance, using size_type you don't have to change it in every source code relying on it.

Share:
29,801
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I just started learning C++ and have a question about vectors. The book I'm reading states that if I want to extract the size of a vector of type double (for example), I should do something like:

    vector<double>::size_type vector_size;
    vector_size = myVector.size();
    

    Whereas in Java I might do

    int vector_size;
    vector_size = myVector.size();
    

    My question is, why is there a type named vector::size_type? Why doesn't C++ just use int?

  • xanatos
    xanatos over 12 years
    @Kerrek Unless you are on a 64 bit system where std::size_t is probably a 64 bit type.
  • Kerrek SB
    Kerrek SB over 12 years
    @xanatos: that's true. Maybe we should dig up some of the common typedefs, out of curiosity.
  • Admin
    Admin over 12 years
    I'm not sure I understand why the first method is unsafe. What do you mean by implicit promotions?
  • James Kanze
    James Kanze over 12 years
    The reason why the standard uses size_t, rather than int, is because on some smaller machines, int wouldn't have been large enough (and going to long would have incurred a performance penalty). The use of an unsigned type here is generally recognized as a flaw, since unsigned types in C and C++ work in somewhat funny ways. (Basically, an expression like abs(i1 - i2) should correctly give the distance between two index types. It doesn't if i1 or i2 are unsigned.)
  • James Kanze
    James Kanze over 12 years
    Unless they've fixed it in C++11, ptrdiff_t is not guaranteed to be large enough. It's possible to have objects the size of which cannot be represented in a ptrdiff_t.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @James: there's not room enough here to discuss how an array of char might be too large in a 16-bit address space. the answer for a 3 GiB array of char in 32-bit address space is that it's not supported by the standard (ptrdiff_t is required to be large enough), so deal specially with it if you must have that beast. Personally, I have never encountered the 3 GiB array of char. :-)
  • James Kanze
    James Kanze over 12 years
    @AlfP.Steinbach I've never needed a 3 GiB array of char either, but I've used 32 bit implementations which supported them (Sun CC under Solaris, g++ under Linux on a 32 bit PC). The C standard definitely allows pointer subtraction to overflow (resulting in undefined behavior), and as far as I know, the C++ standard just passes the buck on to the C standard here.
  • Steve Jessop
    Steve Jessop over 12 years
    @James: there seems to me to be a dispute here between 5.7 and 18.2 in C++11. 5.7 says that overflow is possible, 18.2 says that ptrdiff_t "can hold the difference of two subscripts in an array object, as described in 5.7". This is different from the text in C89/C99, which just says that ptrdiff_t is the type of the result of pointer subtraction, not that it can in general hold the value. It reads to me as if whoever wrote it was expecting the issue to be resolved in C++11, but there's still a difference in language between 18.2/6 (big enough for any object) and /5 (an array object).
  • Steve Jessop
    Steve Jessop over 12 years
    They could have said explicitly in 18.2/5 that ptrdiff_t is big enough to hold the result of any valid pointer/index subtraction, and removed the text about overflow in 5.7. They chose not to. I don't see how this decision can be interpreted to mean anything other than that ptrdiff_t is not guaranteed big enough as James says, but it's possible that the committee has the non-obvious meaning in mind, or that there's text elsewhere that helps.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Steve: the only reasonable interpretation is that an means any, because otherwise it refers to some specific array object that might be one byte for all we know, and then the statement is meaningless (imposes no contraints and imparts no new information).
  • James Kanze
    James Kanze over 12 years
    @SteveJessop The difference between C and C++ is interesting; I would have expected the C++ standard to simply say that these types are defined as in C. I'd be inclined to agree with Alf, that the only reasonable interpretation of "in an array object" is "in any array object", except that §5.7/6 clearly and explicitly says the opposite. I sort of think that a DR is in order. (I also know that a certain number of the committee would strongly oppose any difference with C here. It would be highly undesirable that ptrdiff_t be different types in C and in C++.)
  • Steve Jessop
    Steve Jessop over 12 years
    @Alf: I'd agree, except that 18.2/5 immediately backs off from that by saying "as described in 5.7", which explicitly describes the possibility of ptrdiff_t being too small. When different language is used in adjacent paragraphs, I'm inclined to read that as significant, with different meaning intended, so in this case "an" != "any" . I would not even be surprised if the committee is divided on the issue, with some genuinely believing that they've fixed it and others genuinely believing that they've retained consistency with C. I suppose one would have to check the minutes.
  • Steve Jessop
    Steve Jessop over 12 years
    @James: certainly there's a flaw - either text in 5.7 is redundant (and is misleading you and me) since in fact ptrdiff_t is big enough and overflow is impossible, or else that text isn't redundant and 18.2/5 is misleading Alf since in fact it just means "this is the type of an offset in an array", not "this type can represents all offsets in all arrays". Since the answer to the question affects how people write quite simple code (specifically - a lot of people want to avoid unsigned types, so a signed type that holds sizes is very useful), it does matter.
  • James Kanze
    James Kanze over 12 years
    @SteveJessop To add to what has been said already: existing practice says that ptrdiff_t can overflow. It certainly could on all of the 16 bit machines I used, and it can with g++ under Linux and Sun CC under Solaris. (I suspect that in all such cases, the compiler authors simply thought that C++ was compatible with C here, and did what they'd done in C.)
  • Steve Jessop
    Steve Jessop over 12 years
    @James: since nobody claims to fully support C++11 yet anyway, I wonder if any of them has on their unimplemented feature lists, "stick a wrapper around system memory allocation to prevent it returning a block bigger than ptrdiff_t can hold", referencing the text of 18.2/5 as a new requirement.
  • James Kanze
    James Kanze over 12 years
    @SteveJessop I hadn't noticed that that §18.2 (which was §18.1) had changed in C++11; §18.1 in C++03 says about what I expected (i.e. it forwards to C). Since this is clearly a change, it makes me think that the intent was to require ptrdiff_t to be big enough to not overflow (which means making it long long on most 32 bit machines), and that nobody noticed the discrepancy in §5.7. I'll raise a DR tonight (if I remember).
  • Steve Jessop
    Steve Jessop over 12 years
    @James: But as you said earlier, it needs to be consistent with C. If you're following an ABI in which ptrdiff_t is already defined to be 32-bit on a 32-bit machine, you can't change it to long long (i.e. a breaking change in the C ABI) just for C++11. So if that was the intent of the author, it won't work. Instead you need to cripple malloc to prevent 2GB allocations (presuming that it previously allowed them), and come to think of it you need to cripple it in C too, to prevent C code from passing jumbo arrays to C++ code.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Steve: I think the C requirement of at least 17 bits ptrdiff_t (not 16 bits but 17, implied by required value range) indicates that in C the intent is for ptrdiff_t to be large enough, with resolution of the conflict on 16-bit architecture in favor of large ptrdiff_t or special compiler support, instead of crippling malloc.
  • James Kanze
    James Kanze over 12 years
    @SteveJessop That would be the other solution, but it would break existing code as well. (I've got one C++ program under Linux which does a new char[0x80000000]. That would break.) I don't think the vendors will go along with crippling malloc and new, and I don't think they'll go along with breaking their public API (and Posix, at least, uses ptrdiff_t in its API). Which means that whatever the committee really intended, nothing will actually change.
  • Steve Jessop
    Steve Jessop over 12 years
    @Alf: yes, I think it's only 32-bit architectures where there's confusion, arguably whoever designed the C ABIs should have done something different -- just because the C standard permits overflow doesn't mean they should necessarily have taken advantage of that to make ptrdiff_t 32 bits wide. On 64-bit architectures malloc is naturally crippled so the issue doesn't arise there either, ptrdiff_t and size_t can be the same size, at least until someone wants a 2^63 byte array backed by sparse storage.