C++ vector size types
Solution 1
C++ is a language for library writing*, and allowing the author to be as general as possible is one of its key strengths. Rather than prescribing the standard containers to use any particular data type, the more general approach is to decree that each container expose a size_type
member type. This allows for greater flexibility and genericity. For example, consider this generic code:
template <template <typename...> Container, typename T>
void doStuff(const Container<T> & c)
{
typename Container<T>::size_type n = c.size();
// ...
}
This code will work on any container template (that can be instantiated with a single argument), and we don't impose any unnecessary restrictions on the user of our code.
(In practice, most size types will resolve to std::size_t
, which in turn is an unsigned type, usually unsigned int
or unsigned long
-- but why should we have to know that?)
*) I'm not sure what the corresponding statement for Java would be.
Solution 2
Java does not have unsigned integer types, so they have to go with int
.
Contrarily, C++ does and uses them where appropriate (where negative values are nonsensical), the canonical example being the length of something like an array.
Solution 3
The C++ standard says that a container's size_type
is an unsigned integral type, but it doesn't specify which one; one implementation might use unsigned int
and another might use unsigned long
, for example.
C++ isn't "shielded" from platform-specific implementation details as much as Java is. The size_type
alias helps to shield your code from such details, so it'll work properly regardless of what actual type should be used to represent a vector's size.
Solution 4
The book you’re reading states that if you want to extract the size of a vector of type double (for example), you should do something like:
vector<double>::size_type vector_size;
vector_size = myVector.size();
Whereas in Java you might do
int vector_size;
vector_size = myVector.size();
Both are inferior options in C++. The first is extremely verbose and unsafe (mostly due to implicit promotions). The second is verbose and extremely unsafe (due to number range).
In C++, do
ptrdiff_t const vectorSize = myVector.size();
Note that
ptrdiff_t
, from thestddef.h
header, is a signed type that is guaranteed large enough.Initialization is done in the declaration (this is better C++ style).
The same naming convention has been applied to both variables.
In summary, doing the right thing is shorter and safer.
Cheers & hth.,
Solution 5
My personal feeling about this is that it is for a better code safety/readability.
For me int
is a type which conveys no special meaning: it can number apples, bananas, or anything.
size_type
, which is probably a typedef
for size_t
has a stronger meaning: it indicates a size, in bytes.
That is, it is easier to know what a variable mean. Of course, following this rationale, there could be a lot of different types for different units. But a "buffer size" is really a common case so it somehow deserves a dedicated type.
Another aspect is code maintability: if the container suddenly changes its size_type
from say, uint64_t
to unsigned int
for instance, using size_type
you don't have to change it in every source code relying on it.
Admin
Updated on July 09, 2022Comments
-
Admin almost 2 years
I just started learning C++ and have a question about vectors. The book I'm reading states that if I want to extract the size of a vector of type double (for example), I should do something like:
vector<double>::size_type vector_size; vector_size = myVector.size();
Whereas in Java I might do
int vector_size; vector_size = myVector.size();
My question is, why is there a type named vector::size_type? Why doesn't C++ just use int?
-
xanatos over 12 years@Kerrek Unless you are on a 64 bit system where
std::size_t
is probably a 64 bit type. -
Kerrek SB over 12 years@xanatos: that's true. Maybe we should dig up some of the common typedefs, out of curiosity.
-
Admin over 12 yearsI'm not sure I understand why the first method is unsafe. What do you mean by implicit promotions?
-
James Kanze over 12 yearsThe reason why the standard uses
size_t
, rather thanint
, is because on some smaller machines,int
wouldn't have been large enough (and going tolong
would have incurred a performance penalty). The use of an unsigned type here is generally recognized as a flaw, since unsigned types in C and C++ work in somewhat funny ways. (Basically, an expression likeabs(i1 - i2)
should correctly give the distance between two index types. It doesn't ifi1
ori2
are unsigned.) -
James Kanze over 12 yearsUnless they've fixed it in C++11,
ptrdiff_t
is not guaranteed to be large enough. It's possible to have objects the size of which cannot be represented in aptrdiff_t
. -
Cheers and hth. - Alf over 12 years@James: there's not room enough here to discuss how an array of
char
might be too large in a 16-bit address space. the answer for a 3 GiB array ofchar
in 32-bit address space is that it's not supported by the standard (ptrdiff_t
is required to be large enough), so deal specially with it if you must have that beast. Personally, I have never encountered the 3 GiB array ofchar
. :-) -
James Kanze over 12 years@AlfP.Steinbach I've never needed a 3 GiB array of
char
either, but I've used 32 bit implementations which supported them (Sun CC under Solaris, g++ under Linux on a 32 bit PC). The C standard definitely allows pointer subtraction to overflow (resulting in undefined behavior), and as far as I know, the C++ standard just passes the buck on to the C standard here. -
Steve Jessop over 12 years@James: there seems to me to be a dispute here between 5.7 and 18.2 in C++11. 5.7 says that overflow is possible, 18.2 says that
ptrdiff_t
"can hold the difference of two subscripts in an array object, as described in 5.7". This is different from the text in C89/C99, which just says thatptrdiff_t
is the type of the result of pointer subtraction, not that it can in general hold the value. It reads to me as if whoever wrote it was expecting the issue to be resolved in C++11, but there's still a difference in language between 18.2/6 (big enough for any object) and /5 (an array object). -
Steve Jessop over 12 yearsThey could have said explicitly in 18.2/5 that
ptrdiff_t
is big enough to hold the result of any valid pointer/index subtraction, and removed the text about overflow in 5.7. They chose not to. I don't see how this decision can be interpreted to mean anything other than thatptrdiff_t
is not guaranteed big enough as James says, but it's possible that the committee has the non-obvious meaning in mind, or that there's text elsewhere that helps. -
Cheers and hth. - Alf over 12 years@Steve: the only reasonable interpretation is that an means any, because otherwise it refers to some specific array object that might be one byte for all we know, and then the statement is meaningless (imposes no contraints and imparts no new information).
-
James Kanze over 12 years@SteveJessop The difference between C and C++ is interesting; I would have expected the C++ standard to simply say that these types are defined as in C. I'd be inclined to agree with Alf, that the only reasonable interpretation of "in an array object" is "in any array object", except that §5.7/6 clearly and explicitly says the opposite. I sort of think that a DR is in order. (I also know that a certain number of the committee would strongly oppose any difference with C here. It would be highly undesirable that
ptrdiff_t
be different types in C and in C++.) -
Steve Jessop over 12 years@Alf: I'd agree, except that 18.2/5 immediately backs off from that by saying "as described in 5.7", which explicitly describes the possibility of
ptrdiff_t
being too small. When different language is used in adjacent paragraphs, I'm inclined to read that as significant, with different meaning intended, so in this case "an" != "any" . I would not even be surprised if the committee is divided on the issue, with some genuinely believing that they've fixed it and others genuinely believing that they've retained consistency with C. I suppose one would have to check the minutes. -
Steve Jessop over 12 years@James: certainly there's a flaw - either text in 5.7 is redundant (and is misleading you and me) since in fact
ptrdiff_t
is big enough and overflow is impossible, or else that text isn't redundant and 18.2/5 is misleading Alf since in fact it just means "this is the type of an offset in an array", not "this type can represents all offsets in all arrays". Since the answer to the question affects how people write quite simple code (specifically - a lot of people want to avoid unsigned types, so a signed type that holds sizes is very useful), it does matter. -
James Kanze over 12 years@SteveJessop To add to what has been said already: existing practice says that
ptrdiff_t
can overflow. It certainly could on all of the 16 bit machines I used, and it can with g++ under Linux and Sun CC under Solaris. (I suspect that in all such cases, the compiler authors simply thought that C++ was compatible with C here, and did what they'd done in C.) -
Steve Jessop over 12 years@James: since nobody claims to fully support C++11 yet anyway, I wonder if any of them has on their unimplemented feature lists, "stick a wrapper around system memory allocation to prevent it returning a block bigger than ptrdiff_t can hold", referencing the text of 18.2/5 as a new requirement.
-
James Kanze over 12 years@SteveJessop I hadn't noticed that that §18.2 (which was §18.1) had changed in C++11; §18.1 in C++03 says about what I expected (i.e. it forwards to C). Since this is clearly a change, it makes me think that the intent was to require
ptrdiff_t
to be big enough to not overflow (which means making itlong long
on most 32 bit machines), and that nobody noticed the discrepancy in §5.7. I'll raise a DR tonight (if I remember). -
Steve Jessop over 12 years@James: But as you said earlier, it needs to be consistent with C. If you're following an ABI in which
ptrdiff_t
is already defined to be 32-bit on a 32-bit machine, you can't change it tolong long
(i.e. a breaking change in the C ABI) just for C++11. So if that was the intent of the author, it won't work. Instead you need to cripplemalloc
to prevent 2GB allocations (presuming that it previously allowed them), and come to think of it you need to cripple it in C too, to prevent C code from passing jumbo arrays to C++ code. -
Cheers and hth. - Alf over 12 years@Steve: I think the C requirement of at least 17 bits
ptrdiff_t
(not 16 bits but 17, implied by required value range) indicates that in C the intent is forptrdiff_t
to be large enough, with resolution of the conflict on 16-bit architecture in favor of largeptrdiff_t
or special compiler support, instead of cripplingmalloc
. -
James Kanze over 12 years@SteveJessop That would be the other solution, but it would break existing code as well. (I've got one C++ program under Linux which does a
new char[0x80000000]
. That would break.) I don't think the vendors will go along with cripplingmalloc
andnew
, and I don't think they'll go along with breaking their public API (and Posix, at least, usesptrdiff_t
in its API). Which means that whatever the committee really intended, nothing will actually change. -
Steve Jessop over 12 years@Alf: yes, I think it's only 32-bit architectures where there's confusion, arguably whoever designed the C ABIs should have done something different -- just because the C standard permits overflow doesn't mean they should necessarily have taken advantage of that to make
ptrdiff_t
32 bits wide. On 64-bit architecturesmalloc
is naturally crippled so the issue doesn't arise there either,ptrdiff_t
andsize_t
can be the same size, at least until someone wants a 2^63 byte array backed by sparse storage.