Scalar vs. primitive data type - are they the same thing?

programming-languages types terminology primitive-types scalar

65,885

Solution 1

I don't think they're interchangeable. They are frequently similar, but differences do exist, and seems to mainly be in what they are contrasted with and what is relevant in context.

Scalars are typically contrasted with compounds, such as arrays, maps, sets, structs, etc. A scalar is a "single" value - integer, boolean, perhaps a string - while a compound is made up of multiple scalars (and possibly references to other compounds). "Scalar" is used in contexts where the relevant distinction is between single/simple/atomic values and compound values.

Primitive types, however, are contrasted with e.g. reference types, and are used when the relevant distinction is "Is this directly a value, or is it a reference to something that contains the real value?", as in Java's primitive types vs. references. I see this as a somewhat lower-level distinction than scalar/compound, but not quite.

It really depends on context (and frequently what language family is being discussed). To take one, possibly pathological, example: strings. In C, a string is a compound (an array of characters), while in Perl, a string is a scalar. In Java, a string is an object (or reference type). In Python, everything is (conceptually) an object/reference type, including strings (and numbers).

Solution 2

There's a lot of confusion and misuse of these terms. Often one is used to mean another. Here is what those terms actually mean.

"Native" refers to types that are built into to the language, as opposed to being provided by a library (even a standard library), regardless of how they're implemented. Perl strings are part of the Perl language, so they are native in Perl. C provides string semantics over pointers to chars using a library, so pointer to char is native, but strings are not.

"Atomic" refers to a type that can no longer be decomposed. It is the opposite of "composite". Composites can be decomposed into a combination of atomic values or other composites. Native integers and floating point numbers are atomic. Fractions, complex numbers, containers/collections, and strings are composite.

"Scalar" -- and this is the one that confuses most people -- refers to values that can express scale (hence the name), such as size, volume, counts, etc. Integers, floating point numbers, and fractions are scalars. Complex numbers, booleans, and strings are NOT scalars. Something that is atomic is not necessarily scalar and something that is scalar is not necessarily atomic. Scalars can be native or provided by libraries.

Some types have odd classifications. BigNumber types, usually implemented as an array of digits or integers, are scalars, but they're technically not atomic. They can appear to be atomic if the implementation is hidden and you can't access the internal components. But the components are only hidden, so the atomicity is an illusion. They're almost invariably provided in libraries, so they're not native, but they could be. In the Mathematica programming language, for example, big numbers are native and, since there's no way for a Mathematica program to decompose them into their building blocks, they're also atomic in that context, despite the fact that they're composites under the covers (where you're no longer in the world of the Mathematica language).

These definitions are independent of the language being used.

Solution 3

Put simply, it would appear that a 'scalar' type refers to a single item, as opposed to a composite or collection. So scalars include both primitive values as well as things like an enum value.

http://ee.hawaii.edu/~tep/EE160/Book/chap5/section2.1.3.html

Perhaps the 'scalar' term may be a throwback to C:

where scalars are primitive objects which contain a single value and are not composed of other C++ objects

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1995/N0774.pdf

I'm curious about whether this refers to whether these items would have a value of 'scale'? - Such as counting numbers.

Solution 4

I like Scott Langeberg's answer because it is concise and backed by authoritative links. I would up-vote Scott's answer if I could.

I suppose that "primitive" data type could be considered primary data type so that secondary data types are derived from primary data types. The derivation is through combining, such as a C++ struct. A struct can be used to combine data types (such as and int and a char) to get a secondary data type. The struct-defined data type is always a secondary data type. Primary data types are not derived from anything, rather they are a given in the programming language.

I have a parallel to primitive being the nomenclature meaning primary. That parallel is "regular expression". I think the nomenclature "regular" can be understood as "regulating". Thus you have an expression that regulates the search.

Scalar etymology (http://www.etymonline.com/index.php?allowed_in_frame=0&search=scalar&searchmode=none) means ladder-like. I think the way this relates to programming is that a ladder has only one dimension: How many rungs from the end of the ladder. A scalar data type has only one dimension, thus represented by a single value.

I think in usage, primitive and scalar are interchangeable. Is there any example of a primitive that is not scalar, or of a scalar that is not primitive?

Although interchangeable, primitive refers to the data-type being a basic building block of other data types, and a primitive is not composed of other data types.

Scalar refers to its having a single value. Scalar contrasts with the mathematical vector. A vector is not represented by a single value because (using one kind of vector as an example) one value is needed to represent the vector's direction and another value needed to represent the vector's magnitude.

Reference links: http://whatis.techtarget.com/definition/primitive http://en.wikipedia.org/wiki/Primitive_data_type

View more solutions

65,885

Ben Pearson

Currently an Android software engineer at RdyDev.

Updated on March 02, 2022

Comments

Ben Pearson about 2 years

In various articles I have read, there are sometimes references to primitive data types and sometimes there are references to scalars.

My understanding of each is that they are data types of something simple like an int, boolean, char, etc.

Is there something I am missing that means you should use particular terminology or are the terms simply interchangeable? The Wikipedia pages for each one doesn't show anything obvious.

If the terms are simply interchangeable, which is the preferred one?
Joe Bowbeer almost 8 years

Also to be considered in a discussion of reference types and primitive types are "value" types. Regarding the equivalence of scalars and primitives, it depends on the language. According to the PHP manual, for example, only half of its primitive types are scalars: php.net/manual/en/language.types.intro.php
Bert almost 8 years

I was taught (a very long time ago in school) that the term was derived from 'scalar processor' in contrast to a 'vector processor'. A scalar processor is a CPU that can only handle one piece of data at a time. These processors were/are named after the arithmetic terms. Interestingly enough, when you look up 'scalar' on wikipedia, you get redirected to 'variable'.
lleaff about 7 years

Although this definition of a scalar type makes the most sense to me, this doesn't seem to be the most commonly accepted one.
clockworkpc almost 4 years

Thanks for a clear definition of 'Scalar'. Even though, as @lleaff points out most people do not use it in this specific sense, it would be better if they did.
Jerry almost 4 years

Excellent linguistic definitions. This answer should be read along with Michael Ekstrand's answer for a fuller discussion. In the context of programming languages, scalar has different meanings, unfortunately.
snnsnn about 3 years

I think reference types also represent composite value since it has memory address and the data type. C strings are also compound because they use pointers. Scalar implies magnitude so contrasting them with compounds feels like unintuitive or simply wrong. Also booleans do not signify magnitude so they are not scalar. It appears programmers name things without paying attention to its meaning or implications.
Michael Ekstrand almost 3 years

@snnsnn In a statically-typed language, the data type is not stored with the memory address - only the address is stored. In dynamically-typed languages, the pointer usually points to some kind of language structure that includes data types; however, while the implementation may be compound, its semantics for the implemented language may be "scalar". And my purpose here is to describe terms as they are actually used to describe PL semantics; Perl uses scalar as discussed here. I believe R also does. Many PLs don't use the term.
snnsnn almost 3 years

@MichaelEkstrand My bad, what I meant was address and the data.
Arnel Enero about 2 years

In JavaScript, for example, string is a primitive. But in the general sense, regardless of language, string is not a scalar because (1) it has no finite range of values and (2) you cannot compare a string as greater or less than another string.
Arnel Enero about 2 years

Since you asked about an example of primitive that is non-scalar... how about string in JavaScript. See my answer for the rationale behind this.