Can I determine type of an awk variable?
Solution 1
Awk has 4 types: "number", "string", "numeric string" and "undefined". Here is a function to detect that:
function o_class(obj, q, x, z) {
q = CONVFMT
CONVFMT = "% g"
split(" " obj "\1" obj, x, "\1")
x[1] = obj == x[1]
x[2] = obj == x[2]
x[3] = obj == 0
x[4] = obj "" == +obj
CONVFMT = q
z["0001"] = z["1101"] = z["1111"] = "number"
z["0100"] = z["0101"] = z["0111"] = "string"
z["1100"] = z["1110"] = "strnum"
z["0110"] = "undefined"
return z[x[1] x[2] x[3] x[4]]
}
For the third argument of split
, you need something that is not a space, and
not part of obj
or else it will be treated as a delimiter. I chose \1
based on Stéphane suggestion. The function does internal CONVFMT
toggling, so it should return the correct result regardless of CONVFMT
value
at the time of the function call:
split("12345.6", q); print 1, o_class(q[1])
CONVFMT = "%.5g"; split("12345.6", q); print 2, o_class(q[1])
split("nan", q); print 3, o_class(q[1])
CONVFMT = "%.6G"; split("nan", q); print 4, o_class(q[1])
Result:
1 strnum
2 strnum
3 strnum
4 strnum
Full test suite:
print 1, o_class(0)
print 2, o_class(1)
print 3, o_class(123456.7)
print 4, o_class(1234567.8)
print 5, o_class(+"inf")
print 6, o_class(+"nan")
print 7, o_class("")
print 8, o_class("0")
print 9, o_class("1")
print 10, o_class("inf")
print 11, o_class("nan")
split("00", q); print 12, o_class(q[1])
split("01", q); print 13, o_class(q[1])
split("nan", q); print 14, o_class(q[1])
split("12345.6", q); print 15, o_class(q[1])
print 16, o_class()
Result:
1 number
2 number
3 number
4 number
5 number
6 number
7 string
8 string
9 string
10 string
11 string
12 strnum
13 strnum
14 strnum
15 strnum
16 undefined
The notable weakness is: if you provide "numeric string" of any of the following, the function will incorrectly return "number":
- integer
inf
-inf
For integers, this is explained:
A numeric value that is exactly equal to the value of an integer shall be converted to a string by the equivalent of a call to the
sprintf
function with the string%d
as thefmt
argument
However inf
and -inf
behave this way as well; that is to say that none of
the above can be influenced by the CONVFMT
variable:
CONVFMT = "% g"
print "" .1
print "" (+"nan")
print "" 1
print "" (+"inf")
print "" (+"-inf")
Result:
0.1
nan
1
inf
-inf
In practice this doesn’t really matter, see the Duck test.
Solution 2
With gawk, PROCINFO["identifiers"]
is an array with information about variables. Use it like: PROCINFO["identifiers"]["your_variable_name"]
. The possible value returned is one of "array", "builtin", "extension", "scalar", "untyped", "user".
There is only a general scalar
, which includes both strings and numbers. The gawk
interpreter just tries its best with doing stuff.
There is a reason why sometimes you'll see a seemingly redundant variable + 0
somewhere, to ensure awk
treats the variable as a numeric one.
See this paragraph for some of the trickery with implicit conversions.
Solution 3
To clarify, only strings that are coming from a few sources (here quoting the POSIX spec):
- Field variables
- Input from the getline() function
- FILENAME
- ARGV array elements
- ENVIRON array elements
- Array elements created by the split() function
- A command line variable assignment
- Variable assignment from another numeric string variable
are to be considered a numeric string if their value happens to be numerical (allowing leading and trailing blanks, with variations between implementations in support for hex, octal, inf, nan...).
The "3.14"
literal string constant is a string, not strnum, because it doesn't come from one of those sources.
x = "3.14"; if (x == 3.14) print "yes"
prints yes, but that's because it's doing a lexical comparison (depending on the implementation, using memcmp()
, strcmp()
or strcollate()
) of 3.14
and the conversion to a string (via the CONVFMT
format string, %.6g
in gawk
and many other implementations) of that 3.14
number. That is, with that value of CONVFMT
, (x == 3.14)
is the same as (x == "3.14")
.
(x < 12)
would be false, because 3.14
sorts lexically after 12
(same as ("3.14" < "12")
). With CONVFMT = "%.6e"
, (x == 3.14)
would also return false because that becomes ("3.14" == "3.140000e+00")
.
On the other hand, in:
"echo \"3.1400 \"" | getline x
if (x == 3.14) print "yes"
if (x < 12) print "yes"
yes
is printed twice whatever the value of CONVFMT
, because a numerical comparison is performed. x
is a strnum because it comes from getline
and has a numeric value.
It still retains its string value though. print x
will print "3.1400 "
whatever the value of OFMT
or CONVFMT
.
And:
"echo 3.14 foo" | getline x
if (x == 3.14) print "yes"
Doesn't print yes
. x
comes from getline
but doesn't have a numeric value (because of the foo
). It is a normal string, as if you had written x = "3.14 foo"
. Still, you will be able to do numeric operations with it:
print x + 1
Will output 4.14
. Above, because it is involved in a numeric operation, the string is converted to a number by looking at the initial part (past the eventual blanks) that looks like a number at the start of a string.
So (x+0 == 3.14)
and (x+0 < 12)
will also return true. x+0
is numeric, so we've got a numeric comparison.
Note that inf
, nan
, Infinity
are not recognised as the floating point inf
or nan
special values as constants, but in several awk
implementations, you can use ("inf"+0)
instead.
Solution 4
From GNU Awk 4.2, there is a new function typeof()
to check this, as indicated in the release notes from the beta release:
- The new typeof() function can be used to indicate if a variable or array element is an array, regexp, string or number. The isarray() function is deprecated in favor of typeof().
So now you can say:
$ awk 'BEGIN {print typeof("a")}'
string
$ awk 'BEGIN {print typeof(1)}'
number
$ awk 'BEGIN {print typeof(a[1])}'
unassigned
$ awk 'BEGIN {a[1]=1; print typeof(a)}'
array
$ echo ' 1 ' | awk '{print typeof($0)}'
strnum
Related videos on Youtube
Utku
Updated on September 18, 2022Comments
-
Utku almost 2 years
I have the gawk version of awk. In this part of gawk manual, it is stated that awk variables have "attributes", which are used to determine how to treat them in various operations.
For example, a string that is of the form
" +3.14"
which is obtained by parsing the input has theSTRNUM
attribute, which makes it behave as a number in a comparison with a number, whereas the same string defined in an awk program does not have this attribute.OTOH, a string like
"3.14"
apparently hasSTRNUM
attribute, even if it was defined in the program because the codex = "3.14" { print x == 3.14 }
prints 1. Whereas if we define it as"+3.13"
or" 3.14"
, it does not haveSTRNUM
attribute since thex = "+3.14" { print x == 3.14 }
orx = " 3.14" { print x == 3.14 }
prints 0.I think that such succinctness in variable typing may cause subtle bugs. Hence, in order to aid in debugging such situations, is there a way to learn what type of "attributes" a variable has? I.e, can we learn what is the type of a variable?
-
Utku about 8 years@123 I know that unless I use arithmetic operators on it, then it will be treated as a string. But if I use arithmetic operators on it, it will be treated as a number, whereas this is not the case for such string manually defined in an awk program.
-
123 about 8 yearsIf you use arithmetic operators on any variable then it will be treated as a number no matter how it was defined.
-
-
Stéphane Chazelas about 7 yearsSorry for the confusion, I didn't say it was only a problem with the original awk implementation but with those implementations based on (derived from) it. That includes the awk of current versions of Solaris, FreeBSD, macOS and I suppose most other commercial Unices. Yes, that's a bug, but a widespread one with an easy work around. Note that gawk is almost as ancient (1986). The real awk still maintained by Brian Kernighan (the
k
inawk
) does incorporate features from gawk. -
Justin about 7 yearsAny way to have this check if the value is an array as well?
-
fedorqui over 6 years@Stéphane it is funny how
echo ' 1 ' | awk '{print typeof($0)}'
returns "strnum", whileawk 'BEGIN{print typeof(" 1 ")}'
return "string". Any hint on why is like this? -
Stéphane Chazelas over 6 yearsthat's the whole point of this Q&A. See my answer or Steven's