What are the rules for valid identifiers (e.g. functions, vars, etc) in Bash?
Solution 1
From the manual:
Shell Function Definitions
...
name () compound-command [redirection]
function name [()] compound-command [redirection]
name
is defined elsewhere:
name A word consisting only of alphanumeric characters and under‐
scores, and beginning with an alphabetic character or an under‐
score. Also referred to as an identifier.
So hyphens are not valid. And yet, on my system, they do work...
$ bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
Solution 2
Command identifiers and variable names have different syntaxes. A variable name is restricted to alphanumeric characters and underscore, not starting with a digit. A command name, on the other hand, can be just about anything which doesn't contain bash metacharacters (and even then, they can be quoted).
In bash, function names can be command names, as long as they would be parsed as a WORD without quotes. (Except that, for some reason, they cannot be integers.) However, that is a bash extension. If the target machine is using some other shell (such as dash), it might not work, since the Posix standard shell grammar only allows "NAME" in the function definition form (and also prohibits the use of reserved words).
Solution 3
The question was about "the rules", which has been answered two different ways, each correct in some sense, depending on what you want to call "the rules". Just to flesh out @rici's point that you can shove about any character in a function name, I wrote a small bash script to try to check every possible (0-255) character as a function name, as well as as the second character of a function name:
#!/bin/bash
ASCII=( nul soh stx etx eot enq ack bel bs tab nl vt np cr so si dle \
dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us sp )
for((i=33; i < 127; ++i)); do
printf -v Hex "%x" $i
printf -v Chr "\x$Hex"
ASCII[$i]="$Chr"
done
ASCII[127]=del
for((i=128; i < 256; ++i)); do
ASCII[$i]=$(printf "0X%x" $i)
done
# ASCII table is now defined
function Test(){
Illegal=""
for((i=1; i <= 255; ++i)); do
Name="$(printf \\$(printf '%03o' $i))"
eval "function $1$Name(){ return 0; }; $1$Name ;" 2>/dev/null
if [[ $? -ne 0 ]]; then
Illegal+=" ${ASCII[$i]}"
# echo Illegal: "${ASCII[$i]}"
fi
done
printf "Illegal: %s\n" "$Illegal"
}
echo "$BASH_VERSION"
Test
Test "x"
# can we really do funky crap like this?
function [}{(){
echo "Let me take you to, funkytown!"
}
[}{ # why yes, we can!
# though editor auto-indent modes may punish us
I actually skip NUL (0x00), as that's the one character bash may object to finding in the input stream. The output from this script was:
4.4.0(1)-release
Illegal: soh tab nl sp ! " # $ % & ' ( ) * 0 1 2 3 4 5 6 7 8 9 ; < > \ ` { | } ~ del
Illegal: soh " $ & ' ( ) ; < > [ \ ` | del
Let me take you to, funkytown!
Note that bash happily lets me name my function "[}{". Probably my code is not quite rigorous enough to provide the exact rules for legality-in-practice, but it should give a flavor of what manner of abuse is possible. I wish I could mark this answer "For mature audiences only."
Solution 4
From 3.3 Shell Functions:
Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed. Shell functions are executed in the current shell context; no new process is created to interpret them.
Functions are declared using this syntax:
name () compound-command [ redirections ]
or
function name [()] compound-command [ redirections ]
and from 2 Definitions:
name
A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.
Solution 5
Note The biggest correction here is that newline is never allowed in a function name.
My answer:
- Bash --posix:
[a-zA-Z_][0-9a-zA-Z_]*
- Bash 3.0-4.4:
[^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
- Bash 5.0:
[^#%0-9\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
-
\1
and\x7f
works now
-
- Bash 5.1:
[^#%\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
- Numbers can come first?! Yep!
- Any bash 3-5:
[^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
- Same as 3.0-4.4
- My suggestion (opinion):
[^#%0-9\0-\f "$&'();<>\`|\x7f-\xff][^\0-\f "$&'();<>\`|\x7f-\xff]
- Positive version:
[!*+,-./:=?@A-Z\[\]^_a-z{}~][#%0-9!*+,-./:=?@A-Z\[\]^_a-z{}~]*
- Positive version:
My version of the test:
for ((x=1; x<256; x++)); do
hex="$(printf "%02x" $x)"
name="$(printf \\x${hex})"
if [ "${x}" = "10" ]; then
name=$'\n'
fi
if [ "$(echo -n "${name}" | xxd | awk '{print $2}')" != "${hex}" ]; then
echo "$x failed first sanity check"
fi
(
eval "function ${name}(){ echo ${x};}" &>/dev/null
if test "$("${name}" 2>/dev/null)" != "${x}"; then
eval "function ok${name}doe(){ echo ${x};}" &>/dev/null
if test "$(type -t okdoe 2>/dev/null)" = "function"; then
echo "${x} failed second sanity test"
fi
if test "$("ok${name}doe" 2>/dev/null)" != "${x}"; then
echo "${x}(${name}) never works"
else
echo "${x}(${name}) cannot be first"
fi
else
# Just assume everything over 128 is hard, unless this says otherwise
if test "${x}" -gt 127; then
if declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is actually not difficult"
declare -pF | grep "declare -f \x${hex}" | xxd
fi
elif ! declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is difficult in bash"
fi
fi
)
done
Some additional notes:
- Characters 1-31 are less than ideal, as they are more difficult to type.
- Characters 128-255 are even less ideal in bash (except on bash 3.2 on macOS. It might be compiled differently?) because commands like
declare -pF
do not render the special characters, even though they are there in memory. This means any introspection code will incorrectly assume that these functions are not there. However, features likecompgen
still correctly render the characters. - Out of my testing scope, but some unicode does work too, although it's extra hard to paste/type on macOS over ssh.
labyrinth
My passions are literature and programming. I am working on tools for analysis of poetry using computational linguistics libraries. I'm also working on a webapp for crowdsourced categorization of literature following foundational literary critical approaches (specifically Frye's Anatomy of Criticism). I'm also working on a d3/node.js game for teaching ipv6 (to the technical RFC level). I love my current job because it allows me to study Rails, Node, and other technologies that are useful in my goal to do more digital humanities work. I work with some great people on an innovative, lean team. I love how we have the flexibility to try out just about any new programming/devops technologies as we see fit to meet the customers' needs. I think the only way my job could be better is if it allowed me to work directly in the digital humanities/computational literature field.
Updated on June 23, 2022Comments
-
labyrinth almost 2 years
What are the syntax rules for identifiers, especially function and variable names, in Bash?
I wrote a Bash script and tested it on various versions of Bash on Ubuntu, Debian, Red Hat 5 and 6, and even an old Solaris 8 box. The script ran well, so it shipped.
Yet when a user tried it on SUSE machines, it gave a "not a valid identifier" error. Fortunately, my guess that there was an invalid character in the function name was right. The hyphens were messing it up.
The fact that a script that was at least somewhat tested would have completely different behaviour on another Bash or distro was disconcerting. How can I avoid this?
-
labyrinth over 9 yearsHowever, the SUSE machines were using Bash (according to echo $SHELL). I'm not sure why it was unhappy with the hyphens when the stock Bash other major distros didn't care.
-
rici over 9 years@labyrinth: "echo $SHELL" does not tell you which shell is executing. It tells you what the current user's default shell is. So it's quite possible that the command was running in a different shell, for whatever reason (for example, it was in a script starting with a shebang line with
#!/bin/sh
). Another possibility, although it seems less likely, is that the script was running in a "posix" shell, either because the environment variablePOSIXLY_CORRECT
was set, or becauseset -p
was executed, or-p
was included in the command-line options forbash
. -
labyrinth over 9 yearsno, there is no shebang line in the script, so if it were not using the default login shell, it would have to be a different reason than that.
-
rici over 9 years@labyrinth: If the script has no shebang line, then it is not well defined which shell will be invoked to execute it. It might be
/bin/sh
(which is the behaviour of theexeclp
library call), or it might be the current shell (bash
andksh
handle missing shebang lines this way.) If the script was explicitly invoked withsh script
, then of course it will besh
which executes it, regardless of shebang line or$SHELL
setting. -
Ar5hv1r about 6 yearsIt would be helpful to include the Bash version you ran your script against.
-
PJ Eby about 6 yearsYou can't define a function with an all-number name, but you can start with a number. Also, some of the supposed illegalities here are because of bad quoting in the invocation of the function. For example, you can name a function
*
... but the way this test does it, it produces an error. So the actual list of characters you can't have in the first place of a function name is smaller than the list here... but the list of second-place characters is longer, because again, this script isn't invoking the function correctly. -
muru almost 6 years@PJEby You can? On 4.4.18(1)-release,
*() { echo foo; }
gives mebash: syntax error near unexpected token `}'
, and\*() { echo foo; }
givesbash: `\*': not a valid identifier
. (Similarly for'*'() ...
) -
ilkkachu almost 6 yearsYou can use
function * { echo hello; }
(in Bash 4.4), but you probably need to escape it when calling it to avoid globbing. Same for~
(tilde expansion), and!
(negates exit code).?
would be the same as*
, if you have single-character filenames. You can also define a function%
, but I can't find a way to call it, since just giving a word starting with%
acts the same as runningfg
on it, even if escaped. And if you disableinteractive_comments
, you can use#
as a function name only in an interactive shell (non-interactive shells always take#
as a comment marker). -
PJ Eby almost 6 years@muru - you need to have a space between the
*
and the()
for it to work. No `\` needed. -
codeforester about 5 yearsThe rule for function names and variable names is slightly different. For example, function names can have a
.
or a-
(and other special characters too) in them, but not shell variables. -
codeforester about 5 yearsSee this answer for more accurate description of this.
-
Ar5hv1r over 4 yearsIn
--posix
mode hyphens are not permitted (see §6.11), so supporting hyphens is a Bash-ism. I'm not sure from the manual what other characters Bash implicitly supports, but I know that:
and.
can also be used in function names (though:
, at least, breaks tab-completion). -
Keyur Padalia over 4 yearsEw, gross. Incidentally, the fact that
:
is permitted forms the basis of the shortest forkbomb. -
Franklin Yu over 4 yearsQuestion mark is sometimes allowed if they are defined with keyword
function
. For example I have a functionsource?
which check for existence before sourcing. -
niieani almost 4 yearsIt gets even crazier, you can even use emoji as function names. At least in bash 5 - it works! 😱
-
xpusostomos over 2 yearsIt's not surprising you can use a function name containing [ .... because in standard unix [ is the name of a command... an alias for /bin/test