What defines the maximum size for a command single argument?
Solution 1
Answers
- Definitely not a bug.
-
The parameter which defines the maximum size for one argument is
MAX_ARG_STRLEN
. There is no documentation for this parameter other than the comments inbinfmts.h
:/* * These are the maximum length and maximum number of strings passed to the * execve() system call. MAX_ARG_STRLEN is essentially random but serves to * prevent the kernel from being unduly impacted by misaddressed pointers. * MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer. */ #define MAX_ARG_STRLEN (PAGE_SIZE * 32) #define MAX_ARG_STRINGS 0x7FFFFFFF
As is shown, Linux also has a (very large) limit on the number of arguments to a command.
-
A limit on the size of a single argument (which differs from the overall limit on arguments plus environment) does appear to be specific to Linux. This article gives a detailed comparison of
ARG_MAX
and equivalents on Unix like systems.MAX_ARG_STRLEN
is discussed for Linux, but there is no mention of any equivalent on any other systems.The above article also states that
MAX_ARG_STRLEN
was introduced in Linux 2.6.23, along with a number of other changes relating to command argument maximums (discussed below). The log/diff for the commit can be found here. -
It is still not clear what accounts for the additional discrepancy between the result of
getconf ARG_MAX
and the actual maximum possible size of arguments plus environment. Stephane Chazelas' related answer, suggests that part of the space is accounted for by pointers to each of the argument/environment strings. However, my own investigation suggests that these pointers are not created early in theexecve
system call when it may still return aE2BIG
error to the calling process (although pointers to eachargv
string are certainly created later).Also, the strings are contiguous in memory as far as I can see, so no memory gaps due do alignment here. Although is very likely to be a factor within whatever does use up the extra memory. Understanding what uses the extra space requires a more detailed knowledge of how the kernel allocates memory (which is useful knowledge to have, so I will investigate and update later).
ARG_MAX Confusion
Since the Linux 2.6.23 (as result of this commit), there have been changes to the way that command argument maximums are handled which makes Linux differ from other Unix-like systems. In addition to adding MAX_ARG_STRLEN
and MAX_ARG_STRINGS
, the result of getconf ARG_MAX
now depends on the stack size and may be different from ARG_MAX
in limits.h
.
Normally the result of getconf ARG_MAX
will be 1/4
of the stack size. Consider the following in bash
using ulimit
to get the stack size:
$ echo $(( $(ulimit -s)*1024 / 4 )) # ulimit output in KiB
2097152
$ getconf ARG_MAX
2097152
However, the above behaviour was changed slightly by this commit (added in Linux 2.6.25-rc4~121).
ARG_MAX
in limits.h
now serves as a hard lower bound on the result of getconf ARG_MAX
. If the stack size is set such that 1/4
of the stack size is less than ARG_MAX
in limits.h
, then the limits.h
value will be used:
$ grep ARG_MAX /usr/include/linux/limits.h
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
$ ulimit -s 256
$ echo $(( $(ulimit -s)*1024 / 4 ))
65536
$ getconf ARG_MAX
131072
Note also that if the stack size set lower than the minimum possible ARG_MAX
, then the size of the stack (RLIMIT_STACK
) becomes the upper limit of argument/environment size before E2BIG
is returned (although getconf ARG_MAX
will still show the value in limits.h
).
A final thing to note is that if the kernel is built without CONFIG_MMU
(support for memory management hardware), then the checking of ARG_MAX
is disabled, so the limit does not apply. Although MAX_ARG_STRLEN
and MAX_ARG_STRINGS
still apply.
Further Reading
- Related answer by Stephane Chazelas - https://unix.stackexchange.com/a/110301/48083
- In detailed page covering most of the above. Includes a table of
ARG_MAX
(and equivalent) values on other Unix-like systems - http://www.in-ulm.de/~mascheck/various/argmax/ - Seemingly the introduction of
MAX_ARG_STRLEN
caused a bug in with Automake which was embedding shell scripts into Makefiles usingsh -c
- http://www.mail-archive.com/[email protected]/msg05522.html
Solution 2
In eglibc-2.18/NEWS
* ARG_MAX is not anymore constant on Linux. Use sysconf(_SC_ARG_MAX).
Implemented by Ulrich Drepper.
In eglibc-2.18/debian/patches/kfreebsd/local-sysdeps.diff
+ case _SC_ARG_MAX:
+ request[0] = CTL_KERN;
+ request[1] = KERN_ARGMAX;
+ if (__sysctl(request, 2, &value, &len, NULL, 0) == -1)
+ return ARG_MAX;
+ return (long)value;
In linux/include/uapi/linux/limits.h
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
And 131072
is your $(getconf ARG_MAX)/16-1
, perhaps you should start at 0.
You are dealing with glibc, and Linux. It would be good to patch getconf also in order to get the "right" ARG_MAX
value returned.
Edit:
To clearify a little (after a short but hot discussion)
The ARG_MAX
constant which is defined in limits.h
, gives the max length of one argument passed with exec.
The getconf ARG_MAX
command returns the max value of cumulated arguments size and environment size passed to exec.
Solution 3
So @StephaneChazelas rightly corrects me in the comments below - the shell itself does not dictate in any way the maximum argument size permitted by your system, but rather it's set by your kernel.
As several others have already said, it seems the kernel limits to 128kb the maximum argument size you can hand to a new process from any other when first execing it. You experience this problem specifically due to the many nested $(command substitution)
subshells that must execute in place and hand the entirety of their output from one to the next.
And this one's kind of a wild guess, but as the ~5kb discrepancy seems so close to the standard system page size, my suspicion is that it is dedicated to the page bash
uses to handle the subshell your $(command substitution)
requires to ultimately deliver its output and/or the function stack it employs in associating your array table
with your data. I can only assume neither comes free.
I demonstrate below that, while it might be a little tricky, it is possible to pass very large shell variable values off to new processes at invocation, so long as you can manage to stream it.
In order to do so, I primarily used pipes. But I also evaluated the shell array in a here-document
pointed at cat's stdin.
Results below.
But one last note - if you've no particular need for portable code, it strikes me that mapfile
might simplify your shell jobs a little.
time bash <<-\CMD
( for arg in `seq 1 6533` ; do
printf 'args+=(' ; printf b%.0b `seq 1 6533` ; echo ')'
done ;
for arg in `seq 1 6533` ; do
printf %s\\n printf\ '%s\\n'\ \""\${args[$arg]}"\" ;
done ) | . /dev/stdin >&2
CMD
bash <<<'' 66.19s user 3.75s system 84% cpu 1:22.65 total
Possibly you could double this up and then do so again if you did it in streams - I'm not morbid enough to find out - but definitely it works if you stream it.
I did try changing the printf
generator part in line two to:
printf \ b%.0b
It also works:
bash <<<'' 123.78s user 5.42s system 91% cpu 2:20.53 total
So maybe I'm a little morbid. I use zero padding here
and add in the previous "$arg"
value to the current "$arg"
value. I get way beyond 6500...
time bash <<-\CMD
( for arg in `seq 1 33` ; do
echo $arg >&2
printf 'args+=('"${args[$((a=arg-1))]}$(printf "%0${arg}0d" \
`seq 1 6533` ; printf $((arg-1)))"')\n'
done ;
for arg in `seq 1 33` ; do
printf '/usr/bin/cat <<HERE\n%s\nHERE\n' "\${args[$arg]}"
done ) | . /dev/stdin >&2
CMD
bash <<<'' 14.08s user 2.45s system 94% cpu 17.492 total
And if I change the cat
line to look like this:
printf '/usr/bin/cat <<HERE | { printf '$arg'\ ; wc -c ;}
%s\nHERE\n' "\${args[$arg]}"
I can get byte counts from wc.
Remember these are the sizes of each key in the args
array. The array's total size is the sum of all these values.
1 130662
2 195992
3 261322
4 326652
5 391982
6 457312
7 522642
8 587972
9 653302
10 718633
11 783963
12 849293
13 914623
14 979953
15 1045283
16 1110613
17 1175943
18 1241273
19 1306603
20 1371933
21 1437263
22 1502593
23 1567923
24 1633253
25 1698583
26 1763913
27 1829243
28 1894573
29 1959903
30 2025233
31 2090563
32 2155893
33 2221223
Related videos on Youtube
Graeme
Updated on September 18, 2022Comments
-
Graeme almost 2 years
I was under the impression that the maximum length of a single argument was not the problem here so much as the total size of the overall argument array plus the size of the environment, which is limited to
ARG_MAX
. Thus I thought that something like the following would succeed:env_size=$(cat /proc/$$/environ | wc -c) (( arg_size = $(getconf ARG_MAX) - $env_size - 100 )) /bin/echo $(tr -dc [:alnum:] </dev/urandom | head -c $arg_size) >/dev/null
With the
- 100
being more than enough to account for the difference between the size of the environment in the shell and theecho
process. Instead I got the error:bash: /bin/echo: Argument list too long
After playing around for a while, I found that the maximum was a full hex order of magnitude smaller:
/bin/echo \ $(tr -dc [:alnum:] </dev/urandom | head -c $(($(getconf ARG_MAX)/16-1))) \ >/dev/null
When the minus one is removed, the error returns. Seemingly the maximum for a single argument is actually
ARG_MAX/16
and the-1
accounts for the null byte placed at the end of the string in the argument array.Another issue is that when the argument is repeated, the total size of the argument array can be closer to
ARG_MAX
, but still not quite there:args=( $(tr -dc [:alnum:] </dev/urandom | head -c $(($(getconf ARG_MAX)/16-1))) ) for x in {1..14}; do args+=( ${args[0]} ) done /bin/echo "${args[@]}" "${args[0]:6534}" >/dev/null
Using
"${args[0]:6533}"
here makes the last argument 1 byte longer and gives theArgument list too long
error. This difference is unlikely to be accounted for by the size of the environment given:$ cat /proc/$$/environ | wc -c 1045
Questions:
- Is this correct behaviour, or is there a bug somewhere?
- If not, is this behaviour documented anywhere? Is there another parameter which defines the maximum for a single argument?
- Is this behaviour limited to Linux (or even particular versions of such)?
- What accounts for the additional ~5KB discrepancy between the actual maximum size of the argument array plus the approximate size of the environment and
ARG_MAX
?
Additional info:
uname -a Linux graeme-rock 3.13-1-amd64 #1 SMP Debian 3.13.5-1 (2014-03-04) x86_64 GNU/Linux
-
Stéphane Chazelas over 10 yearsOn Linux, it's hard coded to 32 pages (128kiB). See MAX_ARG_STRLEN in the source.
-
Stéphane Chazelas over 10 yearsMost of the information you're looking for is in this answer to CP: max source files number arguments for copy utility
-
derobert over 10 yearsAt least on my machine,
getconf ARG_MAX
depends on the currentulimit -s
. Set it to unlimited, and get an amazing 4611686018427387903 for ARG_MAX. -
Ciro Santilli Путлер Капут 六四事 about 9 years
-
Stéphane Chazelas over 10 yearsThat ARG_MAX is the minimum guaranteed for the arg+env size limit, it's not the max size of a single argument (though it happens to be the same value as MAX_ARG_STRLEN)
-
Stéphane Chazelas over 10 yearsNo, nothing to do with the shell, it's the execve(2) system call returning E2BIG when a single argument is over 128kiB.
-
Graeme over 10 yearsConsider also that there is not limit on shell builtins -
echo $(tr -dc [:alnum:] </dev/urandom | head -c $(($(getconf ARG_MAX)*10))) >/dev/null
will run fine. It is only when you use an external command that there is a problem. -
Graeme over 10 yearsDo you have a date for your
eglibc-2.18/NEWS
snippet? It would be good to pin this down to a particular kernel version. -
user312543 over 10 years@StephaneChazelas: I'm just too lazy to find the part, but if arg exceeds the max value it isn't necessary to figure out the env size.
-
user312543 over 10 years@Graeme: I have also some older linuxes running where the getconf value shows 131072. I think this belongs to newer linuxes with eglibc > ?? only. Congrats, you found a bug BTW.
-
mikeserv over 10 years@Graeme Well, I did this with cat as well - no problem. The variable is evaluated in a heredoc at the end. See my last edit. I did cut down the total count to 33 because I'm adding in the last value each time. And the zero padding...
-
Stéphane Chazelas over 10 yearsYou're looking at glibc code, that's irrelevant here. The libc doesn't care what size of arguments you're passing. The code you're quoting is about sysconf, an API to gives users an idea of the maximum size (whatever that means) of argv+env passed to an execve(2). It's the kernel that accepts or not the arg and env list passed along an execve() system call. The
getconf ARG_MAX
is about the cumulative size of arg+env (variable in recent Linux, seeulimit -s
and the other question I linked), it's not about the max length of a single arg for which there's no sysconf/getconf query. -
user312543 over 10 years@Graeme: 131072 on Ubuntu 8.04
-
mikeserv over 10 years@StephaneChazelas - so am I getting around that by evaluating the argument in a heredoc stream? Or is
bash
compressing it somehow? -
Graeme over 10 yearsThe error message will be printed by the shell when whichever
exec
function it uses returnsE2BIG
. TheArgument list too long
part will come fromstrerror
orperror
. -
user312543 over 10 yearsI think this is right, bash tried to exec echo with arguments and strerror and perror are in glibc.
-
Stéphane Chazelas over 10 years@mikeserv, I can't see anywhere in you're code any instance of you executing a command with a large arg list.
printf
is a builtin so is not executed, and AFAICT, yourcat
is not given any argument. -
Graeme over 10 yearsBut ultimately the
E2BIG
error and the error strings are defined in the kernel source. Look at thecopy_strings
function infs/exec.c
, this is where the error originates. This is the code that Stephane has looked at. -
mikeserv over 10 years@StephaneChazelas It's given the heredoc - just stdin - that's true, but the heredoc is only given "${args[$arg]}". But i think i get your meaning - that's just an iohere call - a pipe - not an execve, right? So it would just be a shell builtin as well, if not system itself. That's something to think about. Thanks.
-
mikeserv over 10 years@StephaneChazelas - here they are - in bash and zsh both. I do know that in dash theyre anonymous pipes. POSIX only specifies the iohere so i guess it could be anything depending on shell. I wonder if i could do this in dash with set --. It shouldnt make any difference, right?
-
mikeserv over 10 yearsNah. Im done. I dont even use arrays - or bash for that matter. I was just curious about how to keep it from exploding. Guess i get the gist now.
-
mikeserv over 10 years@StephaneChazelas - good point though; thats the wrong place for the heredoc. Bash must be writing in the whole key=value everytime to the tmpfile. A little finagling and a file descriptor and i could switch the pipe and the heredoc maybe... Cest la vie, i guess; I only put it in there because the quotes were getting messy.
-
user312543 over 10 yearsAfter a short edit, I'm done with this topic.
-
Graeme over 10 yearsThe maximum for a single argument is MAX_ARG_STRLEN. If your still not convinced, see the comments in
binfmts.h
. TheARG_MAX
inlimits.h
is a hard lower bound to the maximum size of the argument/environment array. Normally this is a function of the stack size which may be changed - this is what you get withgetconf ARG_MAX
. -
mikeserv over 10 yearsThis is a good answer, certainly better than mine - I upvoted it. But the answer we ask for isnt always the answer we should get - thats why we're asking, because we dont know. It doesnt address the problem with your work flow that brought you head to head with this issue in the first place. I demonstrate how that might be mitigated in my own answer, and how single shell variable string arguments over 2mbs in length can be passed to newly execed processes with just a couple lines of shell script.
-
nh2 over 5 yearsI've made a Python script that demonstrates the 32 * 4KB pages = 128 KB limit of environment variables on default Linux.