Portable way to get file size (in bytes) in the shell

187,737

Solution 1

wc -c < filename (short for word count, -c prints the byte count) is a portable, POSIX solution. Only the output format might not be uniform across platforms as some spaces may be prepended (which is the case for Solaris).

Do not omit the input redirection. When the file is passed as an argument, the file name is printed after the byte count.

I was worried it wouldn't work for binary files, but it works OK on both Linux and Solaris. You can try it with wc -c < /usr/bin/wc. Moreover, POSIX utilities are guaranteed to handle binary files, unless specified otherwise explicitly.

Solution 2

I ended up writing my own program (really small) to display just the size. More information is in bfsize - print file size in bytes (and just that).

The two cleanest ways in my opinion with common Linux tools are:

stat -c %s /usr/bin/stat

50000


wc -c < /usr/bin/wc

36912

But I just don't want to be typing parameters or pipe the output just to get a file size, so I'm using my own bfsize.

Solution 3

Even though du usually prints disk usage and not actual data size, the GNU Core Utilities du can print a file's "apparent size" in bytes:

du -b FILE

But it won't work under BSD, Solaris, macOS, etc.

Solution 4

BSD systems have stat with different options from the GNU Core Utilities one, but with similar capabilities.

stat -f %z <file name>

This works on macOS (tested on 10.12), FreeBSD, NetBSD and OpenBSD.

Solution 5

When processing ls -n output, as an alternative to ill-portable shell arrays, you can use the positional arguments, which form the only array and are the only local variables in the standard shell. Wrap the overwrite of positional arguments in a function to preserve the original arguments to your script or function.

getsize() { set -- $(ls -dn "$1") && echo $5; }
getsize FILE

This splits the output of ln -dn according to current IFS environment variable settings, assigns it to positional arguments and echoes the fifth one. The -d ensures directories are handled properly and the -n assures that user and group names do not need to be resolved, unlike with -l. Also, user and group names containing white space could theoretically break the expected line structure; they are usually disallowed, but this possibility still makes the programmer stop and think.

Share:
187,737
Admin
Author by

Admin

Updated on July 14, 2022

Comments

  • Admin
    Admin almost 2 years

    On Linux, I use stat --format="%s" FILE, but the Solaris machine I have access to doesn't have the stat command. What should I use then?

    I'm writing Bash scripts and can't really install any new software on the system.

    I've considered already using:

    perl -e '@x=stat(shift);print $x[7]' FILE
    

    or even:

    ls -nl FILE | awk '{print $5}'
    

    But neither of these looks sensible - running Perl just to get file size? Or running two programs to do the same?

  • Admin
    Admin over 14 years
    Well, I do a lot of Perl writing myself, but sometimes the tool is chosen for me, not by me :)
  • caf
    caf over 14 years
    Or just wc -c < file if you don't want the filename appearing.
  • Admin
    Admin about 13 years
    First line of problem description states that stat is not an option, and the wc -c is the top answer for over a year now, so I'm not sure what is the point of this answer.
  • jmtd
    jmtd about 13 years
    If I'm not mistaken, though, wc in a pipeline must read() the entire stream to count the bytes. The ls/awk solutions (and similar) use a system call to get the size, which should be linear time (versus O(size))
  • Camilo Martin
    Camilo Martin almost 12 years
    I recall wc being very slow the last time I did that on a full hard disk. It was slow enough that I could re-write the script before the first one finished, came here to remember how I did it lol.
  • yo'
    yo' over 11 years
    The point is in people like me who find this SO question in Google and stat is an option for them.
  • Robert Calhoun
    Robert Calhoun about 11 years
    I'm working on an embedded system where wc -c takes 4090 msec on a 10 MB file vs "0" msec for stat -c %s, so I agree it's helpful to have alternative solutions even when they don't answer the exact question posed.
  • Orwellophile
    Orwellophile about 11 years
    "stat -c" is not portable / does not accept the same arguments on MacOS as it does on Linux. "wc -c" will be very slow for large files.
  • Haravikk
    Haravikk almost 11 years
    I wouldn't use wc -c; it looks much neater but ls + awk is better for speed/resource use. Also, I just wanted to point out that you actually need to post-process the results of wc as well because on some systems it will have whitespace before the result, which you may need to strip before you can do comparisons.
  • Palec
    Palec about 10 years
    FYI maxdepth is not needed. It could be rewritten as size=$(test -f filename && find filename -printf '%s').
  • SourceSeeker
    SourceSeeker about 10 years
    @Palec: The -maxdepth is intended to prevent find from being recursive (since the stat which the OP needs to replace is not). Your find command is missing a -name and the test command isn't necessary.
  • Palec
    Palec about 10 years
    @DennisWilliamson find searches its parameters recursively for files matching given criteria. If the parameters are not directories, the recursion is… quite simple. Therefore I first test that filename is really an existing ordinary file, and then I print its size using find that has nowhere to recurse.
  • pbies
    pbies almost 10 years
    stat gives the size of locked file, when wc does not - Cygwin under Windows on c:\pagefile.sys.
  • Rdpi
    Rdpi over 9 years
    What would be then the best option to print the result in human friendly format? e.g. MB, KB
  • Admin
    Admin almost 9 years
    stat is not portable either. stat -c %s /usr/bin/stat stat: illegal option -- c usage: stat [-FlLnqrsx] [-f format] [-t timefmt] [file ...]
  • Orwellophile
    Orwellophile almost 9 years
    i did say that. try my answer, based on ls should be quite portable stackoverflow.com/a/15522969/912236
  • Silas
    Silas over 8 years
    wc -c is great, but it will not work if you don't have read access to the file.
  • Jose Alban
    Jose Alban about 8 years
    On MacOS X, brew install coreutils and gdu -b will achieve the same effect
  • Luciano
    Luciano about 8 years
    Or put it in a shell script: ls -Lon "$1" | awk '{ print $4 }'
  • Orwellophile
    Orwellophile almost 8 years
    @Luciano I think you have totally missed the point of not forking and doing a task in bash rather than using bash to string a lot of unix commands together in an inefficient fashion.
  • CousinCocaine
    CousinCocaine over 7 years
    I prefer this method because wc needs to read the whole file befor giving a result, du is immediate.
  • Palec
    Palec almost 7 years
    The stat and ls utilities just execut the lstat syscall and get the file length without reading the file. Thus, they do not need the read permission and their performance does not depend on the file's length. wc actually opens the file and usually reads it, making it perform much worse on large files. But GNU coreutils wc optimizes when only byte count of a regular file is wanted: it uses fstat and lseek syscalls to get the count. See the comment with (dd ibs=99k skip=1 count=0; ./wc -c) < /etc/group in its source.
  • Palec
    Palec almost 7 years
    find . -maxdepth 1 -type f -name filename -printf '%s' works only if the file is in the current directory, and it may still examine each file in the directory, which might be slow. Better use (even shorter!) find filename -maxdepth 1 -type f -printf '%s'.
  • Palec
    Palec almost 7 years
    POSIX mentions du -b in a completely different context in du rationale.
  • Palec
    Palec almost 7 years
    Actually, units could be converted, but this shows disk usage instead of file data size ("apparent size").
  • Palec
    Palec almost 7 years
    This shows disk usage instead of file data size ("apparent size").
  • Palec
    Palec almost 7 years
    This uses just the lstat call, so its performance does not depend on file size. Shorter than stat -c '%s', but less intuitive and works differently for folders (prints size of each file inside).
  • Palec
    Palec almost 7 years
    FreeBSD du can get close using du -A -B1, but it still prints the result in multiples of 1024B blocks. Did not manage to get it to print bytes count. Even setting BLOCKSIZE=1 in the environemnt does not help, because 512B block are used then.
  • Palec
    Palec almost 7 years
    Solaris does not have stat utility at all, though.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    I always wondered why the stat CLI utility was never included in POSIX.
  • alper
    alper over 5 years
    Would it be efficient to use wc -c < FILE for very large files such as 100GB ? @Carl Smotricz
  • Carl Smotricz
    Carl Smotricz over 5 years
    @alper: I haven't tested, but I suspect that redirecting a large file like this is terribly slow. My answer was about measuring file size portably, not efficiently. For a quick size based on the directory data, you'd probably be better off looking at some of the other answers here.
  • Jason Martin
    Jason Martin about 3 years
    Busybox doesn't support that structure: stat: unrecognized option: % BusyBox v1.32.1 () multi-call binary.
  • Andrew Henle
    Andrew Henle over 2 years
    Use the flag --b M or --b G for the output in Megabytes or Gigabytes Note, though, that neither of those are portable. pubs.opengroup.org/onlinepubs/9699919799.2018edition/utiliti‌​es/…
  • Peter Mortensen
    Peter Mortensen over 2 years
    Where does it work? Only on Linux?