Calculate sum of several sizes of files in Bash

6,805

Solution 1

Use stat instead of du:

#!/bin/bash

for i in `grep -v ^# ~/cache_temp | grep -v "dovecot.index.cache"`; do
     [ -f "$i" ] && totalsize=$[totalsize + $(stat -c "%s" "$i")]
done
echo totalsize: $totalsize bytes

Solution 2

If you need to use the file this snippet is hopefully efficient.

xargs -a cache_file stat --format="%s" | paste -sd+ | bc -l

The xargs is to prevent overflowing the argument limit but getting the max number of files into one invocation of stat each time.

Solution 3

According to du(1), there is a -c option whose purpose is to produce the grand total.

% du -chs * /etc/passwd
92K ABOUT-NLS
196K    NEWS
12K README
48K THANKS
8,0K    TODO
4,0K    /etc/passwd
360K    total

Solution 4

If you remove the "-h" flag from your "du" command, you'll get the raw byte sizes. You can then add them with the ((a += b)) syntax:

a=0
for i in $(find . -type f -print0 | xargs -0 du -s | awk {'print $1'})
do
  ((a += i))
done
echo $a

The -print0 and -0 flags to find/xargs use null-terminated strings to preserve whitespace.

EDIT: turns out I type slower than @HBruijn comments!

Solution 5

Well... For better or worse, here's my implementation of this. I've always preferred using "while" to read lines from files.

#!/bin/bash

SUM=0
while read file; do
    SUM=$(( $SUM + $(stat $file | awk '/Size:/ { print $2 }') ))
done < cache_temp
echo $SUM

Per janos' recommendation below:

#!/bin/bash

while read file; do
    stat $file
done < cache_temp | awk 'BEGIN { s=0 } $1 == "Size:" { s=s+$2 } END  { print s; }'
Share:
6,805

Related videos on Youtube

Piduna
Author by

Piduna

Updated on September 18, 2022

Comments

  • Piduna
    Piduna over 1 year

    I have list of files in a file, cache_temp.

    In file cache_temp:

    /home/maildir/mydomain.com/een/new/1491397868.M395935P76076.nm1.mydomain.com,S=1740,W=1777
    /home/maildir/mydomain.com/een/new/1485873821.M199286P14170.nm1.mydomain.com,S=440734,W=446889
    /home/maildir/mydomain.com/td.pr/cur/1491397869.M704928P76257.nm1.mydomain.com,S=1742,W=1779:2,Sb
    /home/maildir/mydomain.com/td.pr/cur/1501571359.M552218P73116.nm1.mydomain.com,S=1687,W=1719:2,Sa
    /home/maildir/mydomain.com/td.pr/cur/1498562257.M153946P22434.nm1.mydomain.com,S=1684,W=1717:2,Sb
    

    I have a simple script for getting the size of files from cache_temp:

    #!/bin/bash
    
    for i in `grep -v ^# ~/cache_temp | grep -v "dovecot.index.cache"`; do
        if [ -f "$i" ]; then
            size=$(du -sh "$i" | awk '{print $1}')
            echo $size
        fi
    done
    

    I have a list of sizes of files:

    4,0K
    4,0K
    4,0K
    432K
    4,0K
    

    How can I calculate the sum of them?

    • HBruijn
      HBruijn over 6 years
      Don't use the -h switch for basic size calculations, taking k M or G's into account is going to be horribly complex for a simple shell script. Simply adding numbers is trivial tldp.org/LDP/abs/html/arithexp.html
  • foxfabi
    foxfabi over 6 years
    or (( totalsize += $(stat -c "%s" "$i") ))
  • foxfabi
    foxfabi over 6 years
  • foxfabi
    foxfabi over 6 years
    And to just get the human-readable total size: du -chs * | tail -1 | cut -f1
  • Xen2050
    Xen2050 over 6 years
    @glennjackman Actually the "reason" link is here mywiki.wooledge.org/DontReadLinesWithFor but both links are useful
  • Xen2050
    Xen2050 over 6 years
    Good option, and so close to a full answer... combined with reading files from cache_temp and maybe xargs in case of large lines & add them... I guess you'd have shearn89's answer...
  • tripleee
    tripleee over 6 years
    The stat --format implies Linux which means you have xargs -a to avoid the explicit redirection if you like.
  • Matthew Ife
    Matthew Ife over 6 years
    @tripleee Now thats a valid criticism ;) I changed the suggestion.
  • janos
    janos over 6 years
    Please use modern $(...) subshells instead of backticks
  • Erik
    Erik over 6 years
    I see what you're saying.. no sense running stat AND awk "wc -l cache_temp" times. Use awk one to roll everything up at the end