Handy parsing for numbers with unit suffixes?

5,869

Solution 1

Based on my answer at one of the questions you linked to:

awk '{
    ex = index("KMGTPEZY", substr($1, length($1)))
    val = substr($1, 0, length($1) - 1)

    prod = val * 10^(ex * 3)

    sum += prod
}
END {print sum}'

Another method that's used:

sed 's/G/ * 1000 M/;s/M/ * 1000 K/;s/K/ * 1000/; s/$/ +\\/; $a0' | bc

Solution 2

You can use perl regular expressions to do this. For example,

$value = 0;
if($line =~ /(\d+\.?\d*)(\D+)\s+/) {
   $amplifier = 1024 if ($2 eq 'K');
   $amplifier = 1024 * 1024 if ($2 eq 'M');
   $amplifier = 1024 * 1024 * 1024 if ($2 eq 'G');
   $value = $1 * $amplifier;
}

This is a simple script. You can consider it as starting point. Hope it will help!

Solution 3

Personally, I'd just not use the -h flag in the first place. The "human readable" version rounds off numbers which will need to be rounded again when you convert back, getting even less accurate. (For instance, 2.7MiB is 2831155.2 bytes. What did you do with the other 0.8th of a byte??!)

Otherwise, you can ask units to convert MiB/GiB/KiB to just "B" and it'll handle this, but you'd have to do something like (assuming your output is tabbed, otherwise cut appropriately)

{your output} | cut -f1 '-d{tab}' | xargs -L 1 -I {} units -1t {}iB B | awk '{s+=$1}END{printf "%d\n",s}'

Solution 4

VALUE=$1

for i in "g G m M k K"; do
        VALUE=${VALUE//[gG]/*1024m}
        VALUE=${VALUE//[mM]/*1024k}
        VALUE=${VALUE//[kK]/*1024}
done

[ ${VALUE//\*/} -gt 0 ] && echo VALUE=$((VALUE)) || echo "ERROR: size invalid, pls enter correct size"
Share:
5,869

Related videos on Youtube

Muhammad Danish
Author by

Muhammad Danish

Updated on September 17, 2022

Comments

  • Muhammad Danish
    Muhammad Danish over 1 year

    Let's say you have data with quantities in human-readable format, such as the output of du -h, and want to further operate on those numbers. Let's say you want to pipe your data through grep to do a summation of a sub-set of that data. You do this ad-hoc on many systems you've never seen before, and have only minimal utilities. You want suffix conversions for all the standard 10^n suffixes.

    Exists a gnu-linux utility to convert the suffixed numbers to real numbers within a pipeline? Do you have a bash function written to do this, or some perl which might be easy to remember, instead of a length of regex replacements or several sed steps?

    38M     /var/crazyface/courses/200909-90147
    2.7M    /var/crazyface/courses/200909-90157
    1.1M    /var/crazyface/courses/200909-90159
    385M    /var/crazyface/courses/200909-90161
    1.3M    /var/crazyface/courses/200909-90169
    376M    /var/crazyface/courses/200907-90171
    8.0K    /var/crazyface/courses/200907-90173
    668K    /var/crazyface/courses/200907-90175
    564M    /var/crazyface/courses/200907-90178
    4.0K    /var/crazyface/courses/200907-90179
    

    | grep 200907 | <amazing suffix conversion> | awk '{s+=$1} END {print s}'


    Relevant references:

    • Tony
      Tony over 8 years
      You rarely need to use grep and awk. If you are using awk, then use awk. Just add /200907/ in front of your per-line code, e.g. awk '/200907/{s+=$1} END {print s}'
  • Muhammad Danish
    Muhammad Danish about 13 years
    Indeed, this is one way. I've also found stackoverflow.com/questions/2557649/….
  • Muhammad Danish
    Muhammad Danish about 13 years
    Well noted, that there is a loss of precision. Supplementing the input to units also works.. but I found units missing on my minimal distro! I think we'd all do this differently if we had full control of everything.
  • djuarez
    djuarez almost 5 years
    for the second method, what if the suffix is s?
  • Dennis Williamson
    Dennis Williamson almost 5 years
    @djuarez: What multiplier does the s stand for?
  • djuarez
    djuarez almost 5 years
    None, just extrapolating on other unit cases.