Calculating a round order of magnitude

11,289

Solution 1

Your log10/floor code is perfectly readable, and its performance cost will likely be dwarfed by that of the string formatting you will subsequently be doing on your output.

However, suppose you were to really need the performance...

Note that log10(x) == log2(x) / log2(10) == log2(x) * 1/log2(10)

1/log2(10) is a constant

log2(x) can usually be performed cheaply in the integer pipeline on modern architectures using instructions such as CLZ or a bit twiddling hack, yielding a number between 0 and 63 for a 64-bit integer. That fits in 6 bits, leaving us up to 58 bits after the radix point usable for fixed point arithmetic in a 64-bit type.

So we can then use fixed-point arithmetic to find the log10:

unsigned long long integer_log10( unsigned long long _in )
{
    unsigned long long log10fp6x58 = 0x134413509f79ff0llu; // (unsigned long long) (double(1llu<<58) / log2(10.0))
    return (((integer_log2(_in)) * log10fp6x58)+(1llu<<57)) >> 58;
}

The implementation of integer_log2 is compiler/platform-dependent; e.g. on GCC/PowerPC, it's

unsigned long long integer_log2( unsigned long long _in )
{
    return 63 - __cntlzd(_in);
}

This approach can be generalised for finding the logarithm of any base, simply calculate the appropriate constant as described above.

Solution 2

This is the most straightforward and simple method i can think of ... and maybe it will be a bit faster than computing the logarithm:

postfixes = {{1e12, "T"},
             {1e9,  "G"},
             {1e6,  "M"},
             {1e3,  "K"}}

for each postfix in postfixes{
    if(x > postfix.value){
        return (x / postfix.value) + postfix.letter;
    }
}

return x;

Solution 3

Convert the number into a string and use the strings length. This is certainly not faster, but will be very accurate. You can then go on and use the string directly to build the result by slicing it appropriately.

Solution 4

Don't fiddle with the number, instead s(n)printf the number into a string using "%E", then substitute appropriately for E+00 E+03 E+09 (etc) (IIRC, you should only get powers of 3 with scientific notation - which is what you want).

char number_buff[30];
snprintf(number_buff, 29, "%E", x);
char *powered_number_string = substitute_powers(number_buff);

char *substitute_powers(const char *number_buff) is messy in C.

sed would be something like

-e s/E+0// -e s/E+3/K/ -e s/E+6/M/ -e s/E+9/G/

Share:
11,289
tstenner
Author by

tstenner

Updated on June 05, 2022

Comments

  • tstenner
    tstenner almost 2 years

    For a simple project I have to make large numbers (e.g. 4294967123) readable, so I'm writing only the first digits with a prefix (4294967123 -> 4.29G, 12345 -> 12.34K etc.)

    The code (simplified) looks like this:

    const char* postfixes=" KMGT";
    char postfix(unsigned int x)
    {
         return postfixes[(int) floor(log10(x))];
    }
    

    It works, but I think that there's a more elegant/better solution than computing the full precision logarithm, rounding it and casting it down to an int again.

    Other solutions I thought of:

    int i=0;
    for(; x >= 1000 ; ++i) x/=1000;
    return postfixes[i];
    

    (This is significantly slower, but easier to read)

    The numbers are distributed between according to Benford's Law and the number should be treated as unsigned 64 bit-number, as there should be no rounding error near 10^x (e.g. in python math.log(1000,10) returns 2.999996, which gets floored to 2). Is there any fast, accurate other way I'm missing?

  • tstenner
    tstenner over 14 years
    it works, however it is way slower than the for-loop and it it's quite hard to understand in the beginning.
  • Drakosha
    Drakosha over 14 years
    It'll be probably better to change: unsigned long long log10fp6x58 => static const unsigned long long log10fp6x58
  • Josh The Geek
    Josh The Geek almost 11 years
    How would I use this on a modern setup with Clang/LLVM and Intel? (OS X)
  • Isaac
    Isaac over 9 years
    This is a nice solution but the approximation for 1/log_2_(10) means the result isn't guaranteed to be correct. A fast, accurate solution would be to include a static array with [1, 10^3, 10^6, etc], and a matching ["", "K", "M", "G", ...]. Then at runtime, use binary search to find the closest match in the first array, and index into the second array for the char.
  • Mike Kaganski
    Mike Kaganski about 2 years
    Not only the approximation for 1/log_2_(10), but also the absolutely incorrect result because of the implementation of integer_log2, which can only return 64 different results, and so will return the same end result for any number with given most significant bit set.