Printing long integers in awk

15,806

Solution 1

I believe the underlying numeric format in this case is an IEEE double. So the changed value is a result of floating point precision errors. If it is actually necessary to treat the large values as numerics and to maintain accurate precision, it might be better to use something like Perl, Ruby, or Python which have the capabilities (maybe via extensions) to handle arbitrary-precision arithmetic.

Solution 2

UPDATE: Recent versions of GNU awk support arbitrary precision arithmetic. See the GNU awk manual for more info.

ORIGINAL POST CONTENT: XMLgawk supports arbitrary precision arithmetic on floating-point numbers. So, if installing xgawk is an option:

zsh-4.3.11[drado]% awk --version |head -1; xgawk --version | head -1
GNU Awk 4.0.0
Extensible GNU Awk 3.1.6 (build 20080101) with dynamic loading, and with statically-linked extensions

zsh-4.3.11[drado]% awk 'BEGIN {
  x=665857
  y=470832
  print x^4 - 4 * y^4 - 4 * y^2
  }'
11885568

zsh-4.3.11[drado]% xgawk -lmpfr 'BEGIN {
  MPFR_PRECISION = 80
  x=665857
  y=470832
  print mpfr_sub(mpfr_sub(mpfr_pow(x, 4), mpfr_mul(4, mpfr_pow(y, 4))), 4 * y^2)
  }'
1.0000000000000000000000000

Solution 3

This answer was partially answered by @Mark Wilkins and @Dennis Williamson already but I found out the largest 64-bit integer that can be handled without losing precision is 2^53. Eg awk's reference page http://www.gnu.org/software/gawk/manual/gawk.html#Integer-Programming

(sorry if my answer is too old. Figured I'd still share for the next person before they spend too much time on this like I did)

Solution 4

You're running into Awk's Floating Point Representation Issues. I don't think you can find a work-around within awk framework to perform arithmetic on huge numbers accurately.

Only possible (and crude) way I can think of is to break the huge number into smaller chunk, perform your math and join them again or better yet use Perl/PHP/TCL/bsh etc scripting languages that are more powerful than awk.

Share:
15,806
jaypal singh
Author by

jaypal singh

Updated on July 16, 2022

Comments

  • jaypal singh
    jaypal singh almost 2 years

    I have a pipe delimited feed file which has several fields. Since I only need a few, I thought of using awk to capture them for my testing purposes. However, I noticed that printf changes the value if I use "%d". It works fine if I use "%s".

    Feed File Sample:

    [jaypal:~/Temp] cat temp

    302610004125074|19769904399993903|30|15|2012-01-13 17:20:02.346000|2012-01-13 17:20:03.307000|E072AE4B|587244|316|13|GSM|1|SUCC|0|1|255|2|2|0|213|2|0|6|0|0|0|0|0|10|16473840051|30|302610|235|250|0|7|0|0|0|0|0|10|54320058002|906|722310|2|0||0|BELL MOBILITY CELLULAR, INC|BELL MOBILITY CELLULAR, INC|Bell Mobility|AMX ARGENTINA SA.|Claro aka CTI Movil|CAN|ARG|

    I am interested in capturing the second column which is 19769904399993903.

    Here are my tests:

    [jaypal:~/Temp] awk -F"|" '{printf ("%d\n",$2)}' temp
    19769904399993904   # Value is changed
    

    However, the following two tests works fine -

    [jaypal:~/Temp] awk -F"|" '{printf ("%s\n",$2)}' temp
    19769904399993903   # Value remains same
    
    [jaypal:~/Temp] awk -F"|" '{print $2}' temp
    19769904399993903   # Value remains same
    

    So is this a limit of "%d" of not able to handle long integers. If thats the case why would it add one to the number instead of may be truncating it?

    I have tried this with BSD and GNU versions of awk.

    Version Info:

    [jaypal:~/Temp] gawk --version
    GNU Awk 4.0.0
    Copyright (C) 1989, 1991-2011 Free Software Foundation.
    
    [jaypal:~/Temp] awk --version
    awk version 20070501
    
  • jaypal singh
    jaypal singh over 12 years
    Thanks Mark, so how can we handle such numbers with printf? It's not a show stopper for me but just wanted to know for learning purposes
  • Mark Wilkins
    Mark Wilkins over 12 years
    I don't think it is possible to represent a number in AWK accurately. My understanding (which may be incorrect) is that awk always uses double precision to store numeric values. As long as you don't need to perform math operations, then the best bet is to print/use them as strings (which you already found out).
  • jaypal singh
    jaypal singh over 12 years
    Thanks Anubhava. That sounds right, coz when I do this at the command line, it prints it fine [jaypal:~/Temp] printf "%d" 19769904399993903 19769904399993903
  • SourceSeeker
    SourceSeeker over 12 years
    Correct. According to info gawk: "The internal representation of all numbers, including integers, uses double-precision floating-point numbers. On most modern systems, these are in IEEE 754 standard format."
  • Peter Cordes
    Peter Cordes almost 9 years
    sourceforge.net/projects/gawkextlib/files/xgawk says that GNU awk 4.1 obsoletes xgawk as a separate binary. It recommends gawk with gawkextlib. And your xgawk link is dead. I wasn't sure which link would be best, so I didn't edit your post myself.