How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?

10,721

Solution 1

The issue you're having between -0. and +0. is part of the specification of how floats are supposed to behave (IEEE754). In some circumstance, one needs this distinction. See, for example, the docs that are linked to in the docs for around.

It's also worth noting that the two zeros should compare to equal, so

np.array(-0.)==np.array(+0.) 
# True

That is, I think the problem is more likely with your uniqueness comparison. For example:

a = np.array([-1., -0., 0., 1.])
np.unique(a)
#  array([-1., -0.,  1.])

If you want to keep the numbers as floating point but have all the zeros the same, you could use:

x = np.linspace(-2, 2, 6)
#  array([-2. , -1.2, -0.4,  0.4,  1.2,  2. ])
y = x.round()
#  array([-2., -1., -0.,  0.,  1.,  2.])
y[y==0.] = 0.
#  array([-2., -1.,  0.,  0.,  1.,  2.])

# or  
y += 0.
#  array([-2., -1.,  0.,  0.,  1.,  2.])    

Note, though, you do have to do this bit of extra work since you are trying to avoid the floating point specification.

Note also that this isn't due to a rounding error. For example,

np.fix(np.array(-.4)).tostring().encode('hex')
# '0000000000000080'
np.fix(np.array(-0.)).tostring().encode('hex')
# '0000000000000080'

That is, the resulting numbers are exactly the same, but

np.fix(np.array(0.)).tostring().encode('hex')
# '0000000000000000'

is different. This is why your method is not working, since it's comparing the binary representation of the numbers, which is different for the two zeros. Therefore, I think the problem is more the method of comparison than the general idea of comparing floating point numbers for uniqueness.

A quick timeit test for the various approaches:

data0 = np.fix(4*np.random.rand(1000000,)-2)
#   [ 1. -0.  1. -0. -0.  1.  1.  0. -0. -0. .... ]

N = 100
data = np.array(data0)
print timeit.timeit("data += 0.", setup="from __main__ import np, data", number=N)
#  0.171831846237
data = np.array(data0)
print timeit.timeit("data[data==0.] = 0.", setup="from __main__ import np, data", number=N)
#  0.83500289917
data = np.array(data0)
print timeit.timeit("data.astype(np.int).astype(np.float)", setup="from __main__ import np, data", number=N)
#  0.843791007996

I agree with @senderle's point that if you want simple and exact comparisons and can get by with ints, ints will generally be easier. But if you want unique floats, you should be able to do this too, though you need to do it a bit more carefully. The main issue with floats is that you can have small differences that can be introduced from calculations and don't appear in a normal print, but this isn't an huge barrier and especially not after a round, fix, rint for a reasonable range of floats.

Solution 2

I think the fundamental problem is that you're using set-like operations on floating-point numbers -- which is something to avoid as a general rule, unless you have a very good reason and a deep understanding of floating-point numbers.

The obvious reason to follow this rule is that even a very small difference between two floats registers as an absolute difference, so numerical error can cause set-like operations to produce unexpected results. Now, in your use case, it might initially seem that you've avoided that problem by rounding first, thereby limiting the range of possible values. But it turns out that unexpected results are still possible, as this corner case shows. Floating-point numbers are hard to reason about.

I think the correct fix is to round and then to convert to int using astype.

>>> a
array([-0.5,  2. ,  0.2, -3. , -0.2])
>>> numpy.fix(a)
array([-0.,  2.,  0., -3., -0.])
>>> numpy.fix(a).astype(int)    # could also use 'i8', etc...
array([ 0,  2,  0, -3,  0])

Since you're already rounding, this shouldn't throw away any information, and it will be more stable and predictable for set-like operations later. This is one of those cases where it's best to use the correct abstraction!

If you need floats, you can always convert back. The only problem with this is that it creates another copy; but most of the time that's not really a problem. numpy is fast enough that the overhead of copying is pretty tiny!

I'll add that if your case really demands the use of floats, then tom10's answer is a good one. But I feel that the number of cases in which both floats and set-like operations are genuinely necessary is very small.

Share:
10,721
Arash_D_B
Author by

Arash_D_B

Updated on June 05, 2022

Comments

  • Arash_D_B
    Arash_D_B almost 2 years

    I have a simple question about the fix and floor functions in numpy. When rounding negative numbers that are larger than -1 towards zero, numpy round them off correctly to zero however leaves a negative sign. This negative sign interferes with my costume unique_rows function since it uses the ascontiguousarray to compare elements of the array and this sign disturbs the uniqueness. Both round and fix behave the same in this regard.

    >>> np.fix(-1e-6)
    Out[1]: array(-0.0)
    >>> np.round(-1e-6)
    Out[2]: -0.0
    

    Any insights on how to get rid of the sign? I thought about using the np.sign function but it comes with extra computational cost.

  • tom10
    tom10 over 9 years
    I agree with your solution (so +1), but I think the reason is that the IEEE754 standard specifies 0. and -0. to be different (although they should compare to equal).
  • senderle
    senderle over 9 years
    @tom10, the OP seems to be aware of that, don't you think? But it's even more complex than you've suggested because we're talking about rounding in particular. I have no idea what the standard specifies regarding signed zeros in any of the four rounding rules it defines. And presumably numpy could disregard those rules and round only to positive zero, if it wanted! I think these issues would be difficult regardless of the particular standard in use.
  • tom10
    tom10 over 9 years
    I'll delete my comment and write my own answer. You clearly state here that the problem is "numerical error" and I'm trying to say that this isn't the issue. But I'll delete both of these comments in a few minutes so as not to muddy the waters.
  • senderle
    senderle over 9 years
    @tom10, I don't see a need to delete your comment. I guess my answer wasn't clear enough -- but I didn't say that the problem is numerical error. I said that the problem is using floating point numbers in set-like operations -- period. I'll rephrase to clarify.
  • senderle
    senderle over 9 years
    I agree that this is a good approach if it's necessary to stick with floats. (I wonder how it compares to Mark Ransom's idea of adding 0.0.) Also, I think that positive and negative zero come up as different because the uniqueness test linked to in the question casts the data to np.void.
  • Arash_D_B
    Arash_D_B over 9 years
    Thanks to @Mark Ransom and @tom10. Adding the 0.0 to the answer of fix or round command gets rid of the extra negative sign for the reasons that have been elaborated in the above. After resolving this issue I was able to write a function for python to find the unique rows in a numpy array with the option of accepting precision (number of decimals). This function can be found here.