numpy boolean array with 1 bit entries
Solution 1
You might like to take a look at bitstring (documentation here).
If you create a ConstBitArray
or ConstBitStream
from a file then it will use mmap
and not load it into memory. In this case it won't be mutable so if you want to make changes it will have to be loaded in memory.
For example to create without loading into memory:
>>> a = bitstring.ConstBitArray(filename='your_file')
or
>>> b = bitstring.ConstBitStream(a_file_object)
Solution 2
To do this you can use numpy's packbits
and unpackbits
:
import numpy as np
# original boolean array
A1 = np.array([
[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[1, 1, 1, 1, 1],
], dtype=bool)
# packed data
A2 = np.packbits(A1, axis=None)
# checking the size
print(len(A1.tostring())) # 15 bytes
print(len(A2.tostring())) # 2 bytes (ceil(15/8))
# reconstructing from packed data. You need to resize and reshape
A3 = np.unpackbits(A2, count=A1.size).reshape(A1.shape).view(bool)
# and the arrays are equal
print(np.array_equal(A1, A3)) # True
Prior to numpy 1.17.0, the first function is straight-forward to use, but reconstruction required additional manipulations. Here is an example:
import numpy as np
# original boolean array
A1 = np.array([
[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[1, 1, 1, 1, 1],
], dtype=np.bool)
# packed data
A2 = np.packbits(A1, axis=None)
# checking the size
print(len(A1.tostring())) # 15 bytes
print(len(A2.tostring())) # 2 bytes (ceil(15/8))
# reconstructing from packed data. You need to resize and reshape
A3 = np.unpackbits(A2, axis=None)[:A1.size].reshape(A1.shape).astype(np.bool)
# and the arrays are equal
print(np.array_equal(A1, A3)) # True
Solution 3
You want a bitarray:
efficient arrays of booleans -- C extension
This module provides an object type which efficiently represents an array of booleans. Bitarrays are sequence types and behave very much like usual lists. Eight bits are represented by one byte in a contiguous block of memory. The user can select between two representations; little-endian and big-endian. All of the functionality is implemented in C. Methods for accessing the machine representation are provided. This can be useful when bit level access to binary files is required, such as portable bitmap image files (.pbm). Also, when dealing with compressed data which uses variable bit length encoding, you may find this module useful...
Related videos on Youtube
Andrea Zonca
Support my open-source work on healpy via Github Sponsors
Updated on January 07, 2022Comments
-
Andrea Zonca over 2 years
Is there a way in numpy to create an array of booleans that uses just 1 bit for each entry?
The standard
np.bool
type is 1 byte, but this way I use 8 times the required memory.On Google I found that C++ has
std::vector<bool>
.-
Salvador Dali almost 7 years
-
-
Andrea Zonca about 13 yearsthanks! it looks very useful, the only thing missing here is that the I/O routines load all the file into memory, while with 'numpy.load' I could use a memorymap.
-
Andrea Zonca about 13 yearsas I write just once then I do not need to modify my data.
-
Mad Physicist about 6 yearsJust as an addendum, I am working on improving this answer: github.com/numpy/numpy/pull/10855. The goal is to make packbits and unpackbits completely invertible without the reshaping.
-
Salvador Dali about 6 years@MadPhysicist this would be great. Thanks for doing this. When you are done, please either write you own answer or edit mine.
-
georges abitbol about 5 years@SalvadorDali seems like the PR made it to master
-
Kareem Jeiroudi over 4 yearsIs it easy to convert this bityarray back to a numpy array? Could you add a small example demonstrating that pleases?
-
rizerphe over 4 yearsThe only problem here is that compressing an array means containing both compressed and uncompressed array in memory, but sometimes you simply don't have enough ram to contain the bigger one and that's why you'd want to use a packed format
-
Mad Physicist over 2 years@SalvadorDali. Almost 3 years after you made the offer, I finally updated your answer with the information from the PR :)