Converting pandas.DataFrame to bytes
You can use df.to_records()
to convert your dataframe to a numpy recarray, then call .tostring()
to convert this to a string of bytes:
rec = df.to_records(index=False)
print(repr(rec))
# rec.array([(10, 18446744073709551615, 13240000000.0), (15, 230498234019, 3.14159),
# (20, 32094812309, 234.1341)],
# dtype=[('a', '|u1'), ('b', '<u8'), ('c', '<f8')])
s = rec.tostring()
rec2 = np.fromstring(s, rec.dtype)
print(np.all(rec2 == rec))
# True
Paul Joireman
Python, AWS, Java, Angular, C++, OOP, Design Patterns,
Updated on April 28, 2022Comments
-
Paul Joireman about 2 years
I need convert the data stored in a
pandas.DataFrame
into a byte string where each column can have a separate data type (integer or floating point). Here is a simple set of data:df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a']) df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8') df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')
and df looks something like this:
a b c 0 10 18446744073709551615 1.324000e+10 1 15 230498234019 3.141590e+00 2 20 32094812309 2.341341e+02
The
DataFrame
knows about the types of each columndf.dtypes
so I'd like to do something like this:data_to_pack = [tuple(record) for _, record in df.iterrows()] data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes)) data_bytes = data_array.tostring()
This typically works fine but in this case (due to the maximum value stored in
df['b'][0]
. The second line above converting the array of tuples to annp.array
with a given set of types causes the following error:OverflowError: Python int too large to convert to C long
The error results (I believe) in the first line which extracts the record as a
Series
with a single data type (defaults tofloat64
) and the representation chosen infloat64
for the maximumuint64
value is not directly convertible back touint64
.1) Since the
DataFrame
already knows the types of each column is there a way to get around creating a row of tuples for input into the typednumpy.array
constructor? Or is there a better way than outlined above to preserve the type information in such a conversion?2) Is there a way to go directly from
DataFrame
to a byte string representing the data using the type information for each column. -
user17242583 about 2 yearsWhile this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.