packing and unpacking variable length array/string using the struct module in python
Solution 1
The struct
module does only support fixed-length structures. For variable-length strings, your options are either:
-
Dynamically construct your format string (a
str
will have to be converted to abytes
before passing it topack()
):s = bytes(s, 'utf-8') # Or other appropriate encoding struct.pack("I%ds" % (len(s),), len(s), s)
Skip
struct
and just use normal string methods to add the string to yourpack()
-ed output:struct.pack("I", len(s)) + s
For unpacking, you just have to unpack a bit at a time:
(i,), data = struct.unpack("I", data[:4]), data[4:]
s, data = data[:i], data[i:]
If you're doing a lot of this, you can always add a helper function which uses calcsize
to do the string slicing:
def unpack_helper(fmt, data):
size = struct.calcsize(fmt)
return struct.unpack(fmt, data[:size]), data[size:]
Solution 2
I've googled up this question and a couple of solutions.
construct
An elaborate, flexible solution.
Instead of writing imperative code to parse a piece of data, you declaratively define a data structure that describes your data. As this data structure is not code, you can use it in one direction to parse data into Pythonic objects, and in the other direction, convert (“build”) objects into binary data.
The library provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical structures of increasing complexity. Construct features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier:
Updated: Python 3.x, construct 2.10.67; also they have native PascalString, so renamed
from construct import *
myPascalString = Struct(
"length" / Int8ul,
"data" / Bytes(lambda ctx: ctx.length)
)
>>> myPascalString.parse(b'\x05helloXXX')
Container(length=5, data=b'hello')
>>> myPascalString.build(Container(length=6, data=b"foobar"))
b'\x06foobar'
myPascalString2 = ExprAdapter(myPascalString,
encoder=lambda obj, ctx: Container(length=len(obj), data=obj),
decoder=lambda obj, ctx: obj.data
)
>>> myPascalString2.parse(b"\x05hello")
b'hello'
>>> myPascalString2.build(b"i'm a long string")
b"\x11i'm a long string"
ed: Also pay attention to that ExprAdapter, once native PascalString won't be doing what you need from it, this is what you will be doing.
netstruct
A quick solution if you only need a struct
extension for variable length byte sequences. Nesting a variable-length structure can be achieved by pack
ing the first pack
results.
NetStruct supports a new formatting character, the dollar sign ($). The dollar sign represents a variable-length string, encoded with its length preceeding the string itself.
edit: Looks like the length of a variable-length string uses the same data type as the elements. Thus, the maximum length of variable-length string of bytes is 255, if words - 65535, and so on.
import netstruct
>>> netstruct.pack(b"b$", b"Hello World!")
b'\x0cHello World!'
>>> netstruct.unpack(b"b$", b"\x0cHello World!")
[b'Hello World!']
Solution 3
An easy way that I was able to do a variable length when packing a string is:
pack('{}s'.format(len(string)), string)
when unpacking it is kind of the same way
unpack('{}s'.format(len(data)), data)
Solution 4
Here's some wrapper functions I wrote which help, they seem to work.
Here's the unpacking helper:
def unpack_from(fmt, data, offset = 0):
(byte_order, fmt, args) = (fmt[0], fmt[1:], ()) if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, ())
fmt = filter(None, re.sub("p", "\tp\t", fmt).split('\t'))
for sub_fmt in fmt:
if sub_fmt == 'p':
(str_len,) = struct.unpack_from('B', data, offset)
sub_fmt = str(str_len + 1) + 'p'
sub_size = str_len + 1
else:
sub_fmt = byte_order + sub_fmt
sub_size = struct.calcsize(sub_fmt)
args += struct.unpack_from(sub_fmt, data, offset)
offset += sub_size
return args
Here's the packing helper:
def pack(fmt, *args):
(byte_order, fmt, data) = (fmt[0], fmt[1:], '') if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, '')
fmt = filter(None, re.sub("p", "\tp\t", fmt).split('\t'))
for sub_fmt in fmt:
if sub_fmt == 'p':
(sub_args, args) = ((args[0],), args[1:]) if len(args) > 1 else ((args[0],), [])
sub_fmt = str(len(sub_args[0]) + 1) + 'p'
else:
(sub_args, args) = (args[:len(sub_fmt)], args[len(sub_fmt):])
sub_fmt = byte_order + sub_fmt
data += struct.pack(sub_fmt, *sub_args)
return data
Solution 5
To pack use
packed=bytes('sample string','utf-8')
To unpack use
string=str(packed)[2:][:-1]
This works only on utf-8 string and quite simple workaround.
Related videos on Youtube
Hayo Friese
Updated on October 21, 2021Comments
-
Hayo Friese over 2 years
I am trying to get a grip around the packing and unpacking of binary data in Python 3. Its actually not that hard to understand, except one problem:
what if I have a variable length textstring and want to pack and unpack this in the most elegant manner?
As far as I can tell from the manual I can only unpack fixed size strings directly? In that case, are there any elegant way of getting around this limitation without padding lots and lots of unnecessary zeroes?
-
Hayo Friese over 13 yearsif adding the length/charcount to the binary data, how would you unpack it?
-
jonesy over 13 yearsThe OP's question mentions Python 3 specifically, and this answer doesn't work in Python 3 because string objects no longer support the buffer interface.
-
jscs over 11 years@jonesy: The only part that didn't work was the first snippet, passing a
str
topack()
; this has now been addressed. -
Thinkeye almost 7 yearsFor unpacking of C styled string in binary data block works also something like this
s.rstrip(b'\x00').decode("utf_8")
. -
Reinier Torenbeek about 6 yearsYou wrote about
netstruct
: "looks like it only uses one byte for a string length". However, the format character before the$
sign indicates the format to be used for its length. You choseb
, which is a 1-byte integer. If you had chosenh
,netstruct
would have used a 2-byte integer to represent the length. -
MolbOrg over 2 yearsgood answer, Alpha and Omega of my learning curve with construct, started from this answer and after fiddling a day got back for ExprAdapter as a solution for my: almost works as I need it. I wish my answers be that helpful.