python- how to unpack a text or set of strings

20,428

I think the first question to answer is: Why are you packing strings like this in the first place? Unless you are passing this as a data structure to a library that accepts the format you created above, you should not need to do that: store strings as text in text files - not as binary. If it is the case that you need space/performance, use an SQL engine - SQLITE will work fine.

Also attempt for the fact the code above is hard to read Python - it may be the only way to interpolate strings and numbers in Javascript, but in Python, you should use: txt = struct.pack('B%ds' % len(gvrTxt) , len(gvrTxt), gvrTxt) instead of

txt = struct.pack('B' + str(len(gvrTxt)) + 's', len(gvrTxt), gvrTxt)

You can't unpack this with the struct.unpack method without manually slicing it first, as you've noted, since UNPACK requires that the passed byte-string contained the packed values to be equal in size to the format passed.

You can retrieve the size by slicing first the size of the first field, and passing it to unpack, and then, unpack the remaining of the struct:

length = struct.unpack("B", text[0:1])[0]
gvrTxt = struct.unpack("%ds" % length, text[1:][0]

But of course you could not use struct at all, if you are using just bytestrings:

gvrTxt = text[1:]

If you concatenate the above structures together, doing something like:

data = dataType + varName + txt

you have to unpack then separately, using the length of varName to know where to pick the start of txt

datatype = struct.unpack("H", data[0:2])
lenvarname = ord (data[2])
varName = data[3: 3 + lenvarname]
txt = data [ 4+ lenvarname:]

But I insist, this is usually not needed in a Python program - you only will need this if you are generating a very specific file for another application to consume, or calling a Library in native code which does have rather improper Python bindings.

Also, pay attention that these techniques do not allow for the proper handling of "text" since there is no warranty that any text content will have one byte per character: please read http://www.joelonsoftware.com/articles/Unicode.html before coding any further, even if you think you will never find a unicode character in your life. Because you will. And your program will hit them first.

Share:
20,428
vinu
Author by

vinu

Updated on December 05, 2020

Comments

  • vinu
    vinu over 3 years

    How can I unpack them separately?? I want to get back the length of the strings(used) and the strings itself from txt? Any help?

    dataType = struct.pack('H', gvrDatatype)
    varName = struct.pack('B' + str(len(gvrVarname)) + 's', len(gvrVarname), gvrVarname)
    txt = struct.pack('B' + str(len(gvrTxt)) + 's', len(gvrTxt), gvrTxt)