Parse bson string in python?
12,996
You can do this to initialize a BSON instance with a string:
>>> s = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
>>> bson_obj = BSON(s)
>>> bson_obj.decode()
{u'hello': u'world'}
Author by
frazman
Updated on June 08, 2022Comments
-
frazman almost 2 years
I have a bson formatted string in file
I want to read that file and get the encoded json.
I was looking into the example here:
>>> from bson import BSON >>> bson_string = BSON.encode({"hello": "world"}) >>> bson_string '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00' >>> bson_string.decode() {u'hello': u'world'}
from http://docs.mongodb.org/meta-driver/latest/legacy/bson/
But what i have is say:
string = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
And now i want to parse this json? How do i do this? Thanks
Can you try to parse this bson formatted string:
s = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00'"""
So this is what I did: give a jsonstring
s = """'{ "_id" : "auromotiveengineering.com", "categories" : [ ], "host_act" : { "bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 } }, "name_servers" : [ \t"ns-2.activatedhost.com", \t"ns-1.activatedhost.com", \t"ns-3.activatedhost.com" ], "reputation" : null }"""
Now, loaded this string
jsn = json.loads(s) bson_string = BSON.encode(jsn) And then i copy paste bson_string so bson_string = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00 """
and for this when i try.. it throws an error :(
Another string where i have an error:
._idbrusselscityreporter.comcategorieshost_act�bnMzMC5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��NbnMyOS5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��Nname_serversA0ns30.domaincontrol.com1ns29.domaincontrol.com
-
Paulo Almeida over 10 yearsIt gives me an error,
objsize too large
. I don't know if that is really the case or just malformed. -
Paulo Almeida over 10 yearsIt worked for me. I wrote the bson string to a file, opened and read it back, decoded and got the original json. But I removed the initial
'
from yourjsonstring
, which is not closed. Was that a typo when you copied the string here or maybe that's the problem? Edit: It should be a typo, or it wouldn't have loaded. -
frazman over 10 yearsehh.. can you take a look at this string.. :( the last one with weird characters in it :(
-
Paulo Almeida over 10 years@Fraz, that last one doesn't look like json or bson. How did you generate it? Maybe you can manually turn it into a json string, the fields seem to be there. Or do you have many others like it? If that is the case, it would probably be better to try to understand the format and parse it somehow.