Parse bson string in python?

12,996

You can do this to initialize a BSON instance with a string:

>>> s = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
>>> bson_obj = BSON(s)
>>> bson_obj.decode()
{u'hello': u'world'}
Share:
12,996
frazman
Author by

frazman

Updated on June 08, 2022

Comments

  • frazman
    frazman almost 2 years

    I have a bson formatted string in file

    I want to read that file and get the encoded json.

    I was looking into the example here:

    >>> from bson import BSON
    >>> bson_string = BSON.encode({"hello": "world"})
    >>> bson_string
    '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
    >>> bson_string.decode()
    {u'hello': u'world'}
    

    from http://docs.mongodb.org/meta-driver/latest/legacy/bson/

    But what i have is say:

    string = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
    

    And now i want to parse this json? How do i do this? Thanks


    Can you try to parse this bson formatted string:

    s = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00'"""
    

    So this is what I did: give a jsonstring

       s = """'{ "_id" : "auromotiveengineering.com", "categories" : [ ], "host_act" : { "bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 } }, "name_servers" : [ \t"ns-2.activatedhost.com", \t"ns-1.activatedhost.com", \t"ns-3.activatedhost.com" ], "reputation" : null }"""
    

    Now, loaded this string

    jsn = json.loads(s)
    
    bson_string = BSON.encode(jsn)
    
    And then i copy paste bson_string 
    
    so bson_string = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00
    """
    

    and for this when i try.. it throws an error :(

    Another string where i have an error:

    ._idbrusselscityreporter.comcategorieshost_act�bnMzMC5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��NbnMyOS5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��Nname_serversA0ns30.domaincontrol.com1ns29.domaincontrol.com
    
  • Paulo Almeida
    Paulo Almeida over 10 years
    It gives me an error, objsize too large. I don't know if that is really the case or just malformed.
  • Paulo Almeida
    Paulo Almeida over 10 years
    It worked for me. I wrote the bson string to a file, opened and read it back, decoded and got the original json. But I removed the initial ' from your jsonstring, which is not closed. Was that a typo when you copied the string here or maybe that's the problem? Edit: It should be a typo, or it wouldn't have loaded.
  • frazman
    frazman over 10 years
    ehh.. can you take a look at this string.. :( the last one with weird characters in it :(
  • Paulo Almeida
    Paulo Almeida over 10 years
    @Fraz, that last one doesn't look like json or bson. How did you generate it? Maybe you can manually turn it into a json string, the fields seem to be there. Or do you have many others like it? If that is the case, it would probably be better to try to understand the format and parse it somehow.