Python 3 Decoding Strings

20,360

Solution 1

string = "\x22my quote\x22"
print(string)

You don't need to decode, Python 3 does that for you, but you need the correct control character for the double quote "

If however you have a different character set, it appears you have Windows-1252, then you need to decode the byte string from that character set:

str(b"\x94my quote\x94", "windows-1252")

If your string isn't a byte string you have to encode it first, I found the latin-1 encoding to work:

string = "\x94my quote\x94"
str(string.encode("latin-1"), "windows-1252")

Solution 2

I don't know if you mean to this, but this works:

some_binary = a = b"\x94my quote\x94"
result = some_binary.decode()

And you got the result... If you don't know which encoding to choose, you can use chardet.detect:

import chardet
chardet.detect(some_binary)

Solution 3

Did you try it like this? I think you need to call decode as a method of the byte class, and pass utf-8 as the argument. Add b in front of the string too.

string = b"\x94my quote\x94"
decoded_str = string.decode('utf-8', 'ignore')
print(decoded_str)
Share:
20,360
rmorshea
Author by

rmorshea

Self motivated individual with enthusiasm for new challenges, a strong mathematical background, close ties to open source communities, and experience with data visualization tools. My work is primarily done in Python and contributes towards the communication of back-end analytics to front-end displays.

Updated on July 23, 2022

Comments

  • rmorshea
    rmorshea almost 2 years

    I understand that this is likely a repeat question, but I'm having trouble finding a solution.

    In short I have a string I'd like to decode:

    raw = "\x94my quote\x94"
    string = decode(raw)
    

    expected from string

    '"my quote"'
    

    Last point of note is that I'm working with Python 3 so raw is unicode, and thus is already decoded. Given that, what exactly do I need to do to "decode" the "\x94" characters?

  • CIsForCookies
    CIsForCookies almost 7 years
    If you think, you should verify your solution
  • rmorshea
    rmorshea almost 7 years
    Hmmm, well "\x94" is not an input of my choosing, but rather from a website I'm parsing, and while print may send the decode string to stdout, I need to capture it as a variable.
  • Matthew Plemmons
    Matthew Plemmons almost 7 years
    My fault, corrected it. And you're right, SO is addicting but when my responses start getting that sloppy, it's time for bed. (:
  • CodeMonkey
    CodeMonkey almost 7 years
    It is captured as a variable. If I just write str in Python it will output '"myquote"'.
  • CodeMonkey
    CodeMonkey almost 7 years
    @rmorshea I amended my answer to include decoding the string from a different character set.
  • rmorshea
    rmorshea almost 7 years
    What if I'm not given the string as a binary? Am I forced to encode it somehow, and then decode it? my best guess is "\x94my quote\x94".encode("utf-8").decode('windows-1252') but this is wrong. I get ”my quote”