Can I make decode(errors="ignore") the default for all strings in a Python 2.7 program?

33,966

Solution 1

You cannot redefine methods on built-in types, and you cannot change the default value of the errors parameter to str.decode(). There are other ways to achieve the desired behaviour, though.

The slightly nicer way: Define your own decode() function:

def decode(s, encoding="ascii", errors="ignore"):
    return s.decode(encoding=encoding, errors=errors)

Now, you will need to call decode(s) instead of s.decode(), but that's not too bad, isn't it?

The hack: You can't change the default value of the errors parameter, but you can overwrite what the handler for the default errors="strict" does:

import codecs
def strict_handler(exception):
    return u"", exception.end
codecs.register_error("strict", strict_handler)

This will essentially change the behaviour of errors="strict" to the standard "ignore" behaviour. Note that this will be a global change, affecting all modules you import.

I recommend neither of these two ways. The real solution is to get your encodings right. (I'm well aware that this isn't always possible.)

Solution 2

As mentioned in my thread on the issue the hack from Sven Marnach is even possible without a new function:

import codecs
codecs.register_error("strict", codecs.ignore_errors)

Solution 3

I'm not sure what your setup is exactly, but you can derive a class from str and override its decode method:

class easystr(str):
    def decode(self):
        return str.decode(self, errors="ignore")

If you then convert all incoming strings to easystr, errors will be silently ignored:

line = easystr(input.readline())

That said, decoding a string converts it to unicode, which should never be lossy. Could you figure out which encoding your strings are using and give that as the encoding argument to decode? That would be a better solution (and you can still make it the default in the above way).

Yet another thing you should try is to read your data differently. Do it like this and the decoding errors may well disappear:

import codecs
input = codecs.open(filename, "r", encoding="latin-1") # or whatever
Share:
33,966
Paul Hoffman
Author by

Paul Hoffman

Updated on March 23, 2020

Comments

  • Paul Hoffman
    Paul Hoffman about 4 years

    I have a Python 2.7 program that writes out data from various external applications. I continually get bitten by exceptions when I write to a file until I add .decode(errors="ignore") to the string being written out. (FWIW, opening the file as mode="wb" doesn't fix this.)

    Is there a way to say "ignore encoding errors on all strings in this scope"?