csv class writer : a bytes-like object is required, not 'str'

16,069

You may be trying to write to a BytesIO object, but csv.writer() deals in strings only. From the csv writer objects documentation:

A row must be an iterable of strings or numbers

Emphasis mine. csv.writer() also requires a text file to write to; the object produces strings:

[...] converting the user’s data into delimited strings on the given file-like object.

Either use an io.StringIO object instead, or wrap the BytesIO object in an io.TextIOWrapper object to handle encoding for you. Either way, you'll need to pass in Unicode text to csv.writer().

Because you are later on treating the s.getvalue() data as strings again (using regular expressions defined as strings, and using encoding to Latin-1), you probably want to write to text file (so StringIO).

That f.write(BOM_UTF8) fails is a separate issue. f was opened in text mode ('wt') so expects strings, not bytes. If you want to write text to a file encoded as UTF-8 with an UTF-8 BOM at the start you can use the utf-8-sig encoding when opening the file:

open(path, 'w', encoding='utf-8-sig')

Generally, you appear to be mixing bytes and strings in all the wrong ways. Leave text as text for as long as possible, and only encode at the last possible moment. Here that moment would be when writing to the file at location path, and you can leave the encoding entirely to the file object.

Share:
16,069
FSRubyc
Author by

FSRubyc

Updated on June 05, 2022

Comments

  • FSRubyc
    FSRubyc about 2 years

    For a personal project I'm trying to upgrade the paterns package to Python 3. Actually I'm running the test:db.py, but I'm stuck with the following error in the '__init__.py' file, on a csv class:

    This is the code snippet od the save() function: there, we dfine s as a BytesIO() stream, so the function is asked to stream bytes to a self csv file. The error comes from the line:

    w.writerows([[csv_header_encode(name, type) for name, type in self.fields]])
    
    TypeError: a bytes-like object is required, not 'str' ( below, also the code for this function)
    

    it's supposed that csv_header_encode to deliver bytes, and I checked this and it does, but somehow, in its conversion to list it changes to 'str'. And if I change the s encoding to StringsIO then the complaining comes from

     f.write(BOM_UTF8)
    

    Any help will be appreciated.

    def save(self, path, separator=",", encoder=lambda v: v, headers=False, password=None, **kwargs):
        """ Exports the table to a unicode text file at the given path.
            Rows in the file are separated with a newline.
            Columns in a row are separated with the given separator (by default, comma).
            For data types other than string, int, float, bool or None, a custom string encoder can be given.
        """
        # Optional parameters include all arguments for csv.writer(), see:
        # http://docs.python.org/library/csv.html#csv.writer
        kwargs.setdefault("delimiter", separator)
        kwargs.setdefault("quoting", csvlib.QUOTE_ALL)
        # csv.writer will handle str, int, float and bool:
        s = BytesIO()
        w = csvlib.writer(s,  **kwargs)
        if headers and self.fields is not None:
            w.writerows([[csv_header_encode(name, type) for name, type in self.fields]])
        w.writerows([[encode_utf8(encoder(v)) for v in row] for row in self])
        s = s.getvalue()
        s = s.strip()
        s = re.sub("([^\"]|^)\"None\"", "\\1None", s)
        s = (s if not password else encrypt_string(s, password)).encode('latin-1')
        f = open(path, "wt")
        f.write(BOM_UTF8)
        f.write(s)
        f.close()
    
    def csv_header_encode(field, type=STRING):
        # csv_header_encode("age", INTEGER) => "age (INTEGER)".
        t = re.sub(r"^varchar\(.*?\)", "string", (type or ""))
        t = t and " (%s)" % t or ""
        return "%s%s" % (encode_utf8(field or ""), t.upper())
    
  • FSRubyc
    FSRubyc about 7 years
    Right. As I said, changing to StringIO solves the first error and raises the second. I changed the open method as you sugested{ f = open(path, "w", encoding='utf-8-sig') } but then it comes the error: { write() argument must be str, not bytes}
  • Martijn Pieters
    Martijn Pieters about 7 years
    @FSRubyc: which is why we ask people to create a minimal reproducible example of a problem, and ask separate questions about separate issues.
  • Martijn Pieters
    Martijn Pieters about 7 years
    @FSRubyc: yes, as I state in my answer, keep everything as text, not bytes.
  • Martijn Pieters
    Martijn Pieters about 7 years
    @FSRubyc: also, why are you attempting to encode something as Latin-1 bytes but prepend it with a UTF-8 BOM? Pick an encoding and stick to it. Preferably leave it to the file object to handle encoding.
  • FSRubyc
    FSRubyc about 7 years
    I pick a package in Python 2.7 (github.com/clips/pattern) and I'm trying to make the minimum changes on it to make it Python 3 compatible.
  • Martijn Pieters
    Martijn Pieters about 7 years
    @FSRubyc: ah, oh dear. That code is not very well written and has multiple issues even in the Python 2 implementation. I'm not sure that's really worth your time, especially if you are not well versed in Python yet.
  • FSRubyc
    FSRubyc about 7 years
    The encoding to 'latin-1' was a mod I did to test the problem, but I already took it out .
  • FSRubyc
    FSRubyc about 7 years
    Thanks Martijin for your time. Well, somtimes you win and sometimes you learn.