csv.DictWriter delimiter set to space implies text in " "

20,821

Normally, to suppress the quoting of data fields in the csv file all you need to do is add a quoting=csv.QUOTE_NONE keyword argument to your csv.DictWriter() constructor call.

However, just doing that in this case will cause an exception to be raised because some of the data fields contain the delimiter character you're trying to use, ' ', and a different escapechar was not also defined. This is mentioned in the documentation where it says:

If escapechar is not set, the writer will raise csv.Error if any characters that require escaping are encountered.

This all makes total sense if you think about it — a csv file would be unparsable (i.e. invalid) if the data in the fields themselves contained unescaped delimiter characters because there would no way to tell the data from the delimiters—they'd be non-identifiable—if the results looked like this:

1152901 Identifable X
1152902 Non identfiable X

So changing the delimiter to "\t" will get rid of the quotes (as you already discovered), or alternatively you could define an escapechar="\\" (or whatever) in the constructor call and leave the delimiter set to " " — because either there would no longer be a conflict between the file's delimiter and the contents of its data fields or there would be way to escape them when they're encountered.

If you really want the results to be the way you stated for some unknown reason, you can always write the file out manually, I suppose.

Update: Here's one possible workaround which changes any space characters in the data fields to '\xa0' which is a non-breaking space on my system but displays the same as a regular space character.

with open('fileC.csv', 'w') as F:
    head_fields = golds_two[0].keys()
    head_fields.remove('ID')
    print(head_fields)
    head_fields = sorted(head_fields, reverse=True)
    csvdw = csv.DictWriter(F, delimiter=" ", fieldnames=head_fields,
                           quoting=csv.QUOTE_NONE)
    #headers = dict((n, n) for n in head_fields)

    for z in golds_two:
        z.pop('ID', None)
        print(z)
        for k,v in z.iteritems():
            z[k] = v.replace(' ', '\xa0')  # Convert space chars to non-breaking.
        csvdw.writerow(z)
Share:
20,821
andilabs
Author by

andilabs

Andrzej Kostanski always open for cooperation. See my cv: http://andilabs.com/cv/cv.pdf experienced backend engineer working mostly with python based stacks. founder of Andilabs Python dev shop from Warsaw, Poland http://andilabs.com/ after hours I am a big fan of an active lifestyle and triathlon 🏊🚴🏃 http://anditriathlon.com

Updated on February 21, 2021

Comments

  • andilabs
    andilabs about 3 years

    Hi I noticed that while using DictWriter and delimiter=' ' instead of ',' the string are saved to file in "" while by use of comma without. How set things up to have strings without " "?

    CODE

    import csv
    golds_two =[]
    mydict ={}
    
    with open ('fileA.csv','rU') as csvfile:
        wszystkie=csv.DictReader(csvfile,delimiter=',')
        for w in wszystkie:
            mydict[(w['URL']).split('/')[-1]]=w['MediaObject ID']
    
    
    with open ('fileB.csv','rU') as csvfile:
        golds=csv.DictReader(csvfile,delimiter=';')
        for g in golds:
            g['MediaObject ID']=mydict[g['ID']]
            golds_two.append(g)
    
    
    with open('fileC.csv','w') as F: 
            head_fields =golds_two[0].keys()
            head_fields.remove('ID')
            print head_fields
            head_fields=sorted(head_fields,reverse=True)
            csvdw = csv.DictWriter(F,delimiter=" ",fieldnames=head_fields)
            headers = dict( (n,n) for n in head_fields)
    
            for z in golds_two:
                z.pop('ID',None)
                print z
                csvdw.writerow(z)
    

    fileA.csv:

    MediaObject ID,URL
    1152901,http://foo.bar/TRU716565.jpg
    1152902,http://foo.bar/TRU716566.jpg
    

    fileB.csv:

    ID;Gold label
    TRU716565.jpg;Identifable X
    TRU716566.jpg;Non identfiable X
    

    resulting fileC.csv:

    1152901 "Identifable X"
    1152902 "Non identfiable X"