python opens text file with a space between every character
Solution 1
The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.
Try something like:
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.
EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.
The code snippet above seems to work on my machine with that file.
Solution 2
The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.
Solution 3
Isn't csv a simple txt file with values separated with comma. Just try to open it with a text editor to see if the file is correctly formed.
wlindner
Updated on June 05, 2022Comments
-
wlindner over 1 year
Whenever I try to open a .csv file with the python command
fread = open('input.csv', 'r')
it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?Thanks.
Update
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read() mytext = fread.decode('utf-16') mytext = mytext.encode('ascii', 'ignore') fwrite = open('input-ascii.csv', 'wb') fwrite.write(mytext)
Thanks!
-
wlindner over 14 yearswell, it is a text file and properly formated with | characters instead of commas, but the problem is actually before I ever try to read it into the csv reader.
-
wlindner over 14 yearsyeah, I think it's in unicode, is there a way to open the file in python, convert the file to ascii, write the file, then reopen it to load it as a csv?