Python decoding excel sheet without pandas
You need to unzip the xlsx file first, before you can read its contents (assuming that is the format you are using).
Related videos on Youtube
jake wong
Updated on June 04, 2022Comments
-
jake wong over 1 year
I am trying to read an excel file in python without using
pandas
orxlrd
, and I have been trying to convert the results frombytes
toutf-8
without any success.data from xls file
colA colB colC spc 1D0 20190705 spd 1D0 20190705 spe 1D0 20190705 ... (goes on for 500k lines)
code
with open(file, 'rb') as f: data = f.readlines(1) # Just to check the first line that is printed out print(data[0].decode('utf-8'))
The error I receive is
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
If I were to print
data
without decoding it, the result is:[b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x03\x00\xfe\xff\t\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9e\x00\x00\x00\x9dN\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\xfeM\x00\x00\x01\x00\x00\x00\xffM\x00\x00\x00N\x00\x00\x01N\x00\x00\x02N\x00\x00\x03N\x00\x00\x04N\x00\x00\x05N\x00\x00\x06N\x00\x00\x07N\x00\x00\x08N\x00\x00\tN\x00\x00\n']
There isn't any reason why I don't want to use
pandas
orxlrd
, I am just trying to parse the data with just the standard libraries if required.Any thoughts?
-
amanb over 4 yearsThe error tells there is a specific character in the Excel file that cannot be decoded with 'utf-8'. Try using a different encoder, but still its not known what sort of characters maybe lurking around in the doc. Perhaps, you should give pandas a try:
pd.read_excel(file)
and see what you get. -
lenz over 4 yearsExcel is a binary format, not plain-text. If you don't want to use
xlrd
orpd.read_excel
, you'll have to reimplement what those libraries do. -
John Y over 4 yearsEven if you want to parse .xlsx files, which are considerably easier than .xls, you still have quite a bit of work to do. I guess you are doing this as a learning exercise? If so, then I think you should take a look at this question to find out where to read about the .xlsx specifications. If you are truly trying to learn about .xls files, I urge you to reconsider. There are plenty of other things you could be learning about that are more useful and less painful.
-
-
lenz over 4 yearsIdeally, you should show some code how to do this (eg. using the std-lib
zipfile
module) and then how to proceed, once the xlsx archive is unpacked (which file to process, how to access the data of a cell etc.) -
pygri over 4 yearsit would probably be wise to wait for a confirmation that xlsx is indeed the format the OP is trying to read before embarking in such an enterprise...
-
EirÃkr Útlendi over 3 yearsSee also this comment in another thread, presenting a solution to reading an `*.xlsx* Excel file using just standard library functionality.
-
George Crowther almost 2 yearsFrom the description the OP has given (though they have not been specific), this does not appear to be answering the question posed. Your solution is for a text based file, the OP appears to be struggling with an (assumed) .xls or .xlsx file.