pandas.read_csv from string or package data
Solution 1
The following worked for me in 3.3:
>>> import numpy as np, pandas as pd
>>> import io, pkgutil
>>> wells = pkgutil.get_data('pymc.examples', 'data/wells.dat')
>>> type(wells)
<class 'bytes'>
>>> df = pd.read_csv(io.BytesIO(wells), encoding='utf8', sep=" ", index_col="id", dtype={"switch": np.int8})
>>> df.head()
switch arsenic dist assoc educ
id
1 1 2.36 16.826000 0 0
2 1 0.71 47.321999 0 0
3 0 2.07 20.966999 0 10
4 1 1.15 21.486000 0 12
5 1 1.10 40.874001 1 14
[5 rows x 5 columns]
N.B. I had to manually put wells.dat
in that location, so I can't swear I copied it correctly and that there isn't terminal whitespace, because I deleted some. But passing read_csv
a BytesIO
object and an encoding parameter should work. (Actually, you can probably get away without it, but it's a good habit. io.TextIOWrapper
might be another option.)
Solution 2
To pass a string
to pandas read_csv()
, you can use io.StringIO
, i.e.:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("csv string..."))
John Salvatier
Large scale statistics, functional programming, etc
Updated on July 09, 2022Comments
-
John Salvatier almost 2 years
I have some csv text data in a package which I want to read using read_csv. I was doing this by
from pkgutil import get_data from StringIO import StringIO data = read_csv(StringIO(get_data('package.subpackage', 'path/to/data.csv')))
However, StringIO.StringIO disappears in Python 3, and io.StringIO only accepts Unicode. Is there a simple way to do this?
Edit: the following does not appear to work
import pandas as pd import pkgutil from io import StringIO def get_data_file(pkg, path): f = StringIO() contents = unicode(pkgutil.get_data('pymc.examples', 'data/wells.dat')) f.write(contents) return f wells = get_data_file('pymc.examples', 'data/wells.dat') data = pd.read_csv(wells, delimiter=' ', index_col='id', dtype={'switch': np.int8})
failing with
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 401, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 209, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 509, in __init__ self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 611, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 893, in __init__ self._reader = _parser.TextReader(src, **kwds) File "parser.pyx", line 441, in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3940) File "parser.pyx", line 551, in pandas._parser.TextReader._get_header (pandas/src/parser.c:5096) pandas._parser.CParserError: Passed header=0 but only 0 lines in file
-
John Salvatier over 10 yearsThanks, I had figured out how to do it with io.StringIO(unicode(wells)), but this seems better.
-
addicted over 6 yearsThanks! this is very helpful. I am losing my mind over how to read the
<class 'byte'>
file usingpd.read_csv
-
Nikhil VJ almost 6 yearsthanks! I had CSV file contents uploaded through formdata in post request. this worked:
df = pd.read_csv( io.BytesIO( self.request.files['file1'][0]['body']) )
-
embulldogs99 almost 4 yearsThis works well for .dat files as well. I am replacing the .dat file's spaces with commas then using the above code to convert the comma separated string into a pandas df
-
InnocentBystander over 2 years@embulldogs99 you don't need to replace spaces, just use space as the field separator with
sep=' '