Load CSV to Pandas MultiIndex DataFrame
36,155
Solution 1
You could use pd.read_csv
:
>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
dep freq arr code mode
from to
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBPADTON 127 0 33 99999 2
RGBRDLEY 127 0 1425 99999 2
RGBCHOLSEY 127 0 52 99999 2
RGBMDNHEAD 127 0 91 99999 2
RGBDIDCOTP RGBPADTON 127 0 46 99999 2
RGBPADTON 127 0 3 99999 2
RGBCHOLSEY 127 0 61 99999 2
RGBRDLEY 127 0 1430 99999 2
RGBPADTON 127 0 115 99999 2
where I've used skipinitialspace=True
to get rid of those annoying spaces in the header row.
Solution 2
from_csv() works similarly:
import pandas as pd
df = pd.DataFrame.from_csv(
'data.txt',
index_col = [0, 1]
)
print df
--output:--
dep freq arr code mode
from to
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBPADTON 127 0 33 99999 2
RGBRDLEY 127 0 1425 99999 2
RGBCHOLSEY 127 0 52 99999 2
RGBMDNHEAD 127 0 91 99999 2
RGBDIDCOTP RGBPADTON 127 0 46 99999 2
RGBPADTON 127 0 3 99999 2
RGBCHOLSEY 127 0 61 99999 2
RGBRDLEY 127 0 1430 99999 2
RGBPADTON 127 0 115 99999 2
From this discussion,
https://github.com/pydata/pandas/issues/4916
it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.
Author by
handloomweaver
Updated on March 30, 2020Comments
-
handloomweaver about 4 years
I have a 719mb CSV file that looks like:
from, to, dep, freq, arr, code, mode (header row) RGBOXFD,RGBPADTON,127,0,27,99999,2 RGBOXFD,RGBPADTON,127,0,33,99999,2 RGBOXFD,RGBRDLEY,127,0,1425,99999,2 RGBOXFD,RGBCHOLSEY,127,0,52,99999,2 RGBOXFD,RGBMDNHEAD,127,0,91,99999,2 RGBDIDCOTP,RGBPADTON,127,0,46,99999,2 RGBDIDCOTP,RGBPADTON,127,0,3,99999,2 RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2 RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2 RGBDIDCOTP,RGBPADTON,127,0,115,99999,2 and so on...
I want to load in to a pandas DataFrame. Now I know there is a load from csv method:
r = pd.DataFrame.from_csv('test_data2.csv')
But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:
So ending up with:
dep, freq, arr, code, mode RGBOXFD RGBPADTON 127 0 27 99999 2 RGBRDLEY 127 0 33 99999 2 RGBCHOLSEY 127 0 1425 99999 2 RGBMDNHEAD 127 0 1525 99999 2
etc. I'm not sure how to do that?
-
Jeff Tratner over 10 yearsFYI: from_csv is deprecated (it does the same thing as read_csv, but with some surprising settings, and sometimes behaves strangely with date parsing).
-
7stud over 10 yearsfrom_csv is deprecated I'm not seeing that.
-
Jeff Tratner over 10 yearsYou're absolutely right, pandas hasn't deprecated it yet. Sorry! That said, we've been thinking about deprecating. It's also just a little weird, like how it defaults to parses ints as dates, and we've been discussing deprecating it I guess).
-
7stud over 10 yearsNp. I read the discussion at your link as well as the discussion it links to in the last post, which is the link I included in my post.
-
safay about 2 yearsAnd if you're looking for column-wise multi-index, instead call:
df = pd.read_csv("data.csv", header=[0,1], index_col=0))