Load CSV to Pandas MultiIndex DataFrame

python csv numpy pandas

36,155

Solution 1

You could use pd.read_csv:

>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
                       dep  freq   arr   code  mode
from       to                                      
RGBOXFD    RGBPADTON   127     0    27  99999     2
           RGBPADTON   127     0    33  99999     2
           RGBRDLEY    127     0  1425  99999     2
           RGBCHOLSEY  127     0    52  99999     2
           RGBMDNHEAD  127     0    91  99999     2
RGBDIDCOTP RGBPADTON   127     0    46  99999     2
           RGBPADTON   127     0     3  99999     2
           RGBCHOLSEY  127     0    61  99999     2
           RGBRDLEY    127     0  1430  99999     2
           RGBPADTON   127     0   115  99999     2

where I've used skipinitialspace=True to get rid of those annoying spaces in the header row.

Solution 2

from_csv() works similarly:

import pandas as pd

df = pd.DataFrame.from_csv(
    'data.txt',
    index_col = [0, 1]
)

print df

--output:--
                        dep   freq   arr   code   mode
from        to                                        
RGBOXFD    RGBPADTON    127      0    27  99999      2
           RGBPADTON    127      0    33  99999      2
           RGBRDLEY     127      0  1425  99999      2
           RGBCHOLSEY   127      0    52  99999      2
           RGBMDNHEAD   127      0    91  99999      2
RGBDIDCOTP RGBPADTON    127      0    46  99999      2
           RGBPADTON    127      0     3  99999      2
           RGBCHOLSEY   127      0    61  99999      2
           RGBRDLEY     127      0  1430  99999      2
           RGBPADTON    127      0   115  99999      2

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html#pandas.DataFrame.from_csv

From this discussion,

https://github.com/pydata/pandas/issues/4916

it looks like read_csv() was implemented to allow you to set more options, which makes from_csv() superfluous.

36,155

Author by

handloomweaver

Updated on March 30, 2020

Comments

handloomweaver about 4 years

I have a 719mb CSV file that looks like:

from, to, dep, freq, arr, code, mode   (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99999,2
RGBOXFD,RGBRDLEY,127,0,1425,99999,2
RGBOXFD,RGBCHOLSEY,127,0,52,99999,2
RGBOXFD,RGBMDNHEAD,127,0,91,99999,2
RGBDIDCOTP,RGBPADTON,127,0,46,99999,2
RGBDIDCOTP,RGBPADTON,127,0,3,99999,2
RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2
RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2
RGBDIDCOTP,RGBPADTON,127,0,115,99999,2
and so on...

I want to load in to a pandas DataFrame. Now I know there is a load from csv method:

 r = pd.DataFrame.from_csv('test_data2.csv')

But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:

So ending up with:

                   dep, freq, arr, code, mode
RGBOXFD RGBPADTON  127     0   27  99999    2
        RGBRDLEY   127     0   33  99999    2
        RGBCHOLSEY 127     0 1425  99999    2
        RGBMDNHEAD 127     0 1525  99999    2

etc. I'm not sure how to do that?

Jeff Tratner over 10 years

FYI: from_csv is deprecated (it does the same thing as read_csv, but with some surprising settings, and sometimes behaves strangely with date parsing).
7stud over 10 years

from_csv is deprecated I'm not seeing that.
Jeff Tratner over 10 years

You're absolutely right, pandas hasn't deprecated it yet. Sorry! That said, we've been thinking about deprecating. It's also just a little weird, like how it defaults to parses ints as dates, and we've been discussing deprecating it I guess).
7stud over 10 years

Np. I read the discussion at your link as well as the discussion it links to in the last post, which is the link I included in my post.
safay about 2 years

And if you're looking for column-wise multi-index, instead call: df = pd.read_csv("data.csv", header=[0,1], index_col=0))