converting .rda to pandas dataframe

11,300

Solution 1

Thank you for your useful question. I tried the two ways proposed above to handle my problem. For feather, I faced this issue:

pyarrow.lib.ArrowInvalid: Not a Feather V1 or Arrow IPC file

For rpy2, as mentioned by @Orange: "pandas2ri.ri2py_dataframe does not seem to exist any longer in rpy2 version 3.0.3" or later.

I searched for another workaround and found pyreadr useful for me and maybe for those who are facing the same problems as I am: https://github.com/ofajardo/pyreadr

Usage: https://gist.github.com/LeiG/8094753a6cc7907c716f#gistcomment-2795790

pip install pyreadr
import pyreadr

result = pyreadr.read_r('/path/to/file.RData') # also works for Rds, rda

# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1

Solution 2

You could try using the new feather library developed as a language agnostic dataframe to be used in either R or Python.

# Install feather
devtools::install_github("wesm/feather/R")

library(feather)
path <- "your_file_path"
write_feather(datafile, path)

Then install in python

$ pip install feather-format

And load in your datafile

import feather
path = 'your_file_path'
datafile = feather.read_dataframe(path)

Solution 3

As mentioned, consider converting the .rda file into individual .rds objects using R's mget or eapply for building Python dictionary of dataframes.

RPy2

import os
import pandas as pd

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri    
from rpy2.robjects.packages import importr

pandas2ri.activate()

base = importr('base')
base.load("datafile.rda")    
rdf_List = base.mget(base.ls())

# ITERATE THROUGH LIST OF R DFs 
pydf_dict = {}

for i,f in enumerate(base.names(rdf_List)):
    pydf_dict[f] = pandas2ri.ri2py_dataframe(rdf_List[i])

for k,v in pydf_dict.items():
    print(v.head())
Share:
11,300
Matina G
Author by

Matina G

Updated on June 06, 2022

Comments

  • Matina G
    Matina G almost 2 years

    I have some .rda files that I need to access with Python. My code looks like this:

    import rpy2.robjects as robjects
    from rpy2.robjects import r, pandas2ri
    
    pandas2ri.activate()
    df = robjects.r.load("datafile.rda")
    df2 = pandas2ri.ri2py_dataframe(df)
    

    where df2 is a pandas dataframe. However, it only contains the header of the .rda file! I have searched back and forth. None of the solutions proposed seem to be working.

    Does anyone have an idea how to efficiently convert an .rda dataframe to a pandas dataframe?

  • Nick
    Nick almost 6 years
    Why do you need to write out as rds and load back in? I am new to rpy2 but in your "python combined" code you could seemingly run it as far as the line dfList = base.mget(base.ls()). Then use a for loop over the elements of base.names(dfList) to populate df_dict with the command df_dict[i] = pandas2ri.ri2py_dataframe(robjects.globalenv[i]). At least, that seemed to work for me...
  • Parfait
    Parfait almost 6 years
    You are in fact correct, @Nick. Given the five month old question, answer can be streamlined a bit without saving .rds's to disk. I think I got caught up in the weeds and did not see whole picture. Hindsight is always 20-20 right?
  • 0range
    0range almost 5 years
    pandas2ri.ri2py_dataframe does not seem to exist any longer in rpy2 version 3.0.3.
  • Marc Maxmeister
    Marc Maxmeister almost 4 years
    I tried this on a .rda file and got this error: pyreadr.custom_errors.LibrdataError: The file is compressed using an unsupported compression scheme -- any workarounds?
  • Hoa Nguyen
    Hoa Nguyen almost 4 years
    Hi @MarcMaxmeister, is it possible to share the file? Actually, that package still has some limitations: github.com/ofajardo/pyreadr. I converted rda files from this repository: github.com/clauswilke/dviz.supp/tree/master/data and it worked quite well (41 out of 48 are successfully converted). My converted files were saved as tsv format here: github.com/nguyenhoa93/data-visualization-practice/tree/mast‌​er/….
  • Marc Maxmeister
    Marc Maxmeister almost 4 years
    The .rda file is too big to share. I think gigabytes. It was a genomics database used by a defunct R library.
  • Marc Maxmeister
    Marc Maxmeister almost 4 years
    I figured out a fix - I had to install R, then save to feather, and then load from_feather in python Pandas.
  • lgautier
    lgautier over 3 years
    Note: If interested in using rpy2 with Arrow, there is this - github.com/rpy2/rpy2-arrow