converting .rda to pandas dataframe
Solution 1
Thank you for your useful question. I tried the two ways proposed above to handle my problem.
For feather
, I faced this issue:
pyarrow.lib.ArrowInvalid: Not a Feather V1 or Arrow IPC file
For rpy2
, as mentioned by @Orange: "pandas2ri.ri2py_dataframe does not seem to exist any longer in rpy2 version 3.0.3" or later.
I searched for another workaround and found pyreadr
useful for me and maybe for those who are facing the same problems as I am: https://github.com/ofajardo/pyreadr
Usage: https://gist.github.com/LeiG/8094753a6cc7907c716f#gistcomment-2795790
pip install pyreadr
import pyreadr
result = pyreadr.read_r('/path/to/file.RData') # also works for Rds, rda
# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
Solution 2
You could try using the new feather library developed as a language agnostic dataframe to be used in either R or Python.
# Install feather
devtools::install_github("wesm/feather/R")
library(feather)
path <- "your_file_path"
write_feather(datafile, path)
Then install in python
$ pip install feather-format
And load in your datafile
import feather
path = 'your_file_path'
datafile = feather.read_dataframe(path)
Solution 3
As mentioned, consider converting the .rda file into individual .rds objects using R's mget
or eapply
for building Python dictionary of dataframes.
RPy2
import os
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
pandas2ri.activate()
base = importr('base')
base.load("datafile.rda")
rdf_List = base.mget(base.ls())
# ITERATE THROUGH LIST OF R DFs
pydf_dict = {}
for i,f in enumerate(base.names(rdf_List)):
pydf_dict[f] = pandas2ri.ri2py_dataframe(rdf_List[i])
for k,v in pydf_dict.items():
print(v.head())
Matina G
Updated on June 06, 2022Comments
-
Matina G almost 2 years
I have some .rda files that I need to access with Python. My code looks like this:
import rpy2.robjects as robjects from rpy2.robjects import r, pandas2ri pandas2ri.activate() df = robjects.r.load("datafile.rda") df2 = pandas2ri.ri2py_dataframe(df)
where df2 is a pandas dataframe. However, it only contains the header of the
.rda
file! I have searched back and forth. None of the solutions proposed seem to be working.Does anyone have an idea how to efficiently convert an
.rda
dataframe to a pandas dataframe? -
Nick almost 6 yearsWhy do you need to write out as rds and load back in? I am new to rpy2 but in your "python combined" code you could seemingly run it as far as the line
dfList = base.mget(base.ls())
. Then use afor
loop over the elements ofbase.names(dfList)
to populatedf_dict
with the commanddf_dict[i] = pandas2ri.ri2py_dataframe(robjects.globalenv[i])
. At least, that seemed to work for me... -
Parfait almost 6 yearsYou are in fact correct, @Nick. Given the five month old question, answer can be streamlined a bit without saving .rds's to disk. I think I got caught up in the weeds and did not see whole picture. Hindsight is always 20-20 right?
-
0range almost 5 years
pandas2ri.ri2py_dataframe
does not seem to exist any longer in rpy2 version 3.0.3. -
Marc Maxmeister almost 4 yearsI tried this on a
.rda
file and got this error:pyreadr.custom_errors.LibrdataError: The file is compressed using an unsupported compression scheme
-- any workarounds? -
Hoa Nguyen almost 4 yearsHi @MarcMaxmeister, is it possible to share the file? Actually, that package still has some limitations: github.com/ofajardo/pyreadr. I converted
rda
files from this repository: github.com/clauswilke/dviz.supp/tree/master/data and it worked quite well (41 out of 48 are successfully converted). My converted files were saved astsv
format here: github.com/nguyenhoa93/data-visualization-practice/tree/master/…. -
Marc Maxmeister almost 4 yearsThe .rda file is too big to share. I think gigabytes. It was a genomics database used by a defunct R library.
-
Marc Maxmeister almost 4 yearsI figured out a fix - I had to install R, then save to feather, and then load
from_feather
in python Pandas. -
lgautier over 3 yearsNote: If interested in using rpy2 with Arrow, there is this - github.com/rpy2/rpy2-arrow