Convert Pandas dataframe to Dask dataframe
48,901
I think you can use dask.dataframe.from_pandas
:
from dask import dataframe as dd
sd = dd.from_pandas(df, npartitions=3)
print (sd)
dd.DataFrame<from_pa..., npartitions=2, divisions=(0, 1, 2)>
EDIT:
I find solution:
import pandas as pd
import dask.dataframe as dd
from dask.dataframe.utils import make_meta
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
dsk = {('x', 0): df}
meta = make_meta({'a': 'i8', 'b': 'i8'}, index=pd.Index([], 'i8'))
d = dd.DataFrame(dsk, name='x', meta=meta, divisions=[0, 1, 2])
print (d)
dd.DataFrame<x, npartitions=2, divisions=(0, 1, 2)>
Related videos on Youtube
Author by
rey
Updated on September 28, 2020Comments
-
rey over 3 years
Suppose I have pandas dataframe as:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
When I convert it into dask dataframe what should
name
anddivisions
parameter consist of:from dask import dataframe as dd sd=dd.DataFrame(df.to_dict(),divisions=1,meta=pd.DataFrame(columns=df.columns,index=df.index))
TypeError: init() missing 1 required positional argument: 'name'
Edit : Suppose I create a pandas dataframe like:
pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
Similarly how to create dask dataframe as it needs three additional arguments as
name,divisions
andmeta
.sd=dd.Dataframe({'a':[1,2,3],'b':[4,5,6]},name=,meta=,divisions=)
Thank you for your reply.
-
rey over 7 yearsThanks for the reply but I want to what is name and divisions parameter, while creating dask dataframe.I have gone through the documentation but couldn't understand.
-
jezrael over 7 yearsI am not
dask
expert, but I think you need rom-raw-dask-graphs.But I think author of dask explain more. -
rey over 7 yearsThank you I'll try to figure it out and wait for other answers.
-
MRocklin over 7 years@jezrael is correct. You should create a Dask.DataFrame using the from-pandas method. You only need to use the constructor in advanced situations
-
rey over 7 years@MRocklin I got it but creation of dataframe in pandas is easy as mentioned in
edit
but similarly how to create a simple dataframe directly not from pandas.I asked question for pandas so @jezrael is correct but I just wanted to know creating a sample dataframe directly. -
Arco Bast over 7 yearsI agree, this would be interesting to know.
-
jezrael over 7 years@MRocklin - I add solution, can you check it? Thank you.
-
jezrael over 7 years@rey - I find solution, please check it.
-
rey over 7 years@jezrael thanks for adding solution.I had searched through github dask but couldn't find it.