Databricks display() function equivalent or alternative to Jupyter

18,754

Solution 1

When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.

# Alternative to Databricks display function.
import pandas as PD
pd.set_option('max_columns', None)

myDF.limit(10).toPandas().head()enter image description here

Solution 2

First Recommendation: When you use Jupyter, don't use df.show() instead use df.limit(10).toPandas().head() which results perfect display even better Databricks display()

Second Recommendation: Zeppelin Notebook. Just use z.show(df.limit(10))

Additionally in Zeppelin;

  1. You register your dataframe as SQL Table df.createOrReplaceTempView('tableName')
  2. Insert new paragraph beginning %sql then query your table with amazing display.

Solution 3

In recent IPython, you can just use display(df) if df is a panda dataframe, it will just work. On older version you might need to do a from IPython.display import display. It will also automatically display if the result of the last expression of a cell is a data_frame. For example this notebook. Of course the representation will depends on the library you use to make your dataframe. If you are using PySpark and it does not defined a nice representation by default, then you'll need to teach IPython how to display the Spark DataFrame. For example here is a project that teach IPython how to display Spark Contexts, and Spark Sessions.

Share:
18,754

Related videos on Youtube

Luis Leal
Author by

Luis Leal

Im a software and computer science engineer with experience developing software(back end primarily but not limited) including desktop,web, mobile apps. Im specialized in data having in my skillset: Machine Learning Data mining Artifical intelligence Software Engineering Big Data(hadoop, spark, etc) Business Intelligence/data warehousing Digital electronics Java,Python, R, .NET

Updated on September 16, 2022

Comments

  • Luis Leal
    Luis Leal over 1 year

    I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :

    dataframe.show()
    

    But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?