What is apache zeppelin?

11,849

Solution 1

What is a note book interface ?

An interface for interactively running code, exploring and visualizing data. They allow you to mix narrative, rich media and data.


Short Answer : Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Long answer :

  1. Zeppelin notebook gives you an easy, straightforward way to execute arbitrary code in a web notebook. You can execute Scala, SQL, and even schedule a job (via cron) to run at a regular interval.

  2. First it's easier to mix languages in the same notebook. You can do some SQL, scala, then markdown to document it all together. You can also easily convert your notebook into a presentation style - for maybe presenting to a management or using in dashboards.

  3. The Jupyter (formerly known as IPython) Notebook that has been extremely popular in the Python community. I cant use the word "replace" rather I would use similar kind of...

Further more .

  • Zeppelin supports Spark, PySpark, Spark R, Spark SQL with dependency loader.

  • Zeppelin lets you connect any JDBC data sources seamlessly. Postgresql, Mysql, MariaDB, Redshift, Apache Hive and so on.

  • Python is supported with Matplotlib, Conda, Pandas SQL and PySpark integrations.

Solution 2

Zeppelin is a great tool. It enables use different backend/languages in a single notebook. Here is a simple use case.

  1. Write some description using Markdown
  2. Prepare data using Shell. e.g. download files with curl/wget, inject to HDFS
  3. Doing data analysis with Spark
  4. Simple visualisation with SQL
  5. Export the result with Shell
  6. Publish graph with a link

All those steps can be done in a single notebook. And there are much more can be done in a single notebook.

Zeppelin is very close to Databricks.com online solution

Share:
11,849
Farooque
Author by

Farooque

Updated on June 07, 2022

Comments

  • Farooque
    Farooque almost 2 years

    As we are hearing often about apache zeppelin, So few questions comes to our mind:

    1. What is Apache zeppelin?
    2. What new and/or extra it is adding to Big data ecosystem?
    3. Is it a replacement of some of the framework(s)/tool(s) already existing in Big data ecosystem?
  • Chris Ivan
    Chris Ivan over 2 years
    The databricks notebook, you mean. There's a lot more to databricks as a service than just their implementation of notebooks, but it's fair to say that they built everything on open source, possibly including extending zeppelin.