What is Apache Zeppelin?
Apache Zeppelin is a sophisticated, web-based notebook that supports 20+ language backends, including SQL, R, Python, Scala, and other technologies. The multi-purpose notebook serves
- Data ingestion
- Data discovery
- Analytics
- Visualization and real-time collaboration
The concept of Zeppelin interpreters facilitates any language or data-processing backend to be plugged into a notebook. Apache Zeppelin interpreter supports several language backends, including Apache Spark, Python, R, JDBC, Apache Flink, Markdown, and Shell. Zeppelin provides inbuilt data visualization options like basic charts, pivot charts and also supports creating dynamic input forms using paragraph and note templates.
How does Apache Zeppelin help?
Apache Zeppelin is an open-source, incubating multi-purpose notebook that brings data-driven analytics features to Spark and Hadoop. The tool strengthens data scientists by building, executing, and sharing data code with visualizing results, and it supports the interactive execution of long workflows.
Apache Zeppelin Interpreter allows any language/data backend to be plugged into Zeppelin to support a growing ecosystem of data sources. Zeppelin notebook offers built-in Apache Spark integration without a need for a separate module, plugin, or library. It provides a highly interactive experience to data scientists with real-time collaboration, dynamic forms, and active community support.
Key Features of Apache Zeppelin
Improved Spark Interpreter
With Apache Zeppelin 0.10, Spark interpreter gives comparable Python and R user experience like Jupyter Notebook. It elevates user experience with multiple languages and execution models, inline visualization, multi-tenancy, and interactive development.
Zeppelin SDK
Zeppelin can be used as JobServer via Zeppelin SDK. Zeppelin client API encapsulates Zeppelin’s rest API and allows easy integration of notebook in the system. It enables programmatic jobs like creating/deleting notes/paragraphs, running notes/paragraphs, and more.
Inline generic configuration
The inline generic configuration provides fine-grained control on interpreter settings and more flexibility. ConfInterpreter is a type of generic interpreter that can be used to configure an interpreter inside each note and enable custom settings.
Interpreter lifecycle manager
Interpreter lifecycle manager releases resources when they are not in use by automatically terminating interpreter process on idle timeout.