Dask
Orchestration

Dask

Released: 
Mar 2015
  •  
Documentation
  •  
License:  
BSD-3-Clause License
718
Github open issues
8926
Github stars
21 Oct
Github last commit
3719
Stackoverflow questions

What is Dask?

Dask is an open-source Python library that helps scale Python packages across a compute cluster. It helps in parallelizing workloads that involve big data and require long computation time.  

Dask supports dynamic task scheduling for optimized interactive computational workloads, and the parallel data collections run on top of these dynamic task schedulers. It also facilitates analyzing large datasets similar to Spark or Big array libraries.


How Does Dask Help?

Data analytics is significantly influenced by Python and fueled by computational libraries like Pandas, Numpy, and Scikit-Learn. However, these packages are not scalable beyond a single machine. Dask helps scale these packages and the overall Python ecosystem to fit a multi-core machine and distributed clusters.

Dask perfectly complements the Python ecosystem by adhering to common standards and protocols. It helps make the most out of distributed and parallel computing with minimal coordination.

Dask supports parallelizing complex applications, which is difficult with traditional big-data technologies. For example, tasks involving advanced statistics or ML algorithms, or time series, or local operations.

Dask drives responsive feedback with a suite of investigative and diagnostic tools like a real-time dashboard and statistical profiler.


Key Features of Dask

Familiar API

Dask helps scale Pandas, Numpy, and Scikit-Learn workflows more natively with familiar APIs and data structures. It ensures frictionless scaling of workflows from a single system to a distributed cluster with minimum rewriting.  

Scalability

Dask easily scales up on clusters with 1000s of cores. At the same time, it allows scaling down and running workloads on a laptop in a single process.

Native

Dask enables distributed and parallel computing in pure Python with the help of the PyData stack. It complements the Python ecosystem comprising computational libraries like Pandas, Numpy, and Scikit-Learn and integrates natively with Python code. 

Speedy performance

Dask operates with low latency, low overhead, and minimal serialization required for fast numerical computations and algorithms.

Responsive feedback

Dask emphasizes keeping users informed and content with interactive computing, faster feedback, and diagnostics tools.

Flexibility

Dask offers a task scheduling interface for handling custom workloads and integrating with different projects.

Companies using

Dask

pangeo
prefect
sidewalks lab
No items found.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring