What is Horovod?
Horovod is Uber’s open-source framework for distributed deep learning that supports major deep learning toolkits like Keras, TensorFlow, PyTorch, and Apache MXNet. Horovod allows an existing training script to execute on hundreds of GPUs with a small Python code. It can be installed on-premise or run in cloud platforms such as Azure, AWS, and Databricks.
Horovod runs on top of Apache Spark for unified data processing and model training. It offers the flexibility of using the same infrastructure to train models and switch between PyTorch, TensorFlow, and MXNet. It accelerates distributed training and includes multiple optimization methods for faster distributed training.
How does Horovod Help?
Horovod helps simplify the scaling of a single-GPU training script across multiple GPUs in parallel. For this, the two prime aspects considered are – code modification requirements and speed.
Horovod uses Message Passing Interface (MPI) concepts that help to scale training scripts with minimal code changes, unlike the previous solution such as Distributed TensorFlow with parameter servers. Horovod enables scaling a training script across a single-GPU, multiple-GPUs, or even multiple hosts without further code changes.
Although installing MPI and NCCL is not hassle-free, it is a one-time job while the rest team seamlessly scales the ML training script.
Key Features of Horovod
Stand-alone Python package
Horovod is a stand-alone Python package that leverages the ring-allreduce algorithm without upgrading to the latest version of TensorFlow and applying patches to the current version. Horovod installation takes a few minutes to an hour based on hardware.
Effective NCCL implementation
Horovod replaced Baidu ring- allreduce implementation with NVIDIA’s library- NCCL that supports collective communication and works as an optimized version of ring-allreduce. NCCL 2 facilitates running ring-allreduce across multiple machines and optimizes performance.
MPI concepts
Horovod core principles are based on the Message Passing Interface (MPI) concepts like rank, size, local, allgather, allreduce, alltoall, and broadcast.
Supported frameworks
Horovod supports TensorFlow, Keras, MXNet, PyTorch, and XLA in TensorFlow.
Efficiency
Horovod helps scale up hundreds of GPUs with upwards of 90% scaling efficiency, easy-to-use mechanisms, and portability.