Distributed Deep Learning Training with Enroot on PARAM Rudra

Distributed Deep Learning Training with Enroot on PARAM Rudra

Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Women in HPC Poster
HPC in the Cloud and HPC Containers

Information

Poster is on display and will be presented at the poster pitch session.
The poster presents an approach to scalable, secure, and efficient containerization for high-performance computing (HPC) workloads, focusing on distributed deep learning training on the PARAM Rudra supercomputing facility. The methodology leverages Enroot, a lightweight containerization tool, integrated with SLURM via the Pyxis plugin, to optimize resource utilization and enable seamless deployment across nodes.
The solution emphasizes the advantages of containerization, including portability, ease of use, and reproducibility, while addressing the challenges of traditional methods. The work showcases the PARAM Rudra system's capabilities, offering insights into the potential of containerized distributed training for advancing AI research and scientific applications.
Format
On DemandOn Site