pyGinkgo: A Sparse Linear Algebra Operator Framework for Python

Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)

Foyer D-G - 2nd floor

Women in HPC Poster

AI Applications powered by HPC TechnologiesComputational PhysicsEngineeringNumerical LibrariesPerformance Measurement

Information

Poster is on display and will be presented at the poster pitch session.

Over the past decade, machine learning has achieved significant advancements, with applications spanning diverse fields such as physics, medicine or economics. A pressing challenge in contemporary machine learning is optimizing models for time and energy efficiency. One effective approach to enhance time efficiency is sparsification of neural networks weights.

Despite extensive research on sparse computations, there is a lack of high-performing libraries, especially in Python, for sparse neural networks. While contemporary machine learning libraries such as PyTorch and TensorFlow offer decently optimized kernels for dense matrix computations, their performance for sparse matrix operations often falls short. Ginkgo is a high-performance linear algebra library with a special focus on sparse linear systems. It boasts one of the fastest sparse matrix-vector (SpMV) kernels, the core operation in neural networks. To bridge the performance gap between dense
and sparse computations in the Python world, we present pyGinkgo – Python bindings for the Ginkgo library. pyGinkgo enables Python users to leverage Ginkgo’s advanced capabilities for performing sparse computations, offering significant potential for improving the performance of sparse neural networks and beyond.

To demonstrate the utility of pyGinkgo, we benchmarked the performance of SpMV kernels across widely used Python libraries including PyTorch, PETSc, SciPy, TensorFlow and CuPy on CPU and GPU platforms. pyGinkgo already supports CUDA, HIP, and OpenMP-enabled devices, consistent with the features offered by Ginkgo. The benchmarks were executed on the HoreKa supercomputer, featuring Intel Xeon CPUs and NVIDIA A100 GPUs. We assessed single-core and multi-core performance for CPUs, alongside GPU execution performance.

Benchmarks were conducted using a dataset of 30 sparse matrices from the SuiteSparse Matrix Collection. The matrices had dimensions up to 10^6 and most matrices had densities below 1%.

On GPU, pyGinkgo demonstrated significantly better performance as compared to the other libraries. For the evaluated matrices, all of these libraries achieved a similar peak performance of about 230 GFLOPS. Even though PyTorch’s peak performance was around 207 GFLOPS but it proved to be slower at least by a factor of two as compared to pyGinkgo for most cases. PETSc ranked as the second-fastest library, achieving a similar performance as pyGinkgo. CuPy was slower by a factor in range 2-12, whereas TensorFlow was slower by a factor in range 5-24 for most cases. On a single core on CPU, we observed that SciPy performed better than all the other libraries although its performance did not scale with number of threads. pyGinkgo and PETSc, on the other hand, scaled very well. pyGinkgo’s execution time for the SpMV kernel proved to be at least 25 times faster for larger matrices as we scaled to 128 threads. Again, PETSc’s performance was similar to that of pyGinkgo. For 128 threads, Tensorflow was about 40-80 times slower whereas PyTorch was around 10-60 times slower than pyGinkgo in most cases.

pyGinkgo is still under active development and will soon be published. In this poster, we share initial benchmark results, demonstrating pyGinkgo’s potential to enhance performance in sparse neural networks within Python-based workflows.

Contributors:

Format

On DemandOn Site