The rapid expansion of the AI market has caused low-precision compute power to scale significantly faster than its high-precision counterpart. However, an even more critical bottleneck has emerged: memory bandwidth, which is trailing behind both.
While recent research has successfully leveraged low-precision matrix multiplication units to emulate high-precision dense operations, these advancements often fail to accelerate the memory- and latency-bound workloads typical of scientific simulations. This talk explores how a combination of tailored engineering approaches and innovative numerical methods can utilize mixed precision to overcome these bandwidth limitations and accelerate scientific computing performance.