As we navigate the new age of supercomputing powered by artificial intelligence (AI), we find ourselves standing at the threshold of transformative computing power.
This leap has increased the complexity and diversity of high performance computing (HPC) systems. Hardware heterogeneity is on the rise, manifested at various levels – from different precision types of individual processors to processor classifications within nodes, as well as different architectures of clusters in different systems. Concurrently, the application domain is broadening, fueled by the escalating computational needs of multiple disciplines, the exponential growth in AI capabilities, and the convergence of HPC with AI methodologies. This expansion has led to significant fragmentation in system architecture and application requirements, and thus necessitating a wide array of computing resources and environments. To navigate this fragmentation, a collaborative co-design approach involving hardware, software, and application stakeholders is a must to bridge the divides. We need unified, portable programming models to handle varying precision and hardware types. Common frameworks are also required for application development across different systems. Furthermore, the architecture of software infrastructures must be meticulously crafted to standardize both systems and applications, ensuring portable, high-performance, and system-agnostic support. By comprehensively addressing these challenges, we can harness the full potential of the post-exascale era, charting a new course for the future of supercomputing.