

Self‑Evolving Specialized LLMs on HPC: Efficiency via 1‑bit Quantization and AI Computing Broker, Toward NPUs
Wednesday, June 24, 2026 2:20 PM to 2:40 PM · 20 min. (Europe/Berlin)
Hall H, Booth L01 - Ground Floor
HPC Solutions Forum
Emerging Computing TechnologiesLarge Language Models and Generative AI in HPCResource Management and SchedulingSovereignty in AI
Information
While foundation models have achieved strong general-purpose performance, their deployment in enterprise environments remains constrained by domain specificity, energy efficiency, and data sovereignty requirements. This talk presents Takane, Fujitsu’s research initiative on self‑evolving, domain‑specialized large language models, and discusses a practical path to scale from today’s HPC/GPU-based deployments toward future heterogeneous computing platforms.
At the core of Takane is a self‑adaptation learning framework, in which multiple specialized AI agents autonomously generate training data, select learning strategies, and iteratively refine task-specific models through continual learning and reinforcement learning. Rather than relying on manually engineered fine‑tuning pipelines, Takane enables specialized LLMs to evolve in response to changes in business rules, operational contexts, and feedback from real usage.
To make such self‑evolving models practical under strict power and infrastructure constraints, we focus on software-level efficiency techniques that are already deployable on current HPC and GPU platforms, including 1‑bit quantization and domain‑specific model distillation. These techniques reduce memory footprint and compute demand while preserving task accuracy, enabling compact specialized LLMs suitable for on‑premise and sovereign environments.
In addition, we briefly introduce AI Computing Broker (ACB) as an optimization middleware that improves GPU utilization by enabling efficient GPU sharing across AI workloads, thereby reducing infrastructure cost and power consumption without sacrificing performance. In published demonstrations, ACB achieves substantial throughput gains and can reduce the required number of GPUs (e.g., up to roughly half in representative scenarios).
Looking ahead, we discuss how these deployable efficiency and orchestration techniques can be combined with emerging NPU architectures through longer‑term hardware–software co‑design, extending specialized LLMs into more resource‑constrained sovereign settings.
At the core of Takane is a self‑adaptation learning framework, in which multiple specialized AI agents autonomously generate training data, select learning strategies, and iteratively refine task-specific models through continual learning and reinforcement learning. Rather than relying on manually engineered fine‑tuning pipelines, Takane enables specialized LLMs to evolve in response to changes in business rules, operational contexts, and feedback from real usage.
To make such self‑evolving models practical under strict power and infrastructure constraints, we focus on software-level efficiency techniques that are already deployable on current HPC and GPU platforms, including 1‑bit quantization and domain‑specific model distillation. These techniques reduce memory footprint and compute demand while preserving task accuracy, enabling compact specialized LLMs suitable for on‑premise and sovereign environments.
In addition, we briefly introduce AI Computing Broker (ACB) as an optimization middleware that improves GPU utilization by enabling efficient GPU sharing across AI workloads, thereby reducing infrastructure cost and power consumption without sacrificing performance. In published demonstrations, ACB achieves substantial throughput gains and can reduce the required number of GPUs (e.g., up to roughly half in representative scenarios).
Looking ahead, we discuss how these deployable efficiency and orchestration techniques can be combined with emerging NPU architectures through longer‑term hardware–software co‑design, extending specialized LLMs into more resource‑constrained sovereign settings.
HPC Solutions Forum Questions
What is most important: maximizing performance in a given power envelope, minimizing power costs, or being green? Do you have to choose?What solution is already well accepted among hyperscale use cases that will revolutionize on-premises computing?We hear about CPUs and GPUs. Is there another choice that’s better?
Format
on-site

