Supermicro NVIDIA HGX™ B200 Systems Set New AI Performance Benchmarks with MLPerf® Inference v5.0

Introduction to AI Performance Leadership

In the rapidly evolving landscape of artificial intelligence, performance benchmarks serve as critical indicators of a system's capability. Super Micro Computer, Inc. (SMCI), a prominent Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, has recently set a new standard. The company announced industry-leading performance on the MLPerf® Inference v5.0 benchmarks, utilizing their advanced systems equipped with the NVIDIA HGX™ B200 8-GPU. This achievement underscores Supermicro's commitment to delivering cutting-edge solutions that push the boundaries of AI computation.

Supermicro's First-to-Market Advantage with NVIDIA HGX™ B200

Supermicro has distinguished itself by being among the first to market with systems featuring the powerful NVIDIA HGX™ B200 8-GPU configuration. This strategic positioning allows customers to access state-of-the-art AI hardware capabilities sooner. The company's announcement highlights the exceptional performance of both its 4U liquid-cooled and 10U air-cooled systems. These configurations have not only met but exceeded expectations in several key MLPerf® Inference v5.0 benchmarks, showcasing a remarkable leap in performance compared to previous generations of hardware.

Quantifiable Performance Gains: 3X Token Generation Per Second

The most striking aspect of Supermicro's announcement is the quantifiable performance improvement. The company reported that its NVIDIA HGX™ B200 systems demonstrated more than three times the token generation per second (Token/s) for large language models such as Llama2-70B and Llama3.1-405B when compared to systems equipped with the previous generation H200 8-GPU. This significant uplift in token generation speed is crucial for applications involving natural language processing, content generation, and complex AI inferencing tasks, where rapid and efficient output is paramount.

CEO's Perspective on Innovation and Collaboration

Charles Liang, president and CEO of Supermicro, emphasized the company's sustained leadership in the AI sector. He attributed this success to Supermicro's agile "building block architecture," which facilitates rapid market entry for a diverse array of systems optimized for various workloads. Liang also highlighted the crucial, close collaboration with NVIDIA, stating, "We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads." This symbiotic relationship ensures that Supermicro's hardware is optimized to harness the full potential of NVIDIA's latest GPU technologies.

MLPerf® Inference v5.0: A Standard for AI Performance

The MLPerf® Inference v5.0 benchmarks, managed by MLCommons, provide a standardized and transparent framework for evaluating AI inference performance. Supermicro's participation and leading results in these benchmarks are particularly noteworthy. The company is the sole system vendor to publish record-setting MLPerf® inference performance on select benchmarks for both its air-cooled and liquid-cooled NVIDIA HGX™ B200 8-GPU systems. Importantly, both system configurations were operational and validated before the official MLCommons benchmark start date, showcasing Supermicro's readiness and proactive approach. The results were achieved through meticulous optimization of both the systems and the underlying software by Supermicro engineers, adhering strictly to MLCommons rules to ensure reproducibility and auditability.

Specific Benchmark Highlights

The performance leadership of Supermicro's systems is evident across several demanding benchmarks:

Mixtral 8x7B Inference (Mixture of Experts): Both the 4U liquid-cooled (SYS-421GE-NBRT-LCC) and 10U air-cooled (SYS-A21GE-NBRT) NVIDIA HGX™ B200 systems achieved leadership performance, running the Mixtral 8x7B benchmark with an impressive 129,000 tokens/second.
Llama3.1-405b (Large Language Model): The Supermicro NVIDIA HGX™ B200 systems delivered over 1,000 tokens/second for inference on the massive Llama3.1-405b model. Specifically, the SYS-421GE-NBRT-LCC achieved 1521.74 Tokens/s in offline mode, while the SYS-A21GE-NBRT achieved 1080.31 Tokens/s in server mode (for an 8-GPU node). This dramatically surpasses the capabilities of previous-generation GPU systems.
Llama2-70B (Interactive Inference): For the LLAMA2-70b benchmark, the SYS-A21GE-NBRT system with the NVIDIA B200 SXM-180GB installed demonstrated the highest performance from a Tier 1 system supplier, achieving 62,265.70 Tokens/s.
Stable Diffusion XL (Image Generation): The SYS-A21GE-NBRT system also led in image generation tasks, achieving 28.92 queries/s for Stable Diffusion XL (Server).

David Kanter, Head of MLPerf at MLCommons, congratulated Supermicro, noting, "Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative and reproducible MLPerf results."

Advanced Cooling Technologies for Peak Performance

To support the immense computational power of the NVIDIA HGX™ B200 8-GPU systems, Supermicro has implemented next-generation cooling technologies. The 4U systems feature newly developed cold plates and a 250kW coolant distribution unit (CDU) that more than doubles the cooling capacity compared to previous generations, all within the same 4U form factor. This advanced liquid-cooling solution is crucial for maintaining optimal operating temperatures and sustained performance under heavy loads. The 10U air-cooled system also benefits from a redesigned chassis with expanded thermal headroom, capable of accommodating eight 1000W TDP Blackwell GPUs. This design ensures that even the air-cooled variant offers substantial performance gains, up to 15x for inference and 3x for training compared to previous generations, while maintaining similar rack density.

Scalability and Rack Integration

Supermicro's rack-scale design further enhances the deployment and scalability of these powerful AI systems. Available in 42U, 48U, or 52U configurations, the integration of new vertical coolant distribution manifolds (CDM) in the liquid-cooled racks frees up valuable rack units. This allows for a higher density of compute nodes, enabling configurations with up to eight systems (64 NVIDIA Blackwell GPUs) in a 42U rack, and up to twelve systems (96 NVIDIA Blackwell GPUs) in a 52U rack. Similarly, the 10U air-cooled systems can be densely packed, with up to four units fitting into a standard rack, maintaining the same density as previous generations while delivering significantly higher performance.

Supermicro's Comprehensive AI Portfolio

Beyond the groundbreaking HGX™ B200 systems, Supermicro offers a broad AI portfolio comprising over 100 GPU-optimized systems. These systems come with a variety of cooling options (air-cooled and liquid-cooled) and CPU choices, catering to diverse needs from single-socket to 8-way multiprocessor configurations. Supermicro's approach, centered around its flexible Server Building Block Solutions®, empowers customers to tailor infrastructure precisely to their specific workload and application requirements. This includes a comprehensive selection of form factors, processors, memory, GPUs, storage, networking, power, and cooling solutions, all designed to optimize Total Cost of Ownership (TCO) and promote Green Computing initiatives.

About Super Micro Computer, Inc.

Supermicro (NASDAQ: SMCI), headquartered in San Jose, California, is a recognized global leader in providing Application-Optimized Total IT Solutions. Since its inception, the company has been at the forefront of innovation in Enterprise, Cloud, AI, and 5G Telco/Edge IT Infrastructure. As a comprehensive Total IT Solutions provider, Supermicro offers an extensive range of servers, AI systems, storage solutions, IoT devices, network switches, software, and support services. The company's in-house expertise in motherboard, power, and chassis design fuels its ability to deliver next-generation innovations. Supermicro emphasizes global operations for scale and efficiency, with manufacturing in the US, Taiwan, and the Netherlands, focusing on TCO optimization and environmental sustainability. Their award-winning Server Building Block Solutions® provide customers with the flexibility to configure systems for optimal performance across a wide spectrum of computing demands.