Unveiling the Performance of AMD Ryzen AI Max+ "Strix Halo" with ROCm 7.0

In the realm of high-performance computing and artificial intelligence, the seamless integration of hardware and software is paramount. This article delves into the performance benchmarks of AMD's Ryzen AI Max+ "Strix Halo" processors, specifically focusing on their capabilities when utilizing the ROCm 7.0 compute stack on Ubuntu Linux. We will explore the setup process, the benchmarks conducted, and the resulting performance metrics, offering a comprehensive overview for enthusiasts and professionals alike.

System Configuration and Setup

The testing was performed on a Framework Desktop equipped with the AMD Ryzen AI Max+ 395 processor, featuring 16 cores and 32 threads, and boasting a clock speed of 5.19GHz. The system was configured with 128GB of LPDDR5-8000MT/s memory, a 2TB WD_BLACK SN7100 NVMe SSD, and integrated Radeon 8060S graphics with 64GB of VRAM. The operating system was Ubuntu 24.04.3 LTS, running on a Linux 6.14 kernel. Crucially, the ROCm 7.0 compute stack was installed alongside the latest AMDGPU DKMS driver. This setup was chosen to evaluate the performance of the Strix Halo platform with the latest AMD compute software, even though the Strix Halo SoCs are not officially listed on the ROCm 7.0 supported GPU list.

Benchmarking Methodology

The performance evaluation was carried out using a suite of benchmarks designed to stress various aspects of the hardware, particularly focusing on AI and machine learning workloads. The primary tools employed were vLLM and Llama.cpp, which are popular frameworks for large language model (LLM) inference. These benchmarks assess metrics such as latency, throughput, prompt processing, and text generation speeds across different LLM architectures and quantization levels. The testing was conducted meticulously, with multiple runs and statistical analysis to ensure the reliability of the results. The goal was to provide a clear picture of how the Strix Halo hardware performs under demanding AI computations using the ROCm 7.0 stack.

vLLM Benchmarks: Latency and Throughput

The vLLM benchmarks focused on evaluating the latency and throughput of various LLMs on the Strix Halo platform. For models like Hermes-3-Llama-3.2-3B, the average latency was measured in seconds, with different percentile latencies (10p, 50p, 75p, 90p, 99p) also recorded to understand the distribution of response times. Similarly, for larger models like Hermes-3-Llama-3.2-8B and deepseek-moe-16b-chat, latency metrics were captured. Throughput was assessed using metrics like requests per second, total tokens per second, and output tokens per second. The results indicated a strong performance from the Strix Halo in these LLM inference tasks, with specific models showing promising throughput figures, such as the Hermes-3-Llama-3.1-8B achieving competitive requests per second and token generation rates.

Llama.cpp Performance: CPU vs. Vulkan vs. ROCm

The Llama.cpp benchmarks provided a comparative analysis of different backends, including CPU BLAS, Vulkan, and ROCm HIP. The tests involved various models such as Qwen3-8B-Q8_0, gpt-oss-20b-Q8_0, and Llama-3.1-Tulu-3-8B-Q8_0. The performance was measured in tokens per second for both text generation and prompt processing. The results highlighted the efficiency of different backends, with Vulkan often showing strong performance, especially in prompt processing. The ROCm HIP backend also demonstrated competitive results, particularly in text generation, indicating the growing maturity of AMD's compute stack for these workloads. The data suggests that the optimal backend can vary depending on the specific model and task, underscoring the importance of flexibility in AI development.

System Stability and Additional Notes

Throughout the benchmarking process, the ROCm 7.0 stack was found to be working effectively on the AMD Ryzen AI Max+ "Strix Halo" Framework Desktop, even without explicit support listed for these specific GPUs. While most tests ran smoothly, some specific models within the vLLM suite, such as Qwen3-14B-FP8-dynamic, Qwen3-Coder-30B-A3B-Instruct-FP8, and several Llama.cpp variants, encountered segmentation faults or initialization errors. These issues might be attributed to the early stages of driver and software optimization for this new hardware. The system configuration notes, including kernel versions, compiler details, and security mitigations, were meticulously recorded to ensure reproducibility and provide context for the performance results. The overall experience suggests that while the hardware is capable, ongoing software and driver updates will be crucial for unlocking its full potential in AI workloads.

Conclusion

The performance benchmarks indicate that the AMD Ryzen AI Max+ "Strix Halo" processors, when utilized with ROCm 7.0, offer a compelling platform for AI and machine learning tasks, particularly in the domain of large language model inference. The ability to run these demanding workloads effectively, even on hardware not yet officially listed in the ROCm support matrix, speaks volumes about the underlying architecture and AMD's commitment to its AI ecosystem. While some specific tests encountered issues, the overall performance and stability observed provide a strong foundation for future development and optimization. As the software stack matures, we can expect even greater performance gains from this promising hardware.