AMD and Oracle Forge Strategic Alliance for AI Supercluster Deployment

In a move that signals a significant escalation in the race for AI compute supremacy, Oracle Cloud Infrastructure (OCI) has announced a landmark partnership with AMD. OCI is set to become the first hyperscaler to offer a publicly accessible AI supercluster powered by an initial deployment of 50,000 AMD Instinct MI450 Series GPUs. This ambitious project, scheduled to commence in the third quarter of 2026, with further expansions planned for 2027 and beyond, represents a substantial commitment to bolstering AI infrastructure and directly challenges the established dominance of NVIDIA in the AI accelerator market.

A New Era of AI Compute Powered by AMD's Helios Architecture

The cornerstone of this new AI supercluster is AMD's innovative "Helios" rack design. This vertically-optimized, rack-scale architecture is meticulously engineered to deliver unparalleled performance, scalability, and energy efficiency, crucial for handling the immense demands of next-generation AI training and inference workloads. The Helios design integrates AMD's cutting-edge MI450 Series GPUs, which are built using TSMC's advanced 2nm fabrication technology. Complementing the GPUs are next-generation AMD EPYC™ CPUs, codenamed "Venice," and advanced AMD Pensando™ networking hardware, codenamed "Vulcano." This comprehensive system approach is designed to provide customers with a powerful, cohesive solution for their most demanding AI applications.

Unprecedented Performance and Memory Capabilities

The AMD Instinct MI450 Series GPUs are engineered to push the boundaries of AI performance. Each GPU is equipped with up to 432 GB of HBM4 memory and offers an astounding 20 TB/s of memory bandwidth. This significant increase in memory capacity and bandwidth allows customers to train and infer models that are up to 50 percent larger than previous generations, all within the GPU's memory. This capability is critical for handling the ever-growing complexity of large language models (LLMs) and other sophisticated AI applications, reducing the need for cumbersome model partitioning and improving overall efficiency.

Optimized for Scale and Efficiency: The Helios Rack Design

AMD's "Helios" rack design is a testament to efficient, high-density computing. It enables customers to operate at extreme scales while optimizing for performance density, cost, and energy efficiency through dense, liquid-cooled racks, each housing 72 GPUs. The architecture incorporates UALoE (Unified Accelerator Link Over Ethernet) scale-up connectivity and Ethernet-based Ultra Ethernet Consortium (UEC)-aligned scale-out networking. This sophisticated networking infrastructure is designed to minimize latency and maximize throughput across entire pods and racks, ensuring seamless communication and data flow critical for large-scale distributed AI training.

Empowering AI Workloads with Advanced CPUs and Networking

The "Venice" EPYC CPUs serve as powerful head nodes within the supercluster, designed to maximize cluster utilization and streamline large-scale workflows by accelerating job orchestration and data processing. Furthermore, these CPUs will feature confidential computing capabilities and built-in security enhancements, providing a robust safeguard for sensitive AI workloads. The networking infrastructure is equally advanced, powered by fully programmable AMD Pensando DPU technology. This DPU-accelerated converged networking facilitates line-rate data ingestion, enhancing performance and security for massive AI and cloud workloads. Each GPU can be outfitted with up to three 800 Gbps AMD Pensando "Vulcano" AI-NICs, providing lossless, high-speed, and programmable connectivity that adheres to advanced RoCE (RDMA over Converged Ethernet) and UEC standards.

Open Standards and Scalability for Future Growth

A key aspect of this collaboration is the emphasis on open standards and interoperability. The integration of the UALink and UALoE fabric is designed to help customers efficiently expand workloads, reduce memory bottlenecks, and orchestrate massive multi-trillion-parameter models. UALink, an open, high-speed interconnect standard purpose-built for AI accelerators, minimizes hops and latency by enabling direct, hardware-coherent networking and memory sharing among GPUs within a rack. This commitment to open standards, supported by a broad industry ecosystem, provides customers with the flexibility, scalability, and reliability needed to run their most demanding AI workloads on open, standards-based infrastructure. The open-source AMD ROCm™ software stack further enhances this by offering a flexible programming environment with popular frameworks, libraries, compilers, and runtimes, simplifying migration and fostering rapid innovation.

Challenging the AI Accelerator Landscape

Oracle

AMD and Oracle Forge Strategic Alliance for AI Supercluster Deployment