NVIDIA Unveils Vera Rubin POD at GTC 2026: Seven-Chip AI Supercomputer with 60 Exaflops for Agentic AI Era

The Dawn of the Agentic Era: NVIDIA Unveils Vera Rubin at GTC 2026

At GTC 2026, NVIDIA officially ushered in a new paradigm for artificial intelligence, moving beyond simple model training and deployment. The company unveiled the NVIDIA Vera Rubin platform, a transformative computing architecture explicitly engineered to power the era of agentic AI. This launch marks a significant departure from traditional standalone chip releases, presenting instead a fully integrated, massive-scale system designed to function as a singular, coherent supercomputer.

NVIDIA founder and CEO Jensen Huang declared Vera Rubin a "generational leap," emphasizing that the inflection point for autonomous, reasoning-capable agents has arrived. As enterprises shift their focus toward complex workflows—where models must execute multi-step logic, validate results, and operate autonomously—the underlying infrastructure must evolve from discrete components to comprehensive AI factories. The Vera Rubin platform is the manifestation of this vision, integrating seven distinct chip types into a cohesive infrastructure capable of delivering 60 exaflops of compute performance.

Architecting the AI Factory: Seven Chips, One System

The core innovation of the Vera Rubin platform is its extreme co-design philosophy. Rather than optimizing chips in isolation, NVIDIA has developed an ecosystem of seven specialized chips that operate in perfect synchronization across networking, storage, and compute layers. This approach aims to eliminate traditional bottlenecks in memory movement and communication, which have historically plagued high-performance computing (HPC) for large-scale AI.

The seven pillars of the Vera Rubin silicon architecture include:

Vera CPU: The first NVIDIA processor purpose-built for agentic workflows and reinforcement learning, featuring 88 custom-designed cores and LPDDR5X memory to handle orchestration and logic control.
Rubin GPU: The primary workhorse for training and inference, built on a 3nm process with 336 billion transistors and high-bandwidth HBM4 memory.
Groq 3 LPU (Language Processing Unit): A newly integrated accelerator optimized specifically for the decode phase of inference, dramatically reducing latency for complex agentic interactions.
NVLink 6 Switch: The high-speed interconnect fabric that allows multiple GPUs to act as a single, unified accelerator.
ConnectX-9 SuperNIC: Delivering advanced networking capabilities to handle massive data flows at scale.
BlueField-4 DPU: Managing data processing, storage, and security tasks to offload the main compute units.
Spectrum-6 Ethernet Switch: Providing the robust backbone for cluster-wide communication within the AI factory.

The Power of the POD: Five Rack-Scale Systems

At the center of this announcement is the Vera Rubin POD, a massive, 40-rack-scale supercomputer configuration. By integrating the seven chips above into five distinct purpose-built rack-scale systems, the POD achieves unparalleled throughput and efficiency.

These five systems—the NVL72 GPU rack, the Groq 3 LPX rack, the Vera CPU rack, the BlueField-4 STX rack, and the Spectrum-6 SPX rack—are designed to work in concert to support modern agentic AI paradigms, including mixture-of-experts (MoE) routing and long-context memory storage.

Component System	Primary Function	Key Performance Metric
Vera Rubin NVL72	Training and Inference Engine	72 Rubin GPUs with NVLink 6
Vera CPU Rack	RL and Orchestration	256 Vera CPUs for logic control
Groq 3 LPX Rack	Decode Acceleration	256 LPUs for low-latency inference
BlueField-4 STX Rack	Data/KV Cache Storage	Enhanced memory throughput
Spectrum-6 SPX Rack	Networking Backbone	High-speed Ethernet synchronization

The scale is staggering: a full Vera Rubin POD configuration encompasses nearly 20,000 NVIDIA dies, totaling 1.2 quadrillion transistors. This setup provides 60 exaflops of performance and 10 PB/s of bandwidth, addressing the heavy compute-bound requirements of next-generation AI agents that require constant validation and iteration loops.

Redefining Infrastructure for Agentic Workloads

The transition to agentic AI—where systems must "reason" rather than just predict the next token—places unique demands on hardware. Traditional inference systems often suffer from high latency and prohibitive costs when scaling to the level of autonomy required for mission-critical decisions. NVIDIA’s Vera Rubin platform specifically targets these issues by decoupling the prefill (compute-intensive) and decode (latency-sensitive) phases of inference.

By pairing the Rubin GPU for compute-heavy prefill tasks with the Groq 3 LPU for the decode phase, NVIDIA claims the architecture can deliver significantly higher inference throughput per megawatt. This improvement is critical for companies running trillion-parameter models, as it allows for a more sustainable operational model.

Furthermore, the Vera CPU plays a crucial role in "CPU-native" workloads, such as reinforcement learning environments where agents test and validate code. With 1.2 terabytes per second of memory bandwidth and full Arm compatibility, the Vera CPU ensures that GPUs are not kept waiting for control instructions, effectively solving one of the most common productivity bottlenecks in modern AI data centers.

Conclusion: Setting the Standard for Future Factories

As the industry moves toward 2026 and beyond, the definition of an "AI factory" is becoming clearer. It is no longer defined by the capability of a single GPU, but by the efficiency of the entire system stack. The NVIDIA Vera Rubin platform, with its focus on system-wide co-design, energy efficiency, and scalability, sets a new benchmark for global AI infrastructure.

For enterprises and hyperscalers aiming to deploy complex autonomous agents, the message from GTC 2026 is clear: the hardware bottleneck is being addressed through deep integration. As Vera Rubin-based products move toward full production in the second half of the year, the race to build the infrastructure capable of powering the next wave of intelligent, reasoning-based agents has officially begun.