Nvidia Unveils Vera Rubin: A New Era in AI Infrastructure

6

Nvidia on Monday presented Vera Rubin, a comprehensive AI platform built around seven newly produced chips, backed by major players including Anthropic, OpenAI, Meta, and Mistral AI. This move underscores Nvidia’s continued dominance in the rapidly evolving AI landscape. The platform promises up to ten times better inference performance per watt and a tenfold cost reduction per token compared to existing Blackwell systems. CEO Jensen Huang described it as “a generational leap” driving “the greatest infrastructure buildout in history,” with backing from all major cloud providers and over 80 manufacturing partners.

The Core of Vera Rubin: A Seven-Chip Architecture

The Vera Rubin platform integrates the Nvidia Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the Groq 3 LPU – a specialized inference accelerator. These components function as a unified supercomputer within five interlocking rack-scale systems. The flagship NVL72 rack combines 72 Rubin GPUs and 36 Vera CPUs, connected by NVLink 6. Nvidia claims this configuration can train large models using fewer GPUs than Blackwell, potentially reshaping the economics of frontier AI development.

The Vera CPU rack houses 256 liquid-cooled processors, sustaining over 22,500 concurrent CPU environments – essential for running AI agents. Nvidia touts it as the first processor designed specifically for agentic AI, featuring 88 custom Olympus cores and LPDDR5X memory with 1.2 terabytes per second bandwidth at half the power of traditional CPUs. The Groq 3 LPX rack contains 256 inference processors with 128 gigabytes of on-chip SRAM, targeting low-latency processing for trillion-parameter models. BlueField-4 STX provides high-speed storage for AI reasoning, while Spectrum-6 SPX Ethernet ties everything together with co-packaged optics for improved efficiency.

The Shift to Agentic AI: Why This Matters

Nvidia’s strategy centers on the transition from chatbots to “agentic AI” – systems capable of autonomous reasoning, software execution, and continuous improvement. This requires a shift in infrastructure design. Unlike chatbot queries that consume milliseconds of GPU time, agentic systems demand sustained CPU, GPU, and storage resources for tasks like drug discovery or code debugging. This necessitates a different balance of compute, memory, storage, and networking, which Vera Rubin aims to provide.

To support this evolution, Nvidia introduced the Agent Toolkit, including OpenShell, an open-source runtime enforcing security and privacy guardrails for autonomous agents. Major enterprises like Adobe, Atlassian, and Salesforce are integrating this toolkit into their platforms. The company also launched Dynamo 1.0, described as an “operating system” for AI inference at scale, adopted by AWS, Azure, Google Cloud, and others.

Open Models and Ecosystem Growth

Nvidia’s expansion into open models reflects a strategic effort to cultivate a developer ecosystem that fuels demand for its hardware. The Nemotron Coalition, a global collaboration of AI labs, will jointly develop open frontier models trained on Nvidia’s DGX Cloud. Founding members include Mistral AI and Perplexity, contributing data and expertise. The first model, co-developed with Mistral AI, will underpin the Nemotron 4 family.

Nvidia has also enhanced its open model portfolio with Nemotron 3 Ultra, Nemotron 3 Omni, and Nemotron 3 VoiceChat, delivering improved performance and multimodal capabilities. GR00T N2, a next-generation robot foundation model, demonstrates advances in robotic task completion. This push into open models serves a dual purpose: encouraging developer adoption while positioning Nvidia as a neutral platform provider.

Beyond the Data Center: Vertical Applications

Vera Rubin’s applications extend far beyond traditional data centers. Roche is deploying over 3,500 Blackwell GPUs for biological foundation models and drug discovery, accelerating development timelines. BYD, Geely, and Nissan are integrating Nvidia’s Drive Hyperion platform into Level 4 autonomous vehicles, with plans for expansion through a partnership with Uber. Nvidia also released the first domain-specific physical AI platform for healthcare robotics, leveraging Open-H, the world’s largest healthcare robotics dataset.

The platform also extends to space computing, with the Vera Rubin Space Module offering up to 25x more AI compute for orbital inferencing, attracting partners like Aetherflux and Axiom Space. Nvidia also launched the DGX Station, a deskside supercomputer for local AI experimentation, supporting models of up to one trillion parameters.

The AI Factory Blueprint: Scaling Intelligence Production

Nvidia’s most ambitious move is the Vera Rubin DSX AI Factory reference design, a blueprint for building entire facilities optimized for AI production. This includes integrating compute, networking, storage, power, and cooling into systems maximizing “tokens per watt.” The software stack includes DSX Max-Q for dynamic power provisioning and DSX Flex for connecting to grid services. Nscale and Caterpillar are constructing one of the world’s largest AI factories in West Virginia using this reference design.

Conclusion: Nvidia’s Vera Rubin platform represents a significant step towards a future where AI infrastructure is optimized for autonomous agents. While performance claims require independent verification, the scale and coherence of this integrated stack position Nvidia as a central force in the next phase of AI development. The company’s vision extends beyond hardware to encompass software, ecosystems, and even entire factories dedicated to producing intelligence.