NVIDIA does not lack for ambition. At CES 2026, the chipmaker unveiled its Vera Rubin platform — named after the astronomer who confirmed the existence of dark matter — and announced it was entering full production months ahead of schedule. The message was clear: the company that already controls 92% of the discrete GPU market has no intention of slowing down.
The headline number is 3.3 times the performance of Blackwell Ultra, its previous flagship. That is a remarkable leap for a single generation. The platform pairs 88 custom ARM CPUs with two new Rubin GPUs, each capable of 50 petaflops of NVFP4 compute for AI inference. To put that in perspective: the entire computing power of the world's fastest supercomputer from 2010 fits inside a figure that one Rubin GPU can surpass in a specific workload. The rack-scale version, the NVL72, strings 72 GPUs together and is the first such system to deliver NVIDIA Confidential Computing across CPU, GPU and NVLink domains — a feature that matters greatly to enterprises handling sensitive data in the cloud.
Energy efficiency has improved by 40% per watt compared with the previous generation. This is not a minor footnote. Data centres are consuming electricity at rates that strain national grids; in some markets, power availability has replaced silicon supply as the binding constraint on AI expansion. A 40% efficiency gain translates directly into lower operating costs and more headroom to scale. It also gives NVIDIA a response to critics who worry that AI's energy appetite is becoming a liability.
Why this matters
The timing of Vera Rubin's launch reflects something important about the AI industry's trajectory. Demand for compute is not moderating. Meta has announced plans to spend up to $135bn on AI infrastructure in 2026 alone. OpenAI and NVIDIA have struck a strategic partnership to deploy ten gigawatts of NVIDIA systems — a number that would have seemed fantastical five years ago. AWS, Google Cloud, Microsoft and OCI have all confirmed they will be among the first to deploy Vera Rubin hardware in the second half of 2026.
These are not speculative commitments. They are capacity reservations from organisations that have concluded, correctly or otherwise, that the race for AI capability is far from over and that the cost of falling behind outweighs the cost of over-investing. NVIDIA sits at the centre of this dynamic. Its chips are not merely a component in the AI supply chain; they are its rate-limiting step. Whoever controls the supply of the best chips controls, to a significant degree, the pace of the entire industry.
This is an uncomfortable position for everyone except NVIDIA's shareholders. The company's 92% market share in discrete GPUs is the kind of dominance that tends to attract regulatory attention. AMD and Intel are trying to close the gap; a range of custom silicon efforts from Google (TPUs), Amazon (Trainium) and others nibble at the edges of specific workloads. None has yet mounted a serious challenge to NVIDIA's full-stack advantage — the combination of hardware, the CUDA software ecosystem and the network effects that come from a decade of developer lock-in.
The Rubin edge
What makes Vera Rubin more than an incremental upgrade is its architecture. The integration of 88 ARM CPUs with the Rubin GPUs in a coherent platform — rather than treating the CPU as an afterthought bolted onto a GPU system — reflects how AI workloads have evolved. Modern inference is not purely a matrix-multiplication problem; it involves memory management, data preprocessing and increasingly complex orchestration that benefits from tight CPU-GPU coupling. The NVL72's rack-scale confidential computing capability addresses a genuine market need: enterprises in finance, healthcare and government want the performance of cloud-scale AI without surrendering control of their data.
The 50-petaflop figure for NVFP4 inference deserves a caveat. Performance numbers in the chip industry are almost always measured under conditions that flatter the hardware. Real-world gains depend on model architecture, memory bandwidth, interconnect topology and software optimisation. The 3.3x claim against Blackwell Ultra will hold in some scenarios and fall short in others. That said, even a fraction of that improvement, delivered at 40% better energy efficiency, represents a meaningful advance.
What comes next
Vera Rubin is not the end of NVIDIA's roadmap. The company has already indicated that future generations are in development, maintaining the relentless cadence that has kept rivals perpetually one step behind. For the major cloud providers, this creates a peculiar dynamic: they must buy today's best chips to remain competitive, knowing that next year's chips will make them look dated. The capital commitment required to stay current is enormous and growing.
For the rest of the industry, Vera Rubin's arrival poses a straightforward question: if the best AI hardware is getting 3.3 times more powerful every generation or two, how quickly will today's models and applications look primitive? The honest answer is: faster than most organisations are planning for. The chips are ready. Whether the software, the business models and the regulatory frameworks can keep pace is a different matter entirely.