Open-Source AI Closes the Gap

The gap is almost gone. A year ago, closed models from OpenAI and Anthropic led their open-source counterparts by 17.5 percentage points on the MMLU benchmark, the standard test of language-model breadth. Today that lead has shrunk to 0.3 points. The proprietary advantage, long treated as self-evident, has all but evaporated.

This is not a story about one breakout model. It is a story about a structural shift in who builds AI, how it spreads, and where the frontier now lives.

A year of compression

Twelve months ago, GPT-4 class performance was the exclusive province of a handful of American labs. Replicating it required vast compute, vast data and — usually — a closed API. That barrier is gone. Meta's Llama 3.3 70B runs on a single high-end GPU. DeepSeek R1, trained in China for a fraction of what its Western counterparts cost, matches GPT-4 across a wide range of reasoning tasks and publishes its weights for anyone to use.

The benchmark numbers matter less than what they represent. MMLU is a proxy, not a verdict. But when a freely downloadable model scores within a rounding error of a model that cost hundreds of millions of dollars to train and is sold by subscription, the economics of the industry change. Enterprises that once had no choice but to pay for API access now have a credible alternative. Developers who relied on proprietary tools can self-host. The leverage that closed labs held over the market has diminished.

The geography of AI shifted too

Something else happened in the summer of 2025: total model downloads tipped from American-dominant to Chinese-dominant. DeepSeek, Moonshot's Kimi K2 and Alibaba's Qwen3 collectively drove more Hugging Face traffic than any Western open releases. For an industry that had long assumed Silicon Valley set the pace, this was a clarifying moment.

The Chinese models did not merely match their rivals on benchmarks. They competed on efficiency. DeepSeek in particular became known for doing more with less — a consequence, in part, of operating under export restrictions that limit access to the most powerful Nvidia chips. Necessity produced ingenuity. The resulting models are leaner and, in several configurations, faster to run.

DeepSeek-V3.2 has since emerged as one of the best open-source options for reasoning and agentic workloads, the tasks that matter most for real-world deployment. When a model can plan, use tools and recover from errors, it stops being a novelty and starts being infrastructure.

The West did not stand still

It would be wrong to read this as a story of American decline. Western open releases accelerated sharply in the second half of 2025. Mistral, the French startup that built its reputation on efficient open models, launched Mistral 3. AI2's Olmo 3 — notable for full transparency, publishing training data and intermediate checkpoints alongside weights — pushed the boundary of what reproducible AI science looks like. Smaller labs and academic groups released competitive models that would have seemed implausible two years ago.

The pattern emerging is not one of East versus West but of open versus closed. The open ecosystem is global, collaborative and fast-moving. It iterates in public. Improvements in one model get absorbed into the next release from a different team on a different continent. The closed ecosystem is more controlled, often more polished, and still ahead on the very hardest tasks. But the gap on everyday tasks has closed.

What this means in practice

For most users and most use cases, the practical difference between a top open-source model and a top proprietary one is now negligible. Writing, summarisation, coding assistance, basic reasoning — these are commodities. The competition has moved to the margins: very long contexts, multimodal inputs, real-time voice, the subtle reliability that enterprise customers demand.

This matters for the economics of AI adoption. Costs fall when competition is real. Developers who can self-host have more control over latency, privacy and cost. Companies in regulated industries — finance, healthcare, law — gain viable options for keeping data on their own infrastructure. None of this was plausible at scale eighteen months ago.

The proprietary labs are not finished. They still lead on the most demanding benchmarks, still invest at a scale that most open efforts cannot match, and still attract the talent and compute to push further. But they are no longer operating in a different league. They are operating in the same league, against an opponent that does not charge for access.

The obvious question

If a freely available model is nearly as good as a paid one, what are you paying for? The honest answer varies by use case. For routine tasks, perhaps not much. For cutting-edge capability, the closed models retain an edge — for now. For trust, reliability and support at enterprise scale, the picture is murkier.

The open-source AI community spent years being told it was permanently behind. That argument is no longer available. The gap was 17.5 points. Now it is 0.3. At that rate, the more interesting question is not whether open models will close it entirely, but what the closed labs will do when they do.