// AI Deep Dive
The Frog in the Boiling Water
Every technology boom of the last thirty years was built on open source we owned. The AI boom is the first one we are renting — and the rent is our data.

EXECUTIVE SUMMARY
For three decades, every platform shift handed power to the people building on top of it. The web ran on Apache and Linux, mobile ran on Android, and the cloud ran on open source, nearly all the way down. You could read the code, fork it, host it yourself, and walk away from any vendor that got greedy. The generative AI boom broke that pattern. The defining models are sold as a service, the weights stay locked in someone else's data center, and the price of admission is a steady outbound stream of our most sensitive data. Open alternatives exist and they are good. The catch is that the best of them are Chinese, censored, and arriving just as the American frontier labs lock the doors. The water is warming, and most companies have not noticed.
// Open source quietly owns the foundation of every prior boom: Linux runs about 63% of the world's servers and 100% of the top 500 supercomputers, Android powers roughly 72% of phones, and open source makes up an estimated 77% of the code in the average commercial codebase.
// This boom inverted the model: closed models account for about 80% of all model usage, even though open models reach roughly 90% of their performance at release and cost about a sixth as much.
// The dependency is also financial and circular: OpenAI lost an estimated $8 billion on $12 billion of revenue in 2025, the industry has committed more than $1 trillion against under $50 billion of revenue, and much of it flows through circular deals where Nvidia, OpenAI, and the hyperscalers fund one another.
// The exit ramp mostly speaks Mandarin: around 80% of a16z startups already run open Chinese models, and the leading one, DeepSeek, refuses or scrubs answers on Tiananmen Square and the Uyghurs.
The strategic posture is not to pick a side in an open-versus-closed religious war. It is to keep an exit ramp and own your data before the dependency hardens into something you cannot reverse.
// The Deep Dive
In 2015, a group of technologists started a research lab with a promise baked into its name. OpenAI would be open. Its founding charter pledged to freely collaborate and share its research with the world, a nonprofit counterweight to the closed labs at Google and Facebook. Four years later, it created a capped-profit arm, took a multibillion-dollar investment from Microsoft, and stopped publishing the details that mattered. By the time GPT-4 arrived, the company would not even disclose the model's size, citing competition and safety concerns. The most influential AI company on earth kept the word open in its name and almost nothing else.

I have spent my whole career on the other side of that story. I built businesses on Linux, Apache, MySQL, and the open web — a stack you could download, read, run on a cheap server, and own outright. When mobile arrived, the open option, Android, ended up on three of every four phones on earth. When the cloud arrived, it was open source nearly all the way down. Every boom I have lived through rewarded the people who built on a foundation they controlled. The vendor was a convenience, never a landlord.
This boom feels different, and it took me a while to name why. We are not buying tools this time. We are renting access to someone else's brain, one API call at a time, and feeding it our questions, our code, our customer records, and our strategy documents in the process. The thing we are most dependent on is the thing we understand least and control not at all. That is a new posture for our industry, and I do not think enough leaders have sat with what it costs.
The honest question is not whether open source AI is good enough to matter. It is. The question is whether we will notice the water heating before we have boiled away the two things that gave technology buyers their power for thirty years: optionality and ownership of our own data.
Every prior boom was built on things we owned
Start with the receipts, because the pattern is almost monotonous. Linux runs about 63% of the world's web-facing servers and every one of the top 500 supercomputers. Android, built on the Linux kernel, sits on roughly 72% of the world's phones. The modern enterprise codebase is an estimated 77% open source by volume, and 96% of commercial codebases contain open source components. The internet you are reading this on is, in a real sense, a public works project that private companies decorated and monetized.
This was not charity. Open source won because it was the better deal for the buyer. You have to inspect the code, so trust did not require faith. You've got to fork it so that no vendor could hold your roadmap hostage. You've got to run it yourself, so your costs scale with your skill instead of someone else's pricing committee. The booms of the last thirty years were, underneath the marketing, a long transfer of leverage from sellers to builders. Open source was the mechanism.
The cultural memory of that transfer is why open source AI sounds reassuring. We assume the same gravity applies, that the open option will inevitably win because it always has. That assumption is exactly the kind of thing worth checking before you build a company on it.
So why does this one feel like rent?
Because the economics and the architecture both point in opposite directions. The most capable models are delivered as a service you reach through an API. You do not possess the weights. You cannot read the training data. You cannot host the thing on your own hardware and walk away. When the provider changes its pricing, terms, or model behavior, your only options are to accept it or undertake a painful migration you probably did not architect for.
The usage data show how thoroughly the service model has won out. MIT Sloan researchers found that closed models account for roughly 80% of all model usage, even though open models reach about 90% of closed-model performance at release and cost users roughly six times less. The same analysis estimated that shifting demand toward open models could save the global AI economy about $25 billion a year. Buyers are knowingly paying a large premium for the convenience of renting. That is not what a market does when ownership is easy. It is what a market does when the seller has made ownership feel hard.
And the dependency runs deeper than a monthly invoice. Switching between frontier providers is not a configuration change. Prompts are tuned to a specific model's quirks, evaluation suites are calibrated to its outputs, and entire product behaviors are built around one vendor's strengths. Analysts now describe this as genuine vendor lock-in, the kind that makes the second year of a contract more expensive than the first by design.
The open alternative is real, and most of it speaks Mandarin
Here is the twist that complicates any tidy just-go-open advice. The open frontier is no longer a hobbyist backwater. DeepSeek trained its R1 reasoning model for a disclosed $294,000 in a peer-reviewed Nature paper, a rounding error next to American training budgets, and shipped DeepSeek V4 as an open-weight model in April 2026. Stanford's HAI index reported that China has effectively erased the United States' lead in AI. The Economist found that roughly 80% of a16z's startups were running open Chinese models, a statistic Scott Galloway seized on in his 2026 predictions.
Galloway's framing is worth quoting because it captures the strategic stakes better than any benchmark:
This is how the correction/crash begins: AI dumping from China, US AI firms' growth slows, markets reprice the sector, which in turn reprices everything.
He argues that China can do to AI what it did to steel and solar — flood the world with capable, cheap, open models and crush the pricing power of the American incumbents. If you are a buyer, that sounds like salvation. Cheaper, ownable models breaking the rental cartel are exactly the open-source ending we are conditioned to expect.
But open and trustworthy are not the same word. The leading open models carry the values of the state that produced them. DeepSeek refuses, or self-censors, on Tiananmen Square, Taiwan, and the persecution of the Uyghurs, and a Promptfoo audit cataloged more than 1,100 questions on which it follows Beijing's line. You can run the weights yourself and strip some of that conditioning, but you are still building on a foundation engineered with a worldview baked in. The open exit ramp from American dependency may route straight through Beijing. That is a real choice, not a free lunch.
Meanwhile, the American open options were never fully open to begin with. The Open Source Initiative has stated plainly and repeatedly that Meta's Llama license is not open-source, because it restricts commercial use to above 700 million users and withholds the training data. And Meta, after a disappointing Llama 4, has signaled a retreat from its open posture. The Western flag-bearer for open weights is quietly lowering it.
The boiling water is your data
The frog does not die from the temperature. It dies from failing to notice the change. In this boom, the slow change is data.
Every prompt is a disclosure. When your teams paste contracts, source code, patient notes, financial models, and roadmaps into a hosted model, you are continuously exporting your proprietary context to a third party as a condition of using the tool. The enterprise contracts say the right things — OpenAI and Anthropic both commit not to train on business API data by default — but defaults are not guarantees, and the ground keeps shifting. In 2025, Anthropic changed its consumer terms to retain chats for up to five years and use them for training unless users opted out, a sixtyfold increase in retention. OpenAI spent much of the year under a court order to preserve user data it had promised to delete. Policies you relied on can change with a blog post or a subpoena.
The deeper issue is that data is the one asset you cannot buy back. You can switch model providers. You can renegotiate a price. You cannot un-disclose the decade of institutional knowledge you have already streamed into someone else's system. If the relationship between your data and a handful of model providers becomes the substrate your business runs on, you have recreated the exact dependency that open source spent thirty years dismantling, except this time the lock-in is made of your own information.
Is it a bubble, a dependency, or both?
It can be both, and the two risks compound. The financial structure of this boom has started to worry sober analysts. GMO points out that AI revenue this year is estimated at under $50 billion against more than $1 trillion of committed investment, and that OpenAI lost roughly $8 billion on $12 billion of revenue, with losses forecast to widen for years. Bloomberg has mapped the web of circular deals in which Nvidia, OpenAI, Oracle, and the hyperscalers invest in and buy from each other, the kind of arrangement that inflates reported demand and, as UBS and others note, rhymes uncomfortably with the late-1990s telecom bubble.
For a buyer, the dependency and the bubble are the same problem wearing two hats. If you have bet your operations on a single frontier vendor and that vendor's economics break — because Chinese dumping crushes margins, because the capex never earns out, because the circular financing unwinds — you do not just lose a stock position. You lose a supplier you built your company around. Open source never had this failure mode. A community-maintained project does not get repriced to zero by the public markets and vanish overnight. A venture-funded model provider burning tens of billions a year can.
The bull case is real too, and I want to be fair to it. The technology works, the productivity gains are not imaginary, and as with the actual dot-com bust, the infrastructure built during the mania will outlast the companies that overspent. JPMorgan's view is that the better question is not whether the deals resemble 1999 but whether the fundamentals do, and the fundamentals are stronger than they were for pets.com. But the technology survives, and your chosen vendor survives on terms you can afford, which are very different promises, and only one of them is yours to control.
Common Missteps
Misstep 1: Mistaking open weights for open source. Downloading Llama or DeepSeek and calling your stack open is comforting and mostly wrong. As the OSI keeps pointing out, these are open-weight models with restrictive licenses and undisclosed training data, not open source in the sense that won the last three booms. You get some portability, but you do not get the transparency or the legal freedom you are imagining, and you should plan accordingly.
Misstep 2: Treating the API as risk-free because the contract says so. Default terms are favorable today, which lulls teams into streaming sensitive data without governance. But retention policies change, court orders override promises, and consumer-grade tools with weaker terms creep into the org through the side door. Assume any data you send out could one day be retained longer or used differently than the version of the policy you signed up under.
Misstep 3: Going all-in on a single frontier vendor. Standardizing on one provider is the fastest way to ship, and the fastest way to build a dependency you cannot unwind. When that vendor faces a repricing event — a competitive shock from cheap Chinese models, a funding crunch, a strategy pivot — you inherit its problem with no exit ramp. The convenience of a single throat to choke becomes a single point of failure.
Misstep 4: Equating open with safe or neutral. Open weights solve lock-in. They do not solve governance, values, or geopolitics. The leading open models are Chinese and ship with censorship baked in, and the Western open option is run by a company now backing away from openness. Choosing open is a real strategy, but it trades one set of dependencies for another, and pretending otherwise leads to ugly surprises.
// Key Takeaways
1. Build behind a model gateway, not a model. Put an abstraction layer between your applications and any provider so swapping models is a config change, not a rewrite. The goal is to make your most important AI decision, which model, reversible, because in this market it will need reversing.
2. Run at least one open model in production today. Even routing a fraction of your traffic to an open-weight model preserves the institutional muscle and the pricing leverage you will want when negotiations get hard. MIT Sloan's numbers say you are likely leaving real money and optionality on the table by going all-closed by default.
3. Treat your data as an asset you cannot get back. Own your retrieval layer, your vector stores, and your retention policy, and route sensitive workloads to deployments with contractual or self-hosted guarantees. You can replace a vendor in a quarter, but you can never un-disclose what you have already sent.
4. Put the dependency on the budget line, not just the token price. Score every AI vendor decision on switching cost, financial durability, and data exposure alongside capability and price. The cheapest token from the most fragile supplier is not a bargain if the supplier reprices or disappears.
The playbook you can take into a budget meeting is short. Adopt a gateway this quarter. Stand up one open model behind it and send it real traffic. Inventory exactly what data is leaving your walls and through which terms, and move the sensitive workloads to deployments you control. Add a line to every AI vendor review for switching cost and supplier risk, and refuse to approve any architecture that has no exit ramp. None of this requires betting against AI. It requires refusing to bet your company on a single landlord.
We have done dependency before and clawed our way out with open-source, painfully, over the years. The difference this time is that the thing we are handing over is not our infrastructure but our information, and information does not come back. The water is comfortable right now. That is exactly the moment to find the edge of the pot.

