What's Novel About DeepSeek's Approach? |
It's technical, but at a high level, here's the approach the team at DeepSeek took: |
- DeepSeek used 8-bit floating point numbers as opposed to 32-bit floating point numbers. This isn't as precise, but it is far more efficient with computational resources.
- The model uses a multi-token prediction model rather than a single-token prediction model. We can think of the tokens as the outputs of the model (i.e. the answer to a prompt). By using a multi-token model, it doubles the inference efficiency while being almost as accurate as a single-token model.
- They used a mixture-of-experts (MOE) architecture with some innovation around load balancing. This allows them to have a massive model… but only a small portion of that model is active at any time. The active portion of the model depends on the task it is trying to solve. Doing so reduces computational resources and improves efficiency.
- DeepSeek uses a form of reinforcement learning called Group Relative Policy Optimization (GRPO) to improve reasoning capabilities. This technology compares different responses and selects the best one, which greatly reduces computational overhead compared to the prevailing critic model.
- Compression of the key value indices – which is how individual tokens are represented in the architecture – results in more than 90% compression ratios and dramatically lower memory requirements.
|
It's not necessary to understand all of the technical details as to how DeepSeek accomplished the feat for so little. What's key to understand is the result – DeepSeek is about 45 times more efficient than OpenAI o1. |
And this means that it is dramatically cheaper to use: |
| OpenAI o1 | DeepSeek-R1 | Pricing 1M input tokens | $15 | $0.14 | Pricing 1M cached tokens | $7.50 | $0.55 | Pricing 1M output tokens | $60 | $2.19 | |
Tokens are basically small blocks of words, part of a word, or software code, etc. which are used in both inputs and outputs of large language models. And DeepSeek's pricing is more than 95% cheaper than OpenAI's o1. |
And that's not all. The software has been open-sourced. |
This raises the question again: Why are companies spending hundreds of billions of dollars to build these foundational models… when the software is open-sourced and can be built for a few million? |
And this is where there are some large caveats that the market is missing entirely. |
The Story No One Is Telling You About DeepSeek |
As I've been tracking the finance and tech community's response to this news over the weekend, here's the part of the story I'm not seeing being communicated… |
- DeepSeek V3 may have only cost around $5.5 million to build and train – and it is more efficient to run – but they absolutely spent a lot more money developing earlier versions as they ramped up the technology. I suspect they spent tens of millions to get to V3 – none of it mentioned. Still, yes, it's a lot less than hundreds of millions. But it's not as little as is being proclaimed.
- DeepSeek also benefited from other open-source models made available, such as Alibaba's Qwen 2.5 and Meta's Llama3, which both open-sourced some of their large language models and were used in some of the development of DeepSeek's models.
- DeepSeek is not actually open-source. They have released the weights of the model so that developers can work with DeepSeek, but they have not released any details about the training data or the processing software code. Said another way, there is no transparency.
|
Many were also quick to test DeepSeek-R1 for any influence by the Chinese Community Party (CCP). After all, any meddling here could imply a far greater concern. |
Sadly, when the model is queried about Tiananmen Square, China's human rights abuses with the Uyghurs in Xinjiang, or China's efforts to erase Tibetan culture, we simply won't get an answer. |
DeepSeek demonstrates the same kind of political bias that both Meta's and Google's models have demonstrated. |
And there's an important point we'll have to save for another day regarding the accuracy of the model, as well. |
The point is that sacrifices were made in the name of efficiency for DeepSeek… and much of those sacrifices had to do with the accuracy of the outputs. |
We haven't yet seen how DeepSeek performs on the most difficult AI benchmarks – that's the most critical piece on the road to building an AGI. |
But none of this has stopped people from downloading DeepSeek – the AI Assistant smartphone app. In a matter of days, DeepSeek overtook ChatGPT on the App Store: |
|
This all happened basically overnight. And I can't help but find the timing suspicious. |
Is It a Coincidence? |
|
TikTok has not only been a tool for China to conduct a massive psyops on Western civilization, it is the single largest intelligence gathering tool of the CCP as it collects data from U.S. consumers' phones and sends that information back to Beijing. |
While President Trump has paused the ban for 75 days to pursue a resolution that can protect U.S. national security, the working assumption of the Chinese government has been that TikTok would be banned. |
Is it that much of a leap to think that the CCP funded the development of a new, wildly popular app that would be downloaded as a new data surveillance tool to replace TikTok? While I'm just connecting the dots and speculating, this would make perfect sense to me. |
And China – which has been way behind in the development of artificial intelligence – wants two things to happen: |
- It wants to catch up, even if it means the theft of Western intellectual property and software models, a well-known practice with a long history
- It wants to find a way to slow down U.S. development in artificial intelligence
|
As we don't have access to the data used for training DeepSeek or the software code, despite being called "open source," it's likely that DeepSeek wasn't just the result of great ingenuity. |
And what better way to disrupt the capital formation in Silicon Valley around artificial intelligence than to present a model that was "built for just $5.5 million", suggesting that Project Stargate and all the efforts of Amazon, OpenAI, Microsoft, xAI, Anthropic, Apple, Perplexity, and so many others are the "wrong," wasteful approach. |
China just threw a curveball to Silicon Valley and its backers, wanting them to hit "pause" and question their approach to AI. |
But let's think about that for a moment… |
Will it work? Will they slow down? Will they stop buying NVIDIA and AMD semiconductors? |
Will the U.S. pivot… reduce investment… and adopt a more frugal approach to research and development instead like DeepSeek? |
I think not. |
The sheer threat that a small China company might catch up to the best-in-class models out of the U.S. will only light a fire under the U.S. government and Big Tech to lean in even more. It is a matter of national security. And the more legitimate a threat becomes, the more focus the technology will receive. |
Whether the radical curveball came from DeepSeek or some other company… whether it's something to fear or not… |
This is a no-holds-barred race for AI supremacy. Anything goes. This technology will not only unleash the greatest productivity boom in history… but it can also be used for both offensive and defensive capabilities against adversaries. |
Never before has so much focus been on a single technology. So much mindshare, capital, and technical talent are why it's inevitable we're going to see spikes of innovation and big leaps forward. |
It's evolving quickly in real-time. |
And we shouldn't forget that in the weeks ahead, we'll be seeing the latest releases of Anthropic, xAI, Google, and OpenAI, at a minimum. |
As a reminder, the end game isn't just a fabulous AI assistant (w/agentic AI). It's artificial general intelligence. |
And the more capable AI becomes only means that utilization is going to skyrocket. The focus will soon shift from training to inference, which will benefit AMD, Cerebras, and Groq even more due to their unique semiconductor architectures. |
The growth in AI adoption will, by far, outweigh any efficiency gains in training and running large language models. |
It's not like we haven't seen this story before. |
Computing used to be centered around mainframe computers. And as semiconductors got more powerful and software code improved, computers not only got smaller, they got a lot cheaper – cheap enough to have several in every home and one in every hand (smartphones). |
Did that shrink the market for computing… or grow it? |
It grew it… exponentially. |
And that's precisely what's going to happen with AI. |
Jeff |
P.S. I've put my boots on the excavators clearing land for the next hyperscale data centers across the country. I've seen the chips and tested the software and have been researching very closely, for years now. |
The AI buildout is a trend I've been tracking very closely, and one we're heavily invested in. Do I believe this is a buying opportunity for many of the stocks getting hammered right now? Absolutely. |
For more as we track DeepSeek in the days and weeks to come, I encourage all my readers to join our flagship investment advisory,The Near Future Report, for more. |
|
|
No comments:
Post a Comment