DeepSeek: How 10,000 GPUs and a Quant Trader Sparked an AI Revolution
Jan 26, 2025
Let’s talk about DeepSeek— the open-source AI model that’s been quietly reshaping the landscape of generative AI. If you’ve been following the chatter on social media, you’ve probably seen its name popping up more and more. But what’s the story behind it? How did it go from a quant trader’s passion project to one of the most talked-about models in the AI space?
Let’s dive in.
DeepSeek Origins: A Quant Trader’s Obsession
DeepSeek was founded in 2023 by Liang Wenfeng, a Zhejiang University alum (fun fact: he attended the same university as our CEO and co-founder Sean @xiangrenNLP, before Sean continued his journey on to Stanford and USC!). Liang’s background in quantitative trading at High-Flyer gave him a unique perspective on AI’s potential. Long before the generative AI boom, he was stockpiling 10,000+ NVIDIA A100 GPUs—yes, you read that right. By 2021, he had already built a compute infrastructure that would make most AI labs jealous!
His mission? To pioneer AGI (Artificial General Intelligence) through algorithmic innovation, not brute-force compute. This focus on efficiency became a necessity due to US chip export restrictions, but it also set DeepSeek apart from the start.
DeepSeek Model Evolution: From V1 to R1
DeepSeek’s journey began with DeepSeek-V1/V2, which introduced novel architectures like Multi-head Latent Attention (MLA) and DeepSeekMoE. These innovations reduced compute costs while improving inference efficiency, laying the groundwork for what was to come.
Then came DeepSeek-V3 in December 2024—a 671B parameter MoE model (with 37B active parameters per token) trained on 14.8 trillion tokens. V3 achieved GPT-4-level performance at 1/11th the activated parameters of Llama 3.1-405B, with a total training cost of $5.6M. Key innovations like auxiliary-loss-free load balancing MoE,multi-token prediction (MTP), as well a FP8 mix precision training framework, made it a standout.
But the real game-changer was DeepSeek-R1 in January 2025. This 671B-parameter reasoning specialist excels in math, code, and logic tasks, using reinforcement learning (RL) with minimal labeled data. It’s open-sourced under an MIT license, outperforming OpenAI’s models in benchmarks like AIME 2024 (79.8% vs. 79.2%).
DeepSeek Team: Young, Bold, and Resourceful
DeepSeek’s core team is a powerhouse of young talent, fresh out of top universities in China. The culture? Think OpenAI’s early days: flat hierarchy, resource freedom (anyone can request GPU clusters), and a focus on curiosity-driven research. It’s no wonder they’ve been able to iterate so quickly and effectively.
Their Impact on Today’s AI Ecosystem
DeepSeek has proven that high performance doesn’t require exorbitant compute. V3’s
~$5.6M training cost is a fraction of GPT−4o’s ~$100M, and R1’s open-source release has democratized access to state-of-the-art AI. This has put significant pressure on closed-source rivals, making DeepSeek a leader in the open-source AI movement.
The results speak for themselves: DeepSeek-R1 is ranked #4 on Chatbot Arena (as of January 2025), the only open-source model in the top 10 (except DeepSeek-V3)!
Dive Deeper in DeepSeek
For the technically inclined, here are some resources to explore:
DeepSeek-V3 Technical Report: arxiv.org/abs/2412.19437v1
DeepSeek-R1 GitHub Repo & Paper: github.com/deepseek-ai/DeepSeek-R1
Founder’s Philosophy (Interviews on LessWrong): lesswrong.com/posts/kANyEjDDFWkhSKbcK
DeepSeek and Sahara AI
We recognized DeepSeek's potential early in 2024 and made it a core part of our work. This quarter, R1 will be one of the flagship models in our AI Studio launch, alongside other leading models.
We can’t wait to show you everything we’re building. Join our Developer Early Access Program here to be one of the first to try out our upcoming AI development platform: https://hi.saharalabs.ai/dev-early-access