Thoughts on DeepSeek

In summary, DeepSeek poses little threat to major tech companies but represents a significant challenge for ordinary people and knowledge workers.

Yann LeCun, Meta's Chief AI Scientist, described DeepSeek as:

"A victory for the open-source community."

The Inevitability of DeepSeek's Emergence

Why do I emphasize Yann LeCun's statement?

As an open-source foundational model, LLaMA not only gave birth to DeepSeek but also inspired other specialized models like Alibaba Cloud's Qwen and MediaTek's Breeze. This aligns perfectly with Meta's open-source strategy: leveraging global innovation to refine models while incorporating feedback to enhance their own large model development.

For Meta, technological breakthroughs are inevitable; the only uncertainty lies in which team will achieve them and when. Today, it might be DeepSeek; tomorrow, it could be MediaTek.

The Serendipity of DeepSeek's Success

Interestingly, DeepSeek initially focused on cryptocurrency mining and quantitative trading. They claim that the V3 model was essentially a side project. While some dismiss this claim, I personally agree with it. As mentioned earlier, companies fine-tuning LLaMA models are not primarily AI-focused but are experimenting with new methods through this open-source framework. DeepSeek, like many AI labs, stumbled upon an efficient solution by chance.

Here’s my speculation: early mining teams likely accumulated extensive technical knowledge to optimize GPU cluster computing. Reports suggest they even wrote PTX (a lower-level language than CUDA) to enhance performance, not to mention mastering mixed precision, MoE (Mixture of Experts), and multi-head attention mechanisms. Their results are truly impressive.

Why would a quantitative trading company delve into AI models? While their exact motivations are unclear, if an AI could analyze global markets and identify profitable opportunities, it’s no surprise that resourceful companies would pursue such developments.

The Impact of DeepSeek

DeepSeek has disrupted pricing significantly, challenging the business model of AI companies that rely solely on large model APIs for revenue. If DeepSeek's approach proves effective, major players could adopt it to develop smaller, domain-specific models—such as customer service, legal advisory, or single-specialty medical consultation models—reducing API costs without relying on massive, all-purpose models.

Recall OpenAI’s 12 Days of Christmas event last year, where they introduced reinforcement fine-tuning, allowing users to fine-tune models with minimal examples. I had a hunch that 2025 would be the year of low-cost, domain-specific fine-tuning, but DeepSeek has accelerated this trend by drastically lowering the barriers to model training.

Now, the question arises: Is it better to train a model from scratch or fine-tune an existing one?

The Good News Ends Here; Here’s the Bad News...

DeepSeek’s experiments show that beyond Scaling Law, there are numerous ways to enhance model performance and reduce training costs. Some worry this might erode the advantages of major AI companies. However, I believe this is a boon for them, as their computational resources could replicate DeepSeek’s entire model in days, enabling small teams to branch into new domains. Even mid-sized companies can build or rent computing power to create reasoning-capable AI models using DeepSeek R1’s playbook (as some student teams have already done).

Why is this bad news?

As the barriers to training and deployment plummet, more companies, teams, and individuals will dive into developing niche models. Professionals may soon find their expertise being systematically "cracked" by AI. For businesses, the pressure to adopt AI internally will intensify, as this is a forced upgrade for all. Early adopters will gain a competitive edge.

In short, AI will begin replacing certain jobs.

Beyond job displacement, there’s an even darker side: DeepSeek R1’s alignment capabilities are notably weak. With some background knowledge, it’s relatively easy to bypass its restrictions, making it a potential tool for malicious activities—think advanced fraud or even generating harmful content. DeepSeek is like an uncontrollable wildfire: it has the potential to create wonders but could also cause widespread devastation.

Moving forward, we must remain vigilant and critically evaluate the information we consume.

Conclusion

While some aspects are concerning, DeepSeek accelerates humanity’s journey toward AGI (Artificial General Intelligence). If AGI is inevitable, the challenges it brings must be confronted sooner or later. The democratization of technology is a double-edged sword: it fosters innovation but can also exacerbate societal divides. As we march toward AGI, mitigating AI’s negative impacts will no longer be just an academic warning but a collective responsibility.

This is a golden era of discovery and creation. We are all witnesses and participants in this transformative journey. I look forward to more groundbreaking breakthroughs and hope we can navigate this path wisely.

Finally, I include the Nvidia Project Digits image. Given everything discussed, I believe this product will become a standard for every company—a tool for every task.

Thoughts on DeepSeek
James Huang 2025年2月3日
このポストを共有
Chain of Thought: DeepSeek's Unique Approach to Reasoning Models