Thoughts on DeepSeek

TL;DR: DeepSeek, an open-source AI model, poses minimal threat to tech giants but significant disruption to individuals and knowledge workers. It exemplifies the power of democratized tech innovation, challenging existing AI business models and accelerating the shift towards smaller, domain-specific AI applications. However, this democratization also poses risks like job displacement and potential misuse.

Introduction

In the evolving landscape of artificial intelligence, DeepSeek emerges as a notable player, not by threatening major tech companies, but by reshaping the technological landscape for individuals and knowledge workers. According to Yann LeCun, Meta's Chief AI Scientist, DeepSeek is "a victory for the open-source community." This development highlights the unique capabilities and challenges that open-source AI models present in today's world.

The Inevitability of DeepSeek's Emergence

Why focus on Yann LeCun's statement? It underscores the strategic foresight embedded in Meta's open-source approach. LLaMA, a foundational model, paved the way for DeepSeek and inspired other specialized models like Alibaba Cloud's Qwen and MediaTek's Breeze. Meta's strategy leverages global innovation to refine and enhance their AI models through community feedback.

For Meta, the technological breakthroughs brought by models like DeepSeek were anticipated; the uncertainty lay in which team would achieve them first. Today, it's DeepSeek; tomorrow, another entity might take the lead.

The Serendipity of DeepSeek's Success

Remarkably, DeepSeek initially targeted cryptocurrency mining and quantitative trading. Their V3 model reportedly began as a side project—a claim that, though questioned by some, resonates with me. Companies fine-tuning LLaMA models often explore AI as a new frontier, experimenting with methods within this open-source framework.

Early mining teams amassed vast technical expertise in optimizing GPU cluster computing. Reports indicate they even employed PTX, a language more granular than CUDA, to maximize performance. It's no wonder their achievements are impressive.

But why would a quantitative trading company venture into AI modeling? The motivation could lie in the potential for AI to analyze global markets and identify lucrative opportunities—an attractive prospect for resourceful organizations.

The Impact of DeepSeek

DeepSeek significantly disrupts pricing structures, challenging AI companies reliant on large model APIs for revenue. If effective, this approach allows major players to develop smaller, specialized models for areas like customer service, legal advice, or medical consultations—reducing API costs without relying on vast, generalized models.

Not long ago, OpenAI introduced reinforcement fine-tuning during their "12 Days of Christmas" event, allowing users to fine-tune models with minimal examples. I anticipated 2025 as the year for cost-effective, domain-specific fine-tuning. However, DeepSeek has accelerated this trajectory, lowering the barriers to model training.

This raises an important question: Is it better to train a model from scratch or fine-tune an existing one?

The Good News Ends Here; Here’s the Bad News…

DeepSeek’s experiments reveal numerous methods to boost model performance and cut training costs beyond traditional Scaling Laws. While some fear this might undermine major AI companies' advantages, I see it as beneficial. Companies with substantial computational resources can replicate DeepSeek’s models in days, enabling small teams to enter new domains. Even mid-sized firms can harness or rent the computing power necessary to create advanced AI models, as some student teams have already demonstrated.

Why is this bad news?

As training and deployment barriers fall, more entities will develop niche models, potentially "cracking" professional expertise with AI. Businesses will feel increased pressure to adopt AI internally—a necessary upgrade for all. Early adopters will gain a competitive edge.

In short, AI will begin replacing certain jobs.

Beyond job displacement, there’s a darker aspect: DeepSeek R1’s alignment capabilities are weak. With some knowledge, it's quite easy to bypass its restrictions, making it a tool for potential malicious use, such as sophisticated fraud or harmful content generation. DeepSeek is akin to an uncontrollable wildfire: it holds the potential for innovation but also for widespread harm.

Conclusion

Despite the challenges, DeepSeek pushes humanity closer to AGI (Artificial General Intelligence). If AGI is inevitable, we must face its challenges sooner rather than later. While democratized technology spurs innovation, it can also deepen societal divides. As we advance toward AGI, mitigating AI’s adverse impacts will transition from academic caution to a shared responsibility.

We are living in a golden era of discovery and creation. As witnesses and participants, I eagerly anticipate more groundbreaking advancements and hope we can navigate this transformative path wisely.

On a final note, I believe Nvidia's Project Digits, given our discussion, will become a standard across companies—a versatile tool for diverse tasks.

Thoughts on DeepSeek
James Huang 4 Februari 2025
Share post ini
Chain of Thought: DeepSeek's Unique Approach to Reasoning Models