AI News

Chinese open-source DeepSeek trained top-notch AI with $294,000,is much less than what has been reported for its American competitors

开源黑马DeepSeek改写AI科技竞赛规则 Chinese open-source DeepSeek trained top-notch AI Rewrite the rules of the global science competition Cross-Cultural Investment Fusion

Beijing/San Francisco – An “earthquake” about the future of artificial intelligence is coming from the East, with its epicenter being a Chinese company called DeepSeek. Recently, a paper in the authoritative scientific journal Nature disclosed an astonishing figure: DeepSeek trained its advanced inference AI model R1 at a cost of only $294,000. This figure, like a deep-water bomb dropped into a calm lake, not only pales in comparison to its American counterparts that have spent hundreds of millions of dollars, such as OpenAI and Google, but also has sparked a profound reflection in the global tech industry on efficiency, innovation, and the future ownership of hegemony.

This is not merely a story about costs. This is a story about wisdom, strategy and resilience. In a field dominated by giants and where capital and top-tier chips seem to determine everything, DeepSeek, this “dark horse” from China, is challenging the inherent “law of power” in Silicon Valley with a disruptive posture. Its rise not only made Microsoft CEO Satya Nadella exclaim it as a new “industry benchmark”, but also forced all of us to rethink: Is there really only one road to the “Rome” of artificial intelligence paved with trillions of capital?

The story of DeepSeek offers a valuable perspective: innovation is not always equated with huge investment, and the power of openness and collaboration can sometimes unleash a more vigorous vitality than a closed “arms race”.

The “cost myth” and the technological fog

$294,000, this sum of money in Silicon Valley might only be enough to cover the salaries of a few top AI engineers for a few months or to purchase a small number of racks for high-performance computing clusters. However, DeepSeek claims to have accomplished the core work of training a world-class AI model with it. How on earth was this achieved?

The paper in the journal Nature elaborates that the training of the DeepSeek R1 model utilized 512 Nvidia H800 chips. Here is a crucial background: The H800 is a chip specially supplied by NVIDIA for the Chinese market after the United States implemented export controls in October 2022. Its performance, especially in terms of high-speed interconnection between chips, is somewhat reduced compared to the top H100 or A100 chips in the global market.

This is like a car race where DeepSeek’s racing engine is restricted. However, they did not give up because of this. Instead, through extreme software optimization, innovative algorithms and ingenious scheduling of computing resources, they squeezed out astonishing efficiency on this “constrained engine”. This proves a profound truth: the limits of hardware can be surpassed by the wisdom of software.

The hardware story of DeepSeek has not been smooth sailing. In June, US officials claimed that DeepSeek had obtained “a large quantity” of top-of-the-line H100 chips purchased after export controls. Nvidia denied this, insisting that DeepSeek legally uses the H800.

In the supplementary materials of the journal Nature, DeepSeek admitted for the first time that it does indeed have more powerful A100 chips and stated that these chips were used in the “preparation stage” of project development, such as conducting experiments and verifications on smaller models. Subsequently, the core R1 model was transferred to a cluster of 512 H800 chips and underwent a total of 80 hours of high-intensity training.

The disclosure of this detail partly explains why DeepSeek was able to attract China’s top AI talents – with the A100 supercomputing cluster, which was an extremely rare and valuable resource in China at that time.

Another controversy surrounding DeepSeek is about the “Distillation” technology it uses. In simple terms, this is a technique that enables a smaller “student” AI model to be trained by learning the output of a more powerful “teacher” AI model. The advantage of doing so is that the “student” model can inherit some of the “wisdom” and capabilities of the “teacher” model at an extremely low cost, without having to repeat the expensive training process from scratch of the “teacher” model.

American consultants and some experts in the field of AI have accused DeepSeek of “deliberately distilling” OpenAI’s models. DeepSeek has always insisted that “distillation” is a legal and efficient technology that can significantly reduce the training and operation costs of AI, thus enabling more people to enjoy the benefits brought by AI technology. Especially in the context of huge energy consumption of AI models, this has important practical significance.

In a paper published in the journal Nature, DeepSeek offered a more subtle explanation for this. They admitted that the training data for their V3 model originated from crawled public web pages, which contained “a large number of responses generated by OpenAI models”. Therefore, their model might have “indirectly acquired knowledge from other powerful models”. However, DeepSeek emphasizes that this was not intentional but occurred by chance when capturing vast amounts of Internet data, and it is a kind of “knowledge infiltration” that cannot be completely avoided.

This explanation transforms a sharp intellectual property issue into a complex discussion about the Internet information ecosystem: In a world where AI-generated content (AIGC) is increasingly popular, how to define the boundary between “original data” and “AI-polluted data”?

AI Design The multi-step pipeline of DeepSeek-R1

AI Design The multi-step pipeline of DeepSeek-R1

The Power of Openness – “This is not just a model; it’s a movement.”

The rise of DeepSeek is not merely a breakthrough in technology and cost; the deeper reason lies in its aggressive open-source strategy. In an era when most top AI models are strictly protected by tech giants as “family heirlooms”, DeepSeek has chosen a completely different path.

When closed-source models such as OpenAI’s GPT series and Anthropic’s Claude series charge enterprises high fees through API calls, DeepSeek has open-sourced its most advanced models (such as V3.1) on platforms like Hugging Face, allowing developers worldwide to download, research, modify and use them for free.

This move directly challenges the core business logic behind the United States’ leadership in AI. An advocate of the open source community commented on DeepSeek in this way: “This is not just a model; it’s a movement.” The core concept of this movement is that openness, collaboration and transparency are the best ways to promote the faster and better development of AI technology.

The release of DeepSeek V3.1 was meticulously scheduled following the release of OpenAI’s GPT-5 and Anthropic’s Claude 4.1. In multiple key benchmark tests, its performance is on par with these top closed-source models, and in some aspects, it even surpasses them. When some users were disappointed with the performance of GPT-5, a powerful and completely free open-source alternative – DeepSeek V3.1 – naturally caught the attention of developers worldwide.

This strategy enabled DeepSeek to rapidly grow from an unknown Chinese startup to a banner in the global AI open-source community in less than two years. Just one week after its release, the app of the same name topped the free app download chart in the United States, which is sufficient proof of its huge appeal.

For ordinary families, DeepSeek’s open-source model offers an unprecedented opportunity. They no longer need expensive API keys to access and learn the world’s most cutting-edge AI technologies, which greatly reduces the costs of learning and innovation, truly achieving the “inclusiveness” of AI.

The “DeepSeek Moment” in Silicon Valley and the reshaping of the global AI landscape

The shockwaves brought by DeepSeek have profoundly influenced the strategic thinking of global tech giants.

In a public speech, Microsoft CEO Satya Nadella generously expressed his appreciation for DeepSeek. He was “super impressed” by how a team of only about 200 people could create an AI product that topped the app store. He made it clear that DeepSeek has become a new “Benchmark” for Microsoft to measure its own efficiency and success in the field of AI.

Ironically, the US chip export control aimed at slowing down the development of AI in China may, to some extent, have “forced” DeepSeek’s innovation. As some analyses have pointed out, when it is not easy to obtain the most advanced chips, Chinese engineers are forced to devote more energy to algorithm optimization and improving computing efficiency.

This kind of “innovation driven by scarcity” has enabled companies like DeepSeek to forge a technological path that is different from that of the United States, where “great computing power creates miracles”. In the long run, this efficient innovation ability honed under resource-constrained conditions may become a more resilient and competitive advantage.

The success of DeepSeek has painted a more exciting picture for the future development of global AI. It indicates that the world of AI may not be monopolized by a few tech giants. Teams from different countries and backgrounds, as long as they possess outstanding intelligence and innovative ideas, all have the opportunity to participate in this great technological revolution.

The open-source model will give rise to a more prosperous and diverse AI ecosystem. Developers can create countless AI applications tailored to specific industries and demands based on fundamental models like DeepSeek, just as the Android ecosystem was born from the open-source Linux system in the past. This will greatly accelerate the popularization and application of AI in various fields such as education, healthcare, and entertainment, ultimately benefiting every family.

The story of DeepSeek is far from over. It is led by a low-key founder, Liang Wenfeng (who is also the co-founder of China’s top quantitative hedge fund “Huanfang”), and its future is full of infinite possibilities and uncertainties. But it has already proved to the world:

Innovation is not judged by its origin: Disruptive ideas can be born anywhere, not just in the garages of Silicon Valley.

Efficiency is the new force: In the AI era, how to use resources more intelligently may be more important than how many resources one possesses.

Openness is the key to the future: sharing knowledge and collaborative innovation are the most effective ways for humanity to address common challenges and accelerate technological progress.
For each of us, whether we are tech professionals, investors or educators, the story of DeepSeek offers a valuable lesson: In a rapidly changing world, maintaining an open mind, paying attention to those “game-breakers” who dare to challenge the norm, and encouraging the next generation to learn and embrace technologies that can empower more people might be the best strategy for us to win the future.

中国开源黑马DeepSeek改写AI科技竞赛规则Isometric GPU Infographic Chinese open-source dark horse DeepSeek trained top-notch AI Rewrite the rules of the global science and technology competition

Isometric GPU Infographic Chinese open-source dark horse DeepSeek trained top-notch AI

Thoughts on this report

Question 1: What do you think of the credibility or relevance of this report?

I believe this report (based on the paper in the journal Nature and related facts) has extremely high credibility and relevance.

Credibility

Core source authority: Nature is a top global scientific journal, and the papers it publishes undergo strict peer review. Data co-authored by the DeepSeek team (including its founder, Liang Wenfeng) is backed by credibility at the level of scientific publications.
Multi-party cross-verification: Key information in the report, such as the chip model (H800/A100), comments from Microsoft’s CEO, and reactions from the open-source community, can all be cross-verified from Reuters, technology media, and public records, forming a complete chain of evidence.
Logical consistency: Despite the astonishing cost figures, the logic of its low cost and high efficiency, combined with the “constrained” chips it uses, the efficient “distillation” technology and the open-source strategy, is inherently consistent and not a fantasy.

Relevance

Touching upon the core issues of the industry: the training cost of AI, the reliance on computing power, the technological competition between China and the United States, and the debate between open source and closed source, these are all the most core and closely watched issues in the current technology field. The case of DeepSeek happens to be the convergence point of all these issues and is thus highly relevant to the development of the industry.
Triggering a chain reaction in the market and policies: The report mentioned that the emergence of DeepSeek once led to a sell-off in technology stocks and forced US policymakers to reevaluate the effectiveness of their export control strategies. This indicates that its influence has transcended the tech circle and directly affects the capital market and international relations.
Profound implications for the future: This is not merely a piece of news about a single company; it reveals a new model that could potentially alter the trajectory of AI development. This case holds extremely high research and reference value for anyone who is concerned about the future of technology, innovation strategies and the global competitive landscape.
This report serves as a key entry point for understanding the current dynamics and future trends of the global AI competition.

Question 2: What’s your opinion on this topic?

My view on this topic is that it is a landmark event of a “paradigm shift”, brimming with exciting possibilities, yet also accompanied by complex real-world challenges.

Exciting aspects

The dawn of AI’s “democratization” : What excites me the most is the potential for “AI democratization” brought about by DeepSeek’s open-source strategy. It has broken the monopoly of top AI technology by a few giants, greatly reducing the cost for global small and medium-sized enterprises, research institutions, and even individual developers to participate in cutting-edge innovation. This is just like the invention of printing, which made knowledge no longer exclusive to a few people and is bound to give rise to an incalculable wave of innovation.

A powerful refutation to the “computing power only” theory: For a long time, the AI field seems to have been trapped in a myth of a “computing power arms race”, believing that only unlimited capital and the most advanced chips can drive progress. The success of DeepSeek proves that the ingenuity of algorithms, the extremity of engineering and the wisdom of strategy are equally, and possibly even more sustainable driving forces for innovation. This brings hope to participants with relatively limited resources but abundant intellectual capital.

A healthy global competitive landscape: A strong competitor, no matter where they come from, can prompt the leader to remain vigilant and not slack off. The existence of DeepSeek has exerted a positive pressure on companies like OpenAI and Google, which will prompt them to accelerate innovation, reduce costs and improve services. Ultimately, the beneficiaries will be users around the world.

Complex real-world challenges

The gray area of intellectual property rights: The controversy over “distillation” and the “accidental inclusion” of AI-generated content in training data reveals the lag of existing intellectual property laws in the AI era. How to define the “learning” and “plagiarism” of AI models, and how to protect the rights and interests of original creators without stifling innovation, will be a huge challenge that the global legal community will face in the future.
The geopolitical game: The success of DeepSeek will inevitably be placed under the grand narrative of the technological competition between China and the United States. This may subject it to additional scrutiny and distrust when expanding into Western markets, and its technological and business decisions may also be influenced by geopolitical factors.

The “double-edged sword” of open source: While open source brings about inclusiveness and innovation, it also poses security risks. To ensure that powerful open-source AI models are not used for malicious purposes, it is necessary to establish a global and responsible AI governance framework, but this is undoubtedly extremely difficult.

DeepSeek is like a “wall-breaker”, shattering many of our preconceived notions about the development path of AI. The efficient, open and pragmatic spirit of innovation it represents is the most precious wealth of this era. We should view it with an open and learning attitude. We should not only appreciate its significant technical achievements and contributions to the open-source community, but also face up to the complex problems it has triggered.

The most important lesson for families and the next generation is: never be intimidated by superficial “barriers”, whether they are technological, capital or geographical. True breakthroughs often stem from creative thinking that seeks freedom within limitations and discovers opportunities in challenges.

Watch the analysis of DeepSeek’s new AI model. This video provides a detailed analysis of the DeepSeek V3.1 model, comparing its performance and cost-effectiveness with competitors such as GPT-5, which is consistent with the core theme of the article.

Background science popularization

What is “training cost”?
Training a large language model, in simple terms, is to have a bunch of powerful computing chips repeatedly do “practice” on a large amount of text and code in order to learn how to write articles, answer questions or reason. This process requires continuous calculations from several weeks to several months, consuming electricity and time, which are the main sources of training costs.

What is a GPU (such as A100, H100, H800)?
The GPU (Graphics Processing Unit) is the “engine” for training models. Different models (A100, H100, H800) vary in computing power and energy efficiency. The United States has imposed controls on the export of certain high-performance chips to China, so manufacturers have launched dedicated markets (such as H800) or alternative solutions. The model of a chip is not simply “the newer the better”, but rather depends on the task, cost and availability.

What is “distillation”?
Distillation is a “leveraging” method: a large, expensive “teacher model” is used to teach a smaller, cheaper “student model”, enabling the latter to perform better at a lower cost. The key issue is whether the rights, sources and training data of the teacher model are legal and transparent.

Leave a Reply

Your email address will not be published. Required fields are marked *