DeepSeek: Democratizing AI with Lower Costs

DeepSeek Day Two: A Shift Towards Enterprise AI Adoption

DeepSeek, a rising Chinese AI startup, is making waves with its significantly discounted foundation models. This move has the potential to revolutionize AI adoption for businesses by tackling one of the most significant barriers: cost.

The High Cost of AI Adoption

According to analysts Brad Sills and Carly Liu from BofA Global Research, the expense associated with AI applications is the primary obstacle hindering their widespread implementation. Their report, released on Tuesday, January 28th, suggests that breakthroughs in cost reduction could further decrease prices, leading to increased adoption rates.

DeepSeek’s announcement on Monday, January 27th, sent shockwaves through the AI industry, causing a decline in the shares of several AI companies. The company revealed its ability to train a foundation model for a mere $5.58 million using 2,048 Nvidia H800 chips. This figure stands in stark contrast to the estimated costs of OpenAI and Anthropic, which range from $100 million to a billion dollars and involve the use of thousands of Nvidia’s AI chips.

Roy Benesh, CTO at eSIMple, emphasized the transformative potential of DeepSeek’s achievement, stating that it empowers smaller companies, individual developers, and even researchers to leverage the power of AI without incurring exorbitant costs. This increased accessibility can foster the development of innovative ideas and technologies, leading to greater competitiveness in the field. As a result, customers can benefit from new options, while established AI companies are likely to lower their prices and accelerate technological advancements.

The BofA analysts provided examples of the costs associated with existing AI applications. Microsoft’s 365 Copilot Chat charges between 1 cent and 30 cents per prompt, depending on the complexity of the request. Salesforce’s Agentforce for Service Cloud charges a flat rate of $2 per conversion.

While BofA acknowledged that the $5.58 million figure presented by DeepSeek is somewhat misleading due to the exclusion of costs related to research, experiments, architectures, algorithms, and data, the analysts emphasized the significance of the startup’s innovations in demonstrating the feasibility of less costly training methods.

Pre-Training vs. Inferencing: Understanding the Costs

Foundation AI models, such as OpenAI’s GPT-4o and Google’s Gemini, undergo a process called pre-training, where they are exposed to vast amounts of data, such as the entire internet, to develop general knowledge. However, to make these models more relevant and useful for specific companies and industries, enterprises need to further train or fine-tune them using their own data.

Once the AI model has been fine-tuned, it can process user prompts and generate relevant responses. However, the process of prompting the model and obtaining a response incurs inferencing costs, which are fees associated with engaging the model with new data to understand and analyze.

It’s important to note that most companies do not bear the cost of training foundation models. This responsibility lies with the developers of these models, including OpenAI, Google, Meta, Amazon, Microsoft, Anthropic, Cohere, Hugging Face, Mistral AI, Stability AI, xAI, IBM, Nvidia, certain research labs, and Chinese tech giants like Baidu and Alibaba.

Businesses primarily incur inferencing costs for processing AI workloads, which constitute the majority of AI-related expenses.

The China Connection: DeepSeek’s Inferencing Costs and Privacy Concerns

DeepSeek offers its own inferencing services at significantly lower costs compared to Silicon Valley companies. However, there are certain considerations to keep in mind when using these services.

According to DeepSeek’s privacy policy, user information is stored on servers located in China. The company also states that it will comply with legal obligations and perform tasks in the public interest or to protect the vital interests of its users and other people.

China’s national intelligence law, specifically article 7, mandates that all organizations and citizens support, assist, and cooperate with national intelligence efforts in accordance with the law and protect national intelligence work secrets they are aware of.

Kevin Surace, CEO of Appvance, raised concerns about privacy, stating that data collection from users is a common practice in China. He advised users to exercise caution.

In an experiment conducted by PYMNTS, DeepSeek’s chatbot was asked to explain how the 1989 Tiananmen Square protests have influenced Chinese politics. The chatbot responded, ‘Sorry, I’m not sure how to approach this type of question yet.’

Tim Enneking, CEO at Presearch, pointed out that DeepSeek is a 100% Chinese-owned company located in China. He noted that the chatbot’s inability to provide information on Tiananmen Square or senior Chinese government figures suggests limitations in the technology’s objectivity. While Enneking acknowledged the exciting potential of the technology, he expressed concerns about its control.

However, Enneking also highlighted the open-source nature of DeepSeek’s models, which allows for revisions to remove government and corporate controls. He believes that the company’s engineering creativity creates opportunities for smaller companies and countries to participate and succeed in the generative AI landscape.

DeepSeek’s Potential to Lower Inference Costs for All

DeepSeek’s innovative approach to training foundation models at a lower cost has positive implications for companies like Microsoft, which can continue to reduce the cost of AI computing and drive scale. According to Sills and Liu, lower computing costs can lead to improved margins on AI-enabled offerings.

In a separate research note, BofA analysts Alkesh Shah, Andrew Moss, and Brad Sills suggested that lower AI compute costs could enable broader AI services across various sectors, from automobiles to smartphones.

While it’s unlikely that foundation model developers like OpenAI will immediately achieve training costs as low as DeepSeek’s, the analysts believe that DeepSeek’s innovative training and post-training techniques will be adopted by competing frontier-model developers to enhance efficiencies. However, they emphasize that current models will still require significant investment as they form the foundation for AI agents.

In the long term, the analysts anticipate accelerated adoption of AI by enterprises as chatbots, copilots, and agents become both smarter and cheaper, a phenomenon known as Jevons paradox.

Microsoft CEO Satya Nadella echoed this sentiment on X, stating that the Jevons paradox is at play as AI becomes more efficient and accessible. He believes that this will lead to a surge in AI usage, transforming it into a commodity that we cannot get enough of.

A Deeper Dive into Foundation Models and Their Impact

Foundation models, the backbone of modern AI, are revolutionizing how businesses operate and interact with technology. These models, trained on vast datasets, possess the ability to perform a wide array of tasks, from natural language processing to image recognition. The development and deployment of these models, however, involve a complex interplay of factors, including training costs, inferencing costs, data privacy, and ethical considerations.

Understanding Foundation Models

At their core, foundation models are large neural networks trained on massive datasets. This training process allows them to learn patterns and relationships within the data, enabling them to perform a variety of tasks with remarkable accuracy. Some examples of foundation models include:

  • GPT-4o: A powerful language model developed by OpenAI, capable of generating human-quality text, translating languages, and answering questions in a comprehensive manner.
  • Google’s Gemini: A multimodal AI model that can process and understand various types of data, including text, images, and audio.

These models are not limited to specific tasks but can be adapted to a wide range of applications, making them versatile tools for businesses.

The Role of Pre-Training and Fine-Tuning

The development of a foundation model typically involves two key stages: pre-training and fine-tuning.

  • Pre-training: In this stage, the model is trained on a massive dataset, such as the entire internet, to learn general knowledge and language skills. This process equips the model with the ability to understand and generate text, translate languages, and perform other basic tasks.
  • Fine-tuning: In this stage, the pre-trained model is further trained on a smaller, more specific dataset related to a particular task or industry. This process allows the model to adapt its knowledge and skills to the specific needs of the application.

For example, a pre-trained language model could be fine-tuned on a dataset of customer service interactions to create a chatbot that can effectively respond to customer inquiries. Fine-tuning allows for specialization and optimization for niche applications, enhancing the model’s performance in targeted areas. This customization is crucial for businesses looking to leverage foundation models for specific use cases, ensuring that the AI solutions are tailored to their unique needs and data. The fine-tuning process also helps mitigate biases present in the pre-training data, leading to more fair and accurate results.

The Cost of Training and Inferencing

The costs associated with foundation models can be divided into two main categories: training costs and inferencing costs.

  • Training costs: These costs involve the computational resources, data, and expertise required to train the foundation model. Training a large foundation model can be extremely expensive, often requiring millions of dollars in investment. The scale of the data and the complexity of the models necessitate significant computational power, often requiring specialized hardware like GPUs or TPUs. Moreover, the expertise of skilled AI researchers and engineers is essential for designing the model architecture, optimizing the training process, and ensuring the model’s performance meets the desired standards. Data acquisition and preparation also contribute significantly to the overall training costs.
  • Inferencing costs: These costs involve the computational resources required to use the trained model to make predictions or generate outputs. Inferencing costs can vary depending on the size and complexity of the model, the amount of data being processed, and the infrastructure being used. Larger models with more parameters typically require more computational power for inference, leading to higher costs. The volume of requests and the latency requirements also play a crucial role in determining the overall inferencing expenses. Optimizing the model for efficient inference, utilizing hardware acceleration, and leveraging cloud-based services can help reduce these costs.

DeepSeek’s innovation lies in its ability to significantly reduce the training costs associated with foundation models, making them more accessible to a wider range of businesses and organizations.

Addressing Privacy and Ethical Concerns

The use of foundation models raises important questions about data privacy and ethical considerations. Foundation models are trained on massive datasets, which may contain sensitive or personal information. It is crucial to ensure that these models are used in a responsible and ethical manner, respecting user privacy and avoiding bias. The potential for misuse of these models, such as generating fake news or creating deepfakes, necessitates robust safeguards and ethical guidelines. Transparency in model development and deployment is essential to build trust and ensure accountability.

Some strategies for addressing these concerns include:

  • Data anonymization: Removing or masking personal information from the training data to protect user privacy. Techniques like differential privacy and federated learning can be employed to train models without directly accessing sensitive data.
  • Bias detection and mitigation: Identifying and addressing biases in the training data to ensure that the model does not perpetuate harmful stereotypes or discriminatory practices. Tools and techniques for bias detection can help identify and quantify biases in the training data and the resulting models. Mitigation strategies can then be applied to reduce or eliminate these biases, ensuring fairness and equity.
  • Transparency and accountability: Providing clear information about how the model works and how it is being used, and establishing mechanisms for accountability in case of errors or unintended consequences. Model cards and documentation can provide transparency into the model’s capabilities, limitations, and intended use cases. Establishing clear lines of responsibility and accountability is crucial for addressing any unintended consequences or ethical concerns.

As foundation models become more prevalent, it is essential to address these privacy and ethical concerns proactively to ensure that they are used for the benefit of society.

The Future of Foundation Models

Foundation models are rapidly evolving, and their potential impact on society is immense. In the future, we can expect to see:

  • More powerful and versatile models: As researchers continue to develop new architectures and training techniques, foundation models will become even more powerful and versatile, capable of performing a wider range of tasks with greater accuracy. Innovations in areas like transformers, attention mechanisms, and neural architecture search are driving significant improvements in model performance.
  • Increased accessibility: As training costs decrease and cloud-based AI platforms become more prevalent, foundation models will become more accessible to businesses of all sizes. The rise of cloud-based AI services and the development of more efficient training techniques are making foundation models more affordable and accessible to a wider range of organizations.
  • New applications and use cases: Foundation models will continue to be applied to new and innovative use cases across various industries, from healthcare to finance to education. Applications like personalized medicine, fraud detection, and adaptive learning are just a few examples of the transformative potential of foundation models.

The rise of foundation models represents a paradigm shift in the field of artificial intelligence. By understanding their capabilities, costs, and ethical considerations, we can harness their power to create a better future. The ongoing research and development in this field promise even more exciting advancements and transformative applications in the years to come. As foundation models become increasingly integrated into various aspects of our lives, it is crucial to ensure their responsible and ethical deployment.

DeepSeek’s Contribution to Democratizing AI

DeepSeek’s achievement in significantly reducing the cost of training foundation models marks a pivotal moment in the democratization of AI. By lowering the barrier to entry, DeepSeek is empowering a broader range of organizations and individuals to participate in the AI revolution.

The Impact on Smaller Businesses

Smaller businesses often lack the resources and expertise to develop and deploy their own AI models. DeepSeek’s cost-effective foundation models provide these businesses with access to cutting-edge AI technology that was previously out of reach. This can level the playing field, allowing smaller businesses to compete more effectively with larger, more established companies. The ability to leverage AI for tasks like customer service, marketing, and product development can significantly improve their efficiency and competitiveness.

For example, a small e-commerce business could use DeepSeek’s models to personalize product recommendations for its customers, improve its customer service, or automate its marketing campaigns. By analyzing customer data and preferences, the model can generate tailored product recommendations, enhancing the customer experience and driving sales. AI-powered chatbots can provide instant customer support, resolving queries and addressing concerns efficiently. Automated marketing campaigns can be optimized based on data-driven insights, improving targeting and maximizing ROI.

The Empowerment of Individual Developers

DeepSeek’s models also empower individual developers and researchers to explore new AI applications and innovations. With access to affordable foundation models, developers can experiment with different ideas, develop new AI-powered tools, and contribute to the advancement of AI technology. This fosters a culture of innovation and accelerates the development of novel solutions across various domains.

This can lead to a surge in innovation, as more people have the opportunity to participate in the development of AI. Individual developers can create niche applications tailored to specific needs, contributing to a more diverse and vibrant AI ecosystem. The accessibility of foundation models allows them to experiment with different architectures, training techniques, and deployment strategies, pushing the boundaries of AI capabilities.

The Potential for Open-Source Collaboration

DeepSeek’s open-source approach further promotes collaboration and innovation in the AI community. By making its models available to the public, DeepSeek encourages developers to contribute to their improvement, identify and fix bugs, and develop new features. This collaborative approach accelerates the pace of innovation and ensures the models are robust and reliable.

This collaborative approach can accelerate the development of AI technology and ensure that it is used for the benefit of all. Open-source development fosters transparency and allows for community-driven improvements, resulting in more secure and trustworthy AI systems. The collective intelligence of the open-source community can identify and address potential biases and ethical concerns, ensuring the responsible development and deployment of AI technology.

The Acceleration of AI Adoption

By lowering the cost of AI, DeepSeek is accelerating the adoption of AI across various industries. As AI becomes more affordable and accessible, more businesses will be able to integrate it into their operations, leading to increased productivity, efficiency, and innovation. This widespread adoption of AI has the potential to transform industries and drive economic growth.

This can have a profound impact on the global economy, driving growth and creating new opportunities. AI-powered automation can streamline processes, reduce costs, and improve efficiency across various sectors, from manufacturing to healthcare to finance. The ability to analyze large datasets and extract valuable insights can lead to better decision-making and improved outcomes. The adoption of AI can also create new jobs and opportunities in areas like AI development, deployment, and maintenance.

A More Inclusive AI Ecosystem

DeepSeek’s efforts to democratize AI are contributing to a more inclusive AI ecosystem, where more people have the opportunity to participate in the development and use of AI. This can help to ensure that AI is used in a way that benefits all members of society, rather than just a select few. By promoting diversity and inclusivity, we can ensure that AI reflects the values and perspectives of a wider range of stakeholders.

By empowering smaller businesses, individual developers, and researchers, DeepSeek is fostering a more diverse and innovative AI landscape. This inclusivity can lead to the development of AI solutions that address the needs of underserved communities and promote social good. By ensuring that AI is developed and deployed in a responsible and ethical manner, we can create a more equitable and just society for all. The long-term success of AI depends on its ability to benefit all members of society, and DeepSeek’s efforts to democratize AI are a significant step in that direction.