Ant’s Innovative Approach to AI Model Training
Ant Group, the fintech affiliate of Alibaba and backed by Jack Ma, has achieved a significant advancement in the field of artificial intelligence. The company has successfully developed and implemented techniques for training AI models using domestically produced semiconductors in China. This innovative strategy has resulted in a substantial cost reduction of approximately 20%. Sources familiar with the internal workings of Ant Group have indicated that the company utilized chips from various Chinese manufacturers, including its affiliate Alibaba Group Holding Ltd. and Huawei Technologies Co. These chips were employed to train models using the Mixture of Experts (MoE) machine learning approach, a sophisticated technique gaining traction in the AI community.
The performance achieved by Ant Group using these Chinese-made chips was reportedly comparable to that obtained using Nvidia Corp.’s high-end chips, such as the H800. The H800 is a powerful processor specifically designed for AI workloads, but its export to China is currently restricted by the U.S. government. While Ant Group continues to utilize Nvidia hardware for certain aspects of its AI development, it is increasingly shifting its focus towards alternative solutions for its latest models. These alternatives include chips from Advanced Micro Devices Inc. (AMD) and, notably, the aforementioned Chinese-manufactured chips.
Entering the AI Race: China vs. U.S.
Ant Group’s entry into the arena of AI model development places it directly in the center of an increasingly competitive landscape between Chinese and U.S. technology companies. This rivalry has intensified significantly since DeepSeek, a Chinese AI startup, demonstrated the feasibility of training highly capable AI models at a significantly lower cost than that incurred by industry leaders such as OpenAI and Alphabet Inc.’s Google. These tech giants have invested billions of dollars in AI research and development. Ant Group’s achievement further underscores the growing determination of Chinese companies to leverage locally sourced alternatives to the most advanced Nvidia semiconductors, which are subject to export controls.
The Promise of Cost-Effective AI Inferencing
The research paper published by Ant Group earlier this month highlights the potential of its newly developed models. The paper claims that these models exhibit superior performance in certain benchmark tests compared to those offered by Meta Platforms Inc. It is important to note that these claims have not yet been independently verified by external sources such as Bloomberg News. However, if Ant Group’s platforms perform as claimed, they could represent a substantial leap forward in the development of Chinese artificial intelligence capabilities. This is primarily attributed to their potential to dramatically reduce the cost associated with AI inferencing. Inferencing is the crucial process of deploying and running trained AI models to provide services and support various applications.
Mixture of Experts: A Game-Changer in AI
As companies continue to invest heavily in artificial intelligence, MoE models have emerged as a popular and efficient approach to training and deploying AI systems. This technique, which is also employed by companies like Google and the Hangzhou-based startup DeepSeek, involves dividing complex tasks into smaller, more manageable sets of data. This approach can be likened to having a team of specialized experts, each focusing on a specific segment of a larger task. By distributing the workload in this manner, the overall process can be optimized for efficiency and performance.
Overcoming the GPU Bottleneck
Traditionally, the training of MoE models has been heavily reliant on high-performance computing hardware, particularly graphics processing units (GPUs) manufactured by Nvidia. The significant cost associated with these specialized chips has historically been a major barrier to entry for many smaller firms, effectively limiting the widespread adoption and development of MoE models. However, Ant Group has been actively working on developing methods to train large language models (LLMs) more efficiently, thereby circumventing this constraint. The title of their research paper, which explicitly states the goal of scaling a model “without premium GPUs,” clearly reflects this objective and their commitment to democratizing access to advanced AI technologies.
Challenging Nvidia’s Dominance
Ant Group’s approach directly challenges the prevailing strategy advocated by Nvidia’s CEO, Jensen Huang. Huang has consistently maintained that computational demand will continue to increase, even with the emergence of more efficient models like DeepSeek’s R1. He believes that companies will ultimately require more powerful and sophisticated chips to generate higher revenue, rather than cheaper alternatives to reduce costs. Consequently, Nvidia has remained focused on developing and manufacturing large GPUs with enhanced processing capabilities, including more processing cores, transistors, and increased memory capacity.
Quantifying the Cost Savings
Ant Group has provided concrete figures to demonstrate the cost-effectiveness of its optimized approach to AI model training. The company stated that training a model with 1 trillion tokens using high-performance hardware, such as Nvidia’s top-tier GPUs, would cost approximately 6.35 million yuan (equivalent to $880,000). However, by utilizing lower-specification hardware, combined with its proprietary optimization techniques, Ant Group can reduce this cost to 5.1 million yuan. Tokens represent the fundamental units of information that an AI model processes in order to learn about the world and generate relevant responses to user queries. This reduction in training cost represents a significant saving and could potentially open up new avenues for AI development and deployment.
Leveraging AI Breakthroughs for Industrial Solutions
Ant Group plans to capitalize on its recent advancements in large language models, specifically its Ling-Plus and Ling-Lite models, to develop and deploy industrial AI solutions tailored for specific sectors. These sectors include healthcare and finance, where AI has the potential to significantly improve efficiency, accuracy, and decision-making. The Ling models are designed to address the unique needs and challenges of these industries, providing customized solutions that leverage the power of AI.
Expanding AI Applications in Healthcare
Ant Group’s commitment to the healthcare sector is evident in its integration of the Chinese online platform Haodf.com into its artificial intelligence services. Through the creation of AI Doctor Assistant, Ant Group aims to support Haodf’s extensive network of 290,000 doctors by assisting with various tasks, including medical record management and administrative processes. This application of AI has the potential to significantly improve efficiency and accuracy in healthcare delivery, ultimately benefiting both doctors and patients.
AI-Powered Assistance for Everyday Life
Beyond healthcare, Ant Group has also developed an AI-powered “life assistant” application called Zhixiaobao and a financial advisory AI service named Maxiaocai. These applications demonstrate Ant Group’s broader ambition to integrate artificial intelligence into various aspects of daily life, providing users with personalized and intelligent assistance in a wide range of tasks and activities.
Benchmarking Performance: Ling Models vs. Competitors
In its research paper, Ant Group claims that the Ling-Lite model outperformed one of Meta’s Llama models in a key benchmark test for English-language understanding. Furthermore, both the Ling-Lite and Ling-Plus models demonstrated superior performance compared to DeepSeek’s equivalent models on Chinese-language benchmarks. These results highlight Ant Group’s competitive position in the rapidly evolving AI landscape and its ability to develop models that can compete with those from leading international companies.
As Robin Yu, chief technology officer of Beijing-based AI solution provider Shengshang Tech Co., aptly stated, “If you find one point of attack to beat the world’s best kung fu master, you can still say you beat them, which is why real-world application is important.” This emphasizes the significance of practical applications and the ability to demonstrate tangible results in real-world scenarios.
Open-Sourcing for Collaboration and Innovation
Ant Group has made the Ling models open source, a strategic decision that aims to foster collaboration and innovation within the broader AI community. By making the models publicly available, Ant Group encourages other researchers and developers to build upon its work, contribute to its improvement, and explore new applications. Ling-Lite comprises 16.8 billion parameters, which are adjustable settings that control the model’s behavior and performance. Ling-Plus, on the other hand, boasts a significantly larger 290 billion parameters, placing it among the larger language models currently available. To provide context, experts estimate that ChatGPT’s GPT-4.5 has approximately 1.8 trillion parameters, while DeepSeek-R1 has 671 billion.
Addressing Challenges in Model Training
Ant Group’s journey in developing these advanced AI models has not been without its challenges. The company encountered difficulties in certain areas of training, particularly concerning the stability of the models. Even minor alterations in the hardware configuration or the model’s internal structure could lead to significant issues, including fluctuations in the models’ error rate and overall performance. This underscores the complexity and sensitivity involved in training large and sophisticated AI models, and the need for meticulous attention to detail.
Real-World Deployment in Healthcare
Ant Group’s commitment to practical applications is further demonstrated by its deployment of healthcare-focused large model machines. These machines are currently being utilized by seven hospitals and healthcare providers in major cities across China, including Beijing and Shanghai. The large model leverages a combination of technologies, including DeepSeek R1, Alibaba’s Qwen, and Ant Group’s own LLM, to provide a range of medical consultancy services. These services aim to assist doctors in diagnosis, treatment planning, and patient management.
AI Agents for Enhanced Healthcare Services
In addition to the large model machines, Ant Group has introduced two specialized medical AI agents: Angel and Yibaoer. Angel has already been deployed in over 1,000 medical facilities, providing support for various tasks and workflows. Yibaoer, on the other hand, focuses on providing support for medical insurance services, streamlining processes and improving efficiency for both patients and providers. Furthermore, in September of the previous year, Ant Group launched the AI Healthcare Manager service within its widely used Alipay payments application, further expanding its reach and impact in the healthcare sector. These initiatives demonstrate Ant Group’s dedication to leveraging artificial intelligence to transform and improve healthcare delivery across China, making it more accessible, efficient, and effective. The combination of large language models, specialized AI agents, and integration with existing platforms like Alipay positions Ant Group as a significant player in the rapidly evolving field of AI-powered healthcare.